Monday 07 April 2025
The pursuit of understanding complex biological systems has long been a challenge for scientists. With the vast amount of data available, it can be overwhelming to identify meaningful patterns and relationships between different genes, proteins, and cellular components. Now, researchers have made significant progress in this area by developing a novel framework for evaluating large language models (LLMs) for inferring causal relationships in biology.
The team used a text-mining approach to gather information on 100 well-known cancer genes and their relationships with each other. They then designed a series of prompts that asked the LLMs to quantify the extent to which one gene had a causal effect on another. The prompts were tailored to include additional context, such as experimental findings, gene descriptions, and literature evidence.
The results showed that even relatively small LLMs can capture meaningful aspects of causal structure in biological systems when provided with the right prompts. The best-performing prompt variant included both cancer-related information and mRNA measurements, which helped the LLMs to identify direct causal edges between genes.
To further evaluate the performance of the LLMs, the researchers computed their predictions using a transitive closure approach, which allows them to consider indirect relationships between genes. They found that this approach improved the accuracy of the LLMs in identifying causal relationships.
The study also explored the impact of chain-of-thought reasoning on the performance of the LLMs. This involved asking the models to provide step-by-step reasoning for their predictions, including evidence and counter-evidence for a causal effect. The results showed that this approach did not improve the accuracy of the LLMs, but it did allow them to generate more coherent and well-reasoned explanations for their predictions.
The potential applications of this research are vast. By using LLMs to infer causal relationships in biology, scientists can accelerate their understanding of complex biological systems and develop new treatments for diseases. The study’s authors believe that their framework could be used to improve the accuracy of biomarkers, predict the effects of gene mutations, and even design new therapeutic strategies.
The development of this novel framework is a significant step forward in the field of biological language processing. It demonstrates the potential of LLMs to not only process and analyze large amounts of data but also to generate insights that can inform scientific discovery and medical practice. As researchers continue to refine their methods, we can expect to see even more impressive advances in our understanding of the complex interactions within living organisms.
Cite this article: “Unraveling Causal Relationships in Biology: Large Language Models Take the Reins”, The Science Archive, 2025.
Large Language Models, Biological Systems, Causal Relationships, Gene Expression, Cancer Genes, Text-Mining, Prompts, Transitive Closure, Chain-Of-Thought Reasoning, Biomarkers