AI-Assisted Data Extraction: A Promising Solution for Research Synthesis

Wednesday 22 January 2025


The quest for efficiency in research synthesis has led scientists to explore the potential of large language models (LLMs) in data extraction. A recent study published in Research Synthesis Methods investigated the accuracy of LLMs in extracting data from studies, with promising results.


The researchers used three freely available LLMs – Google’s Gemini 1.5 Flash, Gemini 1.5 Pro, and Mistral Large 2 – to extract data from 112 studies included in a published scoping review. They found that the LLMs were able to accurately extract explicit data, such as study characteristics and participant demographics, with agreement rates ranging from 71% to 84%. However, their performance was less impressive when it came to extracting derived data, such as categorical variables, with agreement rates ranging from 56% to 70%.


Despite the limitations, the researchers were optimistic about the potential of LLMs in streamlining the research synthesis process. They noted that LLMs could reduce labor time and enhance efficiency, making them a valuable tool for researchers.


However, they also emphasized the importance of human oversight in ensuring the accuracy of data extraction. They developed a free, open-source graphic user interface program called AIDE (AI-assisted Data Extraction), which allows researchers to use LLMs for data extraction while still maintaining control over the process.


AIDE is designed to be user-friendly and flexible, allowing researchers to select their preferred LLM provider and model. The program also includes features such as automatic parsing of extracted data and scrolling to relevant pages in the PDF file.


The authors acknowledge that further research is needed to quantify the resources saved by using AIDE and to refine its performance. Nevertheless, their study demonstrates the potential of LLMs in research synthesis and highlights the importance of human oversight in ensuring accuracy.


The use of LLMs in research synthesis has significant implications for the scientific community. As researchers face increasing pressure to publish high-quality studies, the need for efficient and accurate data extraction methods is more pressing than ever. AIDE offers a promising solution, enabling researchers to harness the power of LLMs while maintaining control over the data extraction process.


Moreover, the development of AIDE highlights the potential for collaborative research between human experts and AI systems. By combining the strengths of both, researchers can create innovative solutions that enhance efficiency, accuracy, and productivity in research synthesis.


Cite this article: “AI-Assisted Data Extraction: A Promising Solution for Research Synthesis”, The Science Archive, 2025.


Large Language Models, Research Synthesis, Data Extraction, Artificial Intelligence, Efficiency, Accuracy, Human Oversight, Aide, Collaborative Research, Scientific Community


Reference: Noah L. Schroeder, Chris Davis Jaldi, Shan Zhang, “Large Language Models with Human-In-The-Loop Validation for Systematic Review Data Extraction” (2025).


Leave a Reply