Model information
Description
The intervention group extraction model automatically extracts a set of intervention groups reported in a study.
Inputs
Large Language Model (LLM)
The model extracts the intervention groups from the full PDF.
Outputs
Text string (suggested extraction field values).
Model data
Large language model
We use existing large language models to extract relevant data.
Training dataset
We use samples from a historical dataset to inform model design. Instead of using the data directly, we analyse patterns from the selected dataset to create prompts.
Training approach
We do not conduct direct training; instead, we use existing Large Language Models like Gemini Flash and tailor prompts to guide the model in making accurate predictions.
Evaluation
Methodology
We evaluated the Large Language Model (LLM) output against a benchmark dataset of 56 open-access studies.
To create the benchmark dataset, four Covidence employees annotated all studies, extracting all of the reported intervention groups from the full-text PDF. Two employees, who were not part of the initial annotation, then reached consensus on the individual annotations, grouping similar intervention groups.
We used our model to extract intervention groups from the benchmark studies. We presented these predictions, along with the benchmark data, to three reviewers for assessment.
Based on their evaluations, we calculated the following metrics:
Precision: The proportion of intervention groups extracted by the model that are correct.
Recall: The proportion of benchmark intervention groups successfully extracted by the model.
To assess suitability, we compared the precision and recall to typical human rates reported in existing publications, such as Tang et al (2025). Tang et al found that human accuracy ranges from ~55-60%, based on a 39.54-44.88% reported error rate for data extraction of results data.
Results
The model achieved a precision of 98.15%, and a recall of 95.16%, well out-performing typical human performance.
Intended usage & limitations
Benefit & intended usage
The model can be used to provide highly-accurate extraction suggestions to reviewers completing data extraction, saving time and effort when extracting intervention groups.
Known limitations
Model limitations
Only available for studies with a DOI linked in Covidence.
Access to the full-text PDF is required, either through an accessible open access link or user-uploaded content.
Evaluation limitations
Limited studies were used to evaluate performance
Studies used for evaluation were exclusively in the medical and health science domain