Tuesday 08 April 2025
Machine learning, a branch of artificial intelligence, has become increasingly popular in medical research. It’s used to develop predictive models that can diagnose diseases and make treatment decisions more accurate. However, determining the right sample size for these studies is a crucial step that often gets overlooked.
A new paper proposes a method for calculating the sample size needed for machine learning studies in medicine. The authors recognize that previous research has focused on specific aspects of sample size determination, but there’s been a lack of general guidelines for medical research using machine learning methods.
The proposed method prioritizes the testing set, which is used to evaluate the performance of the machine learning model. It then calculates the training set and total sample sizes based on predetermined ratios or proportions. This approach ensures that both sets are adequately sized, providing a more accurate representation of the data.
To illustrate this method, the authors provide an example of a study aimed at developing a machine learning model for diagnosing COVID-19 using plain chest X-rays. The researchers want to estimate the sensitivity and specificity of their model with 5% precision and 95% confidence intervals. Using the proposed method, they calculate that they need around 2,920 suspected cases.
The paper’s authors emphasize that their approach can be applied to outcomes with multiple categories or continuous variables, making it a versatile tool for medical researchers. They also acknowledge that their method is not without limitations and may require modifications depending on the specific study design and goals.
The increasing use of machine learning in medical research has created a need for clear guidelines on sample size determination. This paper provides a practical solution by offering a general method for calculating sample sizes that takes into account both training and testing sets. As machine learning continues to shape the future of medicine, this approach will help researchers ensure their studies are well-designed and provide reliable results.
The authors’ contribution to the field is significant, as they fill a gap in the current literature by providing a comprehensive method for sample size determination in medical research using machine learning methods. Their work has the potential to improve the accuracy and reliability of machine learning models in medicine, ultimately leading to better patient outcomes.
By prioritizing the testing set and calculating training and total sample sizes based on predetermined ratios or proportions, this proposed method provides a practical solution for medical researchers. It’s an important step forward in ensuring that machine learning studies are well-designed and provide reliable results.
Cite this article: “Machine Learning in Medicine: A Guide to Determining Sample Sizes for Accurate Results”, The Science Archive, 2025.
Machine Learning, Medical Research, Sample Size Determination, Artificial Intelligence, Predictive Models, Diagnostic Accuracy, Treatment Decisions, Covid-19, Plain Chest X-Rays, Study Design.







