A data-driven approach to PCOS Diagnosis: Systematic review of machine learning applications in reproductive health

DOI: 10.2478/amma-2025-0054

Background and aim: Polycystic Ovary Syndrome (PCOS) is a prevalent endocrine disorder in reproductive-aged women, characterized by hormonal imbalances, anovulation, and metabolic abnormalities. This systematic review aims to evaluate the effectiveness, types, and diagnostic performance of ML algorithms applied in PCOS detection and classification, and to identify the most frequently used input features and methodological challenges in existing studies.
Methods: A systematic search was conducted across scholarly databased, but not limited to PubMed, Scopus, and Google Scholar for studies published between 2014 and 2024 using keywords related to PCOS and machine learning. Inclusion criteria focused on original, peer-reviewed studies applying ML models for PCOS diagnosis. Data were extracted on model type, input features, diagnostic accuracy, and study design. Quality assessment was performed using the PROBAST tool.
Results: Out of 450 identified studies, 34 met the inclusion criteria and passed the quality assessment. Supervised learning models such as Random Forest, SVM, and XGBoost showed high accuracy (up to 99%). Deep learning approaches, particularly Convolutional Neural Networks (CNNs), achieved accuracies between 95% and 99.89% in analyzing ultrasound images. Hybrid models integrating clinical and imaging data further enhanced performance. Common input features included BMI, LH/FSH ratio, AMH, and ultrasound-based ovarian morphology. However, few studies validated models on external datasets, and input feature selection lacked standardization.
Conclusion: Machine learning models such as supervised, deep learning, and hybrid approaches show strong potential in improving PCOS diagnosis by identifying complex patterns across multi-dimensional datasets. Challenges such as limited generalizability and data standardization remain, therefore future studies should focus on developing explainable ML tools, validating models in clinical settings, and leveraging diverse data types for robust, personalized PCOS diagnosis.

Full text: PDF