Introduction:
Predictive models are increasingly being used as clinical decision-support tools for the diagnosis and risk stratification of prostate cancer patients. While machine learning and artificial intelligence methods have the potential to develop precise prediction models, recent research has shown that data-driven predictive models often retain systematic biases present in the underlying data and can propagate inequalities via their predictions. This issue is particularly concerning in prostate cancer, where algorithmic bias can exacerbate known existing disparities in care for vulnerable populations, including racial minorities. In this study, we examined the potential for racial bias in machine learning prediction models for prostate cancer survival and compared bias-mitigation strategies to reduce model bias.
Methods:
We utilized the National Cancer Database (NCDB) to identify patients diagnosed with localized prostate cancer between 2004 and 2022. Demographic information, clinical data, and disease-specific factors were extracted for these patients. We categorized patients into National Institute of Health (NIH) subgroups: ‘Non-Hispanic White’, ‘Non-Hispanic Black’, ‘Hispanic’, and ‘Asian’. The dataset was divided into training and testing sets in a 70%/30% split based on these subgroups. A deep Cox proportional hazard model was trained to predict the risk of prostate cancer development. We evaluated model performance to predict death at 5 years in the overall test set and its subgroups based on NIH race categories. Model performance was evaluated in the test sets with balanced accuracy and C-index. The primary metric for assessing prediction disparity was the equalized odds ratio (eOR). We compared three bias mitigation techniques to encourage prediction parity: threshold optimizer, Fair Cox proportional hazard model, and distributionally robust (DRO) Cox proportional hazard model.
Results:
We identified 361,883 patients that met the inclusion criteria, comprising 290,805 (80.3%) non-Hispanic white (NHW), 47,710 (13.2%) non-Hispanic black (NHB), 16,405 (4.5%) Hispanic, and 6,963 (2%) Asian patients. The deep Cox model had a balanced accuracy of 0.591 (95% CI 0.581-0.601) and an overall C-index of 0.667 (95% CI 0.653-0.681). This model performed best for Asian patients (balanced accuracy 0.616; 95% CI 0.584-0.648) and worst for NHW patients (C-index 0.0.588; 95% CI 0.0.580-0.0.597) ; the equalized odds ratio (eOR) of this model was 0.681 (95% CI 0.663-0.699) indicating a prediction disparity across racial and ethnic groups. Applying a threshold optimizer significantly (p-value < 10-5) improved the eOR to 0.797 (95% CI 0.779-0.815) with a balanced accuracy of 0.597 (95% CI 0.582-0.612), indicating no loss in predictive performance. Detailed results are illustrated in Figure 1. The fair Cox proportional hazard model achieved a balanced accuracy of 0.611 (95% CI 0.606-0.616) and an eOR of 0.798 (95% CI 0.771-0.823), while the DRO CoxPH model demonstrated a balanced accuracy of 0.629 (95% CI 0.624-0.634) and an eOR of 0.799 (95% CI 0.779-0.815).
Conclusion:
We developed machine learning models to predict prostate cancer survival and observed that a naïve model exhibited prediction disparities, with certain racial groups experiencing inferior performance compared to others. By applying bias mitigation techniques, we were able to reduce these disparities and achieve improved prediction parity without compromising overall performance. Our study underscores the importance of examining prediction disparities in risk models and employing bias mitigation strategies to enhance model equity in healthcare applications.
Funding: N/A
Image(s) (click to enlarge):
MITIGATING DISPARITIES IN PROSTATE CANCER THROUGH FAIR MACHINE LEARNING MODELS
Category
Prostate Cancer > Potentially Localized
Description
Poster #199
Presented By: Hyungrok Do
Authors:
Hyungrok Do
Jesse Persily
Judy Zhong
Yassamin Neshatvar
Katie Murray
Madhur Nayan
