Accurate grade group determination at the time of prostate cancer diagnosis or surgery is paramount in the decision to pursue definite treatment, active surveillance, or adjuvant therapy. However, known reader variability in the grading of pathologic specimens remains a significant hurdle in the standardization of treatment of prostate cancer. Recently, deep learning (DL) algorithms are being developed to help augment and standardize detection and grading for prostate cancer. The purpose of this study was to assess a novel DL algorithm in the accurate detection and grade group classification of prostate cancer in whole mount pathology slides.
A DL algorithm was previously trained for detection and grading of prostate cancer utilizing publicly available data sources from prostate biopsies, tissue microarrays, and surgical sections. The system utilizes an ensemble of ResNet architectures for cancer detection and grading from image patches 100µmx100µm at 20x (patch level validation performance = 92% detection accuracy, 78% Gleason classification accuracy). Congruent lesions >2 mm2 area were considered positive for DL-based detection. This system was applied to patients with available digitized whole-mount pathology and foci-level annotations of disease burden. Burden of each foci within each slide were marked by ink under microscope and mapped digitally for quantitative comparison. DL-based detection accuracy and grading assessment were compared to ground-truth foci-level annotations and correlated with patient-level ISUP grade group, stratified by GG1/2 vs GG3/4/5 disease.
Fifty patients (n=24 GG1/2, n=26 GG3/4/5 on surgical pathology) with available digital images were selected for assessment. Patient-level cancer detection accuracy was 96%, with the algorithm identifying two false negative (FN) patients actually harboring GG1/2 disease. On annotated foci identification (n=85), 66/85 (77.6% sensitivity) foci were correctly identified at the penalty of 115 false positives (FP) occurring at a median rate of 1 FP/patient (range 0-20). Of the 19 FN foci detected, 16 foci were from patients harboring GG1/GG2 disease, demonstrating the algorithm’s ability to capture most patients with clinically significant disease (>GG2 disease). Despite poor foci-level PPV at 36.5%, median patch-level PPV across patients was 88.9% (range 0-100%). Poor foci-level PPV was likely due to the observation that FP lesions were significantly smaller in size compared to detected regions within pathologist-identified foci as well as the paucity of benign surgical specimen pathology introduced in the training of the algorithm.
Our novel DL algorithm demonstrated excellent accuracy in identifying cancer on whole mount prostate pathology. Foci-level accuracy remains considerable, however, refinement of the algorithm in regards to PPV remains an area of improvement. Future research will focus on false positive exclusion on the basis size to improve output.
APPLICATION OF DEEP LEARNING DETECTION AND GRADING SYSTEM FOR IDENTIFICATION OF CLINICALLY SIGNIFICANT PROSTATE CANCER ON WHOLE MOUNT PATHOLOGY
Category
Prostate Cancer > Potentially Localized
Description
Presented By: Nitin Yerram
Authors: Nitin Yerram
Stephanie Harmon
Luke O'Connor
Alex Wang
Sherif Mehralivand
Samira Masoudi
Michael Daneshvar
Maria Merino
Brad Wood
Peter Choyke
Peter Pinto
Baris Turkbey