Introduction:
Manual chart review has long been the standard for retrospective data collection; however, this approach is time-intensive and prone to error from human fatigue, misinterpretation, and inter- and intra-rater variability. Large Language Models (LLMs) are neural networks that are trained on large amounts of data, enabling them to perform natural language tasks to include semantic comprehension, context retention, and information extraction. Our group has previously developed a framework for branching logic prompt design to leverage LLMs to handle complex reasoning queries and adapt to varied inputs. We sought to evaluate the feasibility and accuracy of this framework using Qwen3-8B, an open-source LLM for automated, local extraction of structured data from radical cystectomy pathology reports.
Methods:
Patients undergoing cystectomy from 2001 to 2025 were included in retrospective analysis. Prompts were designed to evaluate nine variables from surgical pathology notes, to include pT stage, pN stage, number of lymph nodes examined/positive, margin status, variant histology, and lymphovascular invasion. A manually extracted database was used as a reference. Patients with missing manual data were excluded on a variable-by-variable basis. Comparison between LLM and manual extracted data was assessed via % agreement and Cohen’s kappa statistic.
Results:
In total, 1898 radical cystectomy patients were included for analysis. The LLM generated 16046 datapoints across 9 variables with an overall agreement of 85.8% compared to manually abstracted data. There were no missing LLM generated datapoints across variables. Agreement and statistical comparison by variable is demonstrated in Figure 1., with lymphovascular invasion demonstrating the highest statistical agreement (n = 1734, kappa = 0.78, agreement 91.5%). Urethral margin status demonstrated the lowest statistical agreement with kappa = 0.136.
Conclusion:
We demonstrated the ability of branch logic prompting with an open-source LLM to accurately review, interpret, and extract pathology data for patients undergoing radical cystectomy. Future iterative improvements to improve variable agreement for tumor stage and margin status are underway. It’s important to acknowledge that manually-extracted values are themselves imperfect, and re-review will be required to determine whether the automatic or manual approach exhibited higher overall accuracy. Further refinement of prompt design may be needed to improve LLM accuracy for the eventual utilization of LLMs as the primary means for pathology data extraction.
Funding: This work was supported in part by the Department of Defense under Award Number HT94252310918. Additional funding was provided by Climb 4 Kidney Cancer, a nonprofit organization dedicated to advancing research, education, and advocacy for kidney cancer.
Image(s) (click to enlarge):
DEVELOPMENT OF LARGE LANGUAGE MODEL FRAMEWORK FOR AUTOMATED EXTRACTION OF PATHOLOGY DATA IN RADICAL CYSTECTOMY
Category
Bladder Cancer > Muscle Invasive Bladder Cancer
Description
Poster #41
Presented By: Jacob Knorr
Authors:
Jacob Knorr
Sean McSweeney
Rishi Jonnalagadda
Rikhil Seshandri
Haya Abusafieh
Daniel Jevnikar
Gabriela Diaz
Sahil Patel
Laura Bukavina
Nima Almassi
Nicholas Heller
Christopher Weight
