Introduction:
Clinical databases are essential for clinical and translational research. Traditionally, curating a clinical database involves manually collecting data from free text notes within the electronic medical record (EMR), but this process is time-consuming and error prone. Recently, Large Language Models (LLMs) such as OpenAI’s ChatGPT and Google’s Gemini have demonstrated impressive semantic understanding of free text, and could be used to automate the free text data extraction tasks that once could only be done using human experts and trainees. Unfortunately, these free text notes often contain protected health information, and moreover embody a valuable asset, leading health systems to restrict their transfer to entities like the third party AI providers mentioned above. The goal of this study is to evaluate the feasibility of avoiding data transfer by using an open source AI model to generate a clinical database of kidney cancer patients from free text radiology, pathology, and operative notes.
Methods:
Using EPIC’s Clarity database, all patients who underwent nephrectomy were identified and their clinical notes were extracted as text files. Prompts were carefully designed to unambiguously ask the model to provide categorical and numerical data on the basis of these free text notes. The Llama-3-8B model was instantiated using the HuggingFace Python API, and the Guidance-AI library was used to constrain model outputs. The full clinical note was fed to the model as context, followed by the series of prompts. Existing, manually-curated institutional databases for clear cell and papillary renal cell carcinoma patients were used as a reference standard to which the LLM-generated values were compared. In cases where there was disagreement between the LLM and the reference database, an additional human rater arbitrated the disagreement while blinded to which source provided which value.
Results:
3,500 patients represented in both the Clarity query and the manually-curated institutional databases were randomly selected for analysis. The agreement values for select variables are shown in Figure 1. Agreement ranged from 96.1% on the low-end (e.g., tumor size in centimeters) to more than 99% on the high end (e.g., sarcomatoid and rhabdoid features). In cases where there was disagreement, the second human rater more often agreed with the LLM rather than the original human rater for every variable except for tumor histologic type. This data is shown in Figure 2. The AI model took about 4.1 seconds to parse each clinical note using a single NVIDIA A100 GPU with the model instantiated at FP16 resolution.
Conclusion:
This study demonstrates that the 8B parameter variant of the open source Llama-3 AI model from Meta AI is capable of extracting relevant information from clinical notes with at least as high of accuracy as manual raters, if not better. This model is small enough to run on widely available hardware at reasonable cost, and fast enough to generate large-scale clinical databases in weeks or even days. This presents a practical alternative to avoid the challenges posed by closed-source commercial AI solutions for data extraction from free text. One observation from this work is the importance of carefully-constructed prompts and branching logic. Further studies are necessary to determine whether the same level of performance would be achieved by using the same prompts at a different institution, or whether fine-tuning prompts from institution to institution will be necessary.
Funding: This work was supported in part by the Department of Defense under Award Number HT94252310918.
Image(s) (click to enlarge):
AUTOMATING RENAL CANCER CHART REVIEW USING LARGE LANGUAGE MODELS
Category
Kidney Cancer > Clinical
Description
Poster #141
Presented By: Nicholas Heller
Authors:
Nicholas Heller
Angelica Bartholomew
Clara Goebel
Rikhil Seshadri
Beatriz Lopez Morato
Gabriel Wallerstein-King
Betty Wang
Jayant Siva
Jason Scovell
Rebecca Campbell
Michal Ozery-Flato
Vesna Barros
Maria Gabrani
Michal Rosen-Zvi
Ryan Ward
Steven Campbell
Erick Remer
Christopher Weight
Robert Abouassaly