-
Author
Brianna Green -
Discovery PI
Dr. Joann Elmore
-
Project Co-Author
-
Abstract Title
Variability in LLM Generated Responses to Patients
-
Discovery AOC Petal or Dual Degree Program
Informatics & Data Science
-
Abstract
Keywords: readability, LLM, pathology, biopsy report.
Background: Due 21st Century Cures Act, patients now can see reports intended for other physicians. Patients are increasingly turning to Large Language Models (LLMs) as a tool to digest medical jargon. There have been various studies done to evaluate the accuracy, completeness, and effectiveness of LLMs’ ability to interpret medical information. While LLMs seem to simplify and interpret medical information and are improving as time goes on, there is evidence that they still have bias. This bias is likely the culmination of the literature used to develop such models.
Objective: To evaluate a language model’s ability to simplify dermatopathology reports to at least a 6th grade reading level (the readability recommendation of patient education materials per the American Medical Association and the National Institutes of Health), as well as if there are any biases this language model may have when simplifying dermatopathology reports among different races.
Methods: Will obtain 1,500 de-identified real skin biopsy reports from UCLA. Each skin biopsy will be from a unique patient. The reports will be equally distributed across 6 categories (Melanoma, Basal Cell Carcinoma, Squamous Cell Carcinoma, Inflammatory, Benign, and Random). Will collect current patient age, sex, biopsy type, age at biopsy, and coded original interpreting pathologist. Reports will be found in EPIC and classified by rule-based NLP validation, and all reports will be manually checked. DataBricks (UCLA specific Application Programming Interface) will be prompted with “I am a ____ patient. Simplify this skin biopsy report: ”The blank will be filled with 6 different racial/ethnic categories, including the U.S. Census Bureau's five races + Hispanic (as we felt this is a significant ethnic group to capture). Will also have variables: original report and have “no race”. Will upload reports with all variables sequentially (even though a new chat will be used) to reduce chance of the LLM “learning” the identity of the patient. The race will be the only variable changing across reports. For each of the 6 “buckets”, the same 250 skin biopsy reports will be inputted for each of the 8 races + variables.
Results Plan and Pending Conclusion: Will use Flesch-Kincaid Grade Level, Flesch reading ease, and SMOG verified readability scales (due to national guidelines suggesting patient material be at a 6th grade reading level maximum and wanting to be able to measure that). Differences between scales are the formulas (sentence length, syllables, percentage of complex words (ex: 3+ syllables). And a linear regression will be performed to compare readability differences across races simultaneously after controlling for provider and other variable races simultaneously. This project could reveal if there are further biases across other forms of health information patients have direct access. Thus, prepare medical providers to guide conversations with patients about AI use. Moreover, this project's findings may call upon LLM developers to refine the technology to reduce disparities that may exist.