Spotlight Articles

Generative AI – and LLMs [Large Language Models] which drive it – are receiving extensive (and appropriate) attention in the media, in regulatory contexts and in the academic literature. We feature here two articles which explore some of the intersections between generative AI and consent. The first, by Decker et al., explores how LLMs might be employed to potentially enhance ICF readability, improve articulation of risks, and “ease documentation burden for physicians.”  The second, by Litt et al., also explores improvements in readability through aggregating ICFs posted with oncology clinical trials registered on clinicaltrials.gov and then applying a generative AI tool generate more useful consent content. We anticipate a good deal more activity exploring how generative AI can contribute to consent content, effectiveness and integrity.

Large Language Model−Based Chatbot vs Surgeon-Generated Informed Consent Documentation for Common Procedures
Original Investigation Surgery
Hannah Decker, Karen Trang, Joel Ramirez, Alexis Colley, Logan Pierce, Melissa Coleman, Tasce Bongiovanni, Genevieve B. Melton, Elizabeth Wick
JAMA Network Open, 9 October 2023; 6(10)
Abstract
Importance
Informed consent is a critical component of patient care before invasive procedures, yet it is frequently inadequate. Electronic consent forms have the potential to facilitate patient comprehension if they provide information that is readable, accurate, and complete; it is not known if large language model (LLM)-based chatbots may improve informed consent documentation by generating accurate and complete information that is easily understood by patients.
Objective
To compare the readability, accuracy, and completeness of LLM-based chatbot- vs surgeon-generated information on the risks, benefits, and alternatives (RBAs) of common surgical procedures.
Design, Setting, and Participants
This cross-sectional study compared randomly selected surgeon-generated RBAs used in signed electronic consent forms at an academic referral center in San Francisco with LLM-based chatbot-generated (ChatGPT-3.5, OpenAI) RBAs for 6 surgical procedures (colectomy, coronary artery bypass graft, laparoscopic cholecystectomy, inguinal hernia repair, knee arthroplasty, and spinal fusion).
Main Outcomes and Measures
Readability was measured using previously validated scales (Flesh-Kincaid grade level, Gunning Fog index, the Simple Measure of Gobbledygook, and the Coleman-Liau index). Scores range from 0 to greater than 20 to indicate the years of education required to understand a text. Accuracy and completeness were assessed using a rubric developed with recommendations from LeapFrog, the Joint Commission, and the American College of Surgeons. Both composite and RBA subgroup scores were compared.
Results
The total sample consisted of 36 RBAs, with 1 RBA generated by the LLM-based chatbot and 5 RBAs generated by a surgeon for each of the 6 surgical procedures. The mean (SD) readability score for the LLM-based chatbot RBAs was 12.9 (2.0) vs 15.7 (4.0) for surgeon-generated RBAs (P = .10). The mean (SD) composite completeness and accuracy score was lower for surgeons’ RBAs at 1.6 (0.5) than for LLM-based chatbot RBAs at 2.2 (0.4) (P < .001). The LLM-based chatbot scores were higher than the surgeon-generated scores for descriptions of the benefits of surgery (2.3 [0.7] vs 1.4 [0.7]; P < .001) and alternatives to surgery (2.7 [0.5] vs 1.4 [0.7]; P < .001). There was no significant difference in chatbot vs surgeon RBA scores for risks of surgery (1.7 [0.5] vs 1.7 [0.4]; P = .38).
Conclusions and Relevance
The findings of this cross-sectional study suggest that despite not being perfect, LLM-based chatbots have the potential to enhance informed consent documentation. If an LLM were embedded in electronic health records in a manner compliant with the Health Insurance Portability and Accountability Act, it could be used to provide personalized risk information while easing documentation burden for physicians.

Improving clinical trial consent form readability through artificial intelligence
Conference Presentation – ASCO Quality Care Symposium 2023
Henry Kazunaru Litt, Emma Greenstreet Akman, Dame Idossa, Narjust Florez, Ana I. Velazquez Manana
JCO Oncology Practice – Health Care Access, Equity, and Disparities, 26 October 2023; 18(11)suppl
Abstract
Background
High literacy levels are needed to understand oncology clinical trial (CT) informed consent forms (ICF), which represents a barrier to enrollment of older adults and diverse populations. ChatGPT-4 is an artificial intelligence chatbot that responds to user prompts and can summarize large amounts of text. We tested whether ChatGPT-4 could simplify CT information from ICFs.
Methods
On May 22, 2023, we searched clinicaltrials.gov for interventional, therapeutic, NIH-funded, CTs involving adults with the 14 most prevalent cancer types. Only CTs with available study protocols that were currently recruiting, “enrolling by invitation”, and “active not recruiting” were included. Trials that were diagnostic, preventative, or supportive were excluded. Publicly available ICFs from the resulting CTs were downloaded and analyzed. Using the ChatGPT-4 plugin askyourpdf.com, we asked ChatGPT-4 to review each ICF and answer 8 questions recommended by the NCCN for patients considering a CT in a 6th grade literacy level. Our prompt included the following 8 questions: “1) What are the treatments used in the clinical trial? 2) Has the treatment been used for other types of cancer? 3) What are the risks and benefits of this treatment? 4) What side effects should I expect and how will they be managed? 5) How long will I be in the clinical trial? 6) Will I be able to get other treatment if this doesn’t work? 7) How will you know if the treatment is working? 8) Will the clinical trial cost me anything?” Reading level (readability) was assessed for both the ICFs and ChatGPT-4’s question responses using the validated Flesch-Kincaid (FK), Gunning Fog (GF), and SMOG indices using the online Readable App. Data was summarized with descriptive statistics and t-test was used to compare text reading levels between ICFs and ChatGPT-4’s answers.
Results
Our search yielded 83 therapeutic oncology CTs, of which 70 had publicly available ICFs. ChatGPT-4 successfully analyzed 66 of the 70 ICFs (94.3%). The mean text reading levels of its answers were 6.2 (95% CI: 5.9-6.5), 8.6 (95% CI: 8.2-8.9), and 9.2 (95% CI: 8.9-9.4) based on FK, GF, and SMOG indices, respectively. Of 70 ICFs, 54 (77.1%) contained text that could be evaluated for readability analysis and were included in the analysis. The mean text reading levels was 7.9 (95% CI: 7.7-8.1), 9.3 (95% CI: 9.1-9.6), and 10.5 (95% CI: 10.2-10.8) based on FK, GF, and SMOG indices, respectively. ChatGPT-4’s text responses had a significantly lower reading level compared to ICFs text for all three readability indices (FK: p<0.01, GF: p=0.02, SMOG: p<0.01).
Conclusions
ChatGPT-4 presented key information from oncology CT ICFs at a 6th to 9th grade reading level, which was significantly lower than the original ICFs. While further studies are needed to assess ChatGPT-4’s accuracy, this study shows its potential as tool for improving patients’ understanding of oncology CTs.

Leave a comment