Article Text
Abstract
Introduction Artificial intelligence and large language models (LLMs) have emerged as potentially disruptive technologies in healthcare. In this study GPT-3.5, an accessible LLM, was assessed for its accuracy and reliability in performing guideline-based evaluation of neuraxial bleeding risk in hypothetical patients on anticoagulation medication. The study also explored the impact of structured prompt guidance on the LLM’s performance.
Methods A dataset of 10 hypothetical patient stems and 26 anticoagulation profiles (260 unique combinations) was developed based on American Society of Regional Anesthesia and Pain Medicine guidelines. Five prompts were created for the LLM, ranging from minimal guidance to explicit instructions. The model’s responses were compared with a “truth table” based on the guidelines. Performance metrics, including accuracy and area under the receiver operating curve (AUC), were used.
Results Baseline performance of GPT-3.5 was slightly above chance. With detailed prompts and explicit guidelines, performance improved significantly (AUC 0.70, 95% CI (0.64 to 0.77)). Performance varied among medication classes.
Discussion LLMs show potential for assisting in clinical decision making but rely on accurate and relevant prompts. Integration of LLMs should consider safety and privacy concerns. Further research is needed to optimize LLM performance and address complex scenarios. The tested LLM demonstrates potential in assessing neuraxial bleeding risk but relies on precise prompts. LLM integration should be approached cautiously, considering limitations. Future research should focus on optimization and understanding LLM capabilities and limitations in healthcare.
- Anticoagulation
- Nerve Block
- Pain Management
- TECHNOLOGY
Data availability statement
Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information. All data relevant to the study are include as supplementary information. Additionally, code is available at https://github.com/Nathan-C-Hurley/DangerDangerGastonLabat.
Statistics from Altmetric.com
Data availability statement
Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as supplementary information. All data relevant to the study are include as supplementary information. Additionally, code is available at https://github.com/Nathan-C-Hurley/DangerDangerGastonLabat.
Footnotes
Twitter @dr_rajgupta, @KristopherSchr6
Contributors NCH: Coding, statistical analysis, core idea contribution, manuscript preparation, and submission. NCH is responsible for the overall content as guarantor. RKG: Manuscript preparation and review. KMS: Idea conception, test scenarios, manuscript preparation, and review. ASH: Statistical analysis, core idea contribution, manuscript preparation, and review.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests RKG is responsible for the creation and maintenance of the ASRA-PM Coags app, is a Member of the ASRA-PM Board of Directors and is an Associate Editor for Regional Anesthesia and Pain Medicine.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.