Using AI in infection prevention

Anyone with internet access can type a prompt into a large language model (LLM) tool, commonly referred to as AI, to generate a tailored response. LLMs provide human-like responses to requests within seconds by converting prompts into a sequence of units, then predicting what units address it through pre-training in expansive and varied text material. Although debates continue about AI’s implications, institutions like University of Iowa Health Care have already piloted AI tools that summarize patient charts and draft clinical documentation, with the goal of easing physician stress over administrative tasks. And now, internists and infection preventionists (IP) are exploring the possibilities of implementing AI for healthcare guidance.

Infection Control & Hospital Epidemiology recently published research led by UI Health Care Infection preventionist Oluchi J. Abosi MBChB, MPH, CIC, that assessed the accuracy and completeness of LLM responses to clinical-scenario IP questions. Clinical Associate Professor in Infectious Diseases Karen Brust, MD, and IntMed training program alumni Natalie Ross, MD, and Takaaki Kobayashi, MD, MPH, worked with Abosi throughout the research and publication process. Ross presented the initial research to the Society for Healthcare Epidemiology of America in May 2024, where she received positive feedback.

With a 500% surge in Infection Prevention Control (IPC) consultations during the COVID-19 pandemic, Abosi said she and her team were compelled to explore AI’s potential to accurately resolve IPC inquiries. The volume of IPC consultations has not returned to the pre-pandemic status quo, as COVID-19 emphasized IPC’s essential role in preventing and mitigating health-care associated infections (HAIs). Still, many facilities only staff one IP or lack an IPC program altogether, Abosi said.

“It’s not one of the things that many hospitals prioritize. They may end up leveraging a nurse to do IP work, but don’t have an official role,” Abosi said. “In these cases, we wondered if AI could communicate the information and guidelines that exist from multiple different sources and put together an accessible and reliable summary for addressing IP consultations.”

Evaluating AI responses to IP queries
The research team entered 31 sample questions about transmissions-based precautions, communicable disease exposures, and environmental cleaning to four commercially available LLMs: Microsoft Copilot, GPT-3.5, GPT-4.0, and OpenEvidence. These questions were generated from 2022 consultation data to replicate real-world clinical scenarios. Ross fed each question to every LLM, and without knowing the tool associated with a specific response, Abosi, Brust, and Kobayashi assessed responses using 5-point and 6-point Likert scales for accuracy and completeness, respectively. The team recorded responses scoring ≥3  as accurate and those ≥4 as complete.

The team’s research yielded results Abosi found surprising. OpenEvidence, the LLM she anticipated would produce the most accurate responses, was the least accurate of the four (83.9%). ChatGPT-4.0 generated the most accurate and complete responses overall (98.9%). Meanwhile, GPT-3.5, the free version of the highest-performing LLM, provided the least-complete responses (67.7%). OpenEvidence also lagged in completeness (72%).

Abosi said she was surprised by the LLMs performing well in accuracy because the vast text material within LLM training would include a multitude of different organizational guidelines, revised versions of the same guidelines, and evolving institutional research. The team was interested in how restricting LLMs to an organizational database would affect response accuracy and completeness. In a follow-up test, Ross prompted LLMs with the same 31 questions, but this time added “Follow CDC Guidelines in the United States.”

Restricting LLMs to national CDC guidelines decreased response accuracy for all LLMs except GPT-3.5 (90.3%). Completeness also decreased along the same trend. Abosi hypothesizes that accuracy and completeness decreased in these circumstances because the LLMs could not use their full scope of text material to generate responses when limited to one database. However, Abosi believes that experts can train and re-train AI with specific local, state, or federal infection guidelines to increase response accuracy. She pointed out that researchers have obtained promising results when testing AI on its ability to determine transmission routes and identify infections.

The future of AI-assisted infection prevention
Abosi pointed to work from University of Pittsburgh professor Alexander Sunderman, MD, with research showing that an AI-supplemented detection system identified transmission routes in 65.7% of patient clusters, while traditional IP investigation yielded a transmission event for 3.8%. Former UI professor and hospital epidemiologist, Jorge L. Salinas, MD, also contributed to a publication in which an LLM powered by OpenAI’s GPT-4.0 demonstrated high sensitivity for identifying central line-associated bloodstream infections using real clinical notes.

Abosi, like other experts, does not propose that LLMs can or should replace IP specialists. Instead, she is interested in how AI could assist under-resourced IPCs or provide information that a specialist could review in real-time to strengthen IP interventions.

“AI won’t train itself, at least not now,” Abosi said. “I think the more people we have working in this field of AI and infection control, then the more data is out there that people can use to find different ways to leverage AI in their fields.”

Leave a Reply