School of Computer Science › Language Technologies Institute › News and Events › news › Where Does It Hurt?

Illustrated image of a hand holding a smartphone in front of a blue background, with the image of a physician on the phone screen

May 28, 2025

Where Does It Hurt?

For LLMs To Help Patients and Physicians, AI Needs To Ask Better Questions

By Marylee Williams

Media Inquiries

Bryan Burtner

Language Technologies Institute
bburtner(through)andrew.cmu.edu

Chatbots and other artificial intelligence tools could one day be the first resource people turn to when seeking medical advice or deciding whether to make an appointment with their physician. But before AI can help people determine what's wrong with them, it needs to ask better questions — a critical skill physicians have that helps them gather relevant information for making decisions.

Researchers at Carnegie Mellon University's School of Computer Science (SCS) joined forces with colleagues at the University of Washington, the Geisel School of Medicine at Dartmouth College and Allen Institute for AI (Ai2) to develop a method to improve LLMs' question-asking skills, breaking down the components of a good question and helping models learn how to ask better questions when gathering information.

"LLMs are trained to have some medical knowledge, but these tools aren't evaluated on patient-centered or personalized interactions and whether they have the appropriate level of reasoning to answer or interactively aid with questions about health or medical issues, which is how people are using these tools," said Jimin Mun, a doctoral student in CMU's Language Technologies Institute (LTI).

Researchers note that improving question-asking for LLMs in clinical settings could enhance the patient experience. These tools wouldn't replace physicians, but could improve triaging when scheduling an appointment and give the physician more time with the patient.

The researchers developed a method to improve LLM question-asking called ALFA — Alignment Via Fine-Grained Attributes — which contains three components: decompose, synthesize and align. LLMs that used ALFA showed improved question quality and diagnostic accuracy when compared to other baseline LLMs.

Within the ALFA model, the decomposition component identifies aspects of a good question — for this research, in a clinical setting. In their paper, "Aligning LLMs to Ask Good Questions a Case Study in Clinical Reasoning," the authors note that a good question is clear, targeted and answerable. In a clinical setting, medical accuracy, diagnostic relevance and mitigating differential diagnosis biases are necessary when asking good questions. Researchers turned to medical communications and psychology to inform this aspect of the model.

For the synthesize component, the researchers created a new dataset, MediQ-AskDocs, that contains 17,000 questions and follow-ups from the r/AskDocs subreddit. Researchers took this data and then generated enhanced and corrupted variations of the original questions. For example, if the original question was "Do you have a family history of breast cancer?" the enhanced question would be "Has your mother, sister or daughter been diagnosed with breast cancer?" The corrupted question would be much less specific, such as "Has anyone in your family been sick?" Creating these synthetic variations allowed researchers to alter one aspect of a question, such as whether it was clear or ambiguous, while keeping other aspects consistent.

"The broad thing is that you can't just tell an LLM to make the question better generally. That's not as informative and it's also not going to lead to a good model," said Maarten Sap, an assistant professor in the LTI. "You have to decompose questions into these various attributes and teach the model the factors that matter and give it examples of good and bad versions along the specific dimension."

To ensure the LLMs asked good questions when juggling multiple objectives, the researchers integrated the training data and aligned the model. The synthesized data focused on specific aspects of a good question, such as clarity or answerability. They combined this data so the ALFA method better aligned with all necessary aspects of a good question in a clinical setting.

LLMs using the ALFA model had an almost 57 percent reduction in diagnostic errors, compared with the baseline LLMs. While researchers focused on medicine in this study, ALFA can be adapted to any field where clear, targeted question-asking is essential to creating better interactions.

The research team included Sap and Mun from the LTI; Jonathan Ilgen, Yulia Tsvetkov and Shuyue Stella Li from the University of Washington; Faeze Brahman from Ai2; and Bryceton Thomas, Jessica Sin and Bing Ren from the Geisel School of Medicine at Dartmouth College. Pedram Hosseini from Lavita AI also contributed to this work.