July 7, 2023

Doctor AI Will See You Now

ChatGPT and other AI programs can offer medical advice. But how good are they?

By Tanya Lewis, Josh Fischman & Carin Leong

Orhan Turan/Getty Images

Illustration of a Bohr atom model spinning around the words Science Quickly with various science and medicine related icons around the text

SUBSCRIBE TO Science Quickly

Apple | Spotify | RSS

Tanya Lewis: Hi, and welcome to Your Health, Quickly, a Scientific American podcast series!

Josh Fischman: On this show, we highlight the latest vital health news, discoveries that affect your body and your mind.

Every episode, we dive into one topic. We discuss diseases, treatments, and some controversies.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Lewis: And we demystify the medical research in ways you can use to stay healthy.

I’m Tanya Lewis.

Fischman: I’m Josh Fischman.

Lewis: We’re Scientific American’s senior health editors.

Fischman: Today, we’re talking about how chat-based AI programs are being used to help diagnose medical problems. They’re surprisingly accurate. But they do come with a few pitfalls.

[Clip: Show theme music]

Lewis: Josh, have you ever used Google to try to diagnose a medical issue?

Josh: You mean like typing in “What are the causes of low back pain?” or “Is drug X the best treatment for glaucoma?” Yeah, that happens pretty much every other day, either for me or for someone in my family.

Lewis: Yeah, I have, too. And I know it’s a bad idea, because somehow every search that I do ends up suggesting that whatever I have is cancer. But now there’s a new way to get medical info online: generative AI chatbots.

Josh: Like ChatGPT?

Lewis: Yeah, like OpenAI’s ChatGPT and Microsoft’s Bing (which is based on the algorithm that powers ChatGPT). And others that are designed specifically to provide medical info, like Google’s Med-paLM.

They’re all based on large language models, or LLMs, which predict the next word in a sentence. They’re trained on huge amounts of data gleaned from all over the internet, and in some cases, info from medical exams and real doctor-patient interactions.

Josh: Do those things work better than our simple Internet searches?

Lewis: I wanted to know that, too. To find out more, I talked to Sara Reardon, a science journalist based in Bozeman, Montana, and regular SciAm contributor, who has been reporting on AI in medicine for us.

Reardon: Doctors have been concerned for a long time about people googling their symptoms. There's this term “Dr. Google,” which is really frustrating to a lot of physicians, because people come in and think that they know what they have without having the actual expertise or context, just by having looked up, “I have a headache. What does it mean?”

GPT software is much better at actually being accurate in determining what patients have and asking sometimes follow-up questions that will help it further hone in on the correct diagnosis.

Lewis: Companies are starting to study this. And preliminary research suggests the AIs are surprisingly accurate. Studies have shown that they work better than online symptom checkers—which are websites that let you input your symptoms and spit out a diagnosis. They’re also better than some untrained humans.

Reardon: In a study posted on the preprint server MedRxiv in February, which has not yet been peer reviewed, epidemiologist Andrew Beam of Harvard University and his colleagues wrote 48 prompts phrased as descriptions of his patients’ symptoms. When they fed these to OpenAI's GPT-3, which is the version of the algorithm that powered ChatGPT at the time, the top three potential diagnoses for each case included the correct one 88 percent of the time. Physicians by comparison could do this 96 percent of the time when given the same prompts, but people without medical training could do so only 54 percent of the time.

Fischman: Okay, so the AIs are good. But physicians were still better in that study. So I’d still rather go to a real one.

Lewis: Yeah, absolutely—these AI programs should not be used to diagnose a serious illness, and many of them say so. For that, you should definitely see a doctor.

But they’re probably a step up from just googling your symptoms. I tried telling ChatGPT about some characteristic stroke symptoms like numbness in my face and trouble speaking and walking. It came back with a list of likely causes, with stroke at the top, followed by transient ischemic attacks and Multiple Sclerosis.

To its credit, it also told me to seek immediate medical care. But I haven’t tried it with more complex or vague symptoms.

Now, some health care providers are already using these AIs to help communicate with patients.

Reardon: Some of the doctors that I spoke with are starting to play with it, helping them phrase things, helping them sort of condense their thoughts into what would be a short, concise text message. There's a lot of talk about hospitals that might start actually incorporating some of the software soon.

Lewis: These AI programs could help doctors deal with some of the administrative grunt work so that they have more time to actually spend with patients.

And the breakthrough isn’t just the AI itself. It’s the fact that you can ask it questions in plain English, rather than listing off a bunch of symptoms and having it calculate the statistical likelihood of some diagnosis.

But there are some dangers, too.

Reardon: It's not nearly as accurate as the doctor. And there's this known problem with GPT and some of these other similar AI programs where they will do what's called hallucinate and just come up with information on their own, just make stories up, come up with references that don't exist.

Lewis: And that’s not the only concern.

Reardon: There's this huge history in medicine of racism, classism, lots of other isms, and that's baked into a lot of medical literature—some of these assumptions about how Black people respond to pain medication, for instance, that have been completely dismissed as junk science nowadays, but still exist in a lot of the literature that ChatGPT and other programs are trained on. There's a huge problem with racism in medicine in general, but that sort of thing could just be amplified if it's drawing from a program rather than someone who's consciously thinking about these things.

Fischman: Hmm, so these AIs might actually replicate some of the existing human biases that are already in medicine.

Lewis: Exactly. And these programs also pose privacy concerns.

Reardon: The big tech companies, Google, OpenAI, others that are making these programs are, to varying extents, using some of the information that people put into it to help inform better versions of the algorithm in the future.

So there's a lot of concern about how that's going to be dealt with, from a regulatory standpoint, going forward making sure that these companies are protecting patient privacy.

Lewis: When Sara talked to OpenAI, Google and other companies, they said they are aware of all of these concerns and are trying to develop versions of their software that are more accurate and secure.

Fischman: Well, if tech companies do address these issues, is there a health specialty where AI could be particularly useful?

Lewis: Yeah, it’s actually becoming quite common in mental health, through the use of therapy apps.

Reardon: So mental health is actually one of the areas where people think that the software could be most helpful for several reasons. One of which is that a lot of the therapies are based on chat.

Lewis: This could help address the severe shortage of therapists in this country—if we can get it right.

Fischman: It sounds like, whether we like it or not, AI is going to be a big part of medicine.

Lewis: It is. But when it comes to our health, we need to ensure that these programs first do no harm.

[Clip: Show theme music]

Fischman: Your Health, Quickly is produced by Tulika Bose, Jeff DelViscio, Kelso Harper and Carin Leong. It’s edited by Elah Feder and Alexa Lim. Our music is composed by Dominic Smith.

Lewis: Our show is a part of Scientific American’s podcast Science, Quickly. You can subscribe wherever you get your podcasts. If you like the show, give us a rating or review!

And if you have ideas for topics we should cover, send us an email at Yourhealthquickly@sciam.com. That’s your health quickly at S-C-I-A-M dot com.

Fischman: For a daily dose of science, sign up for our new Today in Science newsletter. Our colleague Andrea Gawrylewski delivers some of the most interesting and awe-inspiring science news and opinion to your inbox each afternoon. We think you’ll enjoy it. Check it out at sciam.com/newsletters.

Lewis: Yeah, it’s a great read. I’m Tanya Lewis.

Fischman: I’m Josh Fischman.

Lewis: We’ll be back in two weeks. Thanks for listening!