ChatGPT’s Health AI diagnosis is unreliable and flawed: Study

March 8, 2026: From searching on Google about disease symptoms to turning to artificial intelligence chatbots for medical emergencies has become quite common. Millions are using AI chatbots in the middle of the night for medical remedies for chest tightness, a child’s fever, or thoughts of self-harm. The chatbot has answers to your queries, but be aware that they are not equipped to answer such medical emergencies.

According to an independent study, ChatGPT Health, OpenAI’s dedicated health assistant launched in January 2026, does not do its job accurately or adequately. It confirms an earlier Oxford study that Ai performed no better or worse than internet searches or traditional remedies or judgments.

The study was conducted by researchers at the Icahn School of Medicine at Mount Sinai and Published in the peer-reviewed journal Nature Medicine in February. It is the first independent safety evaluation of ChatGPT Health since the tool’s public launch. The research team designed 60 structured clinical scenarios spanning 21 medical specialties, ranging from minor ailments manageable at home to life-threatening emergencies. Three independent physicians determined the correct urgency level for each scenario using guidelines from 56 medical societies. Each scenario was then tested across 16 different contextual conditions, including variations in race, gender, and social dynamics.

More than half of emergencies missed

ChatGPT Health failed to direct users toward emergency care in 52% of cases that physicians independently classified as genuine medical emergencies. In conditions such as diabetic ketoacidosis and impending respiratory failure, the AI recommended a 24-to-48-hour evaluation window rather than an immediate visit to the emergency department — a dangerous delay that could cost lives.

Lead author Dr. Ashwin Ramaswamy, an instructor of urology at Mount Sinai, explained the pattern: “ChatGPT Health performed well in textbook emergencies such as stroke or severe allergic reactions. But it struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment matters most.”

At the other end of the spectrum, the AI over-triaged nearly 65% of non-urgent cases, recommending an unnecessary clinician visit when home care would have been sufficient. The pattern revealed what researchers called an “inverted U-shape” of performance: competent in the middle, dangerously unreliable at the extremes.

Troubling blind spot: Suicide risk alerts

Perhaps the most alarming discovery concerned the tool’s mental health safeguards. ChatGPT Health is designed to direct high-risk users to the 988 Suicide and Crisis Lifeline. But researchers found the system’s alerts were “inverted relative to clinical risk”. It means the bot performed more reliably in lower-risk situations and sometimes failed to trigger when users described a specific plan of self-harm.

Girish Nadkarni, Mount Sinai’s chief AI officer and a study co-author, called this finding “particularly surprising and concerning.” He noted that in clinical practice, specificity of method is a recognized indicator of elevated immediate danger, exactly the scenario where ChatGPT Health’s alerts were less likely to fire. This concern carries special weight: OpenAI itself has disclosed that more than one million ChatGPT users each week send messages with explicit indicators of suicidal planning or intent.

A broader pattern: AI and medical misinformation

The Mount Sinai findings do not stand alone. A separate study published in The Lancet Digital Health in early 2026 by researchers at the same institution analysed over one million prompts across 20 leading AI models, including ChatGPT, Meta’s Llama, and Google’s Gemma. It found that, on average, AI systems accepted fabricated medical claims roughly 32% of the time when those claims were phrased in authoritative medical language. Smaller or less advanced models accepted false claims more than 60% of the time.

Meanwhile, Oxford University researchers published a randomised trial involving nearly 1,300 participants in early 2026, concluding that people who used AI chatbots to identify health conditions performed no better and often worse than those relying on traditional methods such as online searches or their own judgment. Dr. Rebecca Payne of Oxford’s Nuffield Department of Primary Care Health Sciences was direct: “Despite all the hype, AI just isn’t ready to take on the role of the physician.”

The risks are amplified by scale and context. OpenAI has reported that approximately 40 million people use ChatGPT daily for health-related questions. Critically, 70% of those conversations take place outside normal clinic hours, and more than 580,000 weekly health inquiries in the United States originate from so-called “hospital deserts”, areas more than a 30-minute drive from the nearest hospital. For people in these communities, ChatGPT Health may be their only accessible first point of contact for urgent guidance.

Isaac S. Kohane, MD, PhD, Chair of Biomedical Informatics at Harvard Medical School, summarised the challenge plainly: “LLMs have become patients’ first stop for medical advice, but in 2026, they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm. When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high. Independent evaluation should be routine, not optional.”

OpenAI responds

OpenAI acknowledged the research but pushed back on its methodology. A company spokesperson argued that the study did not reflect how ChatGPT Health is typically used, noting it is designed for multi-turn conversations with follow-up questions, not one-shot prompts. The company added that ChatGPT Health is still available only to a limited number of users and that safety improvements are ongoing. OpenAI maintains that the product is “not intended to diagnose or treat disease.”

Experts are consistent in their guidance: AI chatbots are not a substitute for medical professionals. If you are experiencing symptoms that concern you, especially symptoms that come on suddenly or intensely, contact a licensed healthcare provider, call emergency services, or go to an emergency department. If you or someone you know is experiencing a mental health crisis or suicidal thoughts, contact suicide and crisis lifelines.

The nonprofit patient safety organization ECRI has ranked misuse of AI chatbots in healthcare as the single greatest health technology hazard for 2026 — above issues like device malfunctions and medication errors. As AI tools become more woven into daily life, independent research, regulatory oversight, and informed public awareness remain the most important safeguards.

About Us

Contact Info

ChatGPT’s Health AI diagnosis is unreliable and flawed: Study

More than half of emergencies missed

Troubling blind spot: Suicide risk alerts

A broader pattern: AI and medical misinformation

OpenAI responds

Tags:

About Us

Contact Info

Follow Us

ChatGPT’s Health AI diagnosis is unreliable and flawed: Study

More than half of emergencies missed

Troubling blind spot: Suicide risk alerts

A broader pattern: AI and medical misinformation

OpenAI responds

Tags:

Share This Post:

Related Post