AI Chatbots Enter Patient Portals: A New Frontier in Healthcare with Promise and Peril

As a growing number of Americans turn to large language models (LLMs) for health advice, healthcare systems across the nation are strategically integrating their own branded AI chatbots into patient portals, aiming to harness this popular digital tool while guiding individuals toward their established services. This burgeoning trend, however, immediately raises critical questions and concerns within the United States’ complex and often underperforming healthcare landscape. Executives champion these new offerings as a crucial convenience, meeting patients where they are and promoting digital equity, while also positing their proprietary chatbots as a safer alternative to the myriad commercial versions already in widespread use.
"We are at an inflection point in healthcare," stated Allon Bloch, CEO of clinical AI company K Health, underscoring the accelerating demand for digital health solutions. "Demand is accelerating, and patients are already using AI to navigate their lives." K Health is actively collaborating with Hartford HealthCare in Connecticut to deploy its PatientGPT chatbot to tens of thousands of existing patients, marking a significant step in this digital transformation. Bloch further elaborated on this vision, asserting, "The question isn’t whether AI will shape healthcare, it’s about how we do it in a safe, transparent way, inside a health system that connects to your medical records and your care team. PatientGPT represents that turning point."
Despite the industry’s enthusiasm, a chorus of experts expresses caution regarding these rapid rollouts. Fundamental questions persist: are these chatbots truly ready for such prominent branded debuts? Will there be sufficient, continuous monitoring of their performance? What will the framework for liability look like if a chatbot provides inaccurate or harmful advice? Moreover, critics ponder whether these AI tools genuinely address the root causes of patient access and care problems, or merely offer a technological veneer over deeper systemic issues. Adam Rodman, a clinical reasoning researcher and internist at Beth Israel Deaconess Medical Center in Boston, captured this sentiment, telling Stat News recently, "It’s a tempting idea," but stressing the current lack of empirical evidence demonstrating that integrating chatbots into health systems tangibly improves patient outcomes. "We’re not there yet," he concluded.
The Troubled Landscape of U.S. Healthcare: A Catalyst for AI Adoption
To fully appreciate the potential and pitfalls of AI in this context, it is imperative to understand the broader backdrop of the U.S. healthcare system. Despite being one of the wealthiest nations globally, the United States consistently lags behind other high-income countries in key health metrics. Americans face lower life expectancy, a higher incidence of avoidable deaths, elevated rates of maternal and infant mortality, and a greater prevalence of obesity and chronic conditions. Access to care remains a significant challenge, with a 2023 report by the National Association of Community Health Centers (NACHC) revealing that nearly one-third of Americans—over 100 million individuals—lack a primary care provider. This deficit contributes to worse overall health outcomes and creates a fertile ground for alternative, often unverified, sources of health information.
Into this complex mix, artificial intelligence has arrived, offering a seemingly accessible and immediate solution. Anyone with an internet connection can engage with comforting, confident-sounding LLM-powered chatbots, and a substantial segment of the American populace is navigating towards these tools to address health and medical inquiries. A recent poll conducted by KFF (Kaiser Family Foundation) last month illuminated this trend, finding that a striking one in three adults has utilized an AI chatbot for health information, a figure comparable to those using social media for similar purposes.
The KFF poll further elucidated the motivations and behaviors of these AI users. A significant 41 percent reported uploading personal medical information, such as test results, to these AI tools. When asked about their primary reasons for consulting AI, a notable 19 percent cited an inability to afford care, while 18 percent pointed to the absence of a regular healthcare provider or difficulties in securing timely appointments. For a majority, 65 percent, the simple desire for a quick answer was the driving force. Alarmingly, many respondents indicated they did not subsequently consult a doctor after their AI interactions; this included 58 percent who sought advice on mental health issues and 42 percent who inquired about physical health concerns. This trend raises profound questions about patient safety and the potential for delayed or missed diagnoses.
A Chronology of Emerging Risks and Cautious Rollouts
The rapid adoption of AI to bridge healthcare gaps has, predictably, given rise to mounting cautionary tales and alarming incidents. These examples underscore critical pitfalls related to both the quality of user prompts and the information LLMs "hoover" up from the vast, often unfiltered, expanse of the internet.

A seminal study published in Nature Medicine in February sought to assess the medical accuracy of leading LLMs (specifically GPT-4o, Llama 3, and Command R+) in real-world interactions involving nearly 1,300 participants. When researchers provided the LLMs with precisely formulated text describing specific medical scenarios, the models impressively identified the correct medical condition approximately 95 percent of the time and accurately suggested appropriate next steps—such as visiting an emergency department—in about 56 percent of cases. However, a stark decline in performance was observed when participants used their own, less structured prompts to inquire about the same scenarios. Under these conditions, the LLMs correctly identified medical conditions only about one-third of the time and steered participants to the appropriate next step in just 43 percent of instances.
Lead author Andrew Bean, an AI researcher at Oxford University, highlighted the study’s core finding to NPR last month: "people don’t know what they are supposed to be telling the model." Senior author Adam Mahdi echoed this concern, adding, "The disconnect between benchmark scores and real-world performance should be a wake-up call for AI developers and regulators." This research starkly illustrates that the efficacy of AI chatbots is heavily dependent on the user’s ability to articulate their symptoms and questions accurately, a skill often lacking in individuals seeking medical advice.
Further compounding concerns about information integrity, Nature News reported just last week that LLMs were actively discussing "bixonimania," a fabricated skin condition invented entirely by researchers in Sweden. The team had intentionally posted two fake studies about this condition online to gauge how easily medical misinformation could infiltrate and be propagated by AI tools. The outcome confirmed their fears: the misinformation was integrated far too easily. The studies have since been removed, but the incident serves as a chilling demonstration of LLMs’ vulnerability to ingesting and disseminating unverified, or even deliberately false, medical information.
Despite these significant and clear concerns, several healthcare systems are pressing ahead with their AI chatbot initiatives. Hartford HealthCare and K Health’s PatientGPT, for instance, commenced its beta rollout to a select group of patients last month and is slated for expansion to tens of thousands more this week, according to Stat News. Hartford HealthCare published a pre-print study (which has not yet undergone peer review) involving 75 participants, suggesting that its iterative stress testing—a "red teaming" approach—successfully improved the chatbot’s failure rate over time, particularly in "high-risk" scenarios. The study claimed a reduction in the failure rate for high-risk situations from 30 percent to 8.5 percent. However, the real-world implications of this 8.5 percent failure rate, and the severity of those potential failures, remain ambiguous and a point of considerable debate.
PatientGPT operates in two distinct modes. The first is a generic medical question-and-answer mode, which may incorporate relevant information about the patient from their records. The second, more structured, is a "medical intake" mode. In this mode, once a patient begins providing symptom information, the chatbot becomes less conversational and systematically navigates through clinical flowcharts. After gathering sufficient information, the AI agent proposes a next step, which could range from scheduling a follow-up appointment with primary care to recommending urgent or emergency care. Critically, if emergency care is advised, the chatbot ceases to respond to further questions, presumably to prevent further interaction that might delay critical care.
Hartford HealthCare has stated its commitment to continuous monitoring of PatientGPT’s performance during its broader deployment. During the piloting phase, every interaction was reportedly monitored by humans. However, with the expanded rollout, human review will scale down to approximately 20 interactions per day, with a separate AI agent overseeing the remaining interactions. Additionally, batch studies of every 1,000 conversations will be conducted. Jeff Flaks, president and CEO of Hartford HealthCare, reiterated the organization’s strategic imperative last month: "We’re on a mission to be the most consumer-centric health system in the country. So much of healthcare has traditionally been organized around the provider, but it’s clear we have to meet people where they are and where they desire to be met. With PatientGPT, we are introducing a new tool that supports your health and provides access to a 24/7 care team, while protecting the human relationships at the heart of care."
A More Cautious Approach: Epic’s Emmie
Beyond K Health’s PatientGPT, Epic Systems, the behemoth behind the widely used MyChart electronic health records (EHR) platform, is also rolling out its own AI chat assistant named Emmie. Several health systems, including California-based Sutter Health and Indiana-based Reid Health, are progressively introducing Emmie to users via their online patient portals.
During an executive address last year, Judy Faulkner, Epic’s founder and CEO, outlined Emmie’s intended functionalities. As reported by Becker’s Hospital Review, Emmie is designed to assist patients in preparing for appointments by drafting visit agendas and, post-appointment, to help them comprehend test results and answer follow-up questions by summarizing information already present in their medical charts.
Sutter Health’s FAQ page on Emmie explicitly details the chatbot’s capabilities: it can "answer general health questions, and find or summarize information already visible in your chart—such as notes, results, past visits or messages." Crucially, Sutter Health emphasizes strict limitations: Emmie "doesn’t give personalized medical advice or make care decisions. Emmie is not intended for use in the diagnosis of disease or other conditions, or in the cure, mitigation, treatment or prevention of disease. Emmie is also not intended to replace, modify or be substituted for a physician’s professional clinical judgment." This clear demarcation underscores a more conservative approach, positioning Emmie as a sophisticated information assistant rather than a diagnostic or treatment tool. Currently, Emmie is offered only to a small subset of Sutter patients, who can provide feedback on the chatbot’s responses through simple thumbs-up or thumbs-down reactions, facilitating continuous refinement.

Reid Health has followed Sutter Health as the second adopter of Emmie. In a recent interview with Becker’s Hospital Review, Muhammad Siddiqui, CIO at Reid Health, explained the strategic rationale. Given that Reid Health primarily serves rural communities, the organization views Emmie as a valuable tool to broaden access to care and assist patients in navigating the often-intimidating healthcare system. "Patients want clearer answers, easier access and more guidance between visits," Siddiqui remarked. "If we can provide that inside the health system experience, in a way that is connected to trusted clinical workflows, that is a much better path than leaving people on their own with public tools that may or may not be accurate." This perspective highlights the appeal of integrating AI within a controlled, institutionally vetted environment, contrasting it with the unregulated and potentially hazardous realm of public AI models.
Broader Implications, Ethical Challenges, and the Path Forward
The advent of AI chatbots in patient portals presents a multifaceted challenge, transcending mere technological implementation to touch upon profound ethical, legal, and societal implications.
Accuracy and Patient Safety: The most immediate concern remains the accuracy of the advice provided. Even a small percentage of "high-risk failures," as cited in the PatientGPT pre-print, can have severe consequences in a medical context. A misdiagnosis or inappropriate recommendation could lead to delayed treatment, worsening conditions, or even fatalities. The Nature Medicine study starkly revealed that ordinary users struggle to prompt LLMs effectively for accurate medical advice, highlighting a critical interface problem that health systems must address.
Liability and Regulation: The question of liability in cases of erroneous AI-generated medical advice is largely uncharted territory. Existing medical malpractice laws are designed for human providers, not algorithms. If a chatbot, endorsed by a health system, provides harmful advice, who bears the responsibility? The AI developer? The health system that deployed it? The individual clinician overseeing the patient’s care? Regulatory bodies like the U.S. Food and Drug Administration (FDA) are still developing frameworks for AI as a medical device, but the rapid pace of development often outstrips regulatory capacity. Clear legal guidelines are desperately needed to protect both patients and healthcare providers.
Data Privacy and Security: The KFF poll revealed that a significant portion of users upload personal medical information to AI chatbots. While branded chatbots within patient portals are ostensibly more secure and HIPAA-compliant, the handling, storage, and processing of sensitive patient data by AI models raise new privacy considerations. Robust cybersecurity measures and transparent data governance policies are paramount to maintaining patient trust.
The Human Element and Depersonalization of Care: While AI promises efficiency, there is a risk of eroding the vital human connection in healthcare. Complex medical decisions, emotional support, and the nuanced understanding of a patient’s social and psychological context often require human empathy and judgment that AI cannot replicate. For sensitive areas like mental health, where many AI users forgo professional follow-up, this depersonalization could be particularly detrimental.
Digital Equity vs. Exacerbated Disparities: While proponents argue that chatbots promote digital equity by offering 24/7 access, they could also inadvertently widen existing disparities. Patients lacking reliable internet access, digital literacy, or comfort with technology may be left behind. The promise of "meeting people where they are" must be tempered with efforts to ensure equitable access and usability for all patient demographics.
Official Responses and Stakeholder Perspectives: Medical associations, such as the American Medical Association (AMA), have generally urged caution regarding AI in healthcare, emphasizing the necessity of human oversight, rigorous validation, and adherence to ethical principles. Patient advocacy groups will likely focus on informed consent, data protection, and ensuring that AI tools augment, rather than replace, genuine access to human clinicians. AI developers, while acknowledging risks, continue to stress the potential for efficiency and improved access, often highlighting their internal testing and monitoring protocols as a safeguard.
The Future Landscape: The integration of AI chatbots into patient portals represents an undeniable shift in healthcare delivery. These tools have the potential to streamline administrative tasks, provide instant information, and guide patients to appropriate care levels. However, for these innovations to truly benefit patients and strengthen the healthcare system, rather than create new vulnerabilities, they must be built on a foundation of rigorous scientific validation, robust regulatory oversight, transparent accountability, and an unwavering commitment to patient safety and ethical practice. The "inflection point" that healthcare currently faces demands not just technological advancement, but also profound ethical deliberation and proactive policy-making to ensure that AI truly serves the best interests of patients. Without robust evidence demonstrating improved patient outcomes and a clear framework for managing risks, the promise of AI in healthcare remains largely hypothetical, underscoring Adam Rodman’s cautionary assessment: "We’re not there yet."







