What exactly did Google Health Coach hallucinate?

According to Android Authority (published 2026-05-14), Google's Health Coach told reviewer Will Sattelberg of 9to5Google that he had completed a 5-mile run he never actually took. The coach referenced the phantom run as if it were real, grounded data. When Sattelberg challenged the claim, the coach acknowledged the error but then suggested he might have simply forgotten to record the run, effectively blaming the user for the AI's fabrication.

Is this a Google-specific problem or a general AI problem?

It is a general AI problem that affects any fitness app using a large language model without tight grounding to the user's actual logged data. LLMs are trained to produce plausible-sounding outputs. When the prompt includes an instruction like 'summarize the user's recent activity', the model fills gaps in the context with statistically likely activity patterns rather than refusing to answer. Google's system is not uniquely flawed. It ran into the same architecture problem that any general-purpose AI coach faces when its retrieved context is incomplete.

Does Zealova hallucinate workout data?

Zealova uses LLMs and LLMs can hallucinate. The distinction is not immunity. It is data grounding. Zealova's workout plan generator and chat coach receive the user's actual logged workout history as structured context before any AI call. The model is asked to reason about real data, not infer it. That does not eliminate hallucination, but it removes the specific failure mode of fabricating activity: there is no gap to fill because the history is in the prompt. Zealova can still hallucinate exercise recommendations, form cues, or nutritional estimates, and those are genuine limitations to be aware of.

What is data grounding in AI fitness apps?

Grounding means attaching real, verified data to the context window before the model generates a response. In fitness, that means the AI reads your actual logged sets, reps, weights, and timestamps before answering a question about your training. An ungrounded model is asked to reason about a user it knows little about and fills the gap with plausible-sounding output. A grounded model is asked to reason about specific data. Research on RAG (retrieval-augmented generation) systems shows grounding reduces hallucinations by 42-68% compared to prompting the model with no retrieved context (neuledge.com, February 2026).

Why do large-scale health apps struggle with this more than smaller focused apps?

Large aggregators pull from many data sources: wearable steps, inferred calorie burn, connected third-party apps, manual logs, and health records. The more sources, the more opportunities for missing, conflicting, or low-quality data to enter the context. A focused app that only tracks what a user explicitly logs has a cleaner, more verifiable context to give the model. That does not make it better at everything, but it does make the grounding more reliable for the specific domain it covers.

Should I trust AI fitness app workout summaries?

Cross-check any AI summary against your raw logged data, especially for anything that sounds surprising. This applies to Google Health, Zealova, and any other AI fitness coach. If the coach references an activity you do not remember doing, check the source: did your phone step counter infer it, did a connected app log it, or did the AI fabricate it? The fastest check is to pull your actual workout log and compare. For critical decisions like injury recovery load management, always verify the numbers yourself.

Is Google Health a bad app because of this incident?

No. Google Health is genuinely strong on wearable biometrics, sleep tracking, Apple Health integration, MFP connectivity, and ecosystem breadth. The hallucination issue is a real limitation worth knowing about, not a reason to dismiss the app entirely. The honest recommendation: cross-check AI summaries against your actual data regardless of which app you use, and weight your trust in AI summaries by how explicitly the app shows you the source of each data point.

Published 2026-05-18 by Sai

Google Health Coach Invented a Workout. Here's Why AI Fitness Apps Do This.

On May 14, 2026, Will Sattelberg of 9to5Google published a hands-on review of Google Health Coach. The coach told him he had completed a 5-mile run that he never actually did. When he pushed back, the coach acknowledged the fabrication, then suggested he might have forgotten to record the run.

This is not a Google-specific bug. It is a predictable failure mode for any AI fitness product that asks a large language model to summarize a user's activity without first grounding that model in the user's actual logged data. The model fills gaps with statistically plausible output. A 5-mile run is plausible for an active person. So the model says it happened.

This post covers what happened, why it happens at a technical level, and what the difference between a grounded and an ungrounded AI fitness coach actually is. It also covers where Google Health is genuinely strong: wearable breadth, free tier and ecosystem integrations. The goal is not to sell Zealova. It is to give you a mental model for evaluating any AI fitness product's claims.

Zealova AI coach showing workout plan grounded in real logged history

What happened, exactly

Android Authority covered Sattelberg's review on the same day it published (May 14, 2026, two sources: androidauthority.com and the original 9to5Google hands-on). The full sequence:

Sattelberg opens Google Health Coach in the pre-launch period (before May 19 public launch).
The coach summarizes his recent activity. It correctly references sleep data and a real prior workout.
It then cites a 5-mile run Sattelberg never completed.
Sattelberg challenges the claim. The coach concedes the run was fabricated.
The coach then suggests Sattelberg might have "failed to record" the run, deflecting blame to the user.

The reviewer also noted the advice itself was "quite basic" and "excessively verbose," with length substituting for substance. Both are hallmarks of an LLM that lacks strong grounding and compensates with generic, high-confidence-sounding output.

This is a paid product. Google Health Premium costs $9.99/month, launching May 19. The hallucination surfaced in pre-launch access, just days before the full public rollout.

Key takeaways

5-mile run

fabricated by Google Health Coach in a pre-launch hands-on (May 14, 2026)

Source: Android Authority / 9to5Google, 2026-05-14

42-68%

hallucination reduction when LLMs are grounded with retrieved real data (RAG) vs ungrounded prompts

Source: neuledge.com, February 2026

2 billion

monthly users exposed to Google AI Overviews, which a January 2026 Guardian investigation found spread false health information

Source: Guardian / almcorp.com, January 2026

$9.99/mo

cost of Google Health Premium, the tier that includes the AI Coach that produced the hallucination

Source: store.google.com, verified 2026-05-14

Why AI fitness apps hallucinate workout data

Large language models do not know what you did yesterday. They predict the next token based on the patterns in their training data. When you ask a fitness coach chatbot "what have I been up to this week?", the model needs your workout data injected into the context window before it can answer accurately.

If that injection is missing, incomplete, or noisy, the model does not say "I don't know." It generates a plausible-sounding response. Active users run. Active users run about 3-7 miles. So the model produces a run that fits the pattern. This is not malice. It is how next-token prediction works when the grounding is weak.

Three conditions that make hallucination more likely:

Missing context. The model is asked to summarize activity but the user's actual log is not included in the prompt. The model infers activity from training data priors.
Noisy multi-source aggregation. Data comes from a wearable step counter, a third-party connected app, a manual entry, and a health record. These sources conflict or have gaps. The model smooths over the gaps with inference.
Optimizing for plausible output, not verifiable output. A model trained on RLHF (reinforcement learning from human feedback) learns that confident, coherent responses get better ratings. Generic, verbose advice scores well. Saying "I don't have enough data to answer" scores poorly. So the model says something rather than nothing.

All three conditions were likely present in the Google Health Coach incident. The coach correctly referenced real data in some places (sleep, a real workout), which means grounding was partial. The 5-mile run appears to have been the model filling a gap it could not verify.

The aggregator problem: more data sources, more gaps

Google Health is an aggregator. It pulls from your Fitbit, Apple Health, MyFitnessPal, Peloton, connected apps, manual logs, and health records. This breadth is one of its genuine strengths for biometric tracking. But it creates a specific challenge for AI summarization.

When 6 data sources feed into a context window, each with different schemas, update frequencies, and reliability levels, the resulting context is messy. Gaps appear between sources. A step count from the wearable might not align with a workout from a connected app. An inferred calorie burn from the phone accelerometer exists alongside a manual food log entry.

The model has to reason across all of this. When it cannot reconcile a gap, it fills it. A specialized app that only tracks what a user explicitly logs has a narrower but cleaner context. The model has less to hallucinate across.

Worth noting

This is not unique to Google Health. Any app that pulls from multiple sources (wearable, phone sensors, manual logs, connected apps) faces the same aggregation noise problem. The larger the ecosystem, the harder the grounding problem becomes. Smaller, more focused apps trade ecosystem breadth for cleaner context.

How grounding helps, and what Zealova does differently

Grounding means attaching verified data to the model's context before it generates a response. In fitness, that means: before the AI answers a question about your training, it reads your actual logged sets, reps, weights, exercises, and dates.

Research on retrieval-augmented generation (RAG) systems consistently shows grounding reduces hallucination rates significantly. A 2026 developer guide on LLM grounding (neuledge.com, published February 2026) cites 42-68% reduction in hallucinations when models are given retrieved real context vs no context. Clinical RAG systems have pushed hallucination rates to 5.8% using self-reflective verification layers on top of retrieval.

Zealova's workout plan generator and coach do this: before any AI call related to your training, the system pulls your actual logged workout history and injects it as structured context. The model is not asked to guess what you have been doing. It reads what you logged.

Zealova per-exercise workout history showing real logged sets and weights

What this does NOT mean

Zealova is not immune to hallucination. No LLM-based product is. Grounding removes the specific failure mode of fabricating logged activity. There is no gap to fill because your history is in the prompt. But Zealova can still:

Hallucinate exercise recommendations for equipment it was not told you own
Produce overconfident nutritional estimates from food photos
Generate generic advice when the logged data is thin (e.g. a new user with 2 sessions logged)
Make errors in form cues or exercise descriptions pulled from the exercise library

Cross-checking AI output against your raw data is good practice regardless of which app you use. If the coach references something that surprises you, check the source.

Where Google Health is genuinely strong

Writing about a competitor's failure without acknowledging what they do well is not useful analysis. Google Health has real strengths.

+
Wearable biometrics that no phone-only app can match
Continuous heart rate, HRV, SpO2, sleep stages, readiness scores. These require hardware. If you own a Fitbit or Pixel Watch, Google Health surfaces data that a phone-only app cannot approximate. That data is genuinely useful for recovery and sleep optimization.
+
Free tier with no hardware gate for basic logging
Google Health has a free option for basic activity and food tracking. Zealova requires a subscription after a 7-day trial. If someone needs a free calorie logger with wearable integration, Google Health is the obvious choice.
+
Ecosystem breadth: Apple Health, MFP, Peloton, medical records
Google Health connects to more third-party services than most competitors. If your health data is spread across multiple apps and devices, a single aggregator that pulls it together has real value, as long as you understand that aggregation is also what creates the hallucination conditions described above.
+
3-month free trial vs 7 days
Google Health Premium offers 3 months free for new users. That is a longer evaluation window than Zealova's 7-day trial. For a product that relies on wearable baseline data to be useful, a longer trial makes sense, since the coaching improves as the model accumulates real data.
+
iOS support right now
Zealova is Android only. Google Health is live on iOS and Android. If you are on iPhone, Google Health is an option; Zealova is not yet.

FAQ

Methodology and disclosure

The Google Health Coach hallucination incident is sourced from Android Authority (androidauthority.com, published 2026-05-14) and the original hands-on by Will Sattelberg of 9to5Google. RAG hallucination reduction figures are from neuledge.com (LLM grounding guide, published 2026-02-20). The Guardian AI Overviews investigation (January 2026) is cited via almcorp.com summary. Clinical RAG figures (5.8% hallucination) are from MDPI Electronics 14(21):4227, "Evaluating Retrieval-Augmented Generation Variants for Clinical Decision Support." Google Health pricing verified at store.google.com, 2026-05-14. I am the founder of Zealova. I have a direct financial interest in Zealova being the better product. I have tried to concede every honest Google Health advantage. If something looks wrong, email me: sai@zealova.com.

Try Zealova free for 7 days

AI workout plans grounded in what you actually logged. No hardware required. Android live now.

Download on Android Full Zealova vs Google Health comparison

Last updated 2026-05-18 by Sai. Sources: Android Authority (androidauthority.com, 2026-05-14); 9to5Google original hands-on (2026-05-14); neuledge.com LLM grounding guide (2026-02-20); MDPI Electronics 14(21):4227; Guardian AI Overviews investigation summary (almcorp.com, January 2026); store.google.com pricing (verified 2026-05-14). Zealova pricing and features as of 2026-05-18. Refresh cycle: 60 days or if new hallucination incidents are reported.