I Asked Gemini Spark for a Doc and It Cited a Product That Doesn't Exist

Google announced Gemini Spark at I/O 2026, and the pitch is genuinely different from the Gemini chatbot most people have used. Spark is agentic and always on. It runs 24/7 on Google Cloud VMs, takes proactive multi-step action instead of answering one prompt at a time, and plugs natively into Workspace, so it can read your Drive and write to Docs, Sheets, and Slides on its own.

It is built on Gemini 3.5 Flash plus Antigravity, and right now it is rolling out to trusted testers, then Google AI Ultra subscribers (18+, US) and select business users.

I gave it a simple, realistic job: generate a useful Google Doc from a plain-language request. Here is how it went, including the part where it confidently invented a product that does not exist.

The task

I asked Spark to produce a practical guide on reducing LLM API costs without sacrificing output quality. Concrete title, a numbered section outline, and a "save it to my Drive" instruction. Nothing exotic, the kind of thing you would actually delegate.

The first thing worth saying: the workflow promise is real. I gave it the task, closed my laptop, and the finished Doc was sitting in my Drive later. Spark runs fully server-side, so it does not need your device awake to keep working. For background, multi-step tasks, that is a meaningful shift from a chatbot you have to babysit.

The second thing worth saying: the output needed serious correction.

Where it went wrong

It drifted off-topic. I asked for a cost-saving guide. It produced a general prompt-engineering guide, with the one section on actual cost optimization missing entirely. The model latched onto an adjacent, more common topic and wandered there. The formatting was clean and authoritative, which made the drift easy to miss on a quick skim.

It hallucinated a fact, with total confidence. The doc referenced a model called "Claude Mythos 5." That model does not exist. It was not a typo or a stale name; it was invented out of thin air and presented exactly as credibly as the real facts around it. This is the failure mode that should scare you, because polish does not equal accuracy, and Spark's output is very polished.

The fix: prompt like you don't trust it

The recovery was not a smarter model, it was a stricter prompt. Two changes did most of the work.

1. Pin the topic hard and hand it the full outline. Vague requests give an agentic model room to wander toward whatever is most common in its training data. A concrete title plus a numbered section list leaves far less room to drift:

Create a Google Doc titled "LLM Cost Optimization — Best Practices".

Goal: a practical guide on reducing API/usage costs WITHOUT sacrificing
output quality, latency, or efficiency.

Structure the doc with these sections, using headings, short intros, and bullets:
1. Executive summary (3–4 bullets)
2. Choosing the right model (when the cheaper model is "good enough")
3. Prompt caching (how it works, what to cache, expected savings)
4. Batch processing (discounts, when to use it)
5. Token reduction (concise prompts, trimming context, output limits)
6. Architectural tactics (cascading cheap→expensive models, RAG vs long context)
7. Monitoring & cost controls (usage tracking, budgets, rate limits)
8. Quick-reference checklist
9. Sources (link every doc/pricing page you cite)

Keep it skimmable and actionable. Save it to my Drive.

2. Add an explicit anti-hallucination guardrail. Spark can browse, so make it verify instead of recall:

Before writing, check the current official documentation and pricing so all
product names, prices, and features are accurate as of today. Do not invent
names, prices, or features. If you are unsure, say so rather than guessing.

3. Revise in the same conversation. When I corrected the doc with a follow-up prompt in the same Spark thread, it edited the existing document rather than spawning a new one. That keeps version history tidy and is cheaper than regenerating from scratch. Treat the first output as a draft and iterate in place.

Gotchas that will cost you ten minutes each

A few operational notes from the rollout that are easy to trip over:

App connections default to OFF. Spark cannot touch Drive or Docs until you enable those connections in settings. If it "can't write the doc," check this first.
"Thinking it through…" is normal. That is Spark planning a multi-step task, not hanging.
Generic "Something went wrong" errors are common. It is an early rollout. My fix order: retry, then reload and re-auth, then try a minimal test doc, then check the Drive/Docs connection, then verify AI Ultra access, then check the Workspace status dashboard. Live browsing is the most likely timeout culprit, so simplifying the prompt often helps.
Access is gated. Without AI Ultra (or trusted-tester/business access), Spark will not appear at all. You can get most of the same doc-generation value today through Gemini's "Help me write" panel inside Google Docs.

The honest verdict

Gemini Spark is a real preview of where assistants are heading: server-side, proactive, wired into the tools where your work actually lives. The "set a task, close the laptop, find it done" loop is genuinely useful.

But it is a preview, and it behaves like one. It drifts toward generic topics, it hallucinates with a straight face, and the output looks trustworthy whether or not it is. The practical takeaway is not "avoid it," it is delegate to it like you would to a fast, capable, slightly overconfident intern: give precise instructions, demand verification, and always fact-check the result yourself. Do that, and it saves you real time. Skip it, and it will hand you a beautifully formatted document about a model that was never released.

Have you tried Spark, or another always-on agentic assistant? I'm especially curious whether the drift and hallucination patterns show up the same way on other tasks, or whether doc generation is just a particularly easy place to wander.

Command Palette

The task

Where it went wrong

The fix: prompt like you don't trust it

Gotchas that will cost you ten minutes each

The honest verdict

Comments