Why generative AI creates new quality challenges
Traditional AI trained on structured data often produced outputs that were binary: right or wrong. In generative AI, the boundaries blur. An LLM might summarize a document but omit a key fact, misattribute a quote, or confidently reference a study that doesn’t exist.
Real-world incidents highlight the stakes:
- Routine unreliability: According to Vice, citing anonymous FDA employees interviewed by CNN, the AI assistant “Elsa” is fine for note-taking but “entirely unreliable for anything of actual importance,” with one staffer noting, “Anything that you don’t have time to double-check is unreliable. It hallucinates confidently.”

- Medical missteps: The Verge described a case where a Google AI model combined two distinct anatomical terms—“basal ganglia” and “basilar artery”—into the nonexistent “basilar ganglia.” In a clinical setting, such an error could lead to dangerous treatment decisions.
- Healthcare bias: In research reported by Renal and Urology News, high-income patients were more likely to be offered advanced imaging, while lower-income patients received fewer diagnostic options. Dr. Klang noted these disparities were frequent, consistent, and “not explained by legitimate clinical reasoning.”
For enterprises in healthcare, legal, or finance, these examples show why factuality isn’t just a nice-to-have—it’s central to trust, compliance, and brand reputation.
How human annotators create ground truth
Sigma’s Truth workflows combine human expertise with structured evaluation. Annotators compare AI outputs against verified sources — news archives, academic literature, or proprietary databases — scoring factual alignment, flagging omissions, and rewriting inaccurate segments.
This human-in-the-loop process is critical in high-risk domains. The legal sector has already seen damage from unverified AI output:
- Reuters reported that federal judges in Mississippi and New Jersey withdrew rulings after discovering factual inaccuracies and invented allegations in AI-generated text.

- LegalDive documented multiple cases of fake legal citations, beginning with a 2023 New York case where an attorney submitted nonexistent precedents, followed by similar missteps from other lawyers, including former Donald Trump attorney Michael Cohen.
To minimize such risks, Sigma often employs iterative review cycles, where one annotation team checks the AI’s work and another independently verifies it. In a medical context, for example, annotators have confirmed symptom descriptions against Mayo Clinic guidelines to block fabricated advice. This multi-pass approach routinely achieves inter-annotator agreement scores above 0.85 — far beyond what most crowdsourced teams deliver.
The business impact of truth workflows
Integrating truth validation into AI development prevents costly rework and reputational harm. By reducing hallucinations early, companies adopting Sigma’s approach accelerate product launches and maintain defensible, source-linked output — an advantage as regulators increasingly demand transparency in AI decisions.
The lesson from “Elsa,” Google’s “basilar ganglia” slip, biased medical recommendations, and fabricated legal citations is clear: raw LLM outputs cannot be trusted blindly. Without a factuality framework, even the most advanced AI can undermine trust in seconds.
High-quality human data annotation is more than labeling — it’s a strategic safeguard. By combining expert judgment, rigorous source validation, and structured review cycles, enterprises can teach AI to distinguish what merely sounds plausible from what is actually true.
Building truth into gen AI is a multi-faceted challenge that includes managing bias and defining a clear strategic roadmap. Read our guide on Preventing AI bias: How to ensure fairness in data annotation, as bias often leads to skewed or untrue outputs. For a strategic plan to scale these high-quality, trustworthy models from pilot to production, explore the strategies in Accelerating the new AI.
Talk to Sigma experts to learn how to build bias-resistant, trustworthy AI systems.