Building trustworthy AI: How Sigma ensures reliable AI outputs

A global leader in AI development needed to ensure their models produced reliable and trustworthy content. To achieve this, they partnered with Sigma for rigorous human oversight and evaluation. By leveraging our pre-existing workflows and deep expertise, we were able to quickly assemble a skilled team to research, validate, and rewrite AI-generated responses. Our approach reduced the annotator’s learning curve by up to 65%, accelerating the client’s timeline for obtaining high-quality data.

3

specialized internal AI evaluation projects to build and refine methodologies.

6

customized skill tests to source and evaluate the most capable annotators.

65%

reduction in annotator’s learning curve to ensure our team delivered quality from day one.

Challenge

  • Sourcing and training a team of annotators with the specialized skills needed to critically fact-check AI outputs using external sources.
  • Identifying annotators with strong analytical skills who could accurately judge if AI answers were fully supported by facts.
  • Ensuring annotators could rewrite information in multiple creative ways and had cultural context to interpret and validate the nuances of AI-generated answers.

Solution

  • We quickly deployed a highly trained team of annotators with prior experience on similar projects, ready to start immediately.
  • We sourced and selected annotators with the right skills for the project, using six customized skill tests developed in-house, to evaluate skills such as language logic, researching, and rephrasing.
  • We implemented our factual rewriting workflow to improve the team’s ability to identify and use reliable sources.
  • Our previously designed use case guidelines and internal workflows helped reduce ambiguity and increase consistency.

Project story

Our client, a global leader in AI development, faced a major challenge: ensuring their AI models delivered answers that were not just accurate but also factually supported and culturally reliable. 

The outputs of generative AI and Large Language Models are complex and often subjective, open to multiple interpretations, and deeply influenced by cultural nuances. Addressing these challenges requires expert human-in-the-loop validation, coordinated by a partner who can source and scale quickly.

Sigma was uniquely equipped to find and train skilled annotators and had proven processes in place to ensure every output was factual and backed by credible sources.

Sigma’s proactive approach to annotation

At Sigma, we constantly strengthen our capabilities and expertise to deliver high-quality data with both speed and precision. Instead of building from scratch, we leverage our deep expertise and lessons learned from our in-house initiatives to provide rapid solutions to our clients’ most challenging requests.

In this case, we were able to jump-start the project as a result of our internal efforts:

  • We had already designed and executed three specialized AI evaluation projects, refining our methodologies for complex tasks, such as:
    • Attribution: Assessing if AI answers were fully supported by the provided reference texts.
    • Accuracy scoring: Evaluating AI sentences against reference passages for precision and cultural relevance. 
    • Factual rewriting: Reviewing AI-generated sentences against reference passages and rewriting them to ensure factual accuracy, clarity, and grammatical correctness.​
  • We also developed six customized gen AI skill tests to source and vet annotators with the right abilities for any given assignment, including language logic, researching, and rephrasing.

Implementing a human-in-the-loop validation process

To ensure high-quality, trustworthy AI outputs, we implemented a human-in-the-loop evaluation process tailored to the project’s unique needs.

This multi-step approach included:  

  • Targeted skill testing: We identified annotators with the precise capabilities required for the project.
  • Refining annotation guidelines: We enhanced the original instructions by incorporating case-specific guidance and examples. This helped annotators capture subtle nuances in AI-generated responses, prevent AI bias, and maintain consistency across all evaluations.
  • Research, validate, and rewrite: Our annotators followed a structured workflow. They generated research questions, conducted fact-checking using reliable sources, and rated the factual accuracy of the original answer.

Delivering high-quality outcomes at every stage

High-quality data is a non-negotiable for building responsible AI. 

Rooted in years of prior experience and proven workflows, Sigma has designed a rigorous evaluation process that combines highly trained human annotators, refined guidelines, and a strong fact-checking framework. The result? High-quality, reliable outcomes delivered with speed and precision. If you’d like to learn more, talk to an expert at Sigma.

A major technology services client needs 2000 hours of video in 24 languages transcribed by humans — and wants to launch all 24 teams at once.
A major technology services client needs 2000 hours of video in 24 languages transcribed by humans — and wants to launch all 24 teams at once.
Impressionist painting of people speaking on a video monitor, with captions, illustrating Sigma’s video transcription project in multiple languages
A client in consumer hardware needed highly sensitive user data annotated. Sigma.AI designed, implemented & operated secure facilities for 400+ annotators.
Impressionist painting of a secure modern office, glass walls, and soft sunlight, symbolizing safe facilities for sensitive data annotation.
EN