Launching and Scaling Video Transcription in 24 Languages and Dialects

Case Summary

Providing human-in-the-loop transcriptions in all languages is part of Sigma’s daily work. When a major client in the technology services industry wanted 2000 hours of video transcribed, by humans, in 24 languages and dialects, all launched in the same month, it put their scalability to the test. Sigma delivered, thanks to their experience in optimizing annotation processes and their global pool of 25,000+ trained and vetted annotators.


Technology Services


Speech Recognition – Automated Transcriptions


Data Annotation

Speech and Text



Video hours to transcribe by human annotators


Languages and regional dialects covered


Accurate transcriptions delivered


  • Transcribe speech from video material in 24 languages and dialects
  • Include diarization, i.e. notate timestamps, change in speakers
  • Source, launch and scale annotation teams for all languages simultaneously
  • All transcriptions carried out and reviewed by human annotators
  • All delivered transcriptions need to achieve 100% accuracy after human annotation


  • Successfully sourced and simultaneously launched annotation teams for all 24 languages and dialects
  • Adapted and localized guidelines for transcription and diarization in all 24 languages and dialects
  • Implemented process and user interface customizations to reduce transcription time by 32%
  • Efficiently delivered 100% accurate, diarized, human-in-the-loop transcriptions thanks to optimized review processes

Project Story

Automated transcription is becoming more and more prevalent. It’s now a standard component in many audio and video creator tools, in both professional editing software and across social media, and extremely important for web accessibility. But no automated transcriptions would exist without human annotators first teaching machines how to recognize speech. Providing human transcriptions of video and audio is part of Sigma’s core business – and their experience in the field, along with their global pool of expert linguists, helped them win over a client with a big scaling challenge. 

A major technology services provider needed 2000 hours of video material transcribed and diarized — annotators would need to notate timestamps and changes in speakers along with the transcribed dialogue. They also needed to provide detailed descriptions of background noise in the videos. The guidelines on how to annotate dialogue and sound included exact requirements, for example how to notate speakers, timestamps and noise, and how to segment the dialogue when many different speakers overlapped. They wanted human transcription for all the material, and the final product needed to come out of the review process with 100% accuracy. The catch: The dataset included video material in 24 languages and specific regional dialects. And they wanted to launch the project in all languages at once. 

Relying on Vetted Annotators Cuts Ramp-Up Times

Launching in 24 languages and dialects simultaneously meant sourcing 24 annotator teams, localizing the annotation guidelines into 24 dialects, and then training 24 teams on how to implement those guidelines in their transcriptions. 

Thanks to Sigma’s existing pool of 25,000+ vetted annotators that cover over 250 languages and dialects, they could quickly find the right people for the project, including specific regions and dialects in Central, Southern and East Asia. Relying on vetted annotators was an important factor in accelerating the project — in order to understand and quickly implement the complex guidelines, having previous experience doing these kinds of transcriptions made it easier to bring teams up to speed. 

Precise Guideline Localization Leads to Higher Quality

Sigma’s experienced project managers know that guideline refinement — adapting the rules that annotators use to label or transcribe data to make them understandable and easy to use — is a critical part of assuring quality and accuracy in the final product. 

When working across so many languages, that meant localizing the guidelines to account for linguistic and cultural differences. The guidelines needed to take into account, for example, how the same words are used or spelled differently in different dialects, or how to treat commonly-used words borrowed from other languages. 

Project managers remained in constant contact with the annotators from start to finish, training them on the guidelines, supporting them through practice exercises and testing, and maintaining continuous feedback throughout the actual transcriptions. Their experience working with annotators across the globe helped them apply communication styles localized to the needs of the different teams and bring them up to productivity fast. 

Process Automation Enables Better, Faster Transcription

Manual transcription is an incredibly focused and detailed process. A central part of Sigma’s approach to annotation is finding the right combination of technology and human-in-the-loop inputs to arrive at the highest quality result in the most efficient possible way. This often means making customizations to the tools annotators use, and optimizing or automating parts of the process that are time consuming or distracting to the annotators. This case was no different — the engineering team and product managers made a number of changes to streamline the transcription process so that the annotators could focus on what they do best. 

Providing annotators with machine-generated pre-transcriptions is one automation step that greatly reduces time to achieve an accurate, finished translation, as well as designing the interface to include input templates pre-formatted according to the guidelines. Annotators received exactly the video snippet they were assigned to transcribe next, eliminating any time wasted in finding the right segments of data. 

The process of error analysis, review and feedback also happened right within the user interface, eliminating time lost through context-switching between several apps. Within the user interface, annotators could communicate with project managers and with each other to ask questions on the guidelines, discuss edge cases and share knowledge about guidelines, terms and topics. 

The engineering team also built in small, specific features like transcriber keyboard shortcuts, for example to cut segments and add speaker labels. Even small customizations that shave seconds off annotation time can add up to months of time saved in high-volume projects. 

Collaboration Expands Thanks to Data Quality at Scale

After the months-long project came to a close, the client was happy with their 100% accurate transcriptions — and about Sigma’s precise adherence to their guidelines, possible because of Sigma’s understanding of both how annotators work and the importance of quality training data to a well-functioning algorithm. The client engaged Sigma for future transcription projects, with more volume and new languages.

Related Resources

When iterating on a running search algorithm, engineers turned to Sigma’s flexible annotation teams to evaluate queries and respond to changes on the fly.
Explore our data annotation services: Highest quality data annotation with teams, tech, and processes adapted to our client’s exact needs, across text, audio, image, video and more.
Creating high-quality datasets is essential to successful artificial intelligence (AI) and machine learning (ML) projects. Industry analysts estimate that AI project teams devote about 80 percent of their time to data. However, this statistic is somewhat misleading. That time doesn’t necessarily reflect the importance that AI teams place on data quality. Rather, much of that time is spent on inefficient processes, rework, and training datasets that don’t teach a model to provide desirable outputs, even over multiple iterations.

Let’s Work Together to Build Smarter AI

Whether you need help sourcing and annotating training data at scale, or you need a full-fledged annotation strategy to serve your AI training needs, we can help. Get in touch for more information or to set up your proof-of-concept.