Launching and scaling video transcription in 24 languages and dialects

Providing human-in-the-loop transcriptions in all languages is part of Sigma’s daily work. When a major client in the technology services industry wanted 2000 hours of video transcribed, by humans, in 24 languages and dialects, all launched in the same month, it put their scalability to the test. Sigma delivered, thanks to their experience in optimizing annotation processes and their global pool of 25,000+ trained and vetted annotators.


Video hours to transcribe by human annotators


Languages and regional dialects covered


Accurate transcriptions delivered


  • Transcribe speech from video material in 24 languages and dialects
  • Include diarization, i.e. notate timestamps, change in speakers
  • Source, launch and scale annotation teams for all languages simultaneously
  • All transcriptions carried out and reviewed by human annotators
  • All delivered transcriptions need to achieve 100% accuracy after human annotation


  • Successfully sourced and simultaneously launched annotation teams for all 24 languages and dialects
  • Adapted and localized guidelines for transcription and diarization in all 24 languages and dialects
  • Implemented process and user interface customizations to reduce transcription time by 32%
  • Efficiently delivered 100% accurate, diarized, human-in-the-loop transcriptions thanks to optimized review processes

Project story

Automated transcription is becoming more and more prevalent. It’s now a standard component in many audio and video creator tools, in both professional editing software and across social media, and extremely important for web accessibility. But no automated transcriptions would exist without human annotators first teaching machines how to recognize speech. Providing human transcriptions of video and audio is part of Sigma’s core business – and their experience in the field, along with their global pool of expert linguists, helped them win over a client with a big scaling challenge. 

A major technology services provider needed 2000 hours of video material transcribed and diarized — annotators would need to notate timestamps and changes in speakers along with the transcribed dialogue. They also needed to provide detailed descriptions of background noise in the videos. The guidelines on how to annotate dialogue and sound included exact requirements, for example how to notate speakers, timestamps and noise, and how to segment the dialogue when many different speakers overlapped. They wanted human transcription for all the material, and the final product needed to come out of the review process with 100% accuracy. The catch: The dataset included video material in 24 languages and specific regional dialects. And they wanted to launch the project in all languages at once. 

Relying on vetted annotators cuts ramp-up times

Launching in 24 languages and dialects simultaneously meant sourcing 24 annotator teams, localizing the annotation guidelines into 24 dialects, and then training 24 teams on how to implement those guidelines in their transcriptions. 

Thanks to Sigma’s existing pool of 25,000+ vetted annotators that cover over 250 languages and dialects, they could quickly find the right people for the project, including specific regions and dialects in Central, Southern and East Asia. Relying on vetted annotators was an important factor in accelerating the project — in order to understand and quickly implement the complex guidelines, having previous experience doing these kinds of transcriptions made it easier to bring teams up to speed. 

Precise guideline localization leads to higher quality

Sigma’s experienced project managers know that guideline refinement — adapting the rules that annotators use to label or transcribe data to make them understandable and easy to use — is a critical part of assuring quality and accuracy in the final product. 

When working across so many languages, that meant localizing the guidelines to account for linguistic and cultural differences. The guidelines needed to take into account, for example, how the same words are used or spelled differently in different dialects, or how to treat commonly-used words borrowed from other languages. 

Project managers remained in constant contact with the annotators from start to finish, training them on the guidelines, supporting them through practice exercises and testing, and maintaining continuous feedback throughout the actual transcriptions. Their experience working with annotators across the globe helped them apply communication styles localized to the needs of the different teams and bring them up to productivity fast. 

Process automation enables better, faster transcription

Manual transcription is an incredibly focused and detailed process. A central part of Sigma’s approach to annotation is finding the right combination of technology and human-in-the-loop inputs to arrive at the highest quality result in the most efficient possible way. This often means making customizations to the tools annotators use, and optimizing or automating parts of the process that are time consuming or distracting to the annotators. This case was no different — the engineering team and product managers made a number of changes to streamline the transcription process so that the annotators could focus on what they do best. 

Providing annotators with machine-generated pre-transcriptions is one automation step that greatly reduces time to achieve an accurate, finished translation, as well as designing the interface to include input templates pre-formatted according to the guidelines. Annotators received exactly the video snippet they were assigned to transcribe next, eliminating any time wasted in finding the right segments of data. 

The process of error analysis, review and feedback also happened right within the user interface, eliminating time lost through context-switching between several apps. Within the user interface, annotators could communicate with project managers and with each other to ask questions on the guidelines, discuss edge cases and share knowledge about guidelines, terms and topics. 

The engineering team also built in small, specific features like transcriber keyboard shortcuts, for example to cut segments and add speaker labels. Even small customizations that shave seconds off annotation time can add up to months of time saved in high-volume projects. 

Collaboration expands thanks to data quality at scale

After the months-long project came to a close, the client was happy with their 100% accurate transcriptions — and about Sigma’s precise adherence to their guidelines, possible because of Sigma’s understanding of both how annotators work and the importance of quality training data to a well-functioning algorithm. The client engaged Sigma for future transcription projects, with more volume and new languages.

Un cliente de hardware de consumo necesitaba anotaciones en datos de usuario altamente confidenciales. Sigma diseñó, implementó y operó instalaciones seguras para más de 400 anotadores.
Un cliente de robótica tenía dificultades para etiquetar datos de imágenes de alta calidad dentro de una tolerancia de 1 píxel. Los equipos humanos y asistidos por tecnología de Sigma.AI cumplieron.
¿Cómo se coordinan más de 1000 conversaciones entre pares únicos de hablantes de dialectos específicos en solo 2 meses? Con automatización y el grupo adecuado de lingüistas.