ML-Assisted Data Annotation

While the data annotation ultimate goal is to perform data annotation entirely automatically, it is still challenging to do so for any type of annotation and data. Consequently, the best strategy is to find the optimal combination of automation and manual annotation.

Sigma’s annotation platform automates repetitive processes that involve the manipulation of large volumes of data, the pre-processing of data that feeds the data pipeline, the annotation process and the quality assessment. The different processes can be automated in their entirety or partially depending on their complexity.

Sigma’s platform provides software modules for automatically, or semi-automatically annotate audio, images, videos and text by using speech recognition, speaker diarization and identification, speaker characterization, computer vision, natural language processing and signal processing technologies. Here are a few examples of real-world use cases that illustrate the advantage of using technology to help in the annotation process and allow human annotators to focus their effort where is most needed:

Success Stories

In transcription of short audio recordings, human annotators are able to transcribe 70% more recordings when the transcription field has been pre-filled using an automatic speech recognizer.

In polygon annotation, when an AI model is used to predict the shape of the objects, the number of polygons annotated per hour increases by 60%.

By simply optimizing the size of the audio files we could reduce the annotation time by 20%

By using image selection technology, we typically reduce the annotation time by 25%

Adapting the user interface to the specific speech annotation task in a 2,000 hours annotation project increased the productivity by 32%, which allowed to reduce the number of annotators by 25%.

In a one-year duration annotation project, just the automation of the work flow saved 1.5 months of project manager time, 3 months of annotator time and 4.5 months of QA time.

Adapting the user interface in an object detection, classification and annotation project resulted in a 60% reduction in the number of annotators.

Pairing and organizing speaker’s schedules automatically saved 2.5 months of project manager work in a six-month conversation recording project.