Creating Efficient Processes Through Iteration and Customization
When annotation data quality is so crucial to how an AI performs, knowing exactly where to build efficiencies is a challenge. High-quality annotation can be done efficiently and affordably, but cost savings are constrained by quality concerns as well as compliance with privacy regulations and security.
At Sigma, we’ve developed iterative processes to shave off valuable annotation hours while maintaining the highest standard of quality. Rather than relying on a fixed process, we tailor our approach to your unique project specifications. We carefully analyze parameters like data type, the volume of data required, audio or video recording conditions, demographic characteristics, geographic coverage, and many more to identify what combination of people, processes, and technology will deliver you maximum quality at maximum efficiency.
Thanks to our massive network of vetted and trained data annotators, we’re able to scale your annotation project at a moment’s notice, even with very short lead times. Our workforce includes over 25,000 annotators and specialists with different subject matter expertise such as linguistics, radiology, biology, sports, finance and engineering. Because this pool is already in place, we can quickly scale to meet exact project needs, even for very specific use cases.
We select curated annotation teams for each project according to its specific characteristics. Because we have a broad pool of existing candidates on hand, we can quickly select those with the right set of skills, knowledge, and previous experience with similar projects. Our experience shows that team curation is vital to meeting goals efficiently while ensuring annotation quality.
To make the data collection and annotation more efficient, we identify optimization potential in each step of the process, also on a continuous basis, allowing us to make incremental improvements that can add up to tens of thousands of annotator hours saved. We also streamline the entire process, structuring the sequence of steps in our processes so that they flow without interruption.
Process efficiency needs to be in balance with quality. Our approach to data collection and annotation is human-centric. We believe achieving the highest quality in training data requires humans in the loop. When it comes to creating massive amounts of training data, it’s essential to understand how people can best contribute to the process, and where to take human limitations into account. For example, when we organize annotator working procedures, we maintain an appropriate number of breaks. Not only does this improve the physical health and mental well-being of the annotators, but it can also eliminate the effects of fatigue or lack of attention, improving the quality of the final output, limiting time spent on re-work and corrections, and producing a constant flow of data.
Our quality assurance methodology is also based on understanding where human error can naturally arise, and intervening as early in the process as possible. In addition to building error-preventing measures into the tools annotators use, we establish continuous feedback cycles between project managers and annotators. This allows project managers to refine annotation guidelines and adapt processes during running processes instead of after the fact, creating more efficiency and producing higher-quality annotations as the project progresses.
Tools have to be intuitive to speed up the data collection and annotation process. At the same time, they have to be designed to facilitate the most efficient workflow possible while minimizing the probability of annotator mistakes.
At Sigma, we design our suite of data collection and annotation tools for the optimal combination of human judgment, automation and machine learning capabilities. Here are some of the principles we apply to our tooling:
- Identify steps in the data collection or annotation process that stand to benefit from automation or partial automation and implement tooling to create greater efficiency
- Tools built to handle the intense flow of data with no latencies so that many people can collect or annotate data in parallel
- Designed with features that help find and correct systematic errors
- Interfaces have interaction models that are intuitive and designed to further minimize errors and speed up annotation
- Integrated communication modules allow project managers to provide direct feedback on annotator work
- Annotators have easy access to guidelines and training materials
In addition to our automation technologies, we can further increase efficiency with machine-learning assisted annotation. This technology uses existing annotated data to pre-complete labels for further annotations, decreasing the time needed per annotation over time.