Pixel-perfect image annotation for product recognition: A case study

Object recognition is a complex task for AI. While we effortlessly distinguish a mug from a glass based on years of implicit learning, AI systems require explicit training to make such distinctions – and often within a drastically shorter timeframe. Image annotation is the key to providing this data, enabling AI to distinguish between objects quickly and accurately.

For a robotics client, the goal was ambitious: develop an AI-powered robotic arm capable of identifying and labeling products amidst a chaotic mix of packaging types. The algorithm needed to discern the size, shape, material, packaging features (lids, QR codes, etc.), and position, even when products overlapped, were partially visible, or nestled within soft, shape-shifting packaging. The image annotation underpinning this AI training demanded extreme precision: pixel-perfect accuracy, with a tolerance of just one pixel. And, of course, the client required rapid delivery.

Custom pre-filtering algorithm to choose relevant images automatically

Before data annotation, Sigma collaborated with the client to strategically filter their dataset, selecting the 30% of images most likely to yield effective AI training. This involved identifying images showcasing the greatest diversity in packaging, position, size, and shape. For example, images of bottles in varied arrangements, including some out of alignment, are far more informative for training the AI to recognize different positions than images of consistently aligned bottles.

Manually comparing and filtering the data not only would have been extremely time-consuming — it would have been impossible, as humans can only process a limited subset at any given time.

Sigma’s engineers developed a custom algorithm to pre-filter the dataset, selecting the top 30% of images based on the client’s criteria. Using signal processing technology, they automated the image selection process, which allowed them to compare all images simultaneously and choose the most “interesting” ones. Signal processing transforms the image so that data comparison is less difficult, reducing the image to the most relevant information for selection. This could involve, for example, removing the object from a complex background so that the object is more easily identifiable.

Freeing up annotators to focus on precision work

When annotators label images, they delineate the outlines of an object with points and line segments, known as polygons. Many of the objects were soft or had curved edges, which often required hours of concentrated focus while the annotator built the polygon with minuscule line segments. Or, only a small portion of a package was visible because of overlap, and the annotator had to estimate the size, material, and position of the rest of the object based on context clues — things that only a human annotator can achieve.

To meet the client’s required rate of 220 images per day, at the level of accuracy and refinement they needed, the annotation team needed to reach productivity and scale quickly. They first needed to source a group of annotators with the right skills and mindset to do such incredibly detailed labeling. As Sigma maintains a pool of 25,000+ trained and vetted annotators, it was possible to find the right people and build the team on short notice.

Product managers and the engineering team then needed to figure out ways to shave off time and complexity on several fronts, and free up annotators to do precision work without getting caught up in process details.

Guidelines adapted to annotator needs

The client knew what results they needed to train the algorithm they built — they had drafted precise guidelines based on the requirements of the algorithm. But many of the guidelines needed refinement so annotators could use them effectively. Sigma’s team could bring in their decades of experience working with annotation teams to help the client adapt their guidelines so that they clearly describe the reasoning processes used to label objects. They also replaced specific technical language with terms that annotators typically use, so that the annotators could easily implement the guidelines without confusion or delays.

Increasing speed through workflow automations

From many years of experience in image annotation projects with fast turnaround times and extremely large scale, the project managers knew that work would be much simpler for their annotators if they could work in a customized tool that reduced task complexity and integrated all processes into one streamlined interface. The engineering team customized an open source tool to be able to reference the guidelines and training materials while they annotated. They added purpose-built keyboard shortcuts and the exact labels annotators would need so that they could move faster.

The engineering team also designed the interface so client inputs would be part of one seamless workflow. First, the clients could access the tool to directly upload packets of image data. Even more time savings came from integrating the review and feedback process into the tool — the client would receive a notification when results were delivered, and could then view the labeled data and leave feedback for annotators directly.

These workflow automations led to a total of 6480 working hours saved between quality assurance, project management and annotation — the equivalent of nine months of work, over the course of a year-long project.

Every pixel in place

The client was excited about the quality and precision of image annotation, which exceeded the competition — they had no further corrections at all to the annotated data after it was delivered. The pace at which Sigma was able to build, train, and scale the team up to production capacity was also a major factor. For the next iteration of the project, they not only requested that Sigma not only continue to work with them, but also double the team’s capacity and test a video pilot.