Sigma’s Image / Video Annotation Services

Improve your Computer Vision Models with high quality image annotations.

Our ML-assisted tools can reduce annotation time and cost significantly.

Sigma has highly secure facilities to annotate confidential and PII data.

What is Image / Video Annotation?

Image annotation is the process of associating the whole image and/or parts of an image to a set of pre-defined labels. The labels identify the objects and/or living beings in the image and can provide additional information such as their attributes, pose, and position. Image annotation is used to train computer vision systems, so they can detect and identify objects and living beings.

The amount of labels per image depends on the annotation objectives. For example, if the objective is to classify images, each image will have a single label to represent the content of the entire image. If the goal is to identify all the objects and living beings within an image, all the objects and living beings will have to be identified, marked and labeled.

Video annotation is similar to image annotation. However, unlike image annotation, video annotation has a higher level of difficulty since the objects and living beings are usually in motion either because they move by themselves or because the camera that records the video sequence is in motion. Videos have to be annotated frame-by-frame to help computer vision systems detect, identify and track objects and living beings accurately.

Image / Video Annotation Services

2D Bounding Boxes

In this type of annotation, annotators draw a box around the objects of interest and, usually, the boxes have one or several associated labels to identify the type of object and its attributes.

Sigma’s skilled annotators together with Sigma’s ML-assisted annotation tools can produce consistent and high quality annotations at scale.

3D Bounding Boxes

It is similar to 2D bounding boxes since annotators have to draw boxes around the objects of interest. However, in this case, they have to mark anchor points, so the 3D box shows the length, width and depth of each object. The anchor points have to be marked at the edges of the object to provide accurate annotations.

3D bounding boxes are also called cuboids.


Polygon annotation is normally used to annotate objects with irregular shapes since polygons allow to accurately mark the objects of interest independently on their shape.

As in the case of the bounding boxes, the area inside the polygon is labeled to identify the type of object and its attributes.

Semantic Segmentation

While bounding boxes and polygons annotate certain objects of interest only, semantic segmentation annotates every pixel in the image.

The image is segmented into categories that are identified by a color code. For example, in the image above, vehicles are colored blue, pedestrians are red, buildings are grey, trees are green, etc.

Landmark Annotation

Annotators draw points to mark facial traits and gestures or postures.

This type of annotation helps computer vision algorithms accurately detect emotions and gestures, identify people through facial recognition and assess posture changes in sports.

Lines and Splines

Annotators draw the straight or curve lines that mark the boundaries of the lanes.

This type of annotation helps train AI models for autonomous vehicles, so they can detect the lanes accurately.

Image Classification

It is the process of associating the entire image with one label. For example, this type of annotation can be used to train a machine learning system able to identify skin disorders such as melanoma, melanocytic nevus, actinic keratosis, benign keratosis, etc.

Video Tracking

It consists of drawing 2D or 3D bounding boxes around the objects of interest in every video frame.

Examples of objects of interest can be, vessels, ships, sailboats, pedestrians, vehicles, street lights, or traffic signs, among others.

Video tracking can also be done through semantic segmentation of the video frames.


Thermal Image Annotation

Our specialized annotators will interpret the images to estimate and annotate different aspects of the images.

The image above shows an annotation example for the agriculture sector, in which the temperature, the body condition score and general health status of cows and calves were annotated.

Optical Character Recognition

It is the process of converting text within images into machine-encoded text.

Sigma has the tools and expertise to extract text from images and create data sets that can later be used to train and adapt OCR systems to specific use cases such as traffic sign interpretation, license plate scanning, serial number, document and ID document scanning, etc.

Medical Image Annotation

Sigma has a large database of vetted annotators with a large variety of areas of specialization and skills. This includes radiologists, dermatologists, ophthalmologists, pathologists, biologists, etc.

This allows Sigma to produce very high quality annotations for the healthcare sector.

Quality Assessment

Sigma’s quality assessment team will review the annotated data and will report the type of errors (occasional, systematic and misinterpretation errors) together with the calculation of the most appropriate quality metrics. Some of the quality metrics used are the intersection over union (IoU) and the mean intersection over union (mIoU), precision (P) and recall (R); and the average precision (AP).