Why You Need Better Training Data

Machine Learning (ML) algorithms, much like children, “learn” from examples. The “data examples” that ML algorithms use to “learn” are prepared through the Data Annotation process, that combines technology, procedures, tools, annotation guidelines and quality assessment methodologies.

The quality of the “data examples” is of the utmost importance. In the same way that the quality of books and teaching materials is taken care of, so that children learn correctly; it must be ensured that the examples that artificial intelligence algorithms use to learn have the necessary quality.

Do you imagine what would happen if a children’s book had ”annotation errors”? Consider the example on the right. It has “just” two annotation errors: baby and crib…

The figure below illustrates what could happen if children would have learnt from this book.

Artificial Intelligence (AI) based systems are only as good as the data they’ve been trained with. It cannot be expected that AI performs correctly if its source of knowledge (i.e.: the data examples or training data) does not have the required quality.

The lack of quality of training data increases the error rate of AI-based systems and can even cause them to function in a discriminatory way.

It is for this reason that Sigma delivers very high quality data. This is achieved through a very unique way of defining and analyzing data quality.

But there are many other reasons why Sigma is the best choice:


  1. Experience and Service,
  2. Security and Privacy,
  3. Speed and Scalability,
  4. No Crowd-Sourcing,
  5. Flexibility,
  6. Ethics, and
  7. Pricing