Sigma’s Data Collection Services
Train your Artificial Intelligence Models with high quality Data.
Sigma is ISO27001 certified and 100% GDPR compliant.
Sigma has the experience, knowledge and tools to design datasets and/or collect data.
What is Data Collection?
Data collection is the process of gathering data that once it has been cleaned and annotated can be used to train and assess machine learning and deep learning algorithms.
Artificial Intelligence is only as good as the data it has been trained with. Therefore, data collection is a key part in the development of any AI-based system. There are a number of points to take into account when planning a data collection project, some of which are:
- Domain coverage: The collected data has to accurately represent the task domain the resulting AI is going to work on. Both inaccurate and missing data can negatively affect the result. Inaccurate data is data unrelated to the task domain, while missing data is domain data that is not in the dataset.
- Balance: The data collection has to be designed in such a way that no part of the data is under-represented, so the AI algorithm works as expected in all aspects of the application domain.
- User coverage: All type of users have to be equally represented so there are no biases on gender, age, race, politics, religion, etc.
Sigma has the experience, knowledge and tools to design datasets and/or collect data, be it text, audio, image, video or biometric data while keeping security and privacy. Sigma is ISO27001 certified and 100% GDPR compliant.
Data Collection Services
Whether it is general, domain specific or conversational text data, Sigma has a large experience in designing and creating text datasets for a large variety of use cases, including handwriting text data collection.
Sigma collects monologues and dialogues with any wideband (44.1KHz, 16KHz, 8KHz), quality (studio, home/office, noisy), codification and format.
Whether monologues or dialogues are scripted, domain specific or spontaneous, Sigma has the experience and resources to design and/or collect the voice dataset.
Sigma collects images and videos in any format and domain. This includes from images and videos with regular cameras to images and videos with thermal cameras and drones.