Addressing data challenges with AI-powered solutions

Addressing the challenges of data for AI: a rock climber also faces challenges

Companies collect terabytes of data every day, but most of its potential remains untapped. Making sense of raw, unstructured information is one of the most pressing data challenges for organizations. For those using data to train their own generative AI models, the challenge is even more complex: Not only do they need sheer volumes of data, but it must also be of the highest quality.

With this in mind, Sigma Cognition — a leader in AI-powered solutions and Sigma AI’s sister company — recently completed two research projects: SigmaOnTopic and HADA. Let’s delve deeper into how each of these projects can empower companies to extract greater value from their data.

Table of Contents

SigmaOnTopic: Unlocking the power of unstructured data

What if you could have a search engine for your company’s internal data?

That’s the idea behind SigmaOnTopic. This semantic search tool with advanced capabilities helps organizations explore their internal knowledge bases to recover precise and relevant information. 

Imagine you need to troubleshoot a recurring machine issue. Instead of spending hours flipping through manuals, past maintenance logs, or training videos, you could ask a question in natural language and get the information you need right away. 

SigmaOnTopic can sift through large amounts of unstructured data in multiple formats. This includes text hidden within images or spoken words in a video recording. “By converting all the available information to text using AI tools, organizations’ databases can be significantly enriched and it can even be possible to select video fragments that might be relevant for users with a high level of precision,” explains Pierre Plaza, Director of Research and Development Programs at Sigma Cognition.

This project has received funds from Next Generation EU, within the framework of the Recovery, Transformation, and Resilience Plan of the Spanish economy. This has been possible through a call for proposals organized by Red.Es, an entity attached to the Ministry for Digital Transformation and Public Service through the Secretary of State for Digitalization and Artificial Intelligence.

SigmaOnTopic is being tested by a select group of users within the digital content platform at the Polytechnic University of Cartagena (UPCT). This educational platform is a valuable resource for many students of Spanish universities. To create SigmaOnTopic’s functionality, all information within the platform, including audiovisual content, has been converted into searchable text.  

Beyond academia, the potential of this tool can be applied to other industries, particularly those with limited technical resources. By enabling companies to quickly and easily locate invoices, serial numbers, user manuals, and any other document type, it could unlock the hidden value within internal databases, which have largely remained unexplored.

HADA project, speeding up the data annotation process

Data annotation is an essential step in the machine-learning process. It involves labeling large volumes of data for training machine learning models to recognize patterns and make predictions.

The quality of an AI solution hinges on the quality of the data used to train it.

Achieving this quality often requires a strategic blend of human expertise and technological tools. However, not all companies have the capacity or the resources to handle this process, which can be complex and time-consuming.  

Currently, data annotation consumes 80% of an AI’s project time. “This highlights the need for advanced tools that streamline and enhance this process, while still valuing the irreplaceable human contribution,” explains Ester Sancho Lozano, Data Scientist participating in the project.

Motivated by this data challenge, Sigma Cognition developed a series of Advanced Data Annotation Tools (HADA). These AI tools accelerate and improve data annotation for voice, text, and images. They automate tasks, speed up annotation, and ensure data quality at each stage of the data annotation process.

For example, a deep learning algorithm identifies key frames in security camera footage, clustering relevant segments. This speeds up annotation and reduces viewing time by providing quick access to important video clips.

Implementing these tools in conversation annotation projects has led to: 

  • A 20% improvement in the performance of data annotation equipment, thanks to preprocessing techniques.
  • More than 8% improvement in pre-annotation accuracy.
  • Reducing human errors by up to 8% during the quality control stage.

 

Like SigmaOnTopic, the HADA project also received funding from Next Generation EU. In this case, the project’s development was possible with the collaboration of specialized groups from the Polytechnic University of Madrid (UPM) and Carlos III University.

Unlocking data value

Both HADA and SigmaOnTopic represent Sigma’s research efforts that improve collaboration and solve some of AI’s most significant challenges. By enabling companies to make sense of complex information and enhancing the data annotation process with advanced AI-powered tools, Sigma empowers businesses to realize the full potential of their data and boost efficiency. 

Learn more about Sigma’s data annotation services and how we can help you solve even the most complex challenges. 

 

Want to learn more? Contact us ->
Sigma offers tailor-made solutions for data teams annotating large volumes of training data.
EN