Designing secure facilities for sensitive data annotation

User data is sensitive — not only does it often fall under data protection regulations like GDPR, it’s also a valuable asset to a company. When a company in the consumer hardware space needed a subset of their user data annotated to iterate on an AI algorithm within their product, they needed absolute compliance and control over access to the data. Sigma provided not only the GDPR-compliant annotation, but also designed, implemented and operated the secure facilities where over 100 annotators worked.

10

Weeks from request to design and implementation of secure facility

100+

Annotators working from the facility, no annotators outside the project could enter

24/7

Manned building entrance security and video monitoring

Challenge

  • Provide a secure annotation facility for a team of over 100 annotators working in 8 languages
  • Source and train 100+ security-conscious annotators for 8 different languages
  • Design and implement secure procedures for annotation
  • Manage 24/7 physical security operations within the facility
  • All procedures must conform 100% with European data privacy regulations

Solution

  • Consulted the client on secure annotation facilities and procedures
  • Designed, implemented and began operation of facilities in 4 months
  • Provided 24/7 manned security and video monitoring with multiple measures to restrict and control access to the area — on top of cybersecurity procedures
  • Designed 100% GDPR-compliant procedures for annotating client’s user data
  • Hired and vetted 100+ annotators in 8 languages with rigorous training and testing on security protocols

Project story

AI and machine learning algorithms are built into an increasing number of products — all of which need training data to learn to solve the problems they’re designed for. For many applications, that means collecting and processing user data. 

Data annotation often deals directly with highly personal user data like voice recordings, texts with names, ages or addresses, or search queries. This is necessary to build and improve on the product itself — but processing this kind of sensitive data requires strict adherence to both the company’s terms of use or privacy agreement, and also local data privacy regulations like the GDPR. 

User data is also an extremely valuable resource. First-party data collected while someone uses a product, like behavioral data, chat content or search queries is unique, takes considerable time to collect, and is extremely useful to better understand customers. It can be a significant competitive advantage, especially when it’s used to train an AI — for example to interact better with users and continuously improve the product itself. 

All of these were reasons for a major consumer hardware manufacturer to ask Sigma to design and implement secure facilities and procedures and provide annotations of their users’ data exclusively onsite.

Dedicated annotators provide better security

Sigma was already the client’s annotation provider of choice for this project because of their focus on quality, quick ramp-up times, and flexibility in handling shifting project requirements. Another aspect of Sigma’s approach was crucial for this particularly sensitive project: their policy of vetting, training and hiring annotators directly, and never crowdsourcing. 

Crowdsourcing annotation, apart from data quality concerns, is also often remote. This leads to enormous complications when it comes to assuring data security. Even with the best cybersecurity measures in place, there’s still no way to control physical access to the data from remote teams — crowdsourced annotators working from personal machines could, for example, easily screenshot data, or accidentally show sensitive data to someone else in the room. Having annotators on the team that are familiar with working with sensitive data, who can also work exclusively from secure facilities, allows Sigma to implement a number of physical security measures that provide optimal control over access to client data. 

Treating client data with care and GDPR compliance

The annotators would handle personal data that fell under GDPR restrictions — meaning data that’s unique to the individual user and could theoretically be used to identify that person and their intentions. As a 100% GDPR compliant organization, Sigma understood what it took to assure the highest level of data privacy throughout the entire data annotation process. 

In order to obtain GDPR compliance, Sigma has internal experts in data protection, GDPR, and cyber and physical security. They also undergo annual third-party reviews and audits, and obtain external expert advice from one of the top 5 global consultancy firms. All employees receive annual training on GDPR and other security protocols, so that they’re not only compliant, but can understand why measures are implemented and promote correct application of security measures within the organization. Beyond GDPR, Sigma is also ISO27001 and SOC-2 type II certified.

Designing physical security

In only 10 weeks, the Sigma team designed and rolled out a dedicated secure facility with secure procedures, from consulting the client, to design and planning, to operation. In just 6 more weeks, they had a team of over 400 annotators in all 8 languages — hired, trained and working from the new facility. The security protocols Sigma implemented exceeded the client’s original specifications, and included several additional measures that Sigma recommended to the client based on their experience with sensitive projects. 

Only annotators and project managers working on the project were able to enter the building. Access to the facility was protected in multiple ways. 24/7 manned security at the building entrance assured that only authorized personnel could proceed to the secure area. This was confirmed with two factor authentication that included employee ID badges and biometrics. The secure area and office entrance were constantly monitored by video and recorded and saved locally for the maximum legal duration. The building was completely locked down outside of office hours.

Sigma designed additional physical security measures to protect the data from leaving the facility. Annotators could not bring any personal items or devices into the secure area, and were provided lockers in front of the entrance. Security guards confirmed that no objects were forgotten with metal detectors at the entrance. Similarly, they were checked by security on exiting that nothing was removed from the secure area. All emergency exit doors were alarmed. 

Security mindset from start to finish

Respect for client confidentiality and a security mindset towards personal data extended into every aspect of the project’s implementation, from employee onboarding to data delivery. Sigma incorporates a number of agreements into its employee hiring procedures as standard, including a code of ethics, acceptable use policies and non-disclosure agreements. For this project, and all others that involve confidential or personal data, teams carried out all of the mandatory training courses within the secure zone. Annotators were tested and reviewed on their understanding of and adherence to the security protocols, and reminders were posted around the facility.

No user data, project data, guidelines, or even annotator communications were accessible outside of the facility. Cybersecurity protocols included restricting internet access and using proprietary chat tools that only functioned from the local machines in the facility. These measures were on top of periodic penetration tests and external security audits that Sigma undergoes as a matter of course. 

After an initial pilot phase, the client doubled the volume of work and number of annotators — and continued to work with Sigma for multiple new secure annotation projects.

Un importante cliente de servicios tecnológicos necesita 2.000 horas de vídeo en 24 idiomas transcritos por humanos y quiere lanzar los 24 equipos a la vez. Sigma.AI cumple.
Un cliente de robótica tenía dificultades para etiquetar datos de imágenes de alta calidad dentro de una tolerancia de 1 píxel. Los equipos humanos y asistidos por tecnología de Sigma.AI cumplieron.
¿Cómo se coordinan más de 1000 conversaciones entre pares únicos de hablantes de dialectos específicos en solo 2 meses? Con automatización y el grupo adecuado de lingüistas.
ES