Red teaming goes global
Red teaming — intentionally probing AI models for weaknesses — has long been a key practice in AI safety. But most efforts focus on English, text-based interactions. Sigma AI decided to take things further. In our latest study, they pushed top models to their limits, examining how they behave in different languages, cultures, and formats — from text generation to image creation and interpretation.
Their approach was both deep and diverse. Models like GPT-4.0, Gemini 2.5 Pro, and DALL·E 3 were tested not only in English but also in Spanish and Japanese. The researchers created localized prompts addressing harm categories with the most serious societal implications: bias and discrimination, cybersecurity, crime and violence, and exploitation.
Different strengths, different weaknesses
The study’s central finding is deceptively simple: no single model performs best in every scenario. Each system showed unique strengths and vulnerabilities depending on the task and language. A model that handled English text safely might falter when asked to generate images from Japanese prompts.
This inconsistency matters. It underscores how cultural and linguistic context can shape what “safe” or “appropriate” output means. What passes as harmless in one culture might be sensitive or even harmful in another. True robustness, the report argues, requires accounting for that diversity — not just scaling a single standard of safety.
The promise and limits of automated oversight
Another focus of the study was automation: Can AI systems evaluate each other’s outputs? Sigma AI’s team used what they call “AI juries” — automated systems designed to flag or filter harmful content. The results were encouraging but incomplete. These systems could reliably handle over half of all cases involving clearly non-harmful material.
But when the questions got complex — around ethics, subtle bias, or high-stakes implications — AI juries struggled. The researchers found that human review was still essential for nuanced judgment. Automation can speed up safety checks, but it can’t yet replace ethical reasoning.
Human-AI partnership as the path forward
What emerges is a picture of partnership rather than replacement. AI can handle repetitive, obvious filtering tasks efficiently. Humans bring contextual understanding, empathy, and critical judgment — the qualities that ensure fairness and accountability.
Sigma AI’s results point toward a hybrid model for AI safety: Let machines do what they do best, but keep human oversight at the heart of the process. This approach balances efficiency with responsibility, enabling faster red teaming without losing ethical depth.
Trust built on collaboration
Ultimately, the study challenges how we think about trust in AI. It’s not just about whether a model performs well — it’s about whether the ecosystem around it, including human oversight, ensures its reliability across contexts.
For everyday users, this balance may quietly shape how we interact with AI tools. When we sense that human judgment still guides these systems, our trust deepens. When that layer is missing, confidence falters.
Sigma AI’s multilingual red teaming work reminds us that safety isn’t static. It’s a moving target shaped by culture, language, and collaboration between humans and machines.
Talk to an expert at Sigma AI to learn more about our latest research and see how our multilingual red teaming framework is shaping the next generation of safe, responsible AI.
