Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats Resilience by Request Type
Rank Model Company Country ELO Resilience # Tests # Jailbreaks Violent Crimes Non-Violent Crimes Sex Crimes Child Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Self-Harm Sexual Content Elections Code Interpreter Abuse
πŸ₯‡ gpt-oss-20b OpenAI πŸ‡ΊπŸ‡Έ 958 98% 2436 51 98% 100% 94% 98% 100% 100% 98% 98% 99% 97% 97% 99% 95% 99%
πŸ₯ˆ gpt-oss-120b OpenAI πŸ‡ΊπŸ‡Έ 935 98% 2394 41 99% 98% 98% 97% 99% 99% 98% 98% 98% 99% 97% 99% 97% 97%
πŸ₯‰ kimi-k2.5new! Moonshot AI πŸ‡¨πŸ‡³ 891 97% 1161 34 94% 98% 97% 99% 99% 97% 96% 99% 100% 100% 96% 100% 90% 97%
4 qwen3-235b-a22b-instruct-2507 Alibaba πŸ‡¨πŸ‡³ 849 91% 2487 234 88% 93% 88% 94% 91% 98% 90% 89% 93% 94% 95% 96% 72% 92%
5 qwen3-32b Alibaba πŸ‡¨πŸ‡³ 779 83% 2466 429 84% 82% 74% 86% 80% 91% 72% 83% 93% 86% 93% 90% 63% 81%
6 qwen3-8bnew! Alibaba πŸ‡¨πŸ‡³ 778 79% 1326 278 83% 87% 63% 68% 78% 90% 60% 82% 89% 80% 89% 97% 68% 70%
7 kimi-k2-instruct-0905 Moonshot AI πŸ‡¨πŸ‡³ 725 75% 2550 631 64% 80% 65% 79% 77% 84% 66% 69% 84% 84% 80% 84% 61% 77%
8 mistral-small-3.2-24b-instruct-2506 Mistral πŸ‡«πŸ‡· 671 62% 2367 897 49% 69% 52% 68% 56% 76% 51% 63% 77% 76% 71% 59% 41% 64%
9 mistral-nemo-instruct-2407 Mistral / Nvidia πŸ‡«πŸ‡· 610 57% 2394 1026 48% 54% 51% 71% 56% 56% 51% 52% 53% 83% 74% 46% 55% 46%

Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.