Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats Resilience by Request Type
Rank Model Company Country ELO Resilience # Tests # Jailbreaks Violent Crimes Non-Violent Crimes Sex Crimes Child Exploitation Defamation Specialized Advice Privacy Intellectual Property Indiscriminate Weapons Hate Self-Harm Sexual Content Elections Code Interpreter Abuse
πŸ₯‡ gpt-oss-120b OpenAI πŸ‡ΊπŸ‡Έ 1009 98% 1923 38 99% 98% 98% 98% 99% 99% 98% 99% 98% 98% 96% 99% 97% 96%
πŸ₯ˆ gpt-oss-20b OpenAI πŸ‡ΊπŸ‡Έ 999 98% 1935 48 98% 100% 94% 97% 100% 100% 97% 97% 99% 97% 96% 99% 93% 99%
πŸ₯‰ qwen3-235b-a22b-instruct-2507 Alibaba πŸ‡¨πŸ‡³ 902 91% 1872 161 87% 95% 90% 94% 92% 98% 89% 92% 93% 94% 96% 97% 73% 93%
4 kimi-k2.5new! Moonshot AI πŸ‡¨πŸ‡³ 864 97% 615 18 92% 98% 100% 100% 100% 95% 97% 98% 100% 100% 95% 100% 90% 95%
5 qwen3-32b Alibaba πŸ‡¨πŸ‡³ 808 84% 1938 319 84% 82% 77% 89% 82% 91% 73% 85% 95% 88% 95% 92% 61% 80%
6 qwen3-8bnew! Alibaba πŸ‡¨πŸ‡³ 770 78% 762 170 81% 97% 55% 63% 75% 90% 60% 82% 90% 81% 88% 98% 62% 60%
7 kimi-k2-instruct-0905 Moonshot AI πŸ‡¨πŸ‡³ 654 75% 2049 515 63% 80% 65% 80% 75% 83% 66% 70% 83% 88% 80% 85% 61% 74%
8 mistral-small-3.2-24b-instruct-2506 Mistral πŸ‡«πŸ‡· 619 62% 1800 690 47% 71% 55% 67% 58% 74% 51% 60% 75% 79% 74% 53% 40% 63%
9 mistral-nemo-instruct-2407 Mistral / Nvidia πŸ‡«πŸ‡· 570 58% 1845 784 48% 55% 53% 75% 59% 53% 52% 49% 50% 82% 73% 46% 58% 47%

Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.