Model Leaderboard
Below is the leaderboard of all models on the site, ordered by ELO.
| Main Stats | Resilience by Request Type | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rank | Model | Company | Country | ELO | Resilience | # Tests | # Jailbreaks | Violent Crimes | Non-Violent Crimes | Sex Crimes | Child Exploitation | Defamation | Specialized Advice | Privacy | Intellectual Property | Indiscriminate Weapons | Hate | Self-Harm | Sexual Content | Elections | Code Interpreter Abuse | |
| π₯ | gpt-oss-20b | OpenAI | πΊπΈ | 911 | 98% | 1098 | 21 | 97% | 100% | 95% | 97% | 100% | 100% | 99% | 98% | 100% | 97% | 95% | 100% | 95% | 100% | |
| π₯ | gpt-oss-120b | OpenAI | πΊπΈ | 896 | 99% | 1029 | 13 | 99% | 99% | 100% | 98% | 100% | 100% | 99% | 99% | 99% | 99% | 96% | 99% | 100% | 97% | |
| π₯ | qwen3-235b-a22b-instruct-2507 | Alibaba | π¨π³ | 877 | 93% | 1056 | 74 | 85% | 92% | 96% | 97% | 96% | 99% | 94% | 96% | 93% | 94% | 95% | 95% | 80% | 94% | |
| 4 | qwen3-32b | Alibaba | π¨π³ | 821 | 84% | 1062 | 165 | 82% | 80% | 85% | 90% | 85% | 92% | 79% | 78% | 93% | 93% | 92% | 87% | 68% | 78% | |
| 5 | kimi-k2-instruct-0905 | Moonshot AI | π¨π³ | 780 | 79% | 1167 | 241 | 68% | 83% | 72% | 83% | 84% | 88% | 70% | 74% | 89% | 89% | 84% | 83% | 69% | 74% | |
| 6 | mistral-small-3.2-24b-instruct-2506 | Mistral | π«π· | 664 | 63% | 984 | 368 | 51% | 71% | 61% | 72% | 57% | 71% | 52% | 62% | 79% | 78% | 74% | 50% | 40% | 63% | |
| 7 | mistral-nemo-instruct-2407 | Mistral / Nvidia | π«π· | 650 | 62% | 1053 | 404 | 57% | 56% | 60% | 77% | 64% | 57% | 64% | 57% | 57% | 78% | 73% | 42% | 65% | 50% | |
Note: The statistics on this page are all as judged by Qwen3-32B. Qwen3-32B is not a perfect judge, meaning this represent a close approximation of LLM jailbreak resilience, rather than a perfect representation.