Model Leaderboard

Below is the leaderboard of all models on the site, ordered by ELO.

Main Stats									Resilience by Request Type
Rank	Model	Company	Country	ELO	Resilience	# Tests	# Jailbreaks	Violent Crimes	Non-Violent Crimes	Sex Crimes	Child Exploitation	Defamation	Specialized Advice	Privacy	Intellectual Property	Indiscriminate Weapons	Hate	Self-Harm	Sexual Content	Elections	Code Interpreter Abuse
🥇	gpt-oss-120b	OpenAI	🇺🇸	889	99%	5370	50	99%	99%	99%	99%	100%	99%	99%	99%	99%	99%	99%	99%	98%	99%
🥈	gpt-oss-20b	OpenAI	🇺🇸	873	98%	5313	93	99%	100%	96%	98%	100%	100%	98%	99%	99%	98%	98%	100%	90%	99%
🥉	kimi-k2.5^deprecated	Moonshot AI	🇨🇳	865	96%	1932	82	91%	97%	97%	96%	99%	96%	92%	96%	99%	98%	95%	97%	92%	96%
4	qwen3-235b-a22b-instruct-2507	Alibaba	🇨🇳	847	94%	5487	313	93%	97%	93%	96%	95%	99%	94%	94%	96%	93%	97%	96%	82%	96%
5	nemotron-3-ultra^new!	Nvidia	🇺🇸	845	100%	126	0	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
6	gemma-4-26b-a4b^new!	Google	🇺🇸	834	98%	168	4	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	75%	93%
7	minimax-m3^new!	Minimax	🇨🇳	825	99%	183	2	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	85%	100%
8	qwen3-32b	Alibaba	🇨🇳	791	89%	5580	602	91%	90%	84%	90%	88%	94%	83%	89%	96%	90%	96%	94%	72%	89%
9	qwen3-8b^deprecated	Alibaba	🇨🇳	778	82%	1728	303	84%	90%	70%	73%	81%	92%	70%	86%	91%	79%	91%	97%	71%	76%
10	kimi-k2-instruct-0905^deprecated	Moonshot AI	🇨🇳	725	75%	2550	631	64%	80%	65%	79%	77%	84%	66%	69%	84%	84%	80%	84%	61%	77%
11	mistral-small-3.2-24b-instruct-2506	Mistral	🇫🇷	670	75%	5367	1328	66%	83%	71%	79%	72%	86%	71%	76%	87%	85%	84%	73%	48%	77%
12	mistral-nemo-instruct-2407^deprecated	Mistral / Nvidia	🇫🇷	644	73%	5334	1433	66%	72%	70%	79%	68%	77%	70%	72%	78%	87%	85%	69%	59%	71%

Note: The statistics on this page were all judged by:

Qwen3-32B (until 27/07/2026)
Qwen3.6-27B (27/07/2026-present)

No LLM is a perfect judge, meaning this represents a close approximation of LLM jailbreak resilience, rather than a perfect figure.