In-The-Wild Jailbreak Prompts — real-world LLM jailbreak dataset

This CCS 2024 dataset collects 15,140 ChatGPT prompts scraped from Reddit, Discord, prompt-sharing websites, and open datasets — including 1,405 verified jailbreak prompts gathered over roughly a year — the largest measurement study of in-the-wild jailbreaks at its release.

Key features

1,405 real jailbreak prompts plus a large pool of benign prompts for contrastive evaluation
Sourced from four platforms with timestamps to study how jailbreaks evolve over time
A ready-made corpus for training or benchmarking prompt-injection and jailbreak detectors
Accompanied by analysis of prompt-sharing communities and attack effectiveness
Grounds red-team coverage in prompts that attackers actually used in the wild

Rather than synthetic attacks, this dataset gives defenders authentic adversarial inputs, making it a strong foundation for evaluating whether a guardrail catches the jailbreaks people really deploy.

Curated mirror of the open-source In-The-Wild Jailbreak Prompts (MIT). Get it from the source.

In-The-Wild Jailbreak Prompts

In-The-Wild Jailbreak Prompts — real-world LLM jailbreak dataset

Key features

More from @ai-supply