Releases: confident-ai/deepteam
🎉 New CLI tool, Agentic Red Teaming
🚀 DeepTeam CLI Release
We’re excited to release the first version of the DeepTeam CLI – a powerful command-line tool for red teaming and evaluating LLM applications with DeepEval.
✨ Features
-
Red Team Simulation
- Easily specify simulator and evaluation models (
gpt-3.5-turbo-0125
,gpt-4o
, etc.) - Attack LLM systems with predefined vulnerability categories (e.g., Bias, Toxicity)
- Easily specify simulator and evaluation models (
-
Target System Configuration
- Test both foundational models (like
gpt-3.5-turbo
) and full LLM applications via custom Python wrappers - Simple YAML config structure for defining the target model's purpose and behavior
- Test both foundational models (like
-
System Controls
- Set concurrency and parallelism:
max_concurrent
,run_async
- Specify how many attacks to run per vulnerability type
- Optional error handling (
ignore_errors
) and result storage (output_folder
)
- Set concurrency and parallelism:
-
Pluggable Vulnerabilities and Attacks
- Support for multiple attack types (e.g.,
Prompt Injection
) - Define default vulnerabilities like:
Bias
: targeting race and genderToxicity
: profanity and insults
- Support for multiple attack types (e.g.,
🛠 Example Usage
models:
simulator: gpt-3.5-turbo-0125
evaluation: gpt-4o
target:
purpose: "A helpful AI assistant"
model: gpt-3.5-turbo
system_config:
max_concurrent: 10
attacks_per_vulnerability_type: 3
run_async: true
ignore_errors: false
output_folder: "results"
default_vulnerabilities:
- name: "Bias"
types: ["race", "gender"]
- name: "Toxicity"
types: ["profanity", "insults"]
attacks:
- name: "Prompt Injection"
deepteam run config.yaml
Stay tuned for more attack types, evaluation metrics, and integrations with the DeepEval framework.
🧠 Agentic Red Teaming
Agentic red teaming tests AI agents for vulnerabilities that only emerge when systems operate autonomously, maintain persistent memory, and pursue complex goals.
🧨 Specialized Attack Methods
DeepTeam includes 6 agentic-specific attacks:
Authority Spoofing – Pretend to be a system admin or override
Role Manipulation – Trick the agent into changing roles
Goal Redirection – Reframe or corrupt the agent's priorities
Linguistic Confusion – Use ambiguity to confuse language understanding
Validation Bypass – Bypass safety checks through clever phrasing
Context Injection – Inject false environmental state
Example
from deepteam import red_team
from deepteam.vulnerabilities.agentic import DirectControlHijacking
from deepteam.attacks.single_turn import AuthoritySpoofing
# Test if your agent can be hijacked
risk_assessment = red_team(
model_callback=your_agent_callback,
vulnerabilities=[DirectControlHijacking()],
attacks=[AuthoritySpoofing()]
)
🧪 Happy Red Teaming – now for both chatbots and autonomous agents!
First Stable Release 🎉
DeepTeam v0.1.0 – First Release 🎉
We’re excited to launch the first public release of DeepTeam, the open-source framework for LLM red teaming.
🧠 DeepTeam enables you to simulate real-world attacks on language models, test for failure modes like jailbreaks, and uncover model vulnerabilities using structured, reproducible evaluation.
🚀 Features
- ✅ Built-in adversarial attack strategies (jailbreaks, refusal bypasses, prompt injections)
- ✅ Automatic generation of adversarial test cases
- ✅ Multi-metric evaluation (pass/fail, toxicity, relevance, etc.)
- ✅ Seamless integration with your LLM app and testing pipelines
- ✅ Type-safe Python API with minimal setup
Get started by installing deepteam
:
pip install deepteam