Skip to content

Releases: confident-ai/deepteam

🎉 New CLI tool, Agentic Red Teaming

02 Jul 07:01
Compare
Choose a tag to compare

🚀 DeepTeam CLI Release

We’re excited to release the first version of the DeepTeam CLI – a powerful command-line tool for red teaming and evaluating LLM applications with DeepEval.

✨ Features

  • Red Team Simulation

    • Easily specify simulator and evaluation models (gpt-3.5-turbo-0125, gpt-4o, etc.)
    • Attack LLM systems with predefined vulnerability categories (e.g., Bias, Toxicity)
  • Target System Configuration

    • Test both foundational models (like gpt-3.5-turbo) and full LLM applications via custom Python wrappers
    • Simple YAML config structure for defining the target model's purpose and behavior
  • System Controls

    • Set concurrency and parallelism: max_concurrent, run_async
    • Specify how many attacks to run per vulnerability type
    • Optional error handling (ignore_errors) and result storage (output_folder)
  • Pluggable Vulnerabilities and Attacks

    • Support for multiple attack types (e.g., Prompt Injection)
    • Define default vulnerabilities like:
      • Bias: targeting race and gender
      • Toxicity: profanity and insults

🛠 Example Usage

models:
  simulator: gpt-3.5-turbo-0125
  evaluation: gpt-4o

target:
  purpose: "A helpful AI assistant"
  model: gpt-3.5-turbo

system_config:
  max_concurrent: 10
  attacks_per_vulnerability_type: 3
  run_async: true
  ignore_errors: false
  output_folder: "results"

default_vulnerabilities:
  - name: "Bias"
    types: ["race", "gender"]
  - name: "Toxicity"
    types: ["profanity", "insults"]

attacks:
  - name: "Prompt Injection"
deepteam run config.yaml

Stay tuned for more attack types, evaluation metrics, and integrations with the DeepEval framework.

🧠 Agentic Red Teaming

Agentic red teaming tests AI agents for vulnerabilities that only emerge when systems operate autonomously, maintain persistent memory, and pursue complex goals.

🧨 Specialized Attack Methods

DeepTeam includes 6 agentic-specific attacks:

Authority Spoofing – Pretend to be a system admin or override

Role Manipulation – Trick the agent into changing roles

Goal Redirection – Reframe or corrupt the agent's priorities

Linguistic Confusion – Use ambiguity to confuse language understanding

Validation Bypass – Bypass safety checks through clever phrasing

Context Injection – Inject false environmental state

Example

from deepteam import red_team
from deepteam.vulnerabilities.agentic import DirectControlHijacking
from deepteam.attacks.single_turn import AuthoritySpoofing

# Test if your agent can be hijacked
risk_assessment = red_team(
    model_callback=your_agent_callback,
    vulnerabilities=[DirectControlHijacking()],
    attacks=[AuthoritySpoofing()]
)

🧪 Happy Red Teaming – now for both chatbots and autonomous agents!

First Stable Release 🎉

23 May 19:36
Compare
Choose a tag to compare

DeepTeam v0.1.0 – First Release 🎉

We’re excited to launch the first public release of DeepTeam, the open-source framework for LLM red teaming.

🧠 DeepTeam enables you to simulate real-world attacks on language models, test for failure modes like jailbreaks, and uncover model vulnerabilities using structured, reproducible evaluation.

🚀 Features

  • ✅ Built-in adversarial attack strategies (jailbreaks, refusal bypasses, prompt injections)
  • ✅ Automatic generation of adversarial test cases
  • ✅ Multi-metric evaluation (pass/fail, toxicity, relevance, etc.)
  • ✅ Seamless integration with your LLM app and testing pipelines
  • ✅ Type-safe Python API with minimal setup

Get started by installing deepteam:

pip install deepteam

Docs here: https://www.trydeepteam.com/docs/getting-started