How to Pentest AI Chatbots & LLMs
AI chatbots and LLMs are deployed everywhere — customer support, code generation, healthcare, finance — yet their attack surface is poorly understood. Traditional web application pentesting misses an entire class of vulnerabilities unique to language models. AI security testing is an emerging discipline with its own techniques, tools, and risk frameworks.
OWASP Top 10 for LLM Applications
The OWASP Top 10 for LLM Applications defines the most critical security risks in LLM-powered systems. These are the vulnerabilities every AI security assessment should cover:
Prompt Injection
Manipulating LLM behavior through crafted inputs that override system instructions or inject malicious context.
Insecure Output Handling
Failing to validate or sanitize LLM outputs before passing them to downstream systems, enabling XSS, SSRF, or code execution.
Training Data Poisoning
Corrupting training data to introduce backdoors, biases, or vulnerabilities into the model's behavior.
Model Denial of Service
Crafting inputs that consume excessive resources, causing degraded performance or complete service unavailability.
Supply Chain Vulnerabilities
Exploiting weaknesses in third-party components, pre-trained models, plugins, or training data pipelines.
Sensitive Information Disclosure
Extracting confidential data from models through carefully crafted prompts — PII, API keys, system prompts, or training data.
Insecure Plugin Design
Exploiting plugins that grant LLMs access to external systems without proper input validation or access controls.
Excessive Agency
LLMs with too many permissions or autonomy performing unintended actions — file access, API calls, or system commands.
Overreliance
Trusting LLM outputs without verification, leading to misinformation, fabricated code, or incorrect security decisions.
Model Theft
Unauthorized extraction of model weights, parameters, or architecture through API queries or side-channel attacks.
Security Testing Techniques
Pentesting LLMs requires techniques that don't exist in traditional security testing. These are the core methods used to assess AI systems:
Prompt Injection
Direct & IndirectDirect injection overrides system instructions with user-supplied prompts. Indirect injection hides malicious instructions in external data sources the LLM processes — documents, web pages, or database records.
Jailbreaking
Guardrail BypassBypassing safety guardrails through role-play scenarios (DAN prompts), hypothetical framing, encoding tricks (ROT13, base64), multi-step escalation, or acrostic patterns that evade content filters.
Data Exfiltration
Information ExtractionExtracting training data, system prompts, or sensitive information through summarization attacks, language switching, role-play, or carefully structured queries that bypass output filters.
Model Inversion
Reverse EngineeringReconstructing training inputs by analyzing model outputs and confidence scores. Repeated queries can reveal private data used during training, including PII and proprietary information.
Adversarial Inputs
EvasionCrafting inputs specifically designed to cause misclassification, bypass safety checks, or trigger unexpected behavior. Includes token manipulation, Unicode tricks, and embedding-space attacks.
zeScanner's LLM Pentest Agent
zeScanner includes a dedicated LLM Security Agent that automates AI red teaming. It auto-detects LLM-powered endpoints (OpenAI, Ollama, Anthropic, LiteLLM, vLLM) and runs 36 security probes across 6 categories:
System Prompt Extraction
Prompt Injection
Information Disclosure
Output Manipulation
Excessive Agency
Jailbreak
How It Works
# Standalone LLM security scan
$ zescanner scan --llm-target https://api.example.com/chat
# Auto-detects LLM API type (OpenAI-compatible, Ollama, etc.)
# Runs 36 probes across 6 categories
# Scores: pass / fail / partial with confidence levels
# Maps findings to OWASP LLM Top 10 (LLM01-LLM07)
# Or as part of a full web scan (auto-detects LLM endpoints)
$ zescanner scan --target example.com --profile full
Each probe returns a pass, fail, or partial result with a confidence score. Results are mapped to the relevant OWASP LLM risk category and integrated into the overall security report. The agent uses two tools: llm_probe for conversational attacks and curl_probe for API-level testing.
OWASP Coverage
The LLM Security Agent currently covers 5 of the OWASP Top 10 for LLM Applications:
- LLM01 — Prompt Injection — Direct injection, delimiter confusion, role override, and jailbreak techniques.
- LLM02 — Insecure Output Handling — Tests for XSS, SQL injection, and shell command generation in outputs.
- LLM05 — Supply Chain — Probes for output manipulation that could compromise downstream systems.
- LLM06 — Sensitive Information Disclosure — API key extraction, environment variable leaks, training data exposure.
- LLM07 — Insecure Plugin Design — System prompt extraction, tool abuse, and excessive agency testing.
Why AI Security Testing Matters Now
Every organization deploying LLMs is exposed to risks that traditional security tools cannot detect. A web application firewall won't catch a prompt injection. A vulnerability scanner won't test for jailbreaks. As AI becomes embedded in critical business processes, dedicated LLM security testing is no longer optional — it's a fundamental part of the security assessment lifecycle.
Related Questions
Test the security of your AI deployments