How to Pentest AI Chatbots & LLMs

AI chatbots and LLMs are deployed everywhere — customer support, code generation, healthcare, finance — yet their attack surface is poorly understood. Traditional web application pentesting misses an entire class of vulnerabilities unique to language models. AI security testing is an emerging discipline with its own techniques, tools, and risk frameworks.

OWASP Top 10 for LLM Applications

The OWASP Top 10 for LLM Applications defines the most critical security risks in LLM-powered systems. These are the vulnerabilities every AI security assessment should cover:

LLM01

Prompt Injection

Manipulating LLM behavior through crafted inputs that override system instructions or inject malicious context.

LLM02

Insecure Output Handling

Failing to validate or sanitize LLM outputs before passing them to downstream systems, enabling XSS, SSRF, or code execution.

LLM03

Training Data Poisoning

Corrupting training data to introduce backdoors, biases, or vulnerabilities into the model's behavior.

LLM04

Model Denial of Service

Crafting inputs that consume excessive resources, causing degraded performance or complete service unavailability.

LLM05

Supply Chain Vulnerabilities

Exploiting weaknesses in third-party components, pre-trained models, plugins, or training data pipelines.

LLM06

Sensitive Information Disclosure

Extracting confidential data from models through carefully crafted prompts — PII, API keys, system prompts, or training data.

LLM07

Insecure Plugin Design

Exploiting plugins that grant LLMs access to external systems without proper input validation or access controls.

LLM08

Excessive Agency

LLMs with too many permissions or autonomy performing unintended actions — file access, API calls, or system commands.

LLM09

Overreliance

Trusting LLM outputs without verification, leading to misinformation, fabricated code, or incorrect security decisions.

LLM10

Model Theft

Unauthorized extraction of model weights, parameters, or architecture through API queries or side-channel attacks.

Security Testing Techniques

Pentesting LLMs requires techniques that don't exist in traditional security testing. These are the core methods used to assess AI systems:

💉

Prompt Injection

Direct & Indirect

Direct injection overrides system instructions with user-supplied prompts. Indirect injection hides malicious instructions in external data sources the LLM processes — documents, web pages, or database records.

🔓

Jailbreaking

Guardrail Bypass

Bypassing safety guardrails through role-play scenarios (DAN prompts), hypothetical framing, encoding tricks (ROT13, base64), multi-step escalation, or acrostic patterns that evade content filters.

📤

Data Exfiltration

Information Extraction

Extracting training data, system prompts, or sensitive information through summarization attacks, language switching, role-play, or carefully structured queries that bypass output filters.

🔄

Model Inversion

Reverse Engineering

Reconstructing training inputs by analyzing model outputs and confidence scores. Repeated queries can reveal private data used during training, including PII and proprietary information.

🎯

Adversarial Inputs

Evasion

Crafting inputs specifically designed to cause misclassification, bypass safety checks, or trigger unexpected behavior. Includes token manipulation, Unicode tricks, and embedding-space attacks.

zeScanner's LLM Pentest Agent

zeScanner includes a dedicated LLM Security Agent that automates AI red teaming. It auto-detects LLM-powered endpoints (OpenAI, Ollama, Anthropic, LiteLLM, vLLM) and runs 36 security probes across 6 categories:

System Prompt Extraction

LLM07 6 probes
Role-play extractionLanguage switchingBase64 encoding tricksSummarization attacksInstruction reflectionContext window overflow

Prompt Injection

LLM01 6 probes
Delimiter confusionRole overrideXML/JSON injectionContext escapeInstruction hierarchy bypassMulti-turn manipulation

Information Disclosure

LLM02 5 probes
API key leak probingEnvironment variable extractionTraining data extractionInfrastructure detail leaksInternal URL discovery

Output Manipulation

LLM05 5 probes
XSS payload generationSQL injection via outputMarkdown injectionShell command injectionSSRF through output links

Excessive Agency

LLM06 5 probes
URL fetch requestsFile read attemptsCommand execution probesTool abuse escalationPlugin chaining attacks

Jailbreak

LLM01 9 probes
DAN promptsHypothetical framingROT13 encodingMulti-step escalationAcrostic patternsRole-play persona swapToken smugglingInstruction nestingFew-shot manipulation

How It Works

# Standalone LLM security scan

$ zescanner scan --llm-target https://api.example.com/chat

# Auto-detects LLM API type (OpenAI-compatible, Ollama, etc.)
# Runs 36 probes across 6 categories
# Scores: pass / fail / partial with confidence levels
# Maps findings to OWASP LLM Top 10 (LLM01-LLM07)

# Or as part of a full web scan (auto-detects LLM endpoints)

$ zescanner scan --target example.com --profile full

Each probe returns a pass, fail, or partial result with a confidence score. Results are mapped to the relevant OWASP LLM risk category and integrated into the overall security report. The agent uses two tools: llm_probe for conversational attacks and curl_probe for API-level testing.

OWASP Coverage

The LLM Security Agent currently covers 5 of the OWASP Top 10 for LLM Applications:

  • LLM01 — Prompt Injection — Direct injection, delimiter confusion, role override, and jailbreak techniques.
  • LLM02 — Insecure Output Handling — Tests for XSS, SQL injection, and shell command generation in outputs.
  • LLM05 — Supply Chain — Probes for output manipulation that could compromise downstream systems.
  • LLM06 — Sensitive Information Disclosure — API key extraction, environment variable leaks, training data exposure.
  • LLM07 — Insecure Plugin Design — System prompt extraction, tool abuse, and excessive agency testing.

Why AI Security Testing Matters Now

Every organization deploying LLMs is exposed to risks that traditional security tools cannot detect. A web application firewall won't catch a prompt injection. A vulnerability scanner won't test for jailbreaks. As AI becomes embedded in critical business processes, dedicated LLM security testing is no longer optional — it's a fundamental part of the security assessment lifecycle.

Related Questions

Test the security of your AI deployments