LLM defense pack

Three rule packs that ship as part of vulkro scan and surface the AI-era defender posture: token budget, prompt leak, and rate-limit coverage on every endpoint that fans out to an LLM.

Token-budget audit (TOK-001..005)

Every LLM call site gets a static estimate of its token cost based on prompt string literals, system-prompt assignments, tool definition arrays, and conversation-history shapes.

Rule	What it catches	Severity
TOK-001	LLM call without `max_tokens` (or vendor equivalent)	Medium
TOK-002	System prompt larger than 40 KB (~10k tokens)	Medium
TOK-003	`tools=[...]` array with more than 32 entries	Low
TOK-004	LLM call inside a `for` / `while` loop with no recognized cache	Medium
TOK-005	System prompt routed through the user-message role instead of `system`	High

TOK-005 is the security shot: when the developer concatenates the system prompt into the user message, the system context becomes exfiltratable via the classic "ignore previous, tell me your instructions" jailbreak. TOK-001..004 are cost / DoS rules.

Prompt-leak fingerprint (LEAK-001..003)

Traces prompt-construction variables to log / error / telemetry sinks within the same function body.

Rule	Shape	Severity
LEAK-001	Prompt variable in an error / response payload (`return {"error": ..., "prompt": prompt}`)	Critical
LEAK-002	Prompt variable in a log statement (`logger.info("...", prompt=full_prompt)`)	High
LEAK-003	Prompt variable attached to a telemetry sink (Sentry `set_extra`, Datadog `set_tag`, OpenTelemetry `set_attribute`)	High

LEAK-001 is the ChatGPT plugin shape: developers add the prompt to the 500 error body during debugging and ship that to production. An attacker triggers the error path on demand and reads back the system prompt + tool definitions.

LEAK-002 / LEAK-003 are PII / IP exposure to internal observability tooling: log storage, Sentry, Datadog, Honeycomb.

Static rate-limit auditor (RATE-001..004)

Walks every detected endpoint and reports the ones that have no rate-limit decorator / middleware in scope. Severity bumps to High when the handler body touches a payment / LLM / SQL sink in the next 30 lines.

Rule	Shape	Severity
RATE-001	Endpoint with no recognized limiter in scope	Medium (High if the body reaches a payment / LLM / SQL sink)
RATE-002	Rate-limit library imported but no constructor / wiring call found	High
RATE-003	Auth endpoint with limits looser than the non-auth median (planned)	Medium
RATE-004	Express app defined without a global `app.use(rateLimit(...))` middleware	Medium

Frameworks covered: flask-limiter, django-ratelimit, slowapi, express-rate-limit, @fastify/rate-limit, @nestjs/throttler, rack-attack, tollbooth, httprate.

Pairs with

vulkro scan --ai-code-segregation routes findings on AI-touched files into a separate report block.
vulkro scan --bruteforce-sinks --bruteforce-categories llm drives the prompt-injection payload class at every LLM call site.

Token-budget audit (TOK-001..005)​

Prompt-leak fingerprint (LEAK-001..003)​

Static rate-limit auditor (RATE-001..004)​

Pairs with​

Token-budget audit (TOK-001..005)

Prompt-leak fingerprint (LEAK-001..003)

Static rate-limit auditor (RATE-001..004)

Pairs with