Skip to main content

LLM and AI response security

As Salesforce orgs wire Einstein and large-language-model responses into the UI, into Apex sinks, and into agent tooling, the model's output becomes attacker-influenced data that has to be treated like any other untrusted input. This surface covers three things: an LLM or Einstein response reaching a dangerous sink without escaping (SF-LLM-SANITIZE-001), a deterministic confidence calibration that re-ranks findings without ever calling a model (SF-AI-TRIAGE-001 and SF-AI-TRIAGE-002), and the scanning of MCP server and tool definitions wired into the org (SF-MCP-000 through SF-MCP-003).

A note on how the AI-triage findings work: they are deterministic. The calibration is rule-driven and runs offline on your machine. Nothing here sends your code, your findings, or your prompts to a model. The word "AI" in the finding IDs refers to the surface being analyzed (LLM output, agent tooling), not to the analysis method.

SF-LLM-SANITIZE-001: unescaped LLM / Einstein output reaching a sink

What triggers it

A value that originates from an LLM or Einstein response flows to a sink that renders or executes it, with no escaping or sanitization on the path. The source side recognizes the shapes an Apex or Lightning surface uses to read model output: an Einstein / prompt-template invocation result, a connect-API generation response, or an @AuraEnabled method returning model text to a component. The sink side is any rendering or execution surface that treats its input as markup or code: a Visualforce expression with escape="false", a manual-DOM write in LWC or Aura (innerHTML and siblings), or an Apex string that builds a dynamic query or dynamic page.

The taint engine connects the two. The finding emits when model output reaches such a sink and no recognized sanitizer sits on the path.

Why it matters

An LLM response is not trusted content. A prompt-injection payload, a poisoned grounding source, or simply a model that echoes user-supplied text can put attacker-controlled markup or script into the response. If that response is rendered without escaping, it becomes stored or reflected XSS in the org's UI, with the model as the injection vector. The same value reaching a dynamic-SOQL sink becomes an injection into the query.

How to fix it

Escape model output for the sink it lands in: HTML-encode before rendering (String.escapeHtml4, or let Visualforce escape by default rather than setting escape="false"), bind rather than concatenate before a dynamic query, and prefer framework-safe rendering over manual DOM writes in LWC and Aura. Recognized sanitizer markers on the path clear the finding: String.escapeHtml4, String.stripHtml, a strict allowlist via Pattern.matches, or a length-bounded extraction. See the OWASP XSS Prevention Cheat Sheet and CWE-79 for the general pattern.

SF-AI-TRIAGE-001 and SF-AI-TRIAGE-002: deterministic confidence calibration

These two findings do not detect a new vulnerability class. They re-rank the findings the rest of the scan already produced, so a reviewer reads the most trustworthy signal first. The calibration is deterministic: it applies fixed, documented rules to the corroborating evidence around each finding. There is no model call.

SF-AI-TRIAGE-001: likely-false-positive cluster

What triggers it

A cluster of findings whose corroborating signals point toward low exploitability: the suspected source is not clearly attacker-reachable, a sanitizer is present but not in the exact shape the base rule recognizes, or the surrounding code carries markers (test annotations, audit-role suppressions) that lower the real-world risk. The calibration groups these and tags the cluster as likely false positive.

Why it matters

A scan that buries three real criticals under forty low-confidence advisories gets ignored. Tagging the likely-false-positive cluster lets a reviewer defer it without deleting it: the findings stay in the output, but they sort below the corroborated ones.

How to read it

Treat SF-AI-TRIAGE-001 as "review these last, and consider a documented suppression if the calibration is right." It is advisory and does not gate the scan's exit code on its own.

SF-AI-TRIAGE-002: fix-first corroborated critical

What triggers it

A critical or high finding that multiple independent signals corroborate: the source is clearly attacker-reachable, no sanitizer sits on the path, and the sink has real blast radius (record exposure, query injection, system-mode execution). The calibration promotes it to the top of the queue.

Why it matters

These are the findings to fix before shipping. Surfacing them first shortens the time from scan to fix on the issues that actually matter, which is the whole point of triage.

How to read it

SF-AI-TRIAGE-002 is the "do this first" list. Each promoted finding still carries its own base finding ID and fix guidance; the triage tag only changes its rank.

SF-MCP-000 through SF-MCP-003: MCP server and tool-definition scanning

Model Context Protocol servers and their tool definitions are configuration that grants a model the ability to act. When an org wires an MCP server into an agent surface, that server's definition becomes part of the attack surface, and Vulkro scans it.

SF-MCP-000: MCP server inventory

What triggers it

The presence of an MCP server definition wired into the org's agent surface. This is informational: it is the governance baseline that answers "which MCP servers can the org's agents reach, and what tools do they expose?" It emits at informational severity so the inventory lives in the same output as the rest of the scan.

Why it matters

You cannot reason about MCP risk without first knowing which servers and tools are in scope. The inventory is the starting point.

SF-MCP-001: untrusted or unpinned MCP server source

What triggers it

An MCP server definition that resolves to an unpinned or mutable source: a launch reference that floats to "latest" rather than a pinned version, or a mutable upstream reference. The exact version the org runs can change under it without a review.

Why it matters

An unpinned server is a supply-chain hole: whoever controls the upstream can change the tools the agent can call, after the org reviewed it. Pin the server to an immutable version so the reviewed definition is the one that runs.

How to fix it

Pin the server reference to an exact, immutable version. Re-review on every deliberate bump rather than letting it float.

SF-MCP-002: overbroad tool permission or filesystem scope

What triggers it

A tool definition that grants more reach than its purpose needs: a filesystem root broader than the task requires, a network scope with no host restriction, or a tool that exposes a destructive action without a guard.

Why it matters

An agent that can be prompted into calling a tool inherits that tool's reach. An overbroad filesystem root or an unrestricted network scope turns a prompt injection into real-world access. Scope each tool to the minimum it needs.

How to fix it

Narrow each tool's filesystem root, host allowlist, and action set to the minimum the tool's job requires. Treat the tool's granted scope as the blast radius of a successful prompt injection.

SF-MCP-003: inline secret or cleartext endpoint in an MCP definition

What triggers it

An MCP server or tool definition that carries a secret inline (an API key or token embedded in the definition) or points at a cleartext (http://) endpoint.

Why it matters

A secret in a definition is a secret in source: it leaks wherever the definition travels, and it cannot be rotated without a redeploy. A cleartext endpoint exposes the tool's traffic, including whatever the agent sends it, on the wire.

How to fix it

Move the secret out of the definition into a managed credential store and reference it indirectly. Use an https:// endpoint with a valid certificate chain. See CWE-798 for the hardcoded-credential class.

Tuning

  • SF-LLM-SANITIZE-001 is a taint finding: it emits only when the engine connects a recognized model-output source to a recognized sink with no sanitizer on the path. A custom sanitizer the rule does not recognize will produce a finding the calibration may then route into the SF-AI-TRIAGE-001 likely-false-positive cluster. Document a genuine audited exception inline with a dated suppression so it expires.
  • The SF-AI-TRIAGE findings are advisory re-rankings and do not gate the exit code on their own. The underlying base findings keep their own severity and gating.
  • SF-MCP-000 is informational inventory and does not count toward the exit-code gate. SF-MCP-001 through 003 are gating findings.