LLM and AI response security
As Salesforce orgs wire Einstein and large-language-model responses into the
UI, into Apex sinks, and into agent tooling, the model's output becomes
attacker-influenced data that has to be treated like any other untrusted
input. This surface covers three things: an LLM or Einstein response reaching
a dangerous sink without escaping (SF-LLM-SANITIZE-001), a deterministic
confidence calibration that re-ranks findings without ever calling a model
(SF-AI-TRIAGE-001 and SF-AI-TRIAGE-002), and the scanning of MCP server
and tool definitions wired into the org (SF-MCP-000 through SF-MCP-003).
A note on how the AI-triage findings work: they are deterministic. The calibration is rule-driven and runs offline on your machine. Nothing here sends your code, your findings, or your prompts to a model. The word "AI" in the finding IDs refers to the surface being analyzed (LLM output, agent tooling), not to the analysis method.
SF-LLM-SANITIZE-001: unescaped LLM / Einstein output reaching a sink
What triggers it
A value that originates from an LLM or Einstein response flows to a sink that
renders or executes it, with no escaping or sanitization on the path. The
source side recognizes the shapes an Apex or Lightning surface uses to read
model output: an Einstein / prompt-template invocation result, a connect-API
generation response, or an @AuraEnabled method returning model text to a
component. The sink side is any rendering or execution surface that treats
its input as markup or code: a Visualforce expression with escape="false",
a manual-DOM write in LWC or Aura (innerHTML and siblings), or an Apex
string that builds a dynamic query or dynamic page.
The taint engine connects the two. The finding emits when model output reaches such a sink and no recognized sanitizer sits on the path.
Why it matters
An LLM response is not trusted content. A prompt-injection payload, a poisoned grounding source, or simply a model that echoes user-supplied text can put attacker-controlled markup or script into the response. If that response is rendered without escaping, it becomes stored or reflected XSS in the org's UI, with the model as the injection vector. The same value reaching a dynamic-SOQL sink becomes an injection into the query.
How to fix it
Escape model output for the sink it lands in: HTML-encode before rendering
(String.escapeHtml4, or let Visualforce escape by default rather than
setting escape="false"), bind rather than concatenate before a dynamic
query, and prefer framework-safe rendering over manual DOM writes in LWC and
Aura. Recognized sanitizer markers on the path clear the finding:
String.escapeHtml4, String.stripHtml, a strict allowlist via
Pattern.matches, or a length-bounded extraction. See the
OWASP XSS Prevention Cheat Sheet
and CWE-79 for the general
pattern.
SF-AI-TRIAGE-001 and SF-AI-TRIAGE-002: deterministic confidence calibration
These two findings do not detect a new vulnerability class. They re-rank the findings the rest of the scan already produced, so a reviewer reads the most trustworthy signal first. The calibration is deterministic: it applies fixed, documented rules to the corroborating evidence around each finding. There is no model call.
SF-AI-TRIAGE-001: likely-false-positive cluster
What triggers it
A cluster of findings whose corroborating signals point toward low exploitability: the suspected source is not clearly attacker-reachable, a sanitizer is present but not in the exact shape the base rule recognizes, or the surrounding code carries markers (test annotations, audit-role suppressions) that lower the real-world risk. The calibration groups these and tags the cluster as likely false positive.
Why it matters
A scan that buries three real criticals under forty low-confidence advisories gets ignored. Tagging the likely-false-positive cluster lets a reviewer defer it without deleting it: the findings stay in the output, but they sort below the corroborated ones.
How to read it
Treat SF-AI-TRIAGE-001 as "review these last, and consider a documented
suppression if the calibration is right." It is advisory and does not gate
the scan's exit code on its own.
SF-AI-TRIAGE-002: fix-first corroborated critical
What triggers it
A critical or high finding that multiple independent signals corroborate: the source is clearly attacker-reachable, no sanitizer sits on the path, and the sink has real blast radius (record exposure, query injection, system-mode execution). The calibration promotes it to the top of the queue.
Why it matters
These are the findings to fix before shipping. Surfacing them first shortens the time from scan to fix on the issues that actually matter, which is the whole point of triage.
How to read it
SF-AI-TRIAGE-002 is the "do this first" list. Each promoted finding still
carries its own base finding ID and fix guidance; the triage tag only changes
its rank.
SF-MCP-000 through SF-MCP-003: MCP server and tool-definition scanning
Model Context Protocol servers and their tool definitions are configuration that grants a model the ability to act. When an org wires an MCP server into an agent surface, that server's definition becomes part of the attack surface, and Vulkro scans it.
SF-MCP-000: MCP server inventory
What triggers it
The presence of an MCP server definition wired into the org's agent surface. This is informational: it is the governance baseline that answers "which MCP servers can the org's agents reach, and what tools do they expose?" It emits at informational severity so the inventory lives in the same output as the rest of the scan.
Why it matters
You cannot reason about MCP risk without first knowing which servers and tools are in scope. The inventory is the starting point.
SF-MCP-001: untrusted or unpinned MCP server source
What triggers it
An MCP server definition that resolves to an unpinned or mutable source: a launch reference that floats to "latest" rather than a pinned version, or a mutable upstream reference. The exact version the org runs can change under it without a review.
Why it matters
An unpinned server is a supply-chain hole: whoever controls the upstream can change the tools the agent can call, after the org reviewed it. Pin the server to an immutable version so the reviewed definition is the one that runs.
How to fix it
Pin the server reference to an exact, immutable version. Re-review on every deliberate bump rather than letting it float.
SF-MCP-002: overbroad tool permission or filesystem scope
What triggers it
A tool definition that grants more reach than its purpose needs: a filesystem root broader than the task requires, a network scope with no host restriction, or a tool that exposes a destructive action without a guard.
Why it matters
An agent that can be prompted into calling a tool inherits that tool's reach. An overbroad filesystem root or an unrestricted network scope turns a prompt injection into real-world access. Scope each tool to the minimum it needs.
How to fix it
Narrow each tool's filesystem root, host allowlist, and action set to the minimum the tool's job requires. Treat the tool's granted scope as the blast radius of a successful prompt injection.
SF-MCP-003: inline secret or cleartext endpoint in an MCP definition
What triggers it
An MCP server or tool definition that carries a secret inline (an API key or
token embedded in the definition) or points at a cleartext (http://)
endpoint.
Why it matters
A secret in a definition is a secret in source: it leaks wherever the definition travels, and it cannot be rotated without a redeploy. A cleartext endpoint exposes the tool's traffic, including whatever the agent sends it, on the wire.
How to fix it
Move the secret out of the definition into a managed credential store and
reference it indirectly. Use an https:// endpoint with a valid certificate
chain. See CWE-798 for the
hardcoded-credential class.
Tuning
SF-LLM-SANITIZE-001is a taint finding: it emits only when the engine connects a recognized model-output source to a recognized sink with no sanitizer on the path. A custom sanitizer the rule does not recognize will produce a finding the calibration may then route into theSF-AI-TRIAGE-001likely-false-positive cluster. Document a genuine audited exception inline with a dated suppression so it expires.- The
SF-AI-TRIAGEfindings are advisory re-rankings and do not gate the exit code on their own. The underlying base findings keep their own severity and gating. SF-MCP-000is informational inventory and does not count toward the exit-code gate.SF-MCP-001through003are gating findings.