Taint analysis
Vulkro runs three taint passes on every vulkro scan:
- Intra-procedural - within a single function.
- Inter-procedural, single-module - across functions in one file.
- Cross-file, call-graph-driven - follows the project's call graph, depth limited to 4.
A finding is emitted when tainted data - anything that flows from a source the engine considers user-controlled - reaches a sink the engine considers dangerous, without passing through a known sanitiser.
Language coverage
Dataflow taint (the three passes above) is not uniform across languages. This is the honest matrix:
| Language | Dataflow taint | Framework sources modelled |
|---|---|---|
| JavaScript / TypeScript | Yes | Express, Fastify, Koa, NestJS, Hapi |
| Python | Yes | Flask, Django, FastAPI, Starlette, aiohttp, Tornado, Litestar, DRF |
| Go | Yes | net/http, Gin, Echo, Chi, Fiber, Gorilla mux |
Ruby, Java/Kotlin, C#, and PHP were removed from the general scanner. Salesforce (Apex and the rest of the platform surfaces) ships in the separate Vulkro for Salesforce product.
What the taint engine does not do
Stating the limits plainly is more useful than implying breadth it doesn't have:
- No type inference. Sources and sinks are matched by surface syntax, so a
variable named
dbis treated as a database client. tree-sitter gives the engine syntax, not resolved types. - No taint through container element writes.
obj.x = tainted,arr[0] = tainted, andObject.assign(target, tainted)do not propagate. Reads of a deep member/subscript chain down to a tainted root are tracked (Python and JS/TS). - Loop-carried taint is off by default. Loop bodies are walked once; enable
the worklist engine with
VULKRO_TAINT_CFG=1(JS/TS and Python only). - Cross-file depth is bounded at 4 hops, and the cross-file pass is skipped on very large repos (over ~6000 modules) to keep wall-clock bounded.
Sources, sinks, sanitisers
Sources include:
req.body,req.query,req.params,req.headers(Express / Next.js)request.json(),request.form(),request.query_params(FastAPI)flask.request.*, Djangorequest.GET / POST- Function parameters of route handlers
- Reads from
os.environ/process.env(configurable; off by default to keep the noise sane)
Sinks include:
db.query("..." + tainted),cursor.execute(...)(SQL injection)child_process.exec,subprocess.run(shell=True)(command injection)eval,Function(...),pickle.loads,yaml.load(..., Loader=yaml.Loader)requests.get(tainted),fetch(tainted)(SSRF)redirect(tainted)(open redirect)- Template-as-string with user data (SSTI / XSS)
__proto__writes (prototype pollution)
Sanitisers include:
- Parameterised queries (
?/$1/:name) html.escape/bleach.clean- Allowlist-shaped enum / Pydantic / Zod schemas
urllib.parse.quotefor URLs- Constant-time comparison wrappers
Cross-file propagation
A typical case the cross-file pass catches:
# routes/admin.py
@router.post("/admin/export")
def export(req):
data = req.json() # SOURCE
return run_export(data["filter"])
# services/exports.py
def run_export(filter_expr: str):
return db.execute(f"SELECT * FROM logs WHERE {filter_expr}") # SINK
The intra-procedural pass alone would miss this - req.json() and the
SQL sink are in different files. The cross-file pass traces the call
from export to run_export and propagates the tainted argument.
Depth limit is 4 calls deep; deeper paths are skipped to keep wall-clock bounded on large repos.
Escape hatch
If a regression in cross-file taint blocks your build, set:
VULKRO_DISABLE_INTERPROC_TAINT=1 vulkro scan .
Skips just the cross-file pass; the intra-procedural and single-module passes still run.