with one click
pysa-json-models
// Use when reading, writing, or debugging Pysa JSON model output (.models files). Use when working with taint models that describe sources, sinks, TITO, sanitizers, or issues in JSON format.
// Use when reading, writing, or debugging Pysa JSON model output (.models files). Use when working with taint models that describe sources, sinks, TITO, sanitizers, or issues in JSON format.
| name | pysa-json-models |
| description | Use when reading, writing, or debugging Pysa JSON model output (.models files). Use when working with taint models that describe sources, sinks, TITO, sanitizers, or issues in JSON format. |
| oncalls | ["pysa"] |
Pysa outputs analysis results as newline-delimited JSON (NDJSON). Each line is a self-contained JSON object with "kind" and "data" fields. There are two kinds: "model" (a callable's taint behavior) and "issue" (a detected vulnerability).
Every JSON object is wrapped as:
{"kind": "model", "data": { ... }}
{"kind": "issue", "data": { ... }}
Models and issues are separate top-level objects. Issues are never nested inside models.
{
"callable": "module.function_name",
"filename": "module.py",
"callable_line": 42,
"sources": [ ... ],
"sinks": [ ... ],
"tito": [ ... ],
"parameter_sources": [ ... ],
"global_sanitizer": { ... },
"parameters_sanitizer": { ... },
"sanitizers": [ ... ],
"modes": [ "Obscure", ... ]
}
All fields except callable are optional. Empty fields are omitted entirely — never include empty arrays [] or objects {}.
Note: "sources" represents generations (taint produced by the function). This is distinct from "parameter_sources" which represents sources on parameters.
"module.function_name""module.ClassName.method_name""Obj{module.ClassName.attribute_name}"Available modes: Obscure, SkipObscure, SkipAnalysis, SkipOverrides, AnalyzeAllOverrides, Entrypoint, IgnoreDecorator, SkipModelBroadening, InferSelfTito, InferArgumentTito.
Each of sources, sinks, tito, and parameter_sources is a list of port entries:
{
"port": "<port>",
"taint": [ <trace_element>, ... ]
}
| Port | Meaning |
|---|---|
result | Return value |
result[field] | Return value with access path |
formal(name, position=N) | Named parameter at position N |
formal(name, position=N, positional_only) | Positional-only parameter |
formal(name, position=N)[field][subfield] | Parameter with access path |
formal(*args, position=N) | Variadic positional args |
formal(**kwargs) | Variadic keyword args |
formal(**kwargs, excluded=[x]) | Kwargs excluding named params |
formal($global, position=0) | Global/attribute model port |
Key rule: position is always included in formal() ports (e.g., formal(x, position=0), not formal(x)).
For sources/parameter_sources, the port indicates where taint is produced (typically result or result[field]).
For sinks, the port indicates where taint is consumed (typically formal(...) with optional access path).
For tito, the port indicates the input side (always formal(...)). The output side is encoded inside the taint entry via return_paths.
Each element in a "taint" list is a trace element keyed by its call info. There are four call info variants:
{
"kinds": [ { "kind": "Test" } ],
"declaration": null
}
This is the leaf of the trace — the taint is directly declared by the user, not propagated from another function.
{
"kinds": [ { "kind": "Test", "length": 0 } ],
"origin": { "line": 16, "start": 15, "end": 20 }
}
Represents a call to a function with a user-declared model. The origin location is where the call happens. "call_site" may also be present (e.g., "call_site": "16:4-16:21").
{
"kinds": [ { "kind": "Test", "length": 2 } ],
"call": {
"position": { "line": 20, "start": 8, "end": 22 },
"resolves_to": [ "module.callee_function" ],
"port": "result[a]"
}
}
Represents a hop: taint flows from callee_function at the given port. resolves_to is a list (can have multiple callees for overrides). "call_site" may also appear inside the "call" object.
{
"kinds": [ { "return_paths": { "": 0 }, "kind": "LocalReturn" } ],
"tito": {}
}
Used in "tito" port entries. The "tito" call info is always {} (empty object, never null).
Each "kinds" list entry describes a taint kind with optional metadata:
{
"kind": "Test",
"length": 2,
"leaves": [ { "name": "pysa._test_source", "port": "leaf:return" } ],
"features": [ { "always-via": "special_source" } ],
"local_features": [ { "always-via": "special_sink" } ],
"return_paths": { "": 0 }
}
| Field | Description |
|---|---|
kind | Source/sink kind name (e.g., Test, UserControlled, SQL, RemoteCodeExecution). For TITO: LocalReturn |
length | Trace length (hops to closest leaf). Omitted when 0 |
leaves | Set of leaf declarations this taint originates from. Each leaf: {"name": "qualified.name", "port": "leaf:return"} |
features | Propagated breadcrumbs from callees |
local_features | Breadcrumbs introduced in this function |
return_paths | (TITO only) Maps access paths to collapse depths, e.g. {"": 0} (identity), {"[field]": 3} (into field) |
leaf:return — source on return valueleaf:arg — sink on argumentleaf:arg[field] — sink on argument fieldproducer:N:formal(M) — cross-repository produceranchor:formal(M) — cross-repository anchorLocal features and tito positions can appear at the trace element level (outside kinds):
{
"kinds": [ ... ],
"local_features": [ { "has": "first-index" }, { "first-index": "<numeric>" } ],
"tito_positions": [ { "line": 71, "start": 34, "end": 37 } ],
"origin": { ... }
}
local_features (trace-level): Breadcrumbs introduced at this particular call site within the current function. These describe what happens to the data at this point in the flow — for example, first-index records the first dictionary key or list index used to access tainted data. Unlike kind-level local_features (which are specific to a taint kind), trace-level local features apply across all kinds in the trace element.
tito_positions: Locations in the current function where taint was propagated via taint-in-taint-out. Each entry is a source position (line, start, end) pointing to an expression where tainted data flowed through a TITO function call. This helps trace how data moved through the function before reaching a sink or being returned.
Features use a prefix to indicate whether they appear on all paths (always-) or some paths (no prefix):
{ "always-via": "special_source" }
{ "via": "tito" }
{ "always-type": "bool" }
{ "has": "first-index" }
{ "first-index": "<numeric>" }
{ "first-index": "arg" }
{ "has": "first-field" }
{ "first-field": "attribute_name" }
"global_sanitizer": {
"sources": "All",
"sinks": [ "SQL" ],
"tito": { "sources": [ "UserControlled" ], "sinks": "All" }
}
Each of sources, sinks, tito is optional. Values are either "All" or a list of kind names. The tito sub-object has its own sources/sinks.
Same format as global sanitizer, under key "parameters_sanitizer".
"sanitizers": [
{
"port": "formal(x, position=0)",
"sources": "All",
"sinks": [ "TestSink" ]
}
]
Issues are separate top-level objects, not nested inside models:
{
"kind": "issue",
"data": {
"callable": "module.function_name",
"callable_line": 19,
"code": 5002,
"line": 21,
"start": 15,
"end": 16,
"filename": "module.py",
"message": "Data from [Test] source(s) may reach [Test] sink(s)",
"traces": [
{
"name": "forward",
"roots": [ <trace_elements> ]
},
{
"name": "backward",
"roots": [ <trace_elements> ]
}
],
"features": [ { "always-via": "special_source" } ],
"sink_handle": {
"kind": "Call",
"callee": "module.sink_function",
"index": 0,
"parameter": "formal(arg)"
},
"master_handle": "module.function:5002:0:Call|module.sink|0|formal(arg):md5hash"
}
}
Trace elements in roots use the same format as model trace elements (kinds/call/origin/declaration).
| Variant | Fields |
|---|---|
Call | callee, index, parameter |
Global | callee, index |
Return | (none) |
LiteralStringSink | sink |
ConditionalTestSink | sink |
StringFormat | callee, index, parameter_index |
parameter in sink handle uses the callee's parameter name (e.g., formal(arg) for _test_sink(arg)), not the caller's argument variable name.
| Mistake | Correct |
|---|---|
"formal(x)" | "formal(x, position=0)" — always include position |
"declaration": {} or "decl": null | "declaration": null — exact key and value matter |
"tito": null (call info) | "tito": {} — empty object, not null |
TITO port = "result" | TITO port = "formal(...)" — port is the input side |
Issues inside model "errors" | Issues are separate {"kind": "issue"} objects |
Including "sources": [] | Omit empty fields entirely |
Leaves: {"kind": "Test", "name": "..."} | Leaves: {"name": "pysa._test_source", "port": "leaf:return"} |
| sink_handle parameter = caller's variable | sink_handle parameter = callee's parameter name |
Given:
# example.py
from pysa import _test_source, _test_sink
def get_source():
return _test_source()
def flow():
x = get_source()
_test_sink(x)
_test_source (user-declared source):{
"kind": "model",
"data": {
"callable": "pysa._test_source",
"filename": "pysa/__init__.pyi",
"callable_line": 5,
"sources": [
{
"port": "result",
"taint": [
{
"kinds": [
{
"features": [ { "always-via": "special_source" } ],
"kind": "Test"
}
],
"declaration": null
}
]
}
],
"modes": [ "Obscure" ]
}
}
get_source (inferred source via call):{
"kind": "model",
"data": {
"callable": "example.get_source",
"filename": "example.py",
"callable_line": 4,
"sources": [
{
"port": "result",
"taint": [
{
"kinds": [
{
"local_features": [ { "always-via": "special_source" } ],
"leaves": [ { "name": "pysa._test_source", "port": "leaf:return" } ],
"kind": "Test"
}
],
"origin": { "line": 5, "start": 11, "end": 25 }
}
]
}
]
}
}
flow:{
"kind": "issue",
"data": {
"callable": "example.flow",
"callable_line": 7,
"code": 5002,
"line": 9,
"start": 15,
"end": 16,
"filename": "example.py",
"message": "Data from [Test] source(s) may reach [Test] sink(s)",
"traces": [
{
"name": "forward",
"roots": [
{
"kinds": [
{
"features": [ { "always-via": "special_source" } ],
"leaves": [ { "name": "pysa._test_source", "port": "leaf:return" } ],
"length": 1,
"kind": "Test"
}
],
"call": {
"position": { "line": 8, "start": 8, "end": 20 },
"resolves_to": [ "example.get_source" ],
"port": "result"
}
}
]
},
{
"name": "backward",
"roots": [
{
"kinds": [
{
"local_features": [ { "always-via": "special_sink" } ],
"leaves": [ { "name": "pysa._test_sink", "port": "leaf:arg" } ],
"kind": "Test"
}
],
"origin": { "line": 9, "start": 15, "end": 16 }
}
]
}
],
"features": [
{ "always-via": "special_source" },
{ "always-via": "special_sink" }
],
"sink_handle": {
"kind": "Call",
"callee": "pysa._test_sink",
"index": 0,
"parameter": "formal(arg)"
},
"master_handle": "example.flow:5002:0:Call|pysa._test_sink|0|formal(arg):abcdef1234567890"
}
}
source/interprocedural_analyses/taint/model.mlsource/interprocedural_analyses/taint/domains.mlsource/interprocedural_analyses/taint/features.mlsource/interprocedural_analyses/taint/issue.mlsource/interprocedural_analyses/taint/sanitize.mlUse when you need to understand the Pyre or Pysa codebase architecture, find where specific functionality is implemented, or navigate the source directory structure. Use when working on OCaml code, debugging type checking or taint analysis issues, or exploring the codebase.
Use when creating commits or diffs to apply the correct title prefix, reviewer, and test plan based on the area of the codebase being changed.
Use when debugging a Pysa false negative (missing taint issue), comparing two Pysa output directories, or finding where taint flow is lost.
Use when running, debugging, updating, or creating Pysa end-to-end integration tests. Use when taint analysis tests fail, when expected output files need updating, or when working with .models, .cg, .hofcg, .overrides files under `source/interprocedural_analyses/taint/test/integration`.