تشغيل أي مهارة في Manus بنقرة واحدة

gitnexus-taint-analysis

النجوم٤٢٬٦٥٥

التفرعات٤٬٨٠٨

آخر تحديث١٣ يونيو ٢٠٢٦ في ٠٦:٠٤

Use when working on, reviewing, or extending GitNexus's CFG/taint/PDG subsystem (the `--pdg` layers), or when reasoning about source→sink data-flow findings. Examples: "How does taint analysis work here?", "Why didn't explain find this flow?", "Add a new sink/source", "Review the interprocedural taint code".

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

abhigyanpatwari

abhigyanpatwari/GitNexus

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مطوّرو البرمجياتمهن الحاسوب والرياضيات·SOC 15-1252

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

gitnexus-debugging

abhigyanpatwari/GitNexus

Use when the user is debugging a bug, tracing an error, or asking why something fails. Examples: "Why is X failing?", "Where does this error come from?", "Trace this bug"

2026-06-1442.7k

gitnexus-guide

abhigyanpatwari/GitNexus

Use when the user asks about GitNexus itself — available tools, how to query the knowledge graph, MCP resources, graph schema, or workflow reference. Examples: "What GitNexus tools are available?", "How do I use GitNexus?"

2026-06-1442.7k

gitnexus-debugging

abhigyanpatwari/GitNexus

Use when the user is debugging a bug, tracing an error, or asking why something fails. Examples: "Why is X failing?", "Where does this error come from?", "Trace this bug"

2026-06-1442.7k

gitnexus-guide

abhigyanpatwari/GitNexus

2026-06-1442.7k

gitnexus-debugging

abhigyanpatwari/GitNexus

Trace bugs through call chains using knowledge graph

2026-06-1442.7k

gitnexus-pdg-query

abhigyanpatwari/GitNexus

Use when querying or extending GitNexus's PDG control/data-dependence surface (the `pdg_query` MCP tool, CDG/REACHING_DEF edges), or reasoning about "what controls X" / "where does Y flow" / guard clauses. Examples: "what guards this statement?", "trace this variable within the function", "why is the pdg_query result empty?", "add a CDG query".

2026-06-1342.7k

name	gitnexus-taint-analysis
description	Use when working on, reviewing, or extending GitNexus's CFG/taint/PDG subsystem (the `--pdg` layers), or when reasoning about source→sink data-flow findings. Examples: "How does taint analysis work here?", "Why didn't explain find this flow?", "Add a new sink/source", "Review the interprocedural taint code".

CFG & Taint Analysis with GitNexus

Expert knowledge for the opt-in --pdg program-analysis subsystem: control-flow graphs, reaching definitions, and intra- + inter-procedural taint. Read this before touching gitnexus/src/core/ingestion/cfg/** or gitnexus/src/core/ingestion/taint/**, or when explaining a finding.

When to Use

"How does the taint engine work / why is this flow (not) reported?"
Adding a source, sink, or sanitizer to the model.
Extending or reviewing the CFG / reaching-defs / taint / summary code.
Understanding the explain MCP tool's findings (intra- vs inter-procedural).
Debugging a false positive or false negative in --pdg output.

The layered substrate (build order)

Taint runs on the graph, not beside it. Each layer is opt-in behind --pdg and a default analyze run is byte-identical (the golden parity gate is the hard floor for every change here).

L1  CFG            per-function basic blocks + control-flow edges   (M1 #2081)
L2  REACHING_DEF   GEN/KILL def→use data dependence (pure solver)   (M2 #2082)
L3  Taint (intra)  source→sink over RD facts, minus sanitizers      (M3 #2083)
L4  Taint (inter)  per-function summaries composed over CALLS       (M4 #2084)

Worker-built, main-thread-solved. The parse worker builds each function's CFG + harvests def/use + call-site facts onto ParsedFile.cfgSideChannel (plain, structured-clone-safe data — never AST nodes). The main thread runs the pure solvers. NEVER re-parse on the main thread (re-introduces the #1983 OOM).
In-phase emit (KTD1). L1–L4-harvest all run INSIDE the scope-resolution pdg window (scope-resolution/pipeline/run.ts, gated input.pdg === true), because the disk-backed ParsedFile store is cleared when that phase ends — a standalone post-mro phase would read empty data. The cross-function fixpoint (L4) is the exception: it runs in its OWN registered phase (taintSummaries) AFTER scope-resolution, because it needs the COMPLETE call graph, and consumes small plain summary data threaded out via ScopeResolutionOutput.
Pure-solver contract. computeReachingDefs, computeTaintFlows, harvestFunctionSummary, and solveInterprocTaint are pure and deterministic (no graph, no I/O, no logger; sorted outputs). Snapshot tests and content-derived edge ids depend on it.

Intra-procedural taint (L3)

Forward reachability over RD facts from matched sources to matched sinks, killed by sanitizers. Key design points worth internalizing:

Occurrence-tagged sites. A flat per-arg binding set cannot tell exec(escape(x)) (safe) from exec(x) (finding); the harvest records nested call structure (SiteRecord.parent/via-tags) so sanitizer interposition is precise.
Kind-set sanitizer model. A taint carries a set of neutralized SinkKinds; a sink fires unless its kind is in the set. So escape(req.body) suppresses res.send (xss) but STILL fires db.query (sql) — a kind-blind kill would be a suppressed live injection (the forbidden FN direction). path.basename(t) neutralizes path-traversal only, not command-injection.
Statement-level finding identity. NOT block-pair (block conflation drops distinct findings; exec(req.body, req.query) is two findings).
Persisted as TAINTED edges (BasicBlock→BasicBlock); the path rides the reason column via the shared versioned codec (taint/path-codec.ts).

Interprocedural taint (L4) — the functional/summary method

The production approach (Sharir-Pnueli 1981; the same shape as Meta's Pysa and Mariana Trench, and FB Infer) — NOT full IFDS tabulation. Each function is reduced to a compact summary, and summaries are composed over the already- resolved CALLS graph.

Summary shape (taint/summary-model.ts, whole-parameter granularity):

Edge	Meaning	Analogue
`param→return`	a param flows to the return value	TITO — reserved (the floor already covers its recall; precision pass deferred)
`param→callee-arg`	a param flows into arg j of a call (carries the path's neutralized sink kinds)	TITO into callee
`param→sink`	a param reaches a modelled sink	partial/triggered sink
`source→return`	the function generates+returns a source	generative — composed via the caller's `callResults`
`source→callee-arg`	a generated source flows into a call	fixpoint SEED
`callResults`	a user-function call's result flows to a sink/return/callee-arg in the caller	composes with callee `source→return`

The fixpoint (taint/interproc-solver.ts): the unit is (function, parameter, source). Seed from source→callee-arg, propagate via param→callee-arg, fire a finding when a tainted param meets param→sink.

Cycle-safe by monotonicity. The tainted-set is monotone over a finite lattice (fn × param × source), so the worklist converges — a recursive call just re-proposes an already-visited entry. SCC condensation would only refine processing order; correctness/termination don't require it.
Source-discriminated state (load-bearing). Key the state by the SOURCE too. Keying only by (fn, param) collapses multi-source flows: a sink param tainted by source A is marked visited and a later flow from source B is dropped before firing — the recurring multi-source bug class. (Bit M3; bit M4 U9.)
Name-based call join. Match a summary's call-arg edge to a CALLS edge by CALLEE NAME, not call-site line — line-base parity (CFG 1-based vs reference site) is fragile; the callee identity is exact and context-insensitivity taints the callee's param identically at every call site.
Persisted as TAINT_PATH edges (Function→Function), function-level hop chain in reason via the same codec; confidence < the intra-procedural 1.0.

Context-insensitivity is the accepted trade-off at this tier: one summary per function, return/call-site merging accepted (security-conservative). Expect some FP from merging; the bigger FN sources are unmodeled features (below).

Known false-negative classes (documented, deferred)

The largest is closures/callbacks (arr.forEach(() => sink(y))) — taint into a callback is dropped without per-library models (true of CodeQL's JS libs too). Also deferred: field/property flows (obj.x = taint; sink(obj.y)), field-sensitive access paths, guard-style sanitizers, implicit/control-dependence flows, promise/async-await threading, and destructured/rest params before a tainted simple param (the summary port index is the binding ordinal, not the formal arg position — needs a formal-param index threaded from the worker BindingEntry). The interprocedural join is also context-insensitive: when one caller invokes two distinct same-named callees, a flow into one over-attributes to both (sound — over-report, never a missed flow). Absence of a finding is NOT proof of safety.

GitNexus-specific gotchas

Function↔CFG join. FunctionCfg.functionStartLine is 1-based; Function/ Method node startLine is 0-based — join at startLine - 1. Function nodes have no column, so same-line functions ({a:()=>x(), b:()=>y()}) are ambiguous → drop (the summary driver counts unresolved) rather than cross-wire.
No rel-property index (S1). Kuzu has no secondary index on relationship properties, and unanchored [:TAINTED*]/[:TAINT_PATH*] queries explode. TAINT_PATH is therefore MATERIALIZED + anchored at analyze time, never traversed live; explain reads it source-anchored + LIMIT-guarded.
explain is the only discovery surface. TAINTED/TAINT_PATH are deliberately OUT of VALID_RELATION_TYPES (impact's allow-list) and the web schema (pinned in security.test.ts). explain enumerates both layers (cross-function findings carry interprocedural: true).
One shared codec. Both the emit path and explain import taint/path-codec.ts. Two hand-rolled copies of a wire format drift — never fork it. New metadata extends the format WITHIN the version when writer + reader ship together.
Cache versioning. A worker-harvest shape change bumps the parse-cache pdg NAMESPACE (pdg:N), NOT SCHEMA_BUMP (which cold-invalidates every user). Persisted-graph/config changes ride RepoMeta.pdg's key-union mismatch → full writeback. Model content rides taintModelVersion.

Adding a source / sink / sanitizer

Edit the language model in taint/typescript-model.ts (registered via the explicit registerBuiltinTaintModels seam, keyed by SupportedLanguages). The spec is hashable data (no functions). A sanitizer's neutralizes lists the EXACT sink kinds it defends — never a blanket kill. Add a fixture + assert the finding (or its absence) in test/unit/taint/ (real-source harness: test/helpers/ts-cfg-harness.ts); the end-to-end proof is test/integration/cfg/.

Validation checklist for any `--pdg` change

1. tsc clean (schema additions are exhaustiveness-checked; watch the
   api.ts getNodeQuery runtime read-path if a node label is added).
2. Targeted vitest by directory (test/unit/taint, test/unit/cfg,
   test/integration/cfg) — verify by ISOLATION, not full-suite exit
   (known load-flakes). `node scripts/build.js` before worker/integration runs.
3. Flag-off golden byte-identical (pipeline-graph-golden.test.ts).
4. bench/cfg/measure.mjs --check (no fingerprint drift / budget regression).
5. detect_changes() before commit; impact({direction:'upstream'}) before
   editing shared symbols (KnowledgeGraph, RepoMeta, RelationshipType, codec).

Prior art (for deeper design questions)

Sharir & Pnueli 1981 (functional approach); Reps-Horwitz-Sagiv IFDS (POPL 1995); FlowDroid/StubDroid (access-path summaries); Pysa & Mariana Trench (TITO / propagations, parallel SCC fixpoint); CodeQL Models-as-Data (the richest port notation, incl. callback ports); Infer (content-keyed incremental summaries).

gitnexus-taint-analysis

المزيد من هذا المستودع

المزيد من هذا المستودع

CFG & Taint Analysis with GitNexus

When to Use

The layered substrate (build order)

Intra-procedural taint (L3)

Interprocedural taint (L4) — the functional/summary method

Known false-negative classes (documented, deferred)

GitNexus-specific gotchas

Adding a source / sink / sanitizer

Validation checklist for any --pdg change

Prior art (for deeper design questions)

CFG & Taint Analysis with GitNexus

When to Use

The layered substrate (build order)

Intra-procedural taint (L3)

Interprocedural taint (L4) — the functional/summary method

Known false-negative classes (documented, deferred)

GitNexus-specific gotchas

Adding a source / sink / sanitizer

Validation checklist for any --pdg change

Prior art (for deeper design questions)

Validation checklist for any `--pdg` change

Validation checklist for any `--pdg` change