en un clic
dfd
// DFD with trust boundaries + data classifications (Mermaid + optional Threat Dragon JSON). Source-of-truth for /threat-model.
// DFD with trust boundaries + data classifications (Mermaid + optional Threat Dragon JSON). Source-of-truth for /threat-model.
Six-axis Feature Inventory (routes / models / jobs / tests / UI / docs) — a "what we must preserve" spec for greenfield rewrites.
Per-feature Mermaid flowchart — routes / models / jobs / screens. Consumes /extract-features inventory.
Onboard an external repo via a structured handover assessment + harnessability scoring across 5 codebase dimensions.
Self-contained HTML user-journey map (boxes/arrows with per-page modals) — preview between PRD and tech-design.
Interview-driven initiative → milestones → tasks with dependency-aware sequencing. Walks the operator from initiative-level goal through per-milestone Socratic interview, computes a topo-sorted recommended sequence, and optionally files each milestone as a Feature-shape ticket with `blocks` / `blocked by` cross-refs.
Extract a business process from registered repos via 7-axis code scan + gap-targeted interview, then emit lint-clean BPMN 2.0.
| name | dfd |
| description | DFD with trust boundaries + data classifications (Mermaid + optional Threat Dragon JSON). Source-of-truth for /threat-model. |
| argument-hint | [project-name | . | --scope-all] [--format=mermaid|dragon|all] |
| allowed-tools | Bash, Read, Grep, Glob, Write |
Reads a codebase (or a portfolio of codebases for system-wide DFDs) and produces a Data Flow Diagram showing external actors, processes, data stores, data flows, trust boundaries, and per-element data classifications. The DFD is the input to STRIDE threat modelling and to GDPR cross-border / DPA-coverage analysis.
This skill is the canonical DFD producer in the apexyard family. /threat-model and /compliance-check consume the DFD it writes instead of regenerating their own — see AgDR-0026 for the design rationale.
| Skill | Role |
|---|---|
/dfd (this skill) | Produces the DFD — six-axis discovery + classifications + cross-repo trace |
/threat-model | Consumes the DFD — STRIDE walk over each trust-boundary crossing |
/compliance-check | Consumes the DFD's classifications — cross-border transfers + DPA coverage |
/c4 | Static topology (different shape — system + containers, no data-flow semantics) |
/process (#256) | Dynamic control flow — BPMN, anchor-scoped multi-repo trace (shares the _lib-multi-repo-trace.sh helper) |
Read the registry path via portfolio_registry, the per-project docs dir via portfolio_projects_dir, and the workspace dir via portfolio_workspace_dir from .claude/hooks/_lib-portfolio-paths.sh. Cross-repo discovery uses the shared _lib-multi-repo-trace.sh helper. Source both at the top of any bash block:
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-read-config.sh"
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-portfolio-paths.sh"
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-multi-repo-trace.sh"
projects_dir=$(portfolio_projects_dir)
Defaults match today's single-fork layout. Adopters in split-portfolio mode override the portfolio.* keys in .claude/project-config.json — the helper resolves whichever mode they're in. See docs/multi-project.md.
/dfd # interactive — asks for scope (single service or system-wide)
/dfd billing-api # registered project — single-service DFD
/dfd . # treat cwd as the project root
/dfd --scope-all # walk the whole registry — system-wide DFD with per-service sub-models
/dfd billing-api --format=dragon # also emit Threat Dragon v2 JSON
/dfd billing-api --format=all # Mermaid + Threat Dragon + (future) PlantUML
Re-running on the same scope OFFERS (default-no) to overwrite — same UX as /extract-features and /c4. Existing edits are preserved unless the operator explicitly accepts.
projects/<project-name>/architecture/dfd.md ← Mermaid markdown (primary, source of truth)
projects/<project-name>/architecture/dfd.json ← Threat Dragon v2 JSON (on --format=dragon)
projects/<project-name>/architecture/dfd-source.yaml ← discovery report — supports "what changed since last run"
For --scope-all:
docs/architecture/system-dfd.md ← composed system-wide DFD
docs/architecture/system-dfd-source.yaml ← per-service discovery reports concatenated
. → use cwd as the target.workspace/<name>/. If the workspace clone doesn't exist, stop and ask the operator to clone first (do not auto-clone — same convention as /handover and /extract-features).workspace/<name>/ → use that project.--scope-all → walk every registered project; each becomes a trust-boundary box in the composed system DFD.Discovery is anchored on the target and reachability-bounded. No files written until the operator approves the candidate model.
| Axis | What | How |
|---|---|---|
| 1. External actors | Users, auth providers, third-party SaaS, admin consoles | HTTP route scope (anonymous / authenticated / admin); SDK imports (stripe, @sendgrid/mail, @anthropic-ai/sdk); auth provider imports (@auth0, @clerk/, Cognito); webhook signature handlers |
| 2. Processes | HTTP handlers, queue consumers, scheduled jobs, message-broker subscribers, gRPC methods | Framework signatures (Express / Fastify / NestJS / FastAPI / Flask / Django / Rails / Gin / Spring / Lambda); job-framework signatures (BullMQ, Celery, Sidekiq, Active Job, Temporal); cron / EventBridge schedules |
| 3. Data stores | RDBMSes, document stores, caches, object storage, file systems, data warehouses, search indexes | Schema files (schema.prisma, models.py, *.rb Active Record, *.sql migrations); SDK imports (@aws-sdk/client-*, ioredis, mongoose); connection-string env vars |
| 4. Data flows | What crosses what — payload + arrow direction | Reachability — handler reads/writes which store, calls which external API, enqueues to which worker. LSP-aware path uses call-graph; grep fallback uses imports + usage co-location |
| 5. Trust boundaries | Network / auth / org / classification boundaries | Inferred from code + env config: public ↔ backend (HTTP scope), backend ↔ data (DB credentials gating), user ↔ admin (route prefix), us ↔ third-party (external SDKs). Note: IaC reading is out of scope for v1 |
| 6. Data classifications | PII, PCI, secrets, internal, public — per detected element | Three pathways: code annotations (@PII, // CLASSIFIED: <label>), env-var heuristics (*_SECRET, *_TOKEN), schema-column heuristics (email, phone, card_number). Optional explicit registry at docs/data-classification.{md,yaml} overrides heuristics |
The grep-fallback signatures live in .claude/skills/dfd/discover.sh (axes 1–5) and .claude/skills/dfd/classify.sh (axis 6). When LSP is enabled (ENABLE_LSP_TOOL=1 + per-language plugin per docs/getting-started.md), the skill prefers semantic-index queries over grep for axes 1, 2, 3, and 4. Discovery is identical in shape; LSP just makes it cheaper.
For --scope-all: run axes 1–6 against each registered project, then in step 3 compose by treating every cross-service flow (detected via _lib-multi-repo-trace.sh) as a trust-boundary crossing.
Render the discovery output in a compact table grouped by axis:
For billing-api:
External actors:
[Human] Authenticated customer — POST /api/charges
[Human] Admin — /admin/* routes
[Ext SaaS] Stripe — payment processing (charges, customers)
[Ext SaaS] SendGrid — transactional email
[Ext] Auth0 — authentication provider
Processes:
HTTP API (12 routes)
Webhook handler (Stripe callbacks)
Background workers (2 BullMQ queues: email, reconciliation)
Scheduled jobs (1: nightly reconciliation)
Data stores:
Postgres (via Prisma) — primary data store
Redis — session cache + queue backend
S3 — invoice PDF storage
Inferred trust boundaries:
Public Internet ↔ Backend (HTTP scope)
Backend ↔ Data Stores (DB credentials)
User ↔ Admin (privilege escalation on /admin/*)
Us ↔ Stripe (org boundary, signed webhooks)
Us ↔ SendGrid (org boundary)
Data classifications (detected):
user.email [PII] via schema column — prisma/schema.prisma:42
user.phone_number [PII] via schema column — prisma/schema.prisma:44
charge.card_number [PCI] via schema column — prisma/schema.prisma:118
STRIPE_SECRET_KEY [secrets] via env-var heuristic — .env.example:3
SENDGRID_API_KEY [secrets] via env-var heuristic — .env.example:5
user.profile @PII [pii] via annotation — src/models/user.ts:12
(d) describe in more detail
(e) edit the model (add/remove element, change classification, reword trust boundary)
(a) accept and generate the diagram
(q) quit without writing
>
Ask the operator ONLY where the code is silent:
user.identifier — is this an email, a UUID, or something else? Classification?"messaging queue — is the consumer in this repo, or in another registered service?" (only when cross-repo traversal is ambiguous)Don't ask questions whose answer is already in the discovery report.
Resolve the template:
template=$(portfolio_resolve_template architecture/dfd.md)
Single-fork adopters with no override fall through to templates/architecture/dfd.md. Adopters who want a customised shape drop their version at <private_repo>/custom-templates/architecture/dfd.md (same convention as /c4).
Build the in-memory DFD model (actors / processes / stores / flows / boundaries / classifications) from the approved candidate. Pipe through generate-mermaid.sh to assemble the markdown file:
# Helper preamble — REQUIRED in every bash block that writes to a
# projects/<name>/ path. The Path resolution section above shows the
# pattern at file scope, but Claude executes each ```bash``` block as
# a separate shell invocation, so the variable assignment from the top
# of the file doesn't carry into this block. See me2resh/apexyard#443.
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-read-config.sh"
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-portfolio-paths.sh"
projects_dir=$(portfolio_projects_dir)
discovery_yaml="/tmp/dfd-discovery-${PROJECT}.yaml"
classifications_yaml="/tmp/dfd-classifications-${PROJECT}.yaml"
bash .claude/skills/dfd/discover.sh "$target_dir" "$scope_hint" > "$discovery_yaml"
bash .claude/skills/dfd/classify.sh "$target_dir" > "$classifications_yaml"
bash .claude/skills/dfd/generate-mermaid.sh "$PROJECT" "$discovery_yaml" "$classifications_yaml" \
> "${projects_dir}/${PROJECT}/architecture/dfd.md"
# Persist the source-of-truth combined report for re-run diffs
cat "$discovery_yaml" "$classifications_yaml" > "${projects_dir}/${PROJECT}/architecture/dfd-source.yaml"
The generator replaces placeholders in the template skeleton with real actors / processes / stores / flows from the in-memory model. Every cross-boundary arrow MUST carry a payload label.
--format=dragon or --format=all)Build a JSON document with the in-memory model and pipe through generate-dragon.sh:
# Helper preamble — REQUIRED per block. See me2resh/apexyard#443.
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-read-config.sh"
source "$(git rev-parse --show-toplevel)/.claude/hooks/_lib-portfolio-paths.sh"
projects_dir=$(portfolio_projects_dir)
echo "$model_json" \
| bash .claude/skills/dfd/generate-dragon.sh \
> "${projects_dir}/${PROJECT}/architecture/dfd.json"
generate-dragon.sh is a pure function over the in-memory model — /threat-model --format=dragon (#255) will import the same serialiser when it lands. If #255 ships first with the serialiser inside /threat-model, switch this skill to import from there during code review.
Schema reference: OWASP Threat Dragon repository (see td.vue/src/service/migration/schema/ for the published v2 JSON schema).
--scope-all)After every dfd.md write, run lint.sh against the output file. The lint wraps the shared _lib-mermaid-lint.sh — extracts the ```mermaid flowchart block and validates via mmdc so broken trust-boundary diagrams are caught at write time, not when a human opens the file on GitHub. Graceful-degrades when Node / npx is unavailable (exit 3, advisory only).
SKILL_DIR="$(dirname "$(realpath "$0")")"
"$SKILL_DIR/lint.sh" "$dfd_out" || lint_rc=$?
Exit 1 (parse error) → print the lint output and ask the operator whether to regenerate the diagram block, fix by hand, or re-run with --skip-lint. Exit 3 (Node missing) → one-line warning, proceed.
For each registered project, run discovery → render per-service sub-model. Compose by:
mrt_resolve_target from _lib-multi-repo-trace.sh) becomes an arrow crossing the boundarymrt_is_third_party) render as external entities with their own trust-boundary markerOutput lands at docs/architecture/system-dfd.md (framework-wide view).
If projects/<name>/architecture/dfd.md already exists, prompt:
projects/<name>/architecture/dfd.md already exists (last written {date}).
Overwrite? [y/N]
Default N. On N: write to projects/<name>/architecture/dfd-{YYYY-MM-DD}.md so the operator can diff old vs new. On y: overwrite, BUT first diff the new dfd-source.yaml against the previous one and surface a "what changed" summary:
DFD drift since last run:
+ new data store: docdb_dynamo (./src/billing/audit-log.ts:14)
+ new external service: ext_twilio (./src/notifications/sms.ts:8)
+ new flow: http_api → ext_twilio
+ new classification: phone_number [pii] (./prisma/schema.prisma:44)
These changes may warrant a fresh /threat-model run — newly-introduced data flows
are exactly the surface security review should focus on.
architecture/README.mdIf projects/<name>/architecture/README.md exists, ensure it lists the DFD alongside any C4 diagrams. If it doesn't exist, create it with a minimal index:
# {project-name} — Architecture
| Diagram | Source | Last generated |
|---------|--------|----------------|
| [System Context (C4 L1)](./context.md) | `/c4 {project} --level=1` | YYYY-MM-DD |
| [Container (C4 L2)](./container.md) | `/c4 {project} --level=2` | YYYY-MM-DD |
| [Data Flow Diagram](./dfd.md) | `/dfd {project}` | YYYY-MM-DD |
✓ {project}: DFD written
Mermaid: projects/{project}/architecture/dfd.md
Source: projects/{project}/architecture/dfd-source.yaml
{Dragon JSON: projects/{project}/architecture/dfd.json}
Actors: {N actors} Processes: {N} Stores: {N} Flows: {N} Boundaries: {N}
Classifications: {N PII} / {N PCI} / {N secrets} / {N other}
Source of truth: /threat-model and /compliance-check now read from this file.
Re-run /dfd {project} after architecture changes.
Run /dfd billing-api against a small Express + Prisma + BullMQ service that integrates with Stripe + SendGrid:
Discovery picks up:
@auth0 import → auth0 external actorstripe, @sendgrid/mail imports → stripe + sendgrid external actors + us ↔ third-party boundaryprisma/schema.prisma → postgres data store + backend ↔ data boundaryioredis import → redis cachebullmq new Worker( → background-worker processcron.schedule( → scheduled-job processClassifications detect:
user.email (Prisma column) → PIIuser.phone_number (Prisma column) → PIIcard_number (mentioned in Stripe-flow code) → PCISTRIPE_SECRET_KEY (env var) → secretsSENDGRID_API_KEY (env var) → secretsOperator review confirms the model, classifies one ambiguous user.identifier as PII (UUID, but used as a join key), and overrides the trust boundary between worker and Stripe to also include the email worker (single "outbound integrations" boundary).
Output: projects/billing-api/architecture/dfd.md with the Mermaid diagram, trust-boundaries table, classifications table, and full provenance YAML. Subsequent /threat-model billing-api reads from this file and produces the STRIDE catalogue against the boundary crossings — no DFD regeneration.
projects/<name>/architecture/.--scope-all.docs/data-classification.{md,yaml} exists, its labels override heuristic labels for any field it covers.External Service actor must trace to a concrete SDK import, API key env var, or webhook handler. No guessing.dfd.md ends with Generated by /dfd on YYYY-MM-DD so re-runners know the file is regenerable.package.json, no pyproject.toml, no Gemfile, no schema → stop with an error rather than producing an empty DFD./threat-model and /compliance-check snapshot this DFD into their own outputs at audit time — they read the file once and inline the Mermaid + trust-boundaries + classifications sections into the audit artefact. So editing this file does not invalidate previously-written audits; their snapshots are frozen in time. Re-run the audit to refresh against the current DFD.| Trigger | Use /dfd? |
|---|---|
| Preparing for a security review | YES — produces the DFD /threat-model needs |
| Preparing for GDPR / ePrivacy launch check | YES — produces the classifications /compliance-check needs |
Adopting a new project via /handover | OFFER after /c4 — sequence: /handover → /c4 (static topology) → /dfd (data flows) → /threat-model (security analysis) |
| Major architecture change (new DB, new third party) | YES — re-run; the drift summary surfaces the delta |
| Pure UI / styling work | NO — no data-flow change, DFD stable |
| Spike / throwaway POC | NO — the surface isn't stable enough to be worth modelling |
/dfd informs threat modelling and compliance, doesn't substitute for actual security scans--format=plantuml-dfd) — listed in the AC as v2 follow-updfd.md → re-parsed back into discovery model/dfd <subscope> runs if needed)/dfd for security review. The DFD is the input to STRIDE, not the threat model itself. Always pair with /threat-model for the actual security analysis./dfd --scope-all on every PR. It's a one-off baseline + on-significant-change refresh. Single-service /dfd is the per-PR cadence.Part of ApexYard — multi-project SDLC framework for Claude Code · MIT.