Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

generate-data-lineage

Estrellas0

Forks0

Actualizado14 de abril de 2026, 00:03

Assembles a data flow narrative from MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md context files, answering the explainability question: "What does the system do with [data type] for [user journey]?" Use before a compliance or security review, when a dark-code-audit flags "Explainability: Partial", when onboarding a new engineer who needs to understand data flows, or when preparing for GDPR, EU AI Act, or SOC 2 review. Reads context layers across the codebase, interviews for gaps, and writes docs/data-lineage/YYYY-MM-DD-<name>.md with a confidence rating. Invoke as: /generate-data-lineage (all PII-touching flows in the codebase) /generate-data-lineage --journey user-signup (specific user journey) /generate-data-lineage --module path/to/mod (flows for a specific module) /generate-data-lineage --type payment (specific data type)

Instalación

Instalar con Codex o Claude Copia este prompt, pégalo en Codex, Claude u otro asistente, y deja que revise la página de la skill y la instale por ti.

Ejecutar en Manus

Fuente

az9713

az9713/dark-code-skills

Abrir repositorio de GitHub Ver repositorios del creador

Descarga

Ejecutar en Manus

Ocupaciones relacionadasSOC

Basado en la clasificación ocupacional SOC

Arquitectos de bases de datosOcupaciones informáticas y matemáticas·SOC 15-1243

SKILL.md

readonly

name

generate-data-lineage

description

/generate-data-lineage

Assembles a data flow narrative from context layers — answering the question any compliance reviewer, security engineer, or new hire will eventually ask: "What does the system actually do with this data?"

What this is and what it isn't

This skill produces a structural lineage based on documented behavior. It describes what the system should do based on MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md. It is not a runtime trace — it does not tell you what happened to a specific customer's data on a specific date. For incident response requiring that level of precision, you need application-level audit logs in addition to this document.

For most purposes (compliance prep, engineer onboarding, GDPR ROPA prep, security review), the structural lineage is sufficient. A LOW-confidence lineage document that makes gaps explicit is more useful than no document at all.

Arguments

(none) — traces all PII-touching data flows across the whole codebase
--journey <name> — traces a specific user journey (e.g., "user-signup", "payment", "account-deletion")
--module <path> — traces data flows for a specific module only
--type <category> — traces a specific data type (e.g., "payment", "PII", "credentials", "health")

Phase 1: Discover data-touching modules

Use Glob to find all MODULE_MANIFEST.md files in the repo. For each, look for:

Data classification fields mentioning: PII, personal data, credentials, payment data, health data, financial data, user identity, email, name, address, phone number
Shared resource entries (databases, cache, queues) associated with user data
Data flows sections showing user-sourced input

Build a candidate list of data-touching modules.

If --module or --type was specified, filter to the relevant subset. If --journey was specified, note it — you'll use it to filter which modules are relevant (a "user signup" journey involves the auth module, profile module, and notification module, but probably not the analytics aggregation module).

Report the candidate list before proceeding:

Data-touching modules found: [N]
  - auth/ — handles credentials, session tokens
  - profile/ — handles PII (name, email, address)
  - billing/ — handles payment data
  - notifications/ — handles email addresses

Modules without context files (gaps):
  - recommendations/ — appears to handle user_id but has no MODULE_MANIFEST.md

Proceeding with documented modules. Gaps will be noted in Open Questions.

Phase 2: Read behavioral contracts for each module

For each data-touching module, read BEHAVIORAL_CONTRACTS.md. For each interface that handles the target data type, extract:

Field	Source
Interface name and signature	BEHAVIORAL_CONTRACTS.md interface section
Transformations applied	Side effects section — what changes about the data
Where data is written	Side effects — DB tables, cache keys, queues
What leaves the service	External calls, events emitted, responses returned
Data sensitivity	Data classification field
Retention/expiry	Retention constraints in MODULE_MANIFEST.md

Order the modules by data flow: entry point → processing → storage → egress. For a user signup journey: API handler receives data → validation module transforms it → profile module stores it → notification module sends it to external email service.

Phase 3: Interview for gaps

For each module with a data classification but incomplete contracts, ask targeted questions. Batch them — don't ask per-field per-module:

I need to fill in some gaps for these modules. For each:

auth/ — I can see it stores session tokens but the retention policy is not documented.
  - How long are sessions retained? What triggers expiry?
  - Does any external service receive the session token (e.g., an analytics or logging service)?

recommendations/ — This module appears to use user_id but has no context files.
  - What personal data does this module process?
  - Where is it stored and who else can read it?
  - Is there a deletion or expiry path?

If the user can't answer, mark the gap in Open Questions and proceed with LOW confidence.

Phase 4: Determine confidence level

Before writing:

HIGH — all modules in the flow have complete MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md; no interview gaps
MEDIUM — some modules have partial context; gaps are documented in Open Questions but the main flow is clear
LOW — significant gaps; one or more key modules in the flow are missing context files

Phase 5: Write the lineage document

Write to docs/data-lineage/YYYY-MM-DD-<name>.md. Derive <name> from the argument provided (journey name, data type, or module name). Create the directory if it doesn't exist.

Use this structure:

# Data Lineage: [Journey/Type/Module]

**Scope:** [what data types and user journey this covers]  
**Last updated:** [today's date]  
**Confidence:** HIGH / MEDIUM / LOW  
**Confidence note:** [brief explanation — e.g., "recommendations/ module lacks context files"]

> This document describes the system's *documented* behavior based on context layer files.
> It is not a runtime trace. For incident response requiring evidence of what actually happened
> to specific data, application-level audit logs are required in addition to this document.

---

## Entry Points

[Where this data type enters the system — which interfaces, which modules, from which callers]

| Entry point | Module | Interface | Data received |
|-------------|--------|-----------|---------------|

---

## Flow Narrative

[Step-by-step trace through the system]

### Step 1: [Module Name] — [what happens here]

**Interface:** `[interface name]`  
**Input:** [what data arrives]  
**Transformation:** [what changes about the data, if anything]  
**Output:** [what leaves this module and where it goes]  
**Written to:** [DB table, cache key, queue — or "nothing persisted at this step"]

### Step 2: ...

[Continue for each module in the flow]

---

## Storage Locations

Where this data is at rest.

| Location | Module | Data stored | Retention period | Who can read |
|----------|--------|-------------|-----------------|--------------|

---

## Egress Points

Where data leaves the system.

| Destination | Module | Interface | Data sent | Mechanism |
|-------------|--------|-----------|-----------|-----------|

[If none: "No documented egress points for this data type."]

---

## Deletion / Expiry Path

How data is eventually removed from each storage location.

[If unknown for any location: note it in Open Questions]

---

## Open Questions

Fields or modules where context was incomplete. These are investigation targets before this
lineage document can be relied upon for compliance purposes.

| Module | Gap | Impact |
|--------|-----|--------|

---

## Modules With Missing Context Files

The following modules appear to process this data type but have no context files.
Their behavior is not reflected in this lineage document.

[List — or "None. All data-touching modules have context files." for HIGH confidence]

**To fill these gaps:** Run `/context-layer-generator` on each module listed, then re-run
`/generate-data-lineage` to update this document.

Phase 6: Update MODULE_MANIFEST.md data flow sections

If the lineage process reveals that any existing MODULE_MANIFEST.md data flow sections are incomplete or inaccurate (e.g., a downstream consumer not listed, a data type not mentioned), update those files and note the changes.

After writing

Report:

Journey/type/module documented
Confidence level and reason
Number of open questions
Number of modules missing context files
Path to the generated document
Whether any MODULE_MANIFEST.md files were updated

If confidence is LOW, suggest the priority order for running /context-layer-generator — start with the module that handles the most sensitive data or sits at the entry point of the flow.

Más de este repositorio

mismo repositorio

comprehension-gate

az9713/dark-code-skills

Runs a seven-dimension comprehension review on a code change before it ships: credential exposure, cross-service side effects, blast radius, state/persistence mismatch (the Kiro pattern — AI treating persistent infrastructure as ephemeral), token TTL management, implicit assumptions, and whether the change would be explainable by the person shipping it. Produces a COMPREHENSION_ARTIFACT.md with a findings table and a CLEAR / REVIEW REQUIRED / HOLD verdict. Use this skill before merging any AI-generated code, before any change that touches shared resources (Redis, shared databases, message queues), before changes to auth flows or token handling, when reviewing code for dark code risk, or any time you hear "check blast radius", "review for comprehension", "is this safe to ship", "comprehension gate", "pre-merge review", or "will this cause an incident". This skill catches system-level failures that linters, type checkers, and unit tests cannot detect.

2026-04-140

context-layer-generator

az9713/dark-code-skills

Generates three context layer artifacts for a code module: MODULE_MANIFEST.md (structural map — where things connect), BEHAVIORAL_CONTRACTS.md (semantic contracts — what each interface guarantees), and DECISION_LOG.md (philosophical record — why decisions were made, with explicit warnings about what breaks if reversed). Use this skill whenever working on a module that lacks documentation, when the original author has left, before an AI agent modifies an unfamiliar module, when documenting a module after onboarding, when a codebase audit flagged missing context layers, or any time you hear phrases like "document this module", "make this self-describing", "build context layers", "preserve knowledge before the author leaves", or "what does this module do". This skill is especially important for AI-generated code that was never explained by anyone.

2026-04-140

dark-code-audit

az9713/dark-code-skills

Audits a codebase for dark code risk: code that was generated, passed automated checks, and shipped without anyone understanding it. Produces a structured audit report with a hotspot map, comprehension debt scorecard (spec coverage %, context layer coverage %, review depth), ownership gap analysis, top failure scenarios, and a prioritized action plan. Use this skill before a security review, compliance review, or major refactor; when new engineers join and the codebase feels opaque; after a period of high AI-assisted development velocity; quarterly as a health check; or any time you hear "audit for dark code", "comprehension debt", "dark code risk", "what do we not understand about this codebase", "knowledge gap analysis", "who owns what", or "we've been shipping AI code really fast lately". This skill does not recommend "add more monitoring" — it identifies where human comprehension is missing and prescribes structural fixes.

2026-04-140

dark-code-suite-init

az9713/dark-code-skills

Sets up a project to use the full dark code prevention suite in one step: creates the .claude/comprehension/ directory for comprehension artifacts, adds a ## Dark Code Prevention section to CLAUDE.md (or creates CLAUDE.md if missing), creates docs/dark-code-audit/ for audit reports, and runs an initial dark-code-audit to baseline the project's current comprehension debt. Use this skill when starting to use the dark code suite on a new project, when onboarding a codebase to dark code prevention practices, or any time you hear "set up dark code prevention", "initialize the dark code suite", "add comprehension gate to this project", or "how do I start with dark code practices here".

2026-04-140

generate-eu-ai-act-system-card

az9713/dark-code-skills

Generates a per-service EU AI Act system card documenting AI tool usage, risk classification, human oversight mechanisms, and limitations. Use for any service where AI tools contribute to code generation, decision support, or automated processing — especially before the August 2026 EU AI Act deadline. Use when dark-code-audit flags AI-heavy services, when preparing a compliance package for a regulator or enterprise customer, or when the organization needs to document its AI practices. Reads MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md, conducts a structured interview, and writes docs/compliance/eu-ai-act-system-card-<service>-YYYY-MM-DD.md. Invoke as /generate-eu-ai-act-system-card path/to/service or with --risk-level limited|general|high.

2026-04-140

generate-gdpr-ropa

az9713/dark-code-skills

Generates a draft GDPR Article 30 Record of Processing Activities (ROPA) from MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md context files. Use when preparing for a GDPR audit, when a dark-code-audit flags PII-handling services with incomplete documentation, or when building a compliance package. Reads context layers across the codebase, groups them into logical processing activities, auto-populates what it can, and interviews the user for fields that require human judgment (legal basis, purpose, international transfers). Writes docs/compliance/gdpr-ropa-YYYY-MM-DD.md. Invoke as /generate-gdpr-ropa or /generate-gdpr-ropa --module path/to/module for a single module entry.

2026-04-140

name

generate-data-lineage

description

/generate-data-lineage

What this is and what it isn't

Arguments

(none) — traces all PII-touching data flows across the whole codebase
--journey <name> — traces a specific user journey (e.g., "user-signup", "payment", "account-deletion")
--module <path> — traces data flows for a specific module only
--type <category> — traces a specific data type (e.g., "payment", "PII", "credentials", "health")

Phase 1: Discover data-touching modules

Use Glob to find all MODULE_MANIFEST.md files in the repo. For each, look for:

Data classification fields mentioning: PII, personal data, credentials, payment data, health data, financial data, user identity, email, name, address, phone number
Shared resource entries (databases, cache, queues) associated with user data
Data flows sections showing user-sourced input

Build a candidate list of data-touching modules.

Report the candidate list before proceeding:

Data-touching modules found: [N]
  - auth/ — handles credentials, session tokens
  - profile/ — handles PII (name, email, address)
  - billing/ — handles payment data
  - notifications/ — handles email addresses

Modules without context files (gaps):
  - recommendations/ — appears to handle user_id but has no MODULE_MANIFEST.md

Proceeding with documented modules. Gaps will be noted in Open Questions.

Phase 2: Read behavioral contracts for each module

For each data-touching module, read BEHAVIORAL_CONTRACTS.md. For each interface that handles the target data type, extract:

Field	Source
Interface name and signature	BEHAVIORAL_CONTRACTS.md interface section
Transformations applied	Side effects section — what changes about the data
Where data is written	Side effects — DB tables, cache keys, queues
What leaves the service	External calls, events emitted, responses returned
Data sensitivity	Data classification field
Retention/expiry	Retention constraints in MODULE_MANIFEST.md

Phase 3: Interview for gaps

For each module with a data classification but incomplete contracts, ask targeted questions. Batch them — don't ask per-field per-module:

I need to fill in some gaps for these modules. For each:

auth/ — I can see it stores session tokens but the retention policy is not documented.
  - How long are sessions retained? What triggers expiry?
  - Does any external service receive the session token (e.g., an analytics or logging service)?

recommendations/ — This module appears to use user_id but has no context files.
  - What personal data does this module process?
  - Where is it stored and who else can read it?
  - Is there a deletion or expiry path?

If the user can't answer, mark the gap in Open Questions and proceed with LOW confidence.

Phase 4: Determine confidence level

Before writing:

HIGH — all modules in the flow have complete MODULE_MANIFEST.md and BEHAVIORAL_CONTRACTS.md; no interview gaps
MEDIUM — some modules have partial context; gaps are documented in Open Questions but the main flow is clear
LOW — significant gaps; one or more key modules in the flow are missing context files

Phase 5: Write the lineage document

Write to docs/data-lineage/YYYY-MM-DD-<name>.md. Derive <name> from the argument provided (journey name, data type, or module name). Create the directory if it doesn't exist.

Use this structure:

# Data Lineage: [Journey/Type/Module]

**Scope:** [what data types and user journey this covers]  
**Last updated:** [today's date]  
**Confidence:** HIGH / MEDIUM / LOW  
**Confidence note:** [brief explanation — e.g., "recommendations/ module lacks context files"]

> This document describes the system's *documented* behavior based on context layer files.
> It is not a runtime trace. For incident response requiring evidence of what actually happened
> to specific data, application-level audit logs are required in addition to this document.

---

## Entry Points

[Where this data type enters the system — which interfaces, which modules, from which callers]

| Entry point | Module | Interface | Data received |
|-------------|--------|-----------|---------------|

---

## Flow Narrative

[Step-by-step trace through the system]

### Step 1: [Module Name] — [what happens here]

**Interface:** `[interface name]`  
**Input:** [what data arrives]  
**Transformation:** [what changes about the data, if anything]  
**Output:** [what leaves this module and where it goes]  
**Written to:** [DB table, cache key, queue — or "nothing persisted at this step"]

### Step 2: ...

[Continue for each module in the flow]

---

## Storage Locations

Where this data is at rest.

| Location | Module | Data stored | Retention period | Who can read |
|----------|--------|-------------|-----------------|--------------|

---

## Egress Points

Where data leaves the system.

| Destination | Module | Interface | Data sent | Mechanism |
|-------------|--------|-----------|-----------|-----------|

[If none: "No documented egress points for this data type."]

---

## Deletion / Expiry Path

How data is eventually removed from each storage location.

[If unknown for any location: note it in Open Questions]

---

## Open Questions

Fields or modules where context was incomplete. These are investigation targets before this
lineage document can be relied upon for compliance purposes.

| Module | Gap | Impact |
|--------|-----|--------|

---

## Modules With Missing Context Files

The following modules appear to process this data type but have no context files.
Their behavior is not reflected in this lineage document.

[List — or "None. All data-touching modules have context files." for HIGH confidence]

**To fill these gaps:** Run `/context-layer-generator` on each module listed, then re-run
`/generate-data-lineage` to update this document.

Phase 6: Update MODULE_MANIFEST.md data flow sections

After writing

Report:

Journey/type/module documented
Confidence level and reason
Number of open questions
Number of modules missing context files
Path to the generated document
Whether any MODULE_MANIFEST.md files were updated

If confidence is LOW, suggest the priority order for running /context-layer-generator — start with the module that handles the most sensitive data or sits at the entry point of the flow.