// Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills.
| name | aps-doc-core |
| description | Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills. |
Core documentation generation framework providing shared patterns, templates, and utilities used by all APS layer-specific documentation skills (ingestion, hist-union, staging, id-unification, golden).
Use this skill when:
Note: For layer-specific documentation, use the specialized skills:
aps-doc-skills:ingestion for ingestion layersaps-doc-skills:hist-union for hist-union workflowsaps-doc-skills:staging for staging transformationsaps-doc-skills:id-unification for ID unificationaps-doc-skills:golden for golden layersWITHOUT codebase access = NO documentation. Period.
If no codebase access provided:
I cannot create technical documentation without codebase access.
Required:
- Directory path to code
- Access to relevant files (.dig, .sql, .yml)
Without access, I cannot extract real configurations, SQL, or workflow logic.
Provide path: "Code is in /path/to/layer/"
Before proceeding:
All documentation MUST contain real data from codebase:
NO generic templates. Only production-ready, codebase-driven documentation.
All documentation follows a template-driven approach:
Process:
Benefits:
1. Fetch existing Confluence page (if provided)
2. Extract structure:
- Section headings hierarchy
- Content organization patterns
- Tables and formatting styles
- Code block conventions
3. Identify required sections
4. Map sections to codebase elements
1. Locate relevant files:
- Workflow files (.dig)
- Configuration files (.yml)
- SQL/transformation files (.sql)
- README and documentation and others if any
2. Extract metadata:
- Table schemas (columns, types, nullability)
- Data lineage (source โ destination)
- Dependencies (what depends on what)
- Configuration parameters
3. Analyze patterns:
- Processing logic (incremental, full, batch)
- Error handling strategies
- Performance optimizations
- Security patterns (PII, auth)
1. Create outline matching template
2. Populate sections with codebase data:
- Use actual file names and paths
- Include real configuration examples
- Show actual SQL transformations
- Document real table/column names
3. Add visual elements:
- Mermaid diagrams (flow, ERD, dependencies)
- Tables (configuration, mappings, metrics)
- Code blocks (with syntax highlighting)
4. Validate quality (60+ checks)
5. Test code examples (execute SQL, validate YAML)
6. Publish to Confluence
Use this structure as the base template for all layer documentation:
# {Layer Name}
## Overview
Brief introduction explaining purpose and key characteristics.
### Key Characteristics
* **Engine**: Processing engine (Presto/Trino, Hive, etc.)
* **Architecture**: Processing approach (loop-based, parallel, etc.)
* **Processing Mode**: Incremental/Full/Batch
* **Location**: File system path
---
## Architecture Overview
### Directory Structure
layer_directory/ โโโ main_workflow.dig โโโ config/ โ โโโ configuration.yml โโโ sql/ or queries/ โ โโโ transformation.sql โโโ README.md
### Core Components
Detailed description of each component.
---
## Processing Flow
### Initial Load (if applicable)
Step-by-step description of first-time processing.
### Incremental Load
Step-by-step description of ongoing processing.
---
## Configuration
Complete configuration reference with examples.
---
## Monitoring and Troubleshooting
### Monitoring Queries
Executable SQL queries for checking status.
### Common Issues
Issue descriptions with solutions.
---
## Best Practices
Numbered list of recommendations.
---
## Summary
Key takeaways and benefits.
Generate Mermaid diagrams to visualize architecture:
graph LR
A[Source] -->|Process| B[Destination]
B -->|Transform| C[Output]
graph TD
Start[Start] --> Task1[Task 1]
Task1 --> Parallel{Parallel?}
Parallel -->|Yes| Task2A[Task 2A]
Parallel -->|Yes| Task2B[Task 2B]
Task2A --> End[End]
Task2B --> End
erDiagram
TABLE_A ||--o{ TABLE_B : "has"
TABLE_B ||--|| TABLE_C : "references"
graph TB
A[Source A] --> D[Target D]
B[Source B] --> D
C[Source C] --> E[Target E]
D --> F[Final]
E --> F
Extract and document schemas:
-- Get schema
DESCRIBE {database}.{table};
SHOW COLUMNS FROM {database}.{table};
Document in table format:
| Column | Type | Nullable | Description | Source | Transformation |
|---|---|---|---|---|---|
| id | BIGINT | NO | Primary key | source.id | CAST(id AS BIGINT) |
| VARCHAR | YES | Email address | source.email | LOWER(TRIM(email)) |
SELECT
'{table}' as table_name,
COUNT(*) as total_rows,
COUNT(DISTINCT primary_key) as unique_records,
MIN(time) as earliest_record,
MAX(time) as latest_record
FROM {database}.{table};
column_name:
Source: source_system.source_table.source_column
โ Raw: raw_db.raw_table.column (as-is)
โ Staging: stg_db.stg_table.column_std (UPPER(TRIM(column)))
โ Unified: unif_db.unif_table.column (from staging)
โ Golden: gld_db.gld_table.column (SCD Type 2)
Before publishing, validate:
-- Test monitoring queries
SELECT * FROM {database}.{log_table}
WHERE source = '{source}'
ORDER BY time DESC
LIMIT 10;
-- Test schema queries
DESCRIBE {database}.{table};
-- Test volume queries
SELECT COUNT(*) FROM {database}.{table};
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('config.yml'))"
# Check for placeholders
grep -r "your_\|example_\|placeholder" config.yml
-- Verify table exists
SHOW TABLES IN {database} LIKE '{table}';
-- Verify columns exist
SELECT column_name FROM information_schema.columns
WHERE table_schema = '{database}'
AND table_name = '{table}';
Tool: mcp__atlassian__createConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
spaceId: "{numeric space ID}"
title: "Clear, descriptive title"
body: "Complete Markdown content"
parentId: "{parent page ID}" (optional, for hierarchy)
Tool: mcp__atlassian__updateConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
pageId: "{existing page ID}"
body: "Updated Markdown content"
title: "New title" (optional)
versionMessage: "Description of changes" (optional)
For complex layers with multiple components:
## Components
1. [**Component 1**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+1)
- Description
2. [**Component 2**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+2)
- Description
For layers with multiple workflows/tables:
Document performance characteristics:
| Metric | Value | Benchmark |
|---|---|---|
| Avg Processing Time | 15 min | < 30 min SLA |
| Peak Memory Usage | 8 GB | 12 GB limit |
| Avg Rows/Day | 2.5M | Growing 10% monthly |
Document PII and compliance:
| Table | PII Columns | Protection | Retention | Access |
|---|---|---|---|---|
| table_a | email, phone | SHA256 | 7 years | Restricted |
| table_b | ip_address | Anonymization | 90 days | Internal |
Track documentation changes:
| Version | Date | Changed By | Changes | Impact |
|---|---|---|---|---|
| v2.1 | 2025-11-27 | Claude | Added 3 tables | Low |
| v2.0 | 2025-11-15 | Team | Migrated engine | High |
Solutions:
lsfind . -name "*.dig"Solutions:
Solutions:
Solutions:
The APS Documentation Core provides:
This core framework is used by all layer-specific skills to ensure consistent, high-quality documentation across all Treasure Data pipeline layers.