| name | pipeline |
| description | This skill should be used when designing or reviewing data pipelines — ETL patterns,
orchestration, and performance optimization for data workflows.
Use when:
- "design a data pipeline"
- "review this ETL"
- "optimize data processing"
- "how should I orchestrate this"
- "pipeline architecture"
|
| phase_relevance | ["design","build"] |
| archetype_relevance | ["*"] |
Pipeline Engineering Skill
Design, review, and optimize data pipelines and ETL workflows.
Quick Start
Design a Pipeline
/wicked-garden:data:pipeline design \
--source "postgres://sales_db" \
--target "s3://data-lake/sales" \
--frequency daily
Generates: Architecture diagram, ETL logic, orchestration config, monitoring plan.
Review Existing Pipeline
/wicked-garden:data:pipeline review path/to/pipeline/
Analyzes: Code quality, error handling, performance, maintainability.
Pipeline Patterns
Batch ETL
Use when: Regular scheduled loads, historical processing
Pattern: Extract → Transform → Validate → Load
Tools: Airflow, Dagster, Prefect
Streaming Pipeline
Use when: Real-time processing, event-driven
Pattern: Consume → Transform → Sink
Tools: Kafka, Flink, Spark Streaming
Incremental Processing
Use when: Large datasets, only processing changes
Pattern: Watermark tracking + Merge/Upsert
Pipeline Design Checklist
Architecture
Data Quality
Error Handling
Performance
Monitoring
Operations
Common Issues
| Issue | Symptoms | Solution |
|---|
| Fails halfway | Partial data, inconsistent state | Staging + commit pattern |
| Duplicates | Same data loaded multiple times | Watermarks + idempotency |
| Slow processing | Misses SLA | Profile and optimize bottlenecks |
Integration
- wicked-brain:search: Find pipeline code with
wicked-brain:search "dag|pipeline" (FTS5 over indexed code)
- Native tasks: Track pipeline issues via TaskCreate with
metadata.event_type="task"
- wicked-brain:memory: Recall pipeline patterns
Best Practices
- Idempotency: Same input → same output, pipelines safely rerunnable
- Observability: Log row counts, track duration, emit metrics, alert on anomalies
- Testing: Unit test transforms, integration test full pipeline, test error scenarios
- Documentation: Clear lineage, versioned schemas, operations runbook
External Integration Discovery
Pipeline engineering can leverage available integrations by capability:
| Capability | Discovery Patterns | Provides |
|---|
| Warehouses | snowflake, databricks, bigquery | Query execution, schema access |
| ETL | airbyte, fivetran, dbt | Pipeline status, model metadata |
| Observability | monte-carlo, datadog | Data quality metrics |
Discover available integrations via capability detection. Fall back to wicked-garden:data:analyze for local file analysis via DuckDB.
Reference
For detailed patterns: