Run any Skill in Manus with one click

$pwd:

data-pipeline

Name: Data Pipeline
Author: agulli

// Design, build, or debug data processing pipelines. Use when asked to process a dataset, transform data, build an ETL pipeline, schedule batch jobs, or fix data quality issues.

Run Skill in Manus

$ git log --oneline --stat

stars:242

forks:51

updated:May 13, 2026 at 10:10

SKILL.md

readonly

name	data-pipeline
description	Design, build, or debug data processing pipelines. Use when asked to process a dataset, transform data, build an ETL pipeline, schedule batch jobs, or fix data quality issues.
license	MIT
compatibility	Requires python 3.10+

Overview

Data pipelines fail silently and corrupt downstream systems. Every pipeline must be observable, idempotent, and validated at the boundary.

Process

Define the contract. Before writing any transformation code, specify:
- Input schema: What fields, types, and constraints does the data arrive with?
- Output schema: What fields, types, and constraints must the output satisfy?
- Volume: How many records? Per-run? Per-day?
- Frequency: One-time, scheduled, or event-driven?

Validate at the boundary. The first thing any pipeline stage does is validate its input:

from pydantic import BaseModel, ValidationError

class InputRecord(BaseModel):
    user_id: int
    event_type: str
    timestamp: str  # ISO 8601
    value: float | None = None

def process(raw_records: list[dict]) -> list[dict]:
    valid, invalid = [], []
    for r in raw_records:
        try:
            valid.append(InputRecord(**r).model_dump())
        except ValidationError as e:
            invalid.append({"record": r, "error": str(e)})
    if invalid:
        log_invalid_records(invalid)  # Never silently drop
    return transform(valid)

Make it idempotent. Running the pipeline twice on the same input must produce the same output. Use upserts, not inserts. Use deterministic IDs based on input content, not auto-increment.
Log progress at meaningful checkpoints. After every major stage (extract, validate, transform, load), log the record count and any failures.
Test with a sample. Before running on the full dataset, run on 100 records. Confirm the output schema, record count, and that no records were silently dropped.
Run on the full dataset. Monitor progress. On completion, report: records in, records out, records failed, and time elapsed.

Rationalizations

Excuse	Rebuttal
"I'll add validation later"	Invalid data corrupts your database. Validate at the boundary now.
"Logging slows the pipeline down"	A pipeline that fails without logs requires a full rerun to debug. Log it.
"It worked on the sample"	Test samples are not representative. Always run a full-dataset dry run before writing to the destination.

Verification

Input and output schemas are defined before any code is written
Invalid records are logged, not silently dropped
Pipeline was tested on a 100-record sample before full run
Final report includes: records in, records out, records failed

related-skills.json

same repository

api-design.md

from "agulli/atlas-agents"

Design or review REST and GraphQL API interfaces. Use when asked to design an API, review endpoint structure, define request/response schemas, or improve API ergonomics.

2026-05-13242

code-review.md

from "agulli/atlas-agents"

Perform a structured security and quality audit on source code. Use when asked to review code, audit a pull request, check for vulnerabilities, or assess code quality.

2026-05-13242

database-migration.md

from "agulli/atlas-agents"

Safely run database schema migrations. Use when asked to update database schema, add columns, create tables, run alembic, or apply Django migrations.

2026-05-13242

dependency-audit.md

from "agulli/atlas-agents"

Audit project dependencies for vulnerabilities, license issues, and bloat. Use when asked to check dependencies, audit packages, find vulnerable libraries, or reduce bundle size.

2026-05-13242

deploy-checklist.md

from "agulli/atlas-agents"

Execute a structured deployment to staging or production. Use when asked to deploy, ship, release, push to production, or promote to staging.

2026-05-13242

documentation-writer.md

from "agulli/atlas-agents"

Write or update technical documentation for code, APIs, or systems. Use when asked to document a module, write a README, generate API docs, or update existing documentation.

2026-05-13242

package.json

"author": "agulli"

"repository": "agulli/atlas-agents"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

from pydantic import BaseModel, ValidationError class InputRecord(BaseModel): user_id: int event_type: str timestamp: str # ISO 8601 value: float | None = None def process(raw_records: list[dict]) -> list[dict]: valid, invalid = [], [] for r in raw_records: try: valid.append(InputRecord(**r).model_dump()) except ValidationError as e: invalid.append({"record": r, "error": str(e)}) if invalid: log_invalid_records(invalid) # Never silently drop return transform(valid)

Excuse

Rebuttal

"I'll add validation later"

Invalid data corrupts your database. Validate at the boundary now.

"Logging slows the pipeline down"

A pipeline that fails without logs requires a full rerun to debug. Log it.

"It worked on the sample"

Test samples are not representative. Always run a full-dataset dry run before writing to the destination.

data-pipeline

Overview

Process

Rationalizations

Verification

More from this repository

More from this repository

Overview

Process

Rationalizations

Verification