Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

csv-data-pipeline

Étoiles0

Forks0

Mis à jour17 février 2026 à 11:25

Cleans, validates, and transforms CSV files for analysis. Use when user says "clean this CSV", "fix my data", "prepare this spreadsheet for analysis", "validate this dataset", or uploads a .csv file that needs processing.

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

digi4care

digi4care/opencode-mastery

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Développeurs de logicielsProfessions informatiques et mathématiques·SOC 15-1252

Explorateur de fichiers

3 fichiers

SKILL.md

readonly

name	csv-data-pipeline
description	Cleans, validates, and transforms CSV files for analysis. Use when user says "clean this CSV", "fix my data", "prepare this spreadsheet for analysis", "validate this dataset", or uploads a .csv file that needs processing.

CSV Data Pipeline

Instructions

Step 1: Inspect the Data

When the user provides a CSV file:

Read the first 10 rows to understand structure
Count total rows and columns
Identify column types (text, number, date, boolean, email, URL)
Report findings to the user:

"Your file has {rows} rows and {cols} columns.
Here's what I found:
- {col_name}: {type} ({null_count} missing values)
- ...
{issues_count} potential issues detected."

Step 2: Identify Issues

Check for these common problems:

Issue	Detection	Auto-fix?
Missing values	Null/empty cells	Ask user
Duplicate rows	Exact row matches	Yes
Inconsistent dates	Multiple formats in same column	Yes — standardize to YYYY-MM-DD
Leading/trailing whitespace	Strip and compare	Yes
Mixed case in categories	"Active" vs "active" vs "ACTIVE"	Yes — normalize to lowercase
Outliers	Values > 3 standard deviations from mean	Flag, don't auto-fix

Present issues to the user and ask which to fix.

Step 3: Apply Fixes

For each approved fix:

python scripts/validate_csv.py --input {filename} --checks all

If the script finds issues:

Apply automatic fixes (whitespace, duplicates, date normalization)
For ambiguous fixes (missing values, outliers), ask the user: "Column 'revenue' has 12 missing values. Should I: (a) drop those rows, (b) fill with column median, (c) fill with 0, or (d) leave as-is?"

Step 4: Export Clean Data

Save the cleaned file with a descriptive name:

{original_name}_cleaned.csv — processed data
{original_name}_report.txt — summary of changes made

Tell the user what changed:

"Done! Here's what I fixed:
- Removed 23 duplicate rows
- Standardized 4 date formats to YYYY-MM-DD
- Trimmed whitespace in 'name' and 'email' columns
- Flagged 3 outliers in 'revenue' (not changed)
Clean file: data_cleaned.csv"

Examples

Example 1: Messy Customer List

User: "Clean this customer spreadsheet"

Found: 500 rows, columns [name, email, signup_date, plan, revenue] Issues:

12 duplicate emails
signup_date has "01/15/2025", "2025-01-15", "Jan 15, 2025" formats
3 rows with missing email
"Pro", "pro", "PRO" in plan column

Actions: Remove dupes, normalize dates, flag missing emails, lowercase plans.

Example 2: Large Dataset Validation

User: "Validate this CSV before I import to our database"

Run full validation suite. Report schema compatibility issues. Flag any values that would violate database constraints (nulls in required fields, strings in numeric columns, dates out of range).

Troubleshooting

"File too large to process" Cause: CSV exceeds available memory. Solution: Process in chunks of 10,000 rows. Ask user if sampling is acceptable for initial analysis.

"Encoding error on read" Cause: File isn't UTF-8. Solution: Try latin-1 encoding. If that fails, ask user what encoding the file uses.

"Column count mismatch between rows" Cause: Unescaped commas or newlines in data. Solution: Try reading with different quoting settings. Show the problematic rows to the user.

Plus depuis ce dépôt

même dépôt

opencode-memory

digi4care/opencode-mastery

Persistent memory system with .memory.md, compaction, and cross-project sync

2026-02-210

ace-analysis

digi4care/opencode-mastery

Analyzes sessions using ACE framework to generate prompt improvement suggestions focused on durable patterns.

2026-02-210

database-architect

digi4care/opencode-mastery

Database technology selection and architecture - choose between SQL vs NoSQL, PostgreSQL vs MySQL vs MongoDB, scaling strategies, migration planning. Use for database DECISIONS. For PostgreSQL implementation use postgresql skill.

2026-02-210

flow-analysis

digi4care/opencode-mastery

Analyze feature flows for completeness, preconditions, and branch coverage using reusable flow-analyzer tools.

2026-02-210

gsd-validation

digi4care/opencode-mastery

Automated post-phase validation workflow for GSD phases with deterministic gates and HITL controls.

2026-02-210

advanced-skill

digi4care/opencode-mastery

Full workflow orchestration with config-driven routing and phase-based execution

2026-02-210

name	csv-data-pipeline
description	Cleans, validates, and transforms CSV files for analysis. Use when user says "clean this CSV", "fix my data", "prepare this spreadsheet for analysis", "validate this dataset", or uploads a .csv file that needs processing.

CSV Data Pipeline

Instructions

Step 1: Inspect the Data

When the user provides a CSV file:

Read the first 10 rows to understand structure
Count total rows and columns
Identify column types (text, number, date, boolean, email, URL)
Report findings to the user:

"Your file has {rows} rows and {cols} columns.
Here's what I found:
- {col_name}: {type} ({null_count} missing values)
- ...
{issues_count} potential issues detected."

Step 2: Identify Issues

Check for these common problems:

Issue	Detection	Auto-fix?
Missing values	Null/empty cells	Ask user
Duplicate rows	Exact row matches	Yes
Inconsistent dates	Multiple formats in same column	Yes — standardize to YYYY-MM-DD
Leading/trailing whitespace	Strip and compare	Yes
Mixed case in categories	"Active" vs "active" vs "ACTIVE"	Yes — normalize to lowercase
Outliers	Values > 3 standard deviations from mean	Flag, don't auto-fix

Present issues to the user and ask which to fix.

Step 3: Apply Fixes

For each approved fix:

python scripts/validate_csv.py --input {filename} --checks all

If the script finds issues:

Apply automatic fixes (whitespace, duplicates, date normalization)
For ambiguous fixes (missing values, outliers), ask the user: "Column 'revenue' has 12 missing values. Should I: (a) drop those rows, (b) fill with column median, (c) fill with 0, or (d) leave as-is?"

Step 4: Export Clean Data

Save the cleaned file with a descriptive name:

{original_name}_cleaned.csv — processed data
{original_name}_report.txt — summary of changes made

Tell the user what changed:

"Done! Here's what I fixed:
- Removed 23 duplicate rows
- Standardized 4 date formats to YYYY-MM-DD
- Trimmed whitespace in 'name' and 'email' columns
- Flagged 3 outliers in 'revenue' (not changed)
Clean file: data_cleaned.csv"

Examples

Example 1: Messy Customer List

User: "Clean this customer spreadsheet"

Found: 500 rows, columns [name, email, signup_date, plan, revenue] Issues:

12 duplicate emails
signup_date has "01/15/2025", "2025-01-15", "Jan 15, 2025" formats
3 rows with missing email
"Pro", "pro", "PRO" in plan column

Actions: Remove dupes, normalize dates, flag missing emails, lowercase plans.

Example 2: Large Dataset Validation

User: "Validate this CSV before I import to our database"

Run full validation suite. Report schema compatibility issues. Flag any values that would violate database constraints (nulls in required fields, strings in numeric columns, dates out of range).

Troubleshooting

"File too large to process" Cause: CSV exceeds available memory. Solution: Process in chunks of 10,000 rows. Ask user if sampling is acceptable for initial analysis.

"Encoding error on read" Cause: File isn't UTF-8. Solution: Try latin-1 encoding. If that fails, ask user what encoding the file uses.

"Column count mismatch between rows" Cause: Unescaped commas or newlines in data. Solution: Try reading with different quoting settings. Show the problematic rows to the user.