Run any Skill in Manus with one click

$pwd:

csv-pipeline

Name: Csv Pipeline
Author: AIDotNet

// Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.

Run Skill in Manus

$ git log --oneline --stat

stars:462

forks:91

updated:March 20, 2026 at 09:48

File Explorer

2 files

SKILL.md

readonly

name	csv-pipeline
description	Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.
compatibility	Requires Python 3. No external dependencies beyond Python standard library (csv, json modules).

CSV Data Pipeline

Process tabular data (CSV, TSV, JSON, JSON Lines) using standard command-line tools and Python. No external dependencies required beyond Python 3.

When to use this skill

User provides a CSV/TSV/JSON file and asks to analyze, transform, or report on it
Joining, filtering, grouping, or aggregating tabular data
Converting between formats (CSV to JSON, JSON to CSV, etc.)
Deduplicating, sorting, or cleaning messy data
Generating summary statistics or reports
ETL workflows: extract from one format, transform, load into another

Scripts overview

Script	Purpose	Dependencies
`csv_tool.py`	All-in-one CSV/JSON processing tool	Python 3 (stdlib only)

Steps

1. Ensure Python 3 is available

python --version

CRITICAL — Error Recovery: If any script below fails with an error, check that Python 3 is available and re-run the exact same command.

2. Inspect a data file

python scripts/csv_tool.py inspect "DATA_FILE"

Shows row count, column names, and non-empty value counts per column.

3. Filter rows

python scripts/csv_tool.py filter "DATA_FILE" --column COLUMN_NAME --op OPERATOR --value VALUE --output "OUTPUT_FILE"

Operators: eq, neq, gt, gte, lt, lte, contains, startswith, endswith

4. Sort data

python scripts/csv_tool.py sort "DATA_FILE" --column COLUMN_NAME --order asc --output "OUTPUT_FILE"

Options: --numeric for numeric sorting, --order desc for descending.

5. Deduplicate

python scripts/csv_tool.py dedup "DATA_FILE" --columns "col1,col2" --output "OUTPUT_FILE"

Remove duplicates by specified columns (or all columns if omitted).

6. Aggregate / Group By

python scripts/csv_tool.py aggregate "DATA_FILE" --group-by COLUMN --agg-column VALUE_COL --func sum --output "OUTPUT_FILE"

Functions: sum, avg, count, min, max

7. Join two datasets

python scripts/csv_tool.py join "LEFT_FILE" "RIGHT_FILE" --on KEY_COLUMN --how inner --output "OUTPUT_FILE"

Join types: inner, left

8. Convert formats

python scripts/csv_tool.py convert "DATA_FILE" --to json --output "OUTPUT_FILE"

Supported conversions: csv, json, jsonl (JSON Lines), tsv

9. Generate summary report

python scripts/csv_tool.py report "DATA_FILE" --group-by CATEGORY_COL --value-column VALUE_COL --output "report.md"

Generates a Markdown summary table with count, sum, avg, min, max per group.

10. Clean data

python scripts/csv_tool.py clean "DATA_FILE" --output "CLEAN_FILE"

Strips whitespace, normalizes empty values (N/A, null, None → empty), normalizes booleans.

Decision guide

Quick look → inspect to understand the data
Filter/sort/dedup → use the corresponding subcommand
Summarize → aggregate for raw data, report for Markdown output
Combine files → join two datasets on a shared key
Change format → convert between CSV/JSON/TSV

Edge cases

Large files (100MB+): The tool processes data in streaming fashion where possible
Encoding issues: Files are read as UTF-8 by default. For BOM files, use UTF-8-SIG
Quoted fields: Python's csv module handles RFC 4180 quoting automatically
Mixed types: Numeric operations attempt float conversion, falling back to 0

Scripts

csv_tool.py — All-in-one CSV/JSON data processing tool

related-skills.json

same repository

docx.md

from "AIDotNet/OpenCowork"

Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When GLM needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks

2026-05-12462

email-drafter.md

from "AIDotNet/OpenCowork"

Generate professional email drafts using Python templates. Use when the user needs to compose business emails, follow-ups, introductions, meeting requests, or other professional correspondence. Supports multiple tones, languages, and email types with structured output.

2026-03-20462

excel-processor.md

from "AIDotNet/OpenCowork"

Read, write, analyze, and format Excel spreadsheets (.xlsx). Use when the user needs to create Excel files, extract data from spreadsheets, apply formulas, format cells, or generate Excel reports from data. Supports multiple sheets, charts, and conditional formatting.

2026-03-20462

image-ocr.md

from "AIDotNet/OpenCowork"

Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.

2026-03-20462

pdf.md

from "AIDotNet/OpenCowork"

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When GLM needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

2026-03-20462

web-scraper.md

from "AIDotNet/OpenCowork"

Scrape web pages, search the internet, and extract structured content using Python. Use when the user wants to fetch a webpage, search for information online, extract links, or crawl JavaScript-rendered dynamic pages.

2026-03-20462

package.json

"author": "AIDotNet"

"repository": "AIDotNet/OpenCowork"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name	csv-pipeline
description	Process, transform, analyze, and report on CSV and JSON data files. Use when the user needs to filter rows, join datasets, compute aggregates, convert formats, deduplicate, or generate summary reports from tabular data. Works with any CSV, TSV, or JSON Lines file.
compatibility	Requires Python 3. No external dependencies beyond Python standard library (csv, json modules).

CSV Data Pipeline

Process tabular data (CSV, TSV, JSON, JSON Lines) using standard command-line tools and Python. No external dependencies required beyond Python 3.

When to use this skill

User provides a CSV/TSV/JSON file and asks to analyze, transform, or report on it
Joining, filtering, grouping, or aggregating tabular data
Converting between formats (CSV to JSON, JSON to CSV, etc.)
Deduplicating, sorting, or cleaning messy data
Generating summary statistics or reports
ETL workflows: extract from one format, transform, load into another

Scripts overview

Script	Purpose	Dependencies
`csv_tool.py`	All-in-one CSV/JSON processing tool	Python 3 (stdlib only)

Steps

1. Ensure Python 3 is available

python --version

CRITICAL — Error Recovery: If any script below fails with an error, check that Python 3 is available and re-run the exact same command.

2. Inspect a data file

python scripts/csv_tool.py inspect "DATA_FILE"

Shows row count, column names, and non-empty value counts per column.

3. Filter rows

python scripts/csv_tool.py filter "DATA_FILE" --column COLUMN_NAME --op OPERATOR --value VALUE --output "OUTPUT_FILE"

Operators: eq, neq, gt, gte, lt, lte, contains, startswith, endswith

4. Sort data

python scripts/csv_tool.py sort "DATA_FILE" --column COLUMN_NAME --order asc --output "OUTPUT_FILE"

Options: --numeric for numeric sorting, --order desc for descending.

5. Deduplicate

python scripts/csv_tool.py dedup "DATA_FILE" --columns "col1,col2" --output "OUTPUT_FILE"

Remove duplicates by specified columns (or all columns if omitted).

6. Aggregate / Group By

python scripts/csv_tool.py aggregate "DATA_FILE" --group-by COLUMN --agg-column VALUE_COL --func sum --output "OUTPUT_FILE"

Functions: sum, avg, count, min, max

7. Join two datasets

python scripts/csv_tool.py join "LEFT_FILE" "RIGHT_FILE" --on KEY_COLUMN --how inner --output "OUTPUT_FILE"

Join types: inner, left

8. Convert formats

python scripts/csv_tool.py convert "DATA_FILE" --to json --output "OUTPUT_FILE"

Supported conversions: csv, json, jsonl (JSON Lines), tsv

9. Generate summary report

python scripts/csv_tool.py report "DATA_FILE" --group-by CATEGORY_COL --value-column VALUE_COL --output "report.md"

Generates a Markdown summary table with count, sum, avg, min, max per group.

10. Clean data

python scripts/csv_tool.py clean "DATA_FILE" --output "CLEAN_FILE"

Strips whitespace, normalizes empty values (N/A, null, None → empty), normalizes booleans.

Decision guide

Quick look → inspect to understand the data
Filter/sort/dedup → use the corresponding subcommand
Summarize → aggregate for raw data, report for Markdown output
Combine files → join two datasets on a shared key
Change format → convert between CSV/JSON/TSV

Edge cases

Large files (100MB+): The tool processes data in streaming fashion where possible
Encoding issues: Files are read as UTF-8 by default. For BOM files, use UTF-8-SIG
Quoted fields: Python's csv module handles RFC 4180 quoting automatically
Mixed types: Numeric operations attempt float conversion, falling back to 0

Scripts

csv_tool.py — All-in-one CSV/JSON data processing tool

csv-pipeline

CSV Data Pipeline

When to use this skill

Scripts overview

Steps

1. Ensure Python 3 is available

2. Inspect a data file

3. Filter rows

4. Sort data

5. Deduplicate

6. Aggregate / Group By

7. Join two datasets

8. Convert formats

9. Generate summary report

10. Clean data

Decision guide

Edge cases

Scripts

More from this repository

More from this repository

CSV Data Pipeline

When to use this skill

Scripts overview

Steps

1. Ensure Python 3 is available

2. Inspect a data file

3. Filter rows

4. Sort data

5. Deduplicate

6. Aggregate / Group By

7. Join two datasets

8. Convert formats

9. Generate summary report

10. Clean data

Decision guide

Edge cases

Scripts