Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

haipipe-task-for-raw

Sterne0

Forks0

Aktualisiert24. Juni 2026 um 16:24

Raw extraction task-folder build specialist. Scaffolds {NN}_<name>/ task-folders under R-series task-groups that extract source tables from Databricks as single parquet files, then process locally with Python. Called by /haipipe-task orchestrator when task-type=raw. Direct invocation works for scoped scaffolding. Cross-references /haipipe-data-raw.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

jluo41

jluo41/Tools

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Datei-Explorer

4 Dateien

SKILL.md

readonly

name	haipipe-task-for-raw
description	Raw extraction task-folder build specialist. Scaffolds {NN}_<name>/ task-folders under R-series task-groups that extract source tables from Databricks as single parquet files, then process locally with Python. Called by /haipipe-task orchestrator when task-type=raw. Direct invocation works for scoped scaffolding. Cross-references /haipipe-data-raw.
argument-hint	[project_id] [group] [task-name]
allowed-tools	Bash, Read, Write, Edit, Grep, Glob, Skill
metadata	{"version":"1.0.0","last_updated":"2026-06-10","summary":"Raw extraction task-folder build specialist (Databricks → parquet → local Python).","changelog":["1.0.0 (2026-06-10): initial version — extract-wide-process-local doctrine."]}

Skill: haipipe-task-for-raw

Scaffolds a raw extraction task-folder — a runnable example that extracts source tables from a Databricks catalog as wide parquet files, then optionally processes them locally with Python (pandas). Heavy outputs land in _WorkSpace/0-RawStore/<cohort>/; the task-folder keeps scripts, configs, and convert-only notebooks.

Invocation modes (see ../../haipipe-task/ref/invocation-modes.md): interactive (a human steers; missing fields get ASKed) OR headless (a full spec → run silently, no ASK). haipipe-task-creator-agent calls this skill headless during fan-out, then authors the <TASK>.py body. Always end with the structured return block (status / task_folder / run_name / files).

Position in the series

/haipipe-task-for-data            data-pipeline (Stages 1-4)
/haipipe-task-for-raw         ◀── you are here (Stage 0 — raw extraction)
/haipipe-task-for-algo            algo-dev demo
/haipipe-task-for-training        model training
/haipipe-task-for-eval            model evaluation
/haipipe-task-for-display         paper figure / table
/haipipe-task-for-individual      individual-centric query
/haipipe-task-for-agent           LLM agent call
/haipipe-task-for-inference       inference profiling

What this scaffolds

tasks/R{NN}_<cohort_name>/                   ← group (R-series)
└── {NN}_stage{S}_{description}/             ← task-folder this scaffold creates
    ├── {NN}_stage{S}_{description}.py       source + # %% cells (SQL strings in Python)
    ├── configs/
    │   └── <run_name>.yaml                  seeded from ref/config-seed.yaml
    ├── runs/
    │   └── <run_name>.sh                    from ref/run-databricks-sh-template.sh
    ├── results/                             runtime.yaml + light artifacts
    └── notebooks/                           .ipynb for Databricks upload (convert-only)

Group letter default: R (raw extraction). Heavy outputs land in: _WorkSpace/0-RawStore/<cohort>/.

Extract-Wide-Process-Local Doctrine

This is the core philosophy. Every raw extraction task MUST follow it:

One SQL query per source table → one large parquet file. Keep SQL simple: SELECT columns FROM single_table WHERE filters. Avoid complex JOINs in SQL. If you need joins, extract both tables as separate parquet files and join in Python.
Save parquet to Databricks catalog volume. Path pattern: /Volumes/<catalog>/<schema>/<volume>/<cohort>/<table>.parquet
Download/sync parquet to local _WorkSpace/0-RawStore/<cohort>/. One parquet file per source table. No partitioned directories.
Process with Python (pandas), NOT Spark. Local reads, local transforms, local output. Spark is for extraction only (because the data lives in Databricks). Once the parquet is local, everything is pandas.

Execution model — Databricks notebooks

Unlike other task-types that use papermill for local execution, raw extraction tasks run on Databricks. The run script only converts the .py to .ipynb — it does NOT execute locally.

Workflow:

runs/<RUN>.sh converts .py → .ipynb and writes runtime.yaml
User uploads .ipynb to Databricks workspace (or uses dbx CLI)
User runs the notebook on a Databricks cluster
Extracted parquet files land in the catalog volume
User syncs parquet to local _WorkSpace/0-RawStore/<cohort>/

The run-script template is ref/run-databricks-sh-template.sh — convert-only, no papermill execute.

Stage naming within a cohort group

Each cohort's extraction pipeline is organized as numbered stages:

R01_prediabetes/
├── 01_stage1_extract_tables/     ← SQL extraction (runs on Databricks)
├── 02_stage2_process/            ← Python processing (runs locally)
└── sbatch/

Convention:

stage1 = extract SQL tables → parquet (Databricks)
stage2 = read parquet, clean/transform with pandas (local)
stage3+ = optional further processing stages

Stage numbering is cohort-specific. Different cohorts may have different numbers of stages depending on complexity.

Cross-reference to pipeline skill

/haipipe-data-raw owns the understanding of raw cohort data — the datapoint-timeline lifecycle documentation. After extraction, suggest /haipipe-data-raw understand <cohort> to document what was extracted, then /haipipe-data-source to wrap into Stage 1.

Commands

/haipipe-task-for-raw                              ASK project / group / name
/haipipe-task-for-raw <project> <group> <name>     scaffold direct

Scaffold flow

See fn/scaffold.md for the detailed step-by-step. Summary:

Identify project + task-group.
Collect metadata (NN, name, stage number, _meta block).
Create skeleton (.py, configs/, runs/, results/, notebooks/).
Seed config from ref/config-seed.yaml.
Copy run-script from ref/run-databricks-sh-template.sh.
Suggest next via cross-skill link.
Emit return contract.

Return contract

status:    ok | blocked | failed
summary:   2-3 sentences on what was scaffolded
artifacts: [paths created]
next:      /haipipe-data-raw understand <cohort>  OR  run on Databricks

MUST NOT

Place heavy artifacts (.parquet, .csv > 1 MB) in results/. Heavy outputs land in _WorkSpace/0-RawStore/<cohort>/.
Write complex multi-table JOINs in SQL — extract tables separately, join in Python downstream.
Use Spark for local processing — pandas only once data is local.
Skip the _meta: block.
Create README.md.

First-run gate

runs/<RUN>.sh does NOT execute the notebook — it only converts. The code-review gate is still present (inherited from the base template pattern) but uses skip_review: true by default for initial scaffolding since the notebook will be reviewed manually before Databricks upload.

Mehr aus diesem Repository

gleiches Repository

haipipe-paper-minimap

jluo41/Tools

Create or update the paper folder's 0-lifecycle/5-minimap/5-minimap.tex: the paper IN MINIATURE. Each manuscript paragraph becomes 4-5 sentence-points (one point = one manuscript sentence) tagged with the claim it carries, closed by a lean narrative note and any advisor-feedback line, with the real display floats rendered inline as boxed thumbnails. Venue-shaped (section order + abstract form from _venue/playbook-<venue>), rendered with ref/minimap-template.tex. Use for paragraph minimap, sentence-point spine, evidence anchor, section map, paper-in-miniature, 5-minimap.

2026-06-240

haipipe-paper

jluo41/Tools

Run any paper-lifecycle work. Use `/haipipe-paper enter <paper-path>` or `/haipipe-paper status [paper-path]` to preload an open-needs paper dashboard from STATUS.md, 0-lifecycle, 1-rounds, 0-displays, 0-sections, and git state. Paper lifecycle owns paper-specific story, angle, claims, narrative, displays, minimap, maturity, and dated work rounds; open GAP/NEED items accumulate as probe plans in 1-probe-plans/ and batch-dispatch to /haipipe-probe (the universal evidence gateway for claims; probe calls task/discover during Gather). Direct task/discover verbs available for non-claim utility work. Also parses intent (venue + phase) and dispatches to specialists for writing/revising/rebutting papers. Trigger: paper, enter paper, paper status, open needs, claim gap, figure table gap, round, paper round, work round, write paper, paper pipeline, paper writing, draft paper, revise paper, polish tex, rebuttal, reply to reviewers, probe, probe run, discover, task, evidence, 写论文, 论文流程, /haipipe-paper.

2026-06-240

haipipe-paper-display-diagram

jluo41/Tools

Generate deterministic publication-quality architecture, workflow, and pipeline diagrams from structured JSON (FigureSpec) into editable SVG. Use when user says "架构图", "workflow 图", "pipeline 图", "确定性矢量图", "figure spec", "draw architecture", or needs precise, editable, publication-ready vector diagrams. Preferred over AI illustration for formal architecture/workflow figures.

2026-06-240

haipipe-paper-display-diagram

Application layer orchestrator (the application umbrella). Routes every session-style workflow in the haipipe stack: intervention lifecycle (seed → pitch → [venue] → claims → narrative → display → minimap → draft → review → deploy → iterate) and research questions (ask). Same stage vocabulary as paper. Venue (SMS, checklist, dashboard, report, etc.) determines which stages fire, claims depth (light/medium/full), and whether gates run. One generic draft skill reads the venue profile — no format-specific sub-skills. Trigger: ask, question, session, intervention, enter, status, seed, pitch, venue, claims, narrative, display, minimap, draft, review, deploy, iterate, round, sms, message, /haipipe-application.

2026-06-240

haipipe-discovery

jluo41/Tools

Router and durable lifecycle for the discovery (external-evidence) layer. A discovery is one research topic = one folder, a sibling of a task-folder, running the uniform Plan -> Build(opt) -> Execute -> Report lifecycle across 3 folder types: Search (source = search+read -> sources.md/notes.md), Review (analyze = judge a claim -> verdict.md, or synthesize a field -> landscape.md), Idea (idea -> ideas.md). The 4 capability buckets (search/read/review/idea via arxiv/semantic-scholar/exa/alphaxiv/research-lit/idea-creator/novelty-check) are the Execute-stage workers. Trigger: discover, find paper, lit review, 找idea, 查新, source, verdict, landscape, /haipipe-discovery.

2026-06-240

name	haipipe-task-for-raw
description	Raw extraction task-folder build specialist. Scaffolds {NN}_<name>/ task-folders under R-series task-groups that extract source tables from Databricks as single parquet files, then process locally with Python. Called by /haipipe-task orchestrator when task-type=raw. Direct invocation works for scoped scaffolding. Cross-references /haipipe-data-raw.
argument-hint	[project_id] [group] [task-name]
allowed-tools	Bash, Read, Write, Edit, Grep, Glob, Skill
metadata	{"version":"1.0.0","last_updated":"2026-06-10","summary":"Raw extraction task-folder build specialist (Databricks → parquet → local Python).","changelog":["1.0.0 (2026-06-10): initial version — extract-wide-process-local doctrine."]}

Skill: haipipe-task-for-raw

Position in the series

/haipipe-task-for-data            data-pipeline (Stages 1-4)
/haipipe-task-for-raw         ◀── you are here (Stage 0 — raw extraction)
/haipipe-task-for-algo            algo-dev demo
/haipipe-task-for-training        model training
/haipipe-task-for-eval            model evaluation
/haipipe-task-for-display         paper figure / table
/haipipe-task-for-individual      individual-centric query
/haipipe-task-for-agent           LLM agent call
/haipipe-task-for-inference       inference profiling

What this scaffolds

tasks/R{NN}_<cohort_name>/                   ← group (R-series)
└── {NN}_stage{S}_{description}/             ← task-folder this scaffold creates
    ├── {NN}_stage{S}_{description}.py       source + # %% cells (SQL strings in Python)
    ├── configs/
    │   └── <run_name>.yaml                  seeded from ref/config-seed.yaml
    ├── runs/
    │   └── <run_name>.sh                    from ref/run-databricks-sh-template.sh
    ├── results/                             runtime.yaml + light artifacts
    └── notebooks/                           .ipynb for Databricks upload (convert-only)

Group letter default: R (raw extraction). Heavy outputs land in: _WorkSpace/0-RawStore/<cohort>/.

Extract-Wide-Process-Local Doctrine

This is the core philosophy. Every raw extraction task MUST follow it:

One SQL query per source table → one large parquet file. Keep SQL simple: SELECT columns FROM single_table WHERE filters. Avoid complex JOINs in SQL. If you need joins, extract both tables as separate parquet files and join in Python.
Save parquet to Databricks catalog volume. Path pattern: /Volumes/<catalog>/<schema>/<volume>/<cohort>/<table>.parquet
Download/sync parquet to local _WorkSpace/0-RawStore/<cohort>/. One parquet file per source table. No partitioned directories.
Process with Python (pandas), NOT Spark. Local reads, local transforms, local output. Spark is for extraction only (because the data lives in Databricks). Once the parquet is local, everything is pandas.

Execution model — Databricks notebooks

Unlike other task-types that use papermill for local execution, raw extraction tasks run on Databricks. The run script only converts the .py to .ipynb — it does NOT execute locally.

Workflow:

runs/<RUN>.sh converts .py → .ipynb and writes runtime.yaml
User uploads .ipynb to Databricks workspace (or uses dbx CLI)
User runs the notebook on a Databricks cluster
Extracted parquet files land in the catalog volume
User syncs parquet to local _WorkSpace/0-RawStore/<cohort>/

The run-script template is ref/run-databricks-sh-template.sh — convert-only, no papermill execute.

Stage naming within a cohort group

Each cohort's extraction pipeline is organized as numbered stages:

R01_prediabetes/
├── 01_stage1_extract_tables/     ← SQL extraction (runs on Databricks)
├── 02_stage2_process/            ← Python processing (runs locally)
└── sbatch/

Convention:

stage1 = extract SQL tables → parquet (Databricks)
stage2 = read parquet, clean/transform with pandas (local)
stage3+ = optional further processing stages

Stage numbering is cohort-specific. Different cohorts may have different numbers of stages depending on complexity.

Cross-reference to pipeline skill

Commands

/haipipe-task-for-raw                              ASK project / group / name
/haipipe-task-for-raw <project> <group> <name>     scaffold direct

Scaffold flow

See fn/scaffold.md for the detailed step-by-step. Summary:

Identify project + task-group.
Collect metadata (NN, name, stage number, _meta block).
Create skeleton (.py, configs/, runs/, results/, notebooks/).
Seed config from ref/config-seed.yaml.
Copy run-script from ref/run-databricks-sh-template.sh.
Suggest next via cross-skill link.
Emit return contract.

Return contract

status:    ok | blocked | failed
summary:   2-3 sentences on what was scaffolded
artifacts: [paths created]
next:      /haipipe-data-raw understand <cohort>  OR  run on Databricks

MUST NOT

Place heavy artifacts (.parquet, .csv > 1 MB) in results/. Heavy outputs land in _WorkSpace/0-RawStore/<cohort>/.
Write complex multi-table JOINs in SQL — extract tables separately, join in Python downstream.
Use Spark for local processing — pandas only once data is local.
Skip the _meta: block.
Create README.md.