Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

dataset-validate

Name: Dataset Validate
Author: tsingyuai

// Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure.

Exécuter dans Manus

$ git log --oneline --stat

stars:532

forks:50

updated:3 avril 2026 à 07:59

Explorateur de fichiers

2 fichiers

SKILL.md

readonly

name	dataset-validate
description	Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure.
metadata	{"openclaw":{"emoji":"🗂️","requires":{"bins":["python3","uv"]}}}

Dataset Validate

Don't ask permission. Just do it.

Use this skill before or alongside model implementation review when data quality needs to be checked separately from model quality.

Outputs go to the workspace root.

Use This When

plan_res.md already exists
the project is about to implement or has just implemented a model
data quality, split quality, or label integrity is still uncertain

Do Not Use This When

the project has no concrete plan yet
there is no dataset or data-loading path to inspect

Required Inputs

plan_res.md
project/ if a data pipeline already exists
survey_res.md when it defines dataset or protocol expectations

If plan_res.md is missing, stop and say: Run /research-plan first to complete the implementation plan.

Required Output

data_validation.md

Workflow

Step 1: Read the Data Contract

Read:

plan_res.md
survey_res.md if present
current data-loading code under project/data/ if present

Extract:

expected dataset name
source
split structure
label or target format
expected shapes

Step 2: Audit Data Reality

Check:

whether dataset files actually exist
whether the data is real or mock
whether mock usage is clearly declared
whether row count / sample count is plausible

Step 3: Audit Data Integrity

Check:

train / val / test split existence and separation
label distribution or target sanity
shape / dtype consistency
obvious leakage risks
preprocessing consistency with plan_res.md

If code exists, run lightweight inspection commands under the project environment to verify counts and sample structure.

Step 4: Write `data_validation.md`

Use references/data-validation-template.md.

The report must include:

dataset identity
data reality check
split integrity
label / target health
leakage risk
mock-data disclosure
verdict: PASS, NEEDS_REVISION, or BLOCKED
exact next step

Rules

Keep data quality separate from model quality.
Never infer that data is real if the files or loading path are missing.
If mock data is used, call it out explicitly.
If data leakage is plausible, treat it as blocking until clarified.

related-skills.json

même dépôt

algorithm-selection.md

from "tsingyuai/scientify"

Use this when the user needs to choose between multiple ML routes after survey but before committing to implementation. Compares candidate approaches, selects one, records rejected routes, and keeps a fallback.

2026-04-03532

baseline-runner.md

from "tsingyuai/scientify"

Use this when the project needs real baseline results before or alongside the main model. Runs classical or literature-aligned baselines under the same protocol and writes a reproducible baseline summary.

2026-04-03532

artifact-review.md

from "tsingyuai/scientify"

Use this when the user wants a draft paper, figure bundle, README, release page, or experiment artifact reviewed before sharing. Checks evidence binding, claim scope, captions, layout clarity, and release readiness.

2026-04-03532

figure-standardize.md

from "tsingyuai/scientify"

Use this when the user wants to improve chart quality, standardize plotting style, regenerate release figures, or add captions/protocol notes. Normalizes fonts, colors, legends, units, and scope notes across Scientify figures.

2026-04-03532

release-layout.md

from "tsingyuai/scientify"

Use this when the user wants to improve README, docs pages, or microsites so a new reader can understand what the project is, how to use it, what artifacts exist, and what the scope boundaries are within one screen.

2026-04-03532

research-experiment.md

from "tsingyuai/scientify"

[Read when prompt contains /research-experiment]

2026-04-03532

package.json

"author": "tsingyuai"

"repository": "tsingyuai/scientify"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Scientifiques des donnéesProfessions informatiques et mathématiques15-2051L4

name	dataset-validate
description	Use this when the project needs a dedicated data-quality review before model review. Checks data reality, split correctness, label health, leakage risk, shape consistency, and mock-data disclosure.
metadata	{"openclaw":{"emoji":"🗂️","requires":{"bins":["python3","uv"]}}}

Dataset Validate

Don't ask permission. Just do it.

Use this skill before or alongside model implementation review when data quality needs to be checked separately from model quality.

Outputs go to the workspace root.

Use This When

plan_res.md already exists
the project is about to implement or has just implemented a model
data quality, split quality, or label integrity is still uncertain

Do Not Use This When

the project has no concrete plan yet
there is no dataset or data-loading path to inspect

Required Inputs

plan_res.md
project/ if a data pipeline already exists
survey_res.md when it defines dataset or protocol expectations

If plan_res.md is missing, stop and say: Run /research-plan first to complete the implementation plan.

Required Output

data_validation.md

Workflow

Step 1: Read the Data Contract

Read:

plan_res.md
survey_res.md if present
current data-loading code under project/data/ if present

Extract:

expected dataset name
source
split structure
label or target format
expected shapes

Step 2: Audit Data Reality

Check:

whether dataset files actually exist
whether the data is real or mock
whether mock usage is clearly declared
whether row count / sample count is plausible

Step 3: Audit Data Integrity

Check:

train / val / test split existence and separation
label distribution or target sanity
shape / dtype consistency
obvious leakage risks
preprocessing consistency with plan_res.md

If code exists, run lightweight inspection commands under the project environment to verify counts and sample structure.

Step 4: Write `data_validation.md`

Use references/data-validation-template.md.

The report must include:

dataset identity
data reality check
split integrity
label / target health
leakage risk
mock-data disclosure
verdict: PASS, NEEDS_REVISION, or BLOCKED
exact next step

Rules

Keep data quality separate from model quality.
Never infer that data is real if the files or loading path are missing.
If mock data is used, call it out explicitly.
If data leakage is plausible, treat it as blocking until clarified.

dataset-validate

Dataset Validate

Use This When

Do Not Use This When

Required Inputs

Required Output

Workflow

Step 1: Read the Data Contract

Step 2: Audit Data Reality

Step 3: Audit Data Integrity

Step 4: Write data_validation.md

Rules

Plus depuis ce dépôt

Plus depuis ce dépôt

Dataset Validate

Use This When

Do Not Use This When

Required Inputs

Required Output

Workflow

Step 1: Read the Data Contract

Step 2: Audit Data Reality

Step 3: Audit Data Integrity

Step 4: Write data_validation.md

Rules

Step 4: Write `data_validation.md`

Step 4: Write `data_validation.md`