一键在 Manus 中运行任何 Skill

research-baseline-builder

星标0

分支0

更新时间2026年6月19日 08:29

Translate scientific questions into clear data inputs, outputs, and baseline workflows. Use when Codex needs to help researchers clarify what data goes in, what target or result should come out, how samples/labels/features are defined, and how to solve the resulting data problem through visualization, preprocessing, baseline design, training, evaluation, and reporting.

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

TashanGKD

TashanGKD/tashanwork

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

文件资源管理器

6 个文件

SKILL.md

readonly

name	research-baseline-builder
description	Translate scientific questions into clear data inputs, outputs, and baseline workflows. Use when Codex needs to help researchers clarify what data goes in, what target or result should come out, how samples/labels/features are defined, and how to solve the resulting data problem through visualization, preprocessing, baseline design, training, evaluation, and reporting.

Research Baseline Builder

Overview

Use this skill to help a researcher turn a scientific question into a data problem with explicit inputs, outputs, and a baseline SOP. The goal is a defensible first experiment, not a model leaderboard.

Assume the user is a scientist or domain researcher. Do not over-explain their field. Help them make the data contract, baseline path, and evaluation boundary explicit.

The working shape is:

Goal + Data + Data description
-> data-task recommendation
-> framework selection
-> visualization / preprocessing / baseline execution
-> interpretation report
-> check against the original scientific goal

Core Rule

Do not start with models. First ask what goes in, what should come out, and what one sample means.

Always separate:

scientific question: what the researcher wants to know;
input: raw data, features, conditions, interventions, time points, images, text, spectra, tables, or sequences;
output: label, measurement, effect, ranking, cluster, forecast, mechanism claim, or report;
sample unit: what one row/sample/image/event/patient/material/paper represents;
success criterion: what result would answer the scientific question;
baseline: the simplest credible way to solve that data problem.

Collaboration Style

Treat domain claims as hypotheses to operationalize, not as material to lecture back.
Ask only for missing information that changes the data task, split, metric, or baseline.
Use the scientist's terms when they are clear; translate them into data roles beside the original wording.
Be direct about unidentifiable causal claims, missing labels, leakage, weak ground truth, and small-sample limits.
Avoid beginner tutorials unless the user asks. Give a research workflow, not a course note.

Workflow

Restate the scientific question in one sentence.
Clarify the input-output contract:
- research goal;
- input data;
- data description or field meaning;
- expected output;
- sample unit;
- label/outcome/effect;
- available features;
- grouping/time/batch/source fields;
- missing fields and assumptions.
Read references/problem-to-data-routing.md and choose the data-task family.
Read references/framework-selection.md and recommend the lightest sufficient framework.
Create a workspace with scripts/init_research_baseline_workspace.py when files are useful.
Write the SOP outputs in order:
- problem_definition.md
- data_schema.csv
- eda_plan.md
- preprocess_plan.md
- baseline_plan.md
- train_eval_plan.md
- baseline_report.md
Only generate code after the input, output, sample unit, split rule, metric, and leakage risks are clear.
After results exist, read references/goal-check.md and check whether the output actually answers the original scientific goal. If not, decompose the task, revise the data question, or stop with the missing evidence.

If no data file is available, do not invent columns or write runnable training code. Produce the input-output contract, expected schema, baseline SOP, and the checks needed once data is provided.

Baseline Ladder

After the input-output contract is clear, prefer this order:

sanity baseline: majority class, mean/median, last value, random, simple rule;
interpretable baseline: linear/logistic regression, Cox model, ARIMA, TF-IDF + linear model, simple statistical test;
strong classical baseline: RandomForest, XGBoost/LightGBM, SVM, mixed effects, propensity/matching, difference-in-differences;
neural or foundation baseline only when data size, modality, and evaluation justify it.

If the scientific question is causal, do not turn it into plain prediction without warning. Ask for intervention/exposure, outcome, confounders, timing, and identification assumptions.

Standard SOP

Use this spine after the scientific question has been translated:

Data visualization: label distribution, missingness, feature distributions, group/time/batch balance, target leakage checks.
Data preprocessing: cleaning, units, outliers, missing values, normalization, encoding, duplicate handling, train/val/test split.
Model building: baseline ladder, feature set, assumptions, implementation package.
Model training: split protocol, seeds, cross-validation, hyperparameter boundary, logging.
Model evaluation: primary metric, secondary metrics, uncertainty, subgroup performance, error slices, calibration when relevant.
Interpretation: translate figures, metrics, and errors back to the scientific question.

Guardrails

Do not recommend random split when samples share patient, material, paper, lab, batch, site, time window, or source identity.
Do not use post-outcome variables as features.
Do not use row IDs, object IDs, filenames, source IDs, or database keys as predictive features unless the scientific question explicitly justifies them.
Do not optimize only accuracy under imbalance.
Do not call correlation an effect.
Do not hide unavailable labels, small sample size, weak ground truth, or annotation noise.
Do not overbuild. A clear baseline beats an impressive but uncheckable model.

Output Contract

Report:

scientific question;
input-output contract;
recommended data task and framework;
data-task family and why;
dataset schema and missing fields;
split rule and leakage risks;
baseline ladder;
preprocessing and visualization checklist;
training/evaluation plan;
minimum files or scripts generated;
what must be confirmed before stronger modeling.

同仓库更多 Skills

同仓库

daytona-cloud-instance

TashanGKD/tashanwork

Daytona cloud instance, Den server, OpenWork Cloud, Marketplace onboarding. Use when the user asks to run, launch, start, validate, or record a Daytona cloud/Den instance for OpenWork Cloud flows.

2026-06-190

daytona-cloud-server

TashanGKD/tashanwork

Daytona cloud server and Den sandbox setup. Use when the user says Daytona server, cloud server, Den server, marketplace server, worker proxy, cloud auth, org policies, or connect Electron to a Daytona server.

2026-06-190

daytona-electron-den

TashanGKD/tashanwork

Validate Electron against a Daytona Den server. Use for two-sandbox cloud auth, marketplace, org policy, worker proxy, provider sync, or desktop handoff flows.

2026-06-190

daytona-electron-test

TashanGKD/tashanwork

Daytona Electron sandbox testing with CDP/noVNC. Use when the user says test on Daytona, run Electron on Daytona, Daytona dry run, test Electron remotely, reproduce on Daytona, or validate a real desktop flow.

2026-06-190

daytona-flow-validator

TashanGKD/tashanwork

Daytona UI flow validation loop. Use when validating real app behavior, checking a Daytona flow, proving a bug is fixed, or deciding pass/fail from CDP snapshots, screenshots, and assertions.

2026-06-190

daytona-recording-artifacts

TashanGKD/tashanwork

Daytona recording volume, screenshots, artifacts, and validation evidence. Use when the user says record Daytona, recording volume, artifacts volume, screenshots, proof, PR evidence, before/after video, or validate behavior visually.

2026-06-190

name	research-baseline-builder
description	Translate scientific questions into clear data inputs, outputs, and baseline workflows. Use when Codex needs to help researchers clarify what data goes in, what target or result should come out, how samples/labels/features are defined, and how to solve the resulting data problem through visualization, preprocessing, baseline design, training, evaluation, and reporting.

Research Baseline Builder

Overview

Assume the user is a scientist or domain researcher. Do not over-explain their field. Help them make the data contract, baseline path, and evaluation boundary explicit.

The working shape is:

Goal + Data + Data description
-> data-task recommendation
-> framework selection
-> visualization / preprocessing / baseline execution
-> interpretation report
-> check against the original scientific goal

Core Rule

Do not start with models. First ask what goes in, what should come out, and what one sample means.

Always separate:

scientific question: what the researcher wants to know;
input: raw data, features, conditions, interventions, time points, images, text, spectra, tables, or sequences;
output: label, measurement, effect, ranking, cluster, forecast, mechanism claim, or report;
sample unit: what one row/sample/image/event/patient/material/paper represents;
success criterion: what result would answer the scientific question;
baseline: the simplest credible way to solve that data problem.

Collaboration Style

Treat domain claims as hypotheses to operationalize, not as material to lecture back.
Ask only for missing information that changes the data task, split, metric, or baseline.
Use the scientist's terms when they are clear; translate them into data roles beside the original wording.
Be direct about unidentifiable causal claims, missing labels, leakage, weak ground truth, and small-sample limits.
Avoid beginner tutorials unless the user asks. Give a research workflow, not a course note.

Workflow

Restate the scientific question in one sentence.
Clarify the input-output contract:
- research goal;
- input data;
- data description or field meaning;
- expected output;
- sample unit;
- label/outcome/effect;
- available features;
- grouping/time/batch/source fields;
- missing fields and assumptions.
Read references/problem-to-data-routing.md and choose the data-task family.
Read references/framework-selection.md and recommend the lightest sufficient framework.
Create a workspace with scripts/init_research_baseline_workspace.py when files are useful.
Write the SOP outputs in order:
- problem_definition.md
- data_schema.csv
- eda_plan.md
- preprocess_plan.md
- baseline_plan.md
- train_eval_plan.md
- baseline_report.md
Only generate code after the input, output, sample unit, split rule, metric, and leakage risks are clear.
After results exist, read references/goal-check.md and check whether the output actually answers the original scientific goal. If not, decompose the task, revise the data question, or stop with the missing evidence.

If no data file is available, do not invent columns or write runnable training code. Produce the input-output contract, expected schema, baseline SOP, and the checks needed once data is provided.

Baseline Ladder

After the input-output contract is clear, prefer this order:

sanity baseline: majority class, mean/median, last value, random, simple rule;
interpretable baseline: linear/logistic regression, Cox model, ARIMA, TF-IDF + linear model, simple statistical test;
strong classical baseline: RandomForest, XGBoost/LightGBM, SVM, mixed effects, propensity/matching, difference-in-differences;
neural or foundation baseline only when data size, modality, and evaluation justify it.

If the scientific question is causal, do not turn it into plain prediction without warning. Ask for intervention/exposure, outcome, confounders, timing, and identification assumptions.

Standard SOP

Use this spine after the scientific question has been translated:

Data visualization: label distribution, missingness, feature distributions, group/time/batch balance, target leakage checks.
Data preprocessing: cleaning, units, outliers, missing values, normalization, encoding, duplicate handling, train/val/test split.
Model building: baseline ladder, feature set, assumptions, implementation package.
Model training: split protocol, seeds, cross-validation, hyperparameter boundary, logging.
Model evaluation: primary metric, secondary metrics, uncertainty, subgroup performance, error slices, calibration when relevant.
Interpretation: translate figures, metrics, and errors back to the scientific question.

Guardrails

Do not recommend random split when samples share patient, material, paper, lab, batch, site, time window, or source identity.
Do not use post-outcome variables as features.
Do not use row IDs, object IDs, filenames, source IDs, or database keys as predictive features unless the scientific question explicitly justifies them.
Do not optimize only accuracy under imbalance.
Do not call correlation an effect.
Do not hide unavailable labels, small sample size, weak ground truth, or annotation noise.
Do not overbuild. A clear baseline beats an impressive but uncheckable model.

Output Contract

Report:

scientific question;
input-output contract;
recommended data task and framework;
data-task family and why;
dataset schema and missing fields;
split rule and leakage risks;
baseline ladder;
preprocessing and visualization checklist;
training/evaluation plan;
minimum files or scripts generated;
what must be confirmed before stronger modeling.