with one click
run-pipeline
// Run the full data science pipeline: validate raw data, preprocess, engineer features, train model, and evaluate. Use this when you want to execute the end-to-end ML pipeline or re-run it after data or code changes.
// Run the full data science pipeline: validate raw data, preprocess, engineer features, train model, and evaluate. Use this when you want to execute the end-to-end ML pipeline or re-run it after data or code changes.
Load the latest model checkpoint, run evaluation on the test set, and generate a metrics report with confusion matrix. Use this after training to assess model performance or to re-evaluate a specific checkpoint.
Generate a comprehensive summary report of the latest experiment including metrics, plots, and comparison with baseline. Use this after training and evaluation to create a shareable experiment summary.
Run API integration tests against the running backend, verify endpoints return expected responses and status codes. Use after deploying a preview or starting the dev server.
Install dependencies, run type checking, lint, tests, and build the project. Use after making code changes to verify nothing is broken.
Build Docker images and launch a local preview environment with docker-compose. Use to test the full stack locally before merging.
Build the Xcode project and run the full test suite. Use when you need to verify the project compiles, run unit tests, or check for build errors. Reports pass/fail results with detailed error output.
| name | run-pipeline |
| description | Run the full data science pipeline: validate raw data, preprocess, engineer features, train model, and evaluate. Use this when you want to execute the end-to-end ML pipeline or re-run it after data or code changes. |
| user-invocable | true |
| context | fork |
| allowed-tools | Bash, Read, Grep |
| argument-hint | [config-file] e.g. configs/experiment.yaml |
You are executing the full data science pipeline for this project. Run each stage sequentially, verifying success before proceeding to the next stage. Stop immediately if any stage fails and report the error clearly.
Current branch: !git branch --show-current
Data directory contents: !ls data/ 2>/dev/null || echo "No data/ directory found"
Available configs: !ls configs/*.yaml 2>/dev/null || ls configs/*.toml 2>/dev/null || echo "No config files found"
Python environment: !which python3 && python3 --version 2>/dev/null || echo "Python not found"
Recent changes: !git diff --stat HEAD~3 2>/dev/null || echo "No recent commits"
If the user provided a config file as an argument, use it: $ARGUMENTS
Otherwise, look for the default config at configs/experiment.yaml or configs/experiment.toml.
Execute each stage in order. After each stage, check for errors and verify outputs exist before proceeding.
Verify the Python environment is ready:
python3 -c "import torch; import pandas; import numpy; print(f'PyTorch {torch.__version__}, pandas {pandas.__version__}, NumPy {numpy.__version__}')"
If imports fail, report which packages are missing and suggest pip install -r requirements.txt.
Run data validation on the raw data:
python3 -m src.data.validate --data-dir data/raw/
If the validation script does not exist, look for alternative patterns:
python3 src/data/validate.pypython3 -m pytest tests/test_data/ -v --tb=shortsrc/data/ and report their statusVerify: validation passes with no critical errors. Log any warnings.
Run the preprocessing pipeline:
python3 -m src.data.preprocess --config $CONFIG_FILE
Alternative patterns:
python3 src/data/preprocess.py --config $CONFIG_FILEdvc repro preprocess (if DVC pipeline is configured)Verify: processed data files exist in data/processed/ (check for .parquet or .csv files).
Run feature engineering:
python3 -m src.features.build_features --config $CONFIG_FILE
Alternative patterns:
python3 src/features/build_features.pydvc repro featuresVerify: feature files exist in data/features/ with expected columns.
Run model training:
python3 -m src.models.training.trainer --config $CONFIG_FILE
Alternative patterns:
python3 src/models/train.py --config $CONFIG_FILEpython3 train.py --config $CONFIG_FILEMonitor output for:
Verify: model checkpoint exists in checkpoints/ directory.
Run model evaluation on the test set:
python3 -m src.models.evaluation.evaluate --checkpoint checkpoints/best_model.pt --config $CONFIG_FILE
Alternative patterns:
python3 src/evaluation/evaluate.pypython3 evaluate.py --checkpoint checkpoints/best_model.ptVerify: metrics JSON file exists in reports/ or experiments/.
After all stages complete, produce a summary: