con un clic
generate-report
// Generate a comprehensive summary report of the latest experiment including metrics, plots, and comparison with baseline. Use this after training and evaluation to create a shareable experiment summary.
// Generate a comprehensive summary report of the latest experiment including metrics, plots, and comparison with baseline. Use this after training and evaluation to create a shareable experiment summary.
Load the latest model checkpoint, run evaluation on the test set, and generate a metrics report with confusion matrix. Use this after training to assess model performance or to re-evaluate a specific checkpoint.
Run the full data science pipeline: validate raw data, preprocess, engineer features, train model, and evaluate. Use this when you want to execute the end-to-end ML pipeline or re-run it after data or code changes.
Run API integration tests against the running backend, verify endpoints return expected responses and status codes. Use after deploying a preview or starting the dev server.
Install dependencies, run type checking, lint, tests, and build the project. Use after making code changes to verify nothing is broken.
Build Docker images and launch a local preview environment with docker-compose. Use to test the full stack locally before merging.
Build the Xcode project and run the full test suite. Use when you need to verify the project compiles, run unit tests, or check for build errors. Reports pass/fail results with detailed error output.
| name | generate-report |
| description | Generate a comprehensive summary report of the latest experiment including metrics, plots, and comparison with baseline. Use this after training and evaluation to create a shareable experiment summary. |
| user-invocable | true |
| context | fork |
| allowed-tools | Bash, Read, Grep, Write |
| argument-hint | [experiment-name] e.g. 'transformer-v2-lr-sweep' |
You are generating a comprehensive experiment report for this data science project. Your goal is to gather all available metrics, plots, and configuration details from the latest experiment and produce a clear, well-structured report that can be shared with the team.
Current branch: !git branch --show-current
Git commit: !git rev-parse --short HEAD 2>/dev/null || echo "unknown"
Recent experiment logs: !ls -lt reports/*.json experiments/*.json 2>/dev/null | head -5 || echo "No experiment logs found"
Available plots: !ls reports/figures/*.png reports/figures/*.svg 2>/dev/null | head -10 || echo "No plots found"
Checkpoints: !ls -lt checkpoints/*.pt checkpoints/*.pth 2>/dev/null | head -3 || echo "No checkpoints"
Config used: !ls configs/*.yaml configs/*.toml 2>/dev/null | head -3 || echo "No configs"
If the user provided an experiment name: $ARGUMENTS
Otherwise, derive one from the branch name, latest config file, or use the current date.
Collect all available information about the latest experiment:
reports/ or experiments/# Find and read latest metrics
METRICS_FILE=$(ls -t reports/*.json experiments/*.json 2>/dev/null | head -1)
if [ -n "$METRICS_FILE" ]; then
echo "=== Latest Metrics ==="
cat "$METRICS_FILE"
fi
# Find config used
CONFIG_FILE=$(ls -t configs/*.yaml configs/*.toml 2>/dev/null | head -1)
if [ -n "$CONFIG_FILE" ]; then
echo "=== Configuration ==="
cat "$CONFIG_FILE"
fi
Look for baseline metrics to compare against:
reports/baseline_metrics.json or experiments/baseline.jsongit log --oneline --all -- reports/*.jsonIf plots do not already exist, generate them:
python3 -c "
import json
from pathlib import Path
# Check if visualization script exists
viz_script = Path('src/evaluation/visualize.py')
if viz_script.exists():
print('Visualization script found')
else:
print('No visualization script found -- will generate basic plots')
"
Key visualizations to include:
Generate the report as a Markdown file at reports/experiment_report.md:
# Experiment Report: [Experiment Name]
**Date:** [current date]
**Branch:** [git branch]
**Commit:** [git commit hash]
**Author:** [generated by /generate-report skill]
---
## Executive Summary
[2-3 sentences: what was the experiment, what was the key result, and is it better than baseline?]
## Experiment Configuration
| Parameter | Value |
|-----------|-------|
| Model architecture | [from config] |
| Learning rate | [from config] |
| Batch size | [from config] |
| Epochs | [from config] |
| Optimizer | [from config] |
| Scheduler | [from config] |
| Random seed | [from config] |
| Dataset version | [from config or DVC] |
## Dataset Summary
| Split | Samples | Features | Classes |
|-------|---------|----------|---------|
| Train | [count] | [count] | [count or N/A] |
| Validation | [count] | [count] | [count or N/A] |
| Test | [count] | [count] | [count or N/A] |
## Results
### Final Metrics
| Metric | Value |
|--------|-------|
| [metric 1] | [value] |
| [metric 2] | [value] |
| ... | ... |
### Comparison with Baseline
| Metric | Baseline | Current | Delta | Improvement? |
|--------|----------|---------|-------|-------------|
| [metric 1] | [value] | [value] | [+/- value] | [Yes/No] |
| ... | ... | ... | ... | ... |
### Training Curves


### Confusion Matrix

## Analysis
### Key Findings
- [Finding 1: most important result]
- [Finding 2: notable pattern or observation]
- [Finding 3: any concerning behavior]
### Error Analysis
- [What types of errors does the model make?]
- [Are errors concentrated in specific classes or data subsets?]
### Comparison with Previous Experiments
- [How does this compare to previous runs?]
- [What changed and what impact did it have?]
## Recommendations
### Next Steps
1. [Actionable recommendation 1]
2. [Actionable recommendation 2]
3. [Actionable recommendation 3]
### Potential Improvements
- [Idea for model improvement]
- [Idea for data improvement]
- [Idea for training procedure improvement]
## Artifacts
| Artifact | Path |
|----------|------|
| Best checkpoint | checkpoints/best_model.pt |
| Metrics JSON | reports/metrics.json |
| Config file | configs/experiment.yaml |
| Training logs | experiments/[run-id]/ |
| Figures | reports/figures/ |
---
*Report generated automatically by the /generate-report skill.*
After writing the report:
Report the path to the generated report file when complete.