Use this skill whenever asked to run Inspect AI evaluations, analyze .eval or .json log files, read trajectory data, or work with the Inspect AI Python API. Triggers include mentions of "inspect eval", ".eval files", "eval logs", "trajectory", "EvalLog", "samples", "scores", or references to the inspect_ai Python package. Also use when asked to analyze agent behavior from evaluation runs, extract scores/metrics, retry failed evals, or compare evaluation results programmatically.
Use this skill when asked to fine-tune models on Together AI, create LoRA or DPO training jobs, format training data for Together, upload files to Together, use the Together inference API, set up serverless or dedicated endpoints, run batch inference, or use the Together CLI. Triggers include mentions of "Together AI", "together fine-tune", "together API", "LoRA training on Together", "DPO training", "serverless LoRA", "together batch", or references to the together Python package.
Analyze Weights & Biases runs programmatically. Use when asked to "analyze loss curves", "compare W&B runs", "find best checkpoint", "plot training metrics", "query wandb", or "download wandb artifacts". Covers Python API for querying runs, analyzing loss patterns, comparing hyperparameters, and working with artifacts.