Skip to main content
Run any Skill in Manus
with one click

agent-platform-eval-flywheel

Measure and improve the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset, picking or writing evaluation metrics, analyzing failures, comparing results before and after a fix, or when guidance is needed on Agent Platform eval methodology — including dataset schema, LLM-as-judge scoring, and common failure causes. For fine-tuning, use agent-platform-tuning. For deployment, use agent-platform-deploy.

Overview

Measure and improve the quality of AI models and agents on Google Cloud using the Eval Quality Flywheel methodology. Use when evaluating an agent or model, building an eval dataset, picking or writing evaluation metrics, analyzing failures, comparing results before and after a fix, or when guidance is needed on Agent Platform eval methodology — including dataset schema, LLM-as-judge scoring, and common failure causes. For fine-tuning, use agent-platform-tuning. For deployment, use agent-platform-deploy.

Install command
npx skills add https://github.com/google/skills --skill agent-platform-eval-flywheel

Copy and paste this command into Claude Code to install the skill

Stars11,105
Forks886
UpdatedJune 2, 2026 at 17:06
File Explorer
10 files
SKILL.md
readonly