Skip to main content
Exécutez n'importe quel Skill dans Manus
en un clic

ai-evaluation

Use when evaluating AI/LLM systems — benchmark design, automated evaluation pipelines, human evaluation protocols, A/B testing, hallucination detection, factuality checking, bias testing, safety evaluation (red teaming), latency/cost metrics, eval datasets, regression testing for prompts, and model comparison frameworks.

Étoiles2
Forks0
Mis à jour8 mars 2026 à 08:12
SKILL.md
readonly