Skip to main content
Run any Skill in Manus
with one click

evaluating-llms

Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.

Overview

Evaluate LLM systems using automated metrics, LLM-as-judge, and benchmarks. Use when testing prompt quality, validating RAG pipelines, measuring safety (hallucinations, bias), or comparing models for production deployment.

Install command
npx skills add https://github.com/ancoleman/ai-design-components --skill evaluating-llms

Copy and paste this command into Claude Code to install the skill

Stars371
Forks55
UpdatedDecember 9, 2025 at 21:02
File Explorer
18 files
SKILL.md
readonly