Skip to main content
Manus에서 모든 스킬 실행
원클릭으로

strands-evals

// Use when authoring evaluations with strands-agents-evals. Activates on tasks involving Case/Experiment construction, picking evaluators (Output, Trajectory, Helpfulness, Faithfulness, Coherence, Conciseness, ResponseRelevance, Harmfulness, Refusal, Stereotyping, InstructionFollowing, GoalSuccessRate, ToolSelection/ParameterAccuracy, Multimodal*), trace-based evaluation with mappers (CloudWatch, OpenSearch, OpenInference, LangChain OTel, Strands in-memory), simulators (ActorSimulator, ToolSimulator), failure detection and root-cause analysis (detect_failures, analyze_root_cause, diagnose_session), or auto test-case generation (ExperimentGenerator). Trigger phrases include "evaluate this agent", "score the trajectory", "simulate a user", "diagnose the session", "generate test cases", "LLM-as-a-Judge". Skip for general LLM eval theory unrelated to this package.

$ git log --oneline --stat
stars:131
forks:35
updated:2026년 5월 29일 19:29
SKILL.md
readonly