skill
职业分类
描述
更新
evalyn-calibrate
软件质量保证分析师与测试员
Use when LLM judges need calibration, evaluation metrics seem misaligned with expectations, or annotation and judge tuning is needed
2026-04-07
evalyn-eval
软件质量保证分析师与测试员
Use when building evaluation datasets, selecting metrics, or running evaluations on an LLM agent project with evalyn
2026-04-07
evalyn
软件质量保证分析师与测试员
Use to evaluate an LLM agent with evalyn. Orchestrates the full pipeline: install, instrument, trace, build dataset, suggest metrics, run eval, analyze, calibrate.
2026-04-07
evalyn-setup
软件质量保证分析师与测试员
Use when setting up evalyn evaluation for an LLM agent project, instrumenting agent code, or adding the evalyn decorator
2026-03-22
evalyn-analyze
软件质量保证分析师与测试员
Use when analyzing evalyn evaluation results, investigating failures, comparing runs, or understanding agent performance
2026-03-08