Skip to main content
Run any Skill in Manus
with one click

evaluating-code-models

Stars9,996
Forks745
UpdatedDecember 14, 2025 at 00:38

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

File Explorer
4 files
SKILL.md
readonly