원클릭으로 Manus에서 모든 스킬 실행

$pwd:

evaluating-llms-harness

Name: Evaluating Llms Harness
Author: OpenRaiser

// Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

Manus에서 실행

$ git log --oneline --stat

stars:1,403

forks:100

updated:2026년 5월 7일 02:44

파일 탐색기

5 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

peft-fine-tuning.md

from "OpenRaiser/NanoResearch"

Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models (7B-70B) with limited GPU memory, when you need to train <1% of parameters with minimal accuracy loss, or for multi-adapter serving. HuggingFace's official library integrated with transformers ecosystem.

2026-05-071.4k

unsloth.md

from "OpenRaiser/NanoResearch"

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

2026-05-071.4k

nanoresearch-experiment.md

from "OpenRaiser/NanoResearch"

Generate a Python code skeleton from an experiment blueprint

2026-03-051.4k

package.json

"author": "OpenRaiser"

"repository": "OpenRaiser/NanoResearch"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

데이터 과학자컴퓨터 및 수학직15-2051L4

name	evaluating-llms-harness
description	Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
version	1.0.0
author	Orchestra Research
license	MIT
tags	["Evaluation","LM Evaluation Harness","Benchmarking","MMLU","HumanEval","GSM8K","EleutherAI","Model Quality","Academic Benchmarks","Industry Standard"]
dependencies	["lm-eval","transformers","vllm"]

name	evaluating-llms-harness
description	Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
version	1.0.0
author	Orchestra Research
license	MIT
tags	["Evaluation","LM Evaluation Harness","Benchmarking","MMLU","HumanEval","GSM8K","EleutherAI","Model Quality","Academic Benchmarks","Industry Standard"]
dependencies	["lm-eval","transformers","vllm"]

evaluating-llms-harness

lm-evaluation-harness - LLM Benchmarking

Quick start

Common workflows

Workflow 1: Standard benchmark evaluation

Workflow 2: Track training progress

Workflow 3: Compare multiple models

Workflow 4: Evaluate with vLLM (faster inference)

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources

lm-evaluation-harness - LLM Benchmarking

Quick start

Common workflows

Workflow 1: Standard benchmark evaluation

Workflow 2: Track training progress

Workflow 3: Compare multiple models

Workflow 4: Evaluate with vLLM (faster inference)

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources

evaluating-llms-harness

이 저장소의 다른 Skills

이 저장소의 다른 Skills

lm-evaluation-harness - LLM Benchmarking

Quick start

Common workflows

Workflow 1: Standard benchmark evaluation

Workflow 2: Track training progress

Workflow 3: Compare multiple models

Workflow 4: Evaluate with vLLM (faster inference)

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources

lm-evaluation-harness - LLM Benchmarking

Quick start

Common workflows

Workflow 1: Standard benchmark evaluation

Workflow 2: Track training progress

Workflow 3: Compare multiple models

Workflow 4: Evaluate with vLLM (faster inference)

When to use vs alternatives

Common issues

Advanced topics

Hardware requirements

Resources