تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

umbrela-verify

Name: Umbrela Verify
Author: castorini

// Use when validating umbrela judge outputs — checks label range (0–3), qid/docid completeness, result_status consistency, backend metadata, and JSONL integrity. Wraps `umbrela validate` plus custom assertions. Use after running judge or evaluate to verify output correctness.

تشغيل في Manus

$ git log --oneline --stat

stars:٥٦

forks:٨

updated:١٧ مارس ٢٠٢٦ في ٢٢:١٥

مستكشف الملفات

2 ملفات

SKILL.md

readonly

name	umbrela-verify
description	Use when validating umbrela judge outputs — checks label range (0–3), qid/docid completeness, result_status consistency, backend metadata, and JSONL integrity. Wraps `umbrela validate` plus custom assertions. Use after running judge or evaluate to verify output correctness.

Umbrela Verify

Validates umbrela judgment outputs for label correctness, completeness, and consistency.

When to Use

After umbrela judge — verify judgment output integrity
After umbrela evaluate — verify modified qrel completeness
Before using judgments for downstream analysis or comparison
When comparing judgments across backends or models

What It Checks

JSONL Integrity

Every line is valid JSON
No trailing commas, no truncated records

Judge Output

Every record has model, query, passage, judgment, and result_status
judgment is integer 0–3
result_status is 0 or 1 (1 = successfully parsed, 0 = fallback)
No duplicate query-passage pairs
All records use the same model (consistency check)

Parse Success Rate

Reports the fraction of records with result_status == 1
Warns if parse failure rate exceeds 10%

Modified Qrel (evaluate output)

File exists in modified_qrels/ directory
Standard TREC qrel format (qid Q0 docid label)
Labels are integers in expected range

Usage

Run the verification script:

bash .claude/skills/umbrela-verify/scripts/verify.sh <artifact-path> [judge|qrel]

Or use the built-in validator first:

umbrela validate judge --input-file pairs.jsonl
umbrela validate evaluate --qrel dl19-passage --result-file run.trec

Verification Script

See scripts/verify.sh for the runnable verification wrapper.

Gotchas

umbrela validate checks input contracts. The verify script checks output artifacts.
result_status == 0 means the LLM response couldn't be parsed into a 0–3 label — it falls back to 0. A high rate of result_status == 0 suggests prompt issues or model incompatibility.
The prediction field contains raw LLM text. The label is extracted by 39 regex patterns in common_utils.py. Check this if judgment distribution looks anomalous.
Ensemble evaluate produces one qrel file — constituent backend outputs are not saved individually.
Modified qrel naming: {qrel}_{model}_{judge_cat}{few_shot}_{num_sample}.txt — verify the filename matches expectations.

related-skills.json

نفس المستودع

umbrela-eval.md

from "castorini/umbrela"

Use when analyzing umbrela evaluation results — comparing nDCG@10 scores across backends, interpreting confusion matrices, computing kappa agreement, and comparing modified qrels against human judgments. Use after running evaluate to interpret results.

2026-03-2556

umbrela-install.md

from "castorini/umbrela"

Set up an umbrela development environment — checks Python 3.11+, installs via uv or pip with cloud extras, and verifies with doctor. Use when someone is onboarding, setting up a fresh clone, or troubleshooting their environment.

2026-03-1856

umbrela-quickstart.md

from "castorini/umbrela"

Use when working with umbrela CLI commands (judge, evaluate), backend selection (gpt, gemini, hf, os, ensemble), qrel handling (dl19-passage, dl20-passage, etc.), relevance labels (0–3), or introspection (doctor, describe, schema, validate). Covers all entry points, flags, and evaluation workflows.

2026-03-1756

package.json

"author": "castorini"

"repository": "castorini/umbrela"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

name	umbrela-verify
description	Use when validating umbrela judge outputs — checks label range (0–3), qid/docid completeness, result_status consistency, backend metadata, and JSONL integrity. Wraps `umbrela validate` plus custom assertions. Use after running judge or evaluate to verify output correctness.

Umbrela Verify

Validates umbrela judgment outputs for label correctness, completeness, and consistency.

When to Use

After umbrela judge — verify judgment output integrity
After umbrela evaluate — verify modified qrel completeness
Before using judgments for downstream analysis or comparison
When comparing judgments across backends or models

What It Checks

JSONL Integrity

Every line is valid JSON
No trailing commas, no truncated records

Judge Output

Every record has model, query, passage, judgment, and result_status
judgment is integer 0–3
result_status is 0 or 1 (1 = successfully parsed, 0 = fallback)
No duplicate query-passage pairs
All records use the same model (consistency check)

Parse Success Rate

Reports the fraction of records with result_status == 1
Warns if parse failure rate exceeds 10%

Modified Qrel (evaluate output)

File exists in modified_qrels/ directory
Standard TREC qrel format (qid Q0 docid label)
Labels are integers in expected range

Usage

Run the verification script:

bash .claude/skills/umbrela-verify/scripts/verify.sh <artifact-path> [judge|qrel]

Or use the built-in validator first:

umbrela validate judge --input-file pairs.jsonl
umbrela validate evaluate --qrel dl19-passage --result-file run.trec

Verification Script

See scripts/verify.sh for the runnable verification wrapper.

Gotchas

umbrela validate checks input contracts. The verify script checks output artifacts.
result_status == 0 means the LLM response couldn't be parsed into a 0–3 label — it falls back to 0. A high rate of result_status == 0 suggests prompt issues or model incompatibility.
The prediction field contains raw LLM text. The label is extracted by 39 regex patterns in common_utils.py. Check this if judgment distribution looks anomalous.
Ensemble evaluate produces one qrel file — constituent backend outputs are not saved individually.
Modified qrel naming: {qrel}_{model}_{judge_cat}{few_shot}_{num_sample}.txt — verify the filename matches expectations.

umbrela-verify

Umbrela Verify

When to Use

What It Checks

JSONL Integrity

Judge Output

Parse Success Rate

Modified Qrel (evaluate output)

Usage

Verification Script

Gotchas

المزيد من هذا المستودع

المزيد من هذا المستودع

Umbrela Verify

When to Use

What It Checks

JSONL Integrity

Judge Output

Parse Success Rate

Modified Qrel (evaluate output)

Usage

Verification Script

Gotchas