원클릭으로 Manus에서 모든 스킬 실행

investigate-dataset

스타551

포크358

업데이트2026년 3월 17일 00:53

Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.

설치

Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.

Manus에서 실행

출처

UKGovernmentBEIS

UKGovernmentBEIS/inspect_evals

GitHub 저장소 열기 Creator 저장소 보기

다운로드

Manus에서 실행

Investigate Dataset

This workflow helps you explore and understand datasets used in evaluations. It covers HuggingFace datasets, CSV files, and JSON/JSONL files.

Key Concepts

For detailed information on Inspect's dataset types (datasets.Dataset vs inspect_ai.dataset.Dataset), the hf_dataset() pipeline, caching behaviour, and test utilities, see references/inspect-dataset-patterns.md.

Common Patterns in Evals

Evals typically define:

DATASET_PATH: HuggingFace repo path (e.g., "qiaojin/PubMedQA")
DATASET_REVISION: Optional git revision/tag for reproducibility
record_to_sample(): Function converting raw records to Sample objects

Prerequisites

Access to the evaluation code to find dataset configuration
Python environment with datasets, pandas, and inspect_ai installed

Steps

1. Identify the Dataset Source

Look for these patterns in the evaluation code:

# HuggingFace dataset
DATASET_PATH = "org/dataset-name"
DATASET_REVISION = "v1.0"  # optional
hf_dataset(path=DATASET_PATH, name="subset", split="train", ...)

# CSV dataset
csv_dataset("path/to/file.csv", ...)
load_csv_dataset("https://example.com/file.csv", eval_name="myeval", ...)

# JSON/JSONL dataset
json_dataset("path/to/file.json", ...)
load_json_dataset("https://example.com/file.jsonl", eval_name="myeval", ...)

2. Load the Raw Dataset

For investigation, load the raw data directly (not through Inspect's sample_fields transformation). Use standard datasets.load_dataset() for HuggingFace, pd.read_csv() for CSV, or pd.read_json() for JSON/JSONL. For gated datasets, ensure HF_TOKEN is set or run huggingface-cli login.

3. Explore Structure and Quality

Use standard pandas/datasets methods to explore:

Schema: ds.features (HF) or df.dtypes (pandas)
Shape: len(ds), ds.column_names (HF) or df.info(), df.columns (pandas)
Sample data: ds[:3] (HF) or df.head() (pandas)
Missing values: Check for None, empty strings, empty lists
Duplicates: Check ID uniqueness if an ID field exists
Value distributions: value_counts() for categorical columns, length stats for text fields

For converting an Inspect Dataset (which has no .to_pandas()) to a DataFrame, see references/inspect-dataset-patterns.md.

4. Understand the Sample Conversion

Look at the record_to_sample function to understand how raw data maps to Inspect samples. Key questions:

Which fields become input? Are they combined/formatted?
What is the target format? (letter, text, JSON, etc.)
Are there choices for multiple choice?
What goes into metadata?
Are any records filtered out?

5. Test the Inspect Loading Pipeline

See references/inspect-dataset-patterns.md for the pattern to load through Inspect's hf_dataset() and verify sample conversion works correctly.

Quick Reference Commands

# View HF dataset info without downloading
uv run python -c "from datasets import load_dataset_builder; b = load_dataset_builder('org/name'); print(b.info)"

# List available configs/subsets
uv run python -c "from datasets import get_dataset_config_names; print(get_dataset_config_names('org/name'))"

# List available splits
uv run python -c "from datasets import load_dataset; print(load_dataset('org/name', split=None).keys())"

Caching and Troubleshooting

For cache locations (HuggingFace native, Inspect AI, Inspect Evals), force re-download commands, and test utilities, see references/inspect-dataset-patterns.md.

Gated dataset: Run huggingface-cli login or set HF_TOKEN
Rate limited: The hf_dataset wrapper in inspect_evals.utils.huggingface has built-in retry with backoff
Large dataset: Use streaming=True or split="train[:1000]" for sampling
Missing revision: Check the dataset's "Files and versions" tab on HuggingFace

이 저장소의 다른 Skills

같은 저장소

ci-maintenance-workflow

UKGovernmentBEIS/inspect_evals

CI and GitHub Actions maintenance workflows — fix a failing test from a CI URL, fix a failing smoke test, add @pytest.mark.slow markers to slow tests, or review a PR against agent-checkable standards. Use when user asks to fix a failing test, fix a smoke test, mark slow tests, or review a PR. Trigger when the user asks you to run the "Write a PR For A Failing Test", "Fix A Failing Smoke Test", "Mark Slow Tests", or "Review PR According to Agent-Checkable Standards" workflow.

2026-06-19551

prepare-submission-workflow

UKGovernmentBEIS/inspect_evals

Prepare an evaluation for PR submission as an entry to the register. Use when user asks to prepare an eval for submission or finalize a PR. Trigger when the user asks you to run the "Prepare Evaluation For Submission" workflow.

2026-06-11551

eval-validity-review

UKGovernmentBEIS/inspect_evals

Review a single evaluation's validity — whether its claims hold up, whether its name is accurate, whether samples can be both succeeded and failed at, and whether scoring measures ground truth. Use when user asks to check validity of an eval, or as part of the Master Checklist workflow. Do NOT use for code quality or test coverage (use eval-quality-workflow or ensure-test-coverage instead).

2026-06-07551

code-quality-fix-all

UKGovernmentBEIS/inspect_evals

Fix code quality issues identified in a code quality review stored in agent_artefacts/code_quality/<topic>/. Systematically addresses issues found by the code-quality-review-all skill for ANY code quality topic, with validation and testing at each step. Use when user asks to fix issues from a code quality review, or asks to fix issues from agent_artefacts/code_quality/<topic>.

2026-06-04551

eval-report-workflow

UKGovernmentBEIS/inspect_evals

Create an evaluation report for a README by selecting models, estimating costs, running evaluations, and formatting results tables. Use when user asks to make/create/generate an evaluation report. Trigger when the user asks you to run the "Make An Evaluation Report" workflow.

2026-05-24551

create-eval

UKGovernmentBEIS/inspect_evals

Redirect to the inspect-evals-template for creating new evaluations. New evals are no longer created in this repository — they live in standalone repos. Use when user asks to create/implement/build a new evaluation.

2026-05-04551

name	investigate-dataset
description	Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.

Investigate Dataset

This workflow helps you explore and understand datasets used in evaluations. It covers HuggingFace datasets, CSV files, and JSON/JSONL files.

Key Concepts

Common Patterns in Evals

Evals typically define:

DATASET_PATH: HuggingFace repo path (e.g., "qiaojin/PubMedQA")
DATASET_REVISION: Optional git revision/tag for reproducibility
record_to_sample(): Function converting raw records to Sample objects

Prerequisites

Access to the evaluation code to find dataset configuration
Python environment with datasets, pandas, and inspect_ai installed

Steps

1. Identify the Dataset Source

Look for these patterns in the evaluation code:

# HuggingFace dataset
DATASET_PATH = "org/dataset-name"
DATASET_REVISION = "v1.0"  # optional
hf_dataset(path=DATASET_PATH, name="subset", split="train", ...)

# CSV dataset
csv_dataset("path/to/file.csv", ...)
load_csv_dataset("https://example.com/file.csv", eval_name="myeval", ...)

# JSON/JSONL dataset
json_dataset("path/to/file.json", ...)
load_json_dataset("https://example.com/file.jsonl", eval_name="myeval", ...)

2. Load the Raw Dataset

3. Explore Structure and Quality

Use standard pandas/datasets methods to explore:

Schema: ds.features (HF) or df.dtypes (pandas)
Shape: len(ds), ds.column_names (HF) or df.info(), df.columns (pandas)
Sample data: ds[:3] (HF) or df.head() (pandas)
Missing values: Check for None, empty strings, empty lists
Duplicates: Check ID uniqueness if an ID field exists
Value distributions: value_counts() for categorical columns, length stats for text fields

For converting an Inspect Dataset (which has no .to_pandas()) to a DataFrame, see references/inspect-dataset-patterns.md.

4. Understand the Sample Conversion

Look at the record_to_sample function to understand how raw data maps to Inspect samples. Key questions:

Which fields become input? Are they combined/formatted?
What is the target format? (letter, text, JSON, etc.)
Are there choices for multiple choice?
What goes into metadata?
Are any records filtered out?

5. Test the Inspect Loading Pipeline

See references/inspect-dataset-patterns.md for the pattern to load through Inspect's hf_dataset() and verify sample conversion works correctly.

Quick Reference Commands

# View HF dataset info without downloading
uv run python -c "from datasets import load_dataset_builder; b = load_dataset_builder('org/name'); print(b.info)"

# List available configs/subsets
uv run python -c "from datasets import get_dataset_config_names; print(get_dataset_config_names('org/name'))"

# List available splits
uv run python -c "from datasets import load_dataset; print(load_dataset('org/name', split=None).keys())"

Caching and Troubleshooting

For cache locations (HuggingFace native, Inspect AI, Inspect Evals), force re-download commands, and test utilities, see references/inspect-dataset-patterns.md.

Gated dataset: Run huggingface-cli login or set HF_TOKEN
Rate limited: The hf_dataset wrapper in inspect_evals.utils.huggingface has built-in retry with backoff
Large dataset: Use streaming=True or split="train[:1000]" for sampling
Missing revision: Check the dataset's "Files and versions" tab on HuggingFace