一键在 Manus 中运行任何 Skill

dataset-management

星标0

分支0

更新时间2026年2月6日 17:26

Create, manage, and curate Freeplay datasets (prompt datasets and agent datasets). Always confirm with the user before any write operations. Use when the user wants to create a new dataset, add test cases, update datasets, manage dataset content, import test data from CSV or JSONL, create golden sets, or build evaluation datasets. Do NOT use for running tests (use run-test) or analyzing test results (use test-run-analysis).

安装

用 Codex 或 Claude 帮你安装复制这段 Prompt，粘贴到 Codex、Claude 或其他助手里，让它检查 Skill 页面并帮你完成安装。

在 Manus 中运行

来源

freeplayai

freeplayai/freeplay-skills

打开 GitHub 仓库查看创作者相关仓库

下载

在 Manus 中运行

Freeplay Dataset Management

IMPORTANT: Safety Guidelines

Confirmation Required: Always ask for user confirmation before performing any write operations (creating datasets, adding test cases, updating datasets, updating test cases).

No Deletion Operations: Do NOT perform any deletion operations (deleting datasets or test cases). If the user requests deletion, inform them that deletion must be done manually through the Freeplay UI or API directly. This skill does not support deletion actions to prevent accidental data loss.

Manage test datasets for evaluating prompts and agents in Freeplay.

Critical: Two Dataset Types

Freeplay has two distinct dataset types with different API endpoints:

Prompt Datasets - Test individual prompt templates → See prompt-datasets.md
Agent Datasets - Test complete agent workflows → See agent-datasets.md

ALWAYS confirm which type the user needs. The APIs are not interchangeable.

Quick decision guide:

User mentions "prompt", "template", or "component" → Prompt Dataset
User mentions "agent", "workflow", or "end-to-end" → Agent Dataset

Quick Start

Setup (required for all operations)

import requests
import os
from scripts.secrets import SecretString

api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
headers = {
    "Authorization": f"Bearer {api_key.get()}",
    "Content-Type": "application/json"
}
project_id = "<project-id>"  # See note below on how to resolve this
base = f"{os.environ.get('FREEPLAY_BASE_URL', 'https://app.freeplay.ai')}/api/v2/projects/{project_id}"

All examples assume this setup. Required environment variables:

FREEPLAY_API_KEY - Your Freeplay API key
FREEPLAY_BASE_URL - API URL (default: https://app.freeplay.ai)

Project ID can come from:

User specification
MCP list_projects() tool to discover available projects

Create a Prompt Dataset

response = requests.post(
    f"{base}/prompt-datasets",
    headers=headers,
    json={
        "name": "my-golden-set",
        "description": "Golden test cases",
        "input_names": ["question", "context"]
    }
)
dataset_id = response.json()["id"]  # Status: 201 Created

Create an Agent Dataset

response = requests.post(
    f"{base}/agent-datasets",
    headers=headers,
    json={
        "name": "my-agent-tests",
        "description": "Agent workflow test cases"
    }
)
dataset_id = response.json()["id"]  # Status: 201 Created

Add Test Cases (both types)

# For prompt datasets: /prompt-datasets/{id}/test-cases/bulk
# For agent datasets: /agent-datasets/{id}/test-cases/bulk

test_cases = [
    {
        "inputs": {"question": "What is your refund policy?"},
        "output": "Expected response...",
        "metadata": {"category": "refunds"}
    }
]

response = requests.post(
    f"{base}/prompt-datasets/{dataset_id}/test-cases/bulk",
    headers=headers,
    json={"data": test_cases}  # Key is "data", not "test_cases"
)
# Status: 201 Created, max 100 items per request

Common Operations

Import from CSV/JSONL

Use the utility script for bulk imports:

python scripts/import_testcases.py \
  --file test_cases.csv \
  --dataset-id ds_abc123 \
  --type prompt  # or "agent"

Handles batching automatically (100 per request). See script help for full options:

python scripts/import_testcases.py --help

CSV format:

Columns starting with inputs. become input fields
output column becomes expected output
Other columns become metadata

Example:

inputs.question,inputs.context,output,category,priority
"What is...","User context","Expected...","refunds","high"

List Datasets

# Prompt datasets
response = requests.get(f"{base}/prompt-datasets", headers=headers)
datasets = response.json().get('data', [])

# Agent datasets
response = requests.get(f"{base}/agent-datasets", headers=headers)
datasets = response.json().get('data', [])

Get Test Cases

# Replace prompt-datasets with agent-datasets for agent datasets
response = requests.get(
    f"{base}/prompt-datasets/{dataset_id}/test-cases",
    headers=headers
)
test_cases = response.json().get('data', [])

Update Test Case

response = requests.patch(
    f"{base}/prompt-datasets/{dataset_id}/test-cases/{test_case_id}",
    headers=headers,
    json={
        "inputs": {"question": "Updated question"},
        "output": "Updated output",
        "metadata": {"updated": True}
    }
)
# Status: 200 OK

Workflow Patterns

For complex operations, use the checklist pattern:

Dataset Creation with Verification:
- [ ] Step 1: Create dataset (verify 201 status)
- [ ] Step 2: Add test cases (verify 201 status)
- [ ] Step 3: Retrieve test cases to confirm count
- [ ] Step 4: Verify test case structure

See examples.md for complete workflows with verification steps.

Reference Documentation

Complete API references:

prompt-datasets.md - All prompt dataset operations
agent-datasets.md - All agent dataset operations
examples.md - Complete workflow examples with verification

Quick links by task:

Creating datasets: Prompt | Agent
Adding test cases: Prompt | Agent
Test case structure: Prompt | Agent
Updating operations: Prompt | Agent

Utility Scripts

scripts/import_testcases.py - Import test cases from CSV or JSONL files

Handles batching automatically (100 per request)
Supports both prompt and agent datasets
Provides progress reporting and error handling

scripts/batch_operations.py - Reusable batch operation functions

Use in custom scripts for programmatic batch operations
Functions: batch_create_test_cases()

Common Errors

Error	Cause	Solution
404 Path Not Found	Wrong endpoint path or dataset type	Verify `/api/v2/` base and correct type (`prompt-datasets` or `agent-datasets`)
Bulk limit exceeded	>100 items in single request	Use `scripts/import_testcases.py` or batch manually
Unauthorized	Invalid or missing API key	Check `FREEPLAY_API_KEY` environment variable
Incompatible test case	Input keys don't match prompt variables	Verify `inputs` keys match prompt template variables

Best Practices

Verify operations: Always check status codes and counts after operations
Use descriptive names: Dataset names should indicate purpose (e.g., "refunds-edge-cases")
Include metadata: Tag test cases for filtering (category, priority, source)
Provide expected outputs: Required for meaningful evaluations
Batch large operations: Use scripts for >10 items
Organize by scenario: Create focused datasets rather than one large dataset
Match input names: For prompt datasets, ensure inputs match template variables

Related Skills

After creating datasets:

Run tests using the run-test skill
Analyze results using the test-run-analysis skill
Check deployment status using the get_deployed_prompt_versions MCP tool

dataset-management

同仓库更多 Skills

同仓库更多 Skills

Freeplay Dataset Management

IMPORTANT: Safety Guidelines

Critical: Two Dataset Types

Quick Start

Setup (required for all operations)

Create a Prompt Dataset

Create an Agent Dataset

Add Test Cases (both types)

Common Operations

Import from CSV/JSONL

List Datasets

Get Test Cases

Update Test Case

Workflow Patterns

Reference Documentation

Utility Scripts

Common Errors

Best Practices

Related Skills

Freeplay Documentation

Freeplay Dataset Management

IMPORTANT: Safety Guidelines

Critical: Two Dataset Types

Quick Start

Setup (required for all operations)

Create a Prompt Dataset

Create an Agent Dataset

Add Test Cases (both types)

Common Operations

Import from CSV/JSONL

List Datasets

Get Test Cases

Update Test Case

Workflow Patterns

Reference Documentation

Utility Scripts

Common Errors

Best Practices

Related Skills

Freeplay Documentation