| name | dataset-management |
| description | Create, manage, and curate Freeplay datasets (prompt datasets and agent datasets). Always confirm with the user before any write operations. Use when the user wants to create a new dataset, add test cases, update datasets, manage dataset content, import test data from CSV or JSONL, create golden sets, or build evaluation datasets. Do NOT use for running tests (use run-test) or analyzing test results (use test-run-analysis). |
Freeplay Dataset Management
IMPORTANT: Safety Guidelines
Confirmation Required: Always ask for user confirmation before performing any write operations (creating datasets, adding test cases, updating datasets, updating test cases).
No Deletion Operations: Do NOT perform any deletion operations (deleting datasets or test cases). If the user requests deletion, inform them that deletion must be done manually through the Freeplay UI or API directly. This skill does not support deletion actions to prevent accidental data loss.
Manage test datasets for evaluating prompts and agents in Freeplay.
Critical: Two Dataset Types
Freeplay has two distinct dataset types with different API endpoints:
- Prompt Datasets - Test individual prompt templates → See prompt-datasets.md
- Agent Datasets - Test complete agent workflows → See agent-datasets.md
ALWAYS confirm which type the user needs. The APIs are not interchangeable.
Quick decision guide:
- User mentions "prompt", "template", or "component" → Prompt Dataset
- User mentions "agent", "workflow", or "end-to-end" → Agent Dataset
Quick Start
Setup (required for all operations)
import requests
import os
from scripts.secrets import SecretString
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
headers = {
"Authorization": f"Bearer {api_key.get()}",
"Content-Type": "application/json"
}
project_id = "<project-id>"
base = f"{os.environ.get('FREEPLAY_BASE_URL', 'https://app.freeplay.ai')}/api/v2/projects/{project_id}"
All examples assume this setup. Required environment variables:
Project ID can come from:
- User specification
- MCP
list_projects() tool to discover available projects
Create a Prompt Dataset
response = requests.post(
f"{base}/prompt-datasets",
headers=headers,
json={
"name": "my-golden-set",
"description": "Golden test cases",
"input_names": ["question", "context"]
}
)
dataset_id = response.json()["id"]
Create an Agent Dataset
response = requests.post(
f"{base}/agent-datasets",
headers=headers,
json={
"name": "my-agent-tests",
"description": "Agent workflow test cases"
}
)
dataset_id = response.json()["id"]
Add Test Cases (both types)
test_cases = [
{
"inputs": {"question": "What is your refund policy?"},
"output": "Expected response...",
"metadata": {"category": "refunds"}
}
]
response = requests.post(
f"{base}/prompt-datasets/{dataset_id}/test-cases/bulk",
headers=headers,
json={"data": test_cases}
)
Common Operations
Import from CSV/JSONL
Use the utility script for bulk imports:
python scripts/import_testcases.py \
--file test_cases.csv \
--dataset-id ds_abc123 \
--type prompt
Handles batching automatically (100 per request). See script help for full options:
python scripts/import_testcases.py --help
CSV format:
- Columns starting with
inputs. become input fields
output column becomes expected output
- Other columns become metadata
Example:
inputs.question,inputs.context,output,category,priority
"What is...","User context","Expected...","refunds","high"
List Datasets
response = requests.get(f"{base}/prompt-datasets", headers=headers)
datasets = response.json().get('data', [])
response = requests.get(f"{base}/agent-datasets", headers=headers)
datasets = response.json().get('data', [])
Get Test Cases
response = requests.get(
f"{base}/prompt-datasets/{dataset_id}/test-cases",
headers=headers
)
test_cases = response.json().get('data', [])
Update Test Case
response = requests.patch(
f"{base}/prompt-datasets/{dataset_id}/test-cases/{test_case_id}",
headers=headers,
json={
"inputs": {"question": "Updated question"},
"output": "Updated output",
"metadata": {"updated": True}
}
)
Workflow Patterns
For complex operations, use the checklist pattern:
Dataset Creation with Verification:
- [ ] Step 1: Create dataset (verify 201 status)
- [ ] Step 2: Add test cases (verify 201 status)
- [ ] Step 3: Retrieve test cases to confirm count
- [ ] Step 4: Verify test case structure
See examples.md for complete workflows with verification steps.
Reference Documentation
Complete API references:
Quick links by task:
Utility Scripts
scripts/import_testcases.py - Import test cases from CSV or JSONL files
- Handles batching automatically (100 per request)
- Supports both prompt and agent datasets
- Provides progress reporting and error handling
scripts/batch_operations.py - Reusable batch operation functions
- Use in custom scripts for programmatic batch operations
- Functions:
batch_create_test_cases()
Common Errors
| Error | Cause | Solution |
|---|
| 404 Path Not Found | Wrong endpoint path or dataset type | Verify /api/v2/ base and correct type (prompt-datasets or agent-datasets) |
| Bulk limit exceeded | >100 items in single request | Use scripts/import_testcases.py or batch manually |
| Unauthorized | Invalid or missing API key | Check FREEPLAY_API_KEY environment variable |
| Incompatible test case | Input keys don't match prompt variables | Verify inputs keys match prompt template variables |
Best Practices
- Verify operations: Always check status codes and counts after operations
- Use descriptive names: Dataset names should indicate purpose (e.g., "refunds-edge-cases")
- Include metadata: Tag test cases for filtering (category, priority, source)
- Provide expected outputs: Required for meaningful evaluations
- Batch large operations: Use scripts for >10 items
- Organize by scenario: Create focused datasets rather than one large dataset
- Match input names: For prompt datasets, ensure inputs match template variables
Related Skills
After creating datasets:
- Run tests using the
run-test skill
- Analyze results using the
test-run-analysis skill
- Check deployment status using the
get_deployed_prompt_versions MCP tool
Freeplay Documentation