Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

run-test

Étoiles0

Forks0

Mis à jour6 février 2026 à 17:26

Execute Freeplay test runs to evaluate prompt templates and agents against datasets. Always confirm with the user before running — test runs make API calls to LLM providers and may incur costs. Use when the user wants to run a test, execute tests against a dataset, kick off evaluations, or trigger a test run. Do NOT use for analyzing existing test results (use test-run-analysis) or managing datasets (use dataset-management).

Installation

Installer avec Codex ou Claude Copiez ce prompt, collez-le dans Codex, Claude ou un autre assistant, puis laissez-le vérifier la page du skill et l'installer pour vous.

Exécuter dans Manus

Source

freeplayai

freeplayai/freeplay-skills

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Téléchargement

Exécuter dans Manus

Métiers associésSOC

Basé sur la classification professionnelle SOC

Analystes en assurance qualité des logiciels et testeursProfessions informatiques et mathématiques·SOC 15-1253

SKILL.md

readonly

name

run-test

description

Run Freeplay Test

IMPORTANT: Confirmation Required

Always ask for user confirmation before executing any test run. Test runs make API calls to Freeplay and LLM providers, which may incur costs. Never execute a test without explicit user consent.

This skill helps users execute Freeplay test runs via the SDK/API to evaluate their AI features against datasets.

Note: This skill is for SDK/API-based test runs only, not UI-based testing from the Freeplay dashboard.

When to use this skill

"Run a test for my prompt"
"Execute the test run for customer support"
"Test my agent against the golden dataset"
"Run evaluation tests"
"Kick off a test run"
"I want to test my prompt changes"

Workflow

Step 1: Identify the Test Run Code Path

Search the codebase for existing test run implementation. Look for:

# Search patterns (run these to find test run implementations)
grep -r "test_runs.create" --include="*.py" .
grep -r "fp_client.test_runs" --include="*.py" .
grep -r "freeplay.*test" --include="*.py" .
grep -r "TestRun" --include="*.py" .

The key API call to look for is:

Python SDK:

test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Dataset Name",
    name="Test Run Name"
)

TypeScript SDK:

const testRun = await fpClient.testRuns.create({
    projectId: projectId,
    testlist: "Dataset Name",
    name: "Test Run Name"
});

Step 2: If No Test Run Code Exists

If the search returns no results, inform the user:

I couldn't find an existing test run script in your codebase. To run Freeplay test runs, you'll need to set up a test runner script first.

Documentation to get started:

Test Runs Overview

Component-Level Testing - For testing individual prompts

End-to-End Testing - For testing complete workflows/agents

Would you like me to help you understand what's needed to set up a test runner?

Do NOT attempt to write the test runner for them. This requires understanding their specific:

Pipeline architecture
Dataset structure
Evaluation criteria
Environment configuration

Step 3: If Test Run Code Exists

Once you've found the test run implementation:

Identify the entry point - Find the script/function that initiates test runs
Check for required environment variables:
- FREEPLAY_API_KEY
- FREEPLAY_BASE_URL (default: https://app.freeplay.ai)
- OPENAI_API_KEY (or other LLM provider keys)

NOTE Project ID can be provided by user or discovered via list_projects()

Determine how to run it:

# Common patterns
python run_tests.py
python -m tests.run_freeplay_tests
npm run test:freeplay
pytest tests/test_freeplay.py

Ask the user for any required parameters:
- Dataset/testlist name
- Test run name
- Prompt template to test
- Environment (if applicable)

Step 4: Execute with Consent

Before running, confirm with the user:

I found the test runner at [path]. This will:

Run tests against the [dataset] dataset

Test the [prompt/agent]

Make API calls to Freeplay and your LLM provider

Ready to execute?

Only proceed with explicit consent.

Step 5: Handle Results

On Success:

Report the test run completed
Provide the test run ID for reviewing results
Share any immediate metrics if available in output
Always suggest using the test-run-analysis skill to analyze results and suggest improvements to the prompt or agent based on evaluation metrics

On Failure:

Capture the error output
Identify common issues:
- Missing environment variables
- Invalid API keys
- Dataset not found
- Network/connectivity issues
- Rate limiting
Help debug with the user

Example Test Run Script Structure

For reference, a typical component-level test run script looks like:

from freeplay import Freeplay, RecordPayload
from openai import OpenAI
import os
from scripts.secrets import SecretString

# Initialize clients (use SecretString to prevent accidental logging)
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
fp_client = Freeplay(
    api_key=api_key.get(),
    api_base=os.environ.get("FREEPLAY_BASE_URL", "https://app.freeplay.ai")
)
openai_client = OpenAI()

project_id = "<project-id>"  # Provided by user or discovered via list_projects()

# Create test run
test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Golden Set",
    name="My Test Run"
)

# Get prompt template
template_prompt = fp_client.prompts.get(
    project_id=project_id,
    template_name="my-prompt",
    environment="latest"
)

# Process each test case
for test_case in test_run.test_cases:
    formatted_prompt = template_prompt.bind(test_case.variables).format()

    # Call LLM
    response = openai_client.chat.completions.create(
        model=formatted_prompt.prompt_info.model,
        messages=formatted_prompt.llm_prompt,
        **formatted_prompt.prompt_info.model_parameters
    )

    # Record results back to Freeplay
    fp_client.recordings.create(RecordPayload(
        # ... recording configuration
    ))

print(f"Test run completed: {test_run.id}")

Environment Variables

Required variables that must be set:

FREEPLAY_API_KEY - Freeplay API key
FREEPLAY_BASE_URL - Freeplay API URL (default: https://app.freeplay.ai)
LLM provider keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY)

Project ID can come from:

User specification
MCP list_projects() tool to discover available projects

Tips

Always search the codebase first before assuming no test runner exists
Check common locations: scripts/, tests/, tools/, root directory
Look for files named: run_test*.py, test_runner.py, freeplay_test*.py
Check package.json scripts or pyproject.toml for test commands
Environment variables may be in .env, .env.local, or CI configuration

Plus depuis ce dépôt

même dépôt

freeplay-onboarding

freeplayai/freeplay-skills

Complete guided onboarding to Freeplay — analyzes codebase, migrates prompts, and integrates logging. Use whenever the user says 'onboard to Freeplay', 'set up Freeplay', 'get started with Freeplay', 'integrate Freeplay', or wants end-to-end setup.

2026-02-130

freeplay-plan

freeplayai/freeplay-skills

Analyze a codebase to map its LLM implementation for Freeplay migration. Examines prompts, model configs, tool use, sessions, and agent patterns. Use when user wants to understand their LLM usage, inventory prompts, plan a migration, audit LLM patterns, or before integrating with Freeplay — even if they just say 'analyze my code' or 'what LLM stuff do I have'.

2026-02-130

prompt-migration

freeplayai/freeplay-skills

Migrate prompts from code into Freeplay for version control, A/B testing, and centralized management. Use whenever the user wants to move prompts, manage templates, set up prompt management, or mentions 'migrate prompts', 'prompt versioning', 'prompt templates', or wants to organize their prompts.

2026-02-130

record-to-freeplay

freeplayai/freeplay-skills

Integrate Freeplay logging, tracing, and observability into an LLM application. Use whenever the user mentions recording LLM calls, adding observability, capturing traces, logging to Freeplay, monitoring LLM usage, or wants a dashboard — even if they don't say 'Freeplay'. Also use after prompt migration, or for observability-only setups.

2026-02-130

health-check

freeplayai/freeplay-skills

Assess the health, completeness, and production readiness of a Freeplay project across the data flywheel. Use when the user asks about project status, wants to know if their project is ready for production, asks what's missing in their Freeplay setup, wants a project health check, or asks "what should I set up next?" Also use when the user is first connecting a project to Freeplay.

2026-02-120

test-run-analysis

freeplayai/freeplay-skills

Analyze Freeplay test run results to surface insights, evaluation metrics, and test case details. Use when the user wants to review test run performance, compare two test runs, understand evaluation scores, or identify failing test cases. Do NOT use for executing new test runs (use run-test) or managing datasets (use dataset-management).

2026-02-060

name

run-test

description

Run Freeplay Test

IMPORTANT: Confirmation Required

Always ask for user confirmation before executing any test run. Test runs make API calls to Freeplay and LLM providers, which may incur costs. Never execute a test without explicit user consent.

This skill helps users execute Freeplay test runs via the SDK/API to evaluate their AI features against datasets.

Note: This skill is for SDK/API-based test runs only, not UI-based testing from the Freeplay dashboard.

When to use this skill

"Run a test for my prompt"
"Execute the test run for customer support"
"Test my agent against the golden dataset"
"Run evaluation tests"
"Kick off a test run"
"I want to test my prompt changes"

Workflow

Step 1: Identify the Test Run Code Path

Search the codebase for existing test run implementation. Look for:

# Search patterns (run these to find test run implementations)
grep -r "test_runs.create" --include="*.py" .
grep -r "fp_client.test_runs" --include="*.py" .
grep -r "freeplay.*test" --include="*.py" .
grep -r "TestRun" --include="*.py" .

The key API call to look for is:

Python SDK:

test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Dataset Name",
    name="Test Run Name"
)

TypeScript SDK:

const testRun = await fpClient.testRuns.create({
    projectId: projectId,
    testlist: "Dataset Name",
    name: "Test Run Name"
});

Step 2: If No Test Run Code Exists

If the search returns no results, inform the user:

I couldn't find an existing test run script in your codebase. To run Freeplay test runs, you'll need to set up a test runner script first.

Documentation to get started:

Test Runs Overview

Component-Level Testing - For testing individual prompts

End-to-End Testing - For testing complete workflows/agents

Would you like me to help you understand what's needed to set up a test runner?

Do NOT attempt to write the test runner for them. This requires understanding their specific:

Pipeline architecture
Dataset structure
Evaluation criteria
Environment configuration

Step 3: If Test Run Code Exists

Once you've found the test run implementation:

Identify the entry point - Find the script/function that initiates test runs
Check for required environment variables:
- FREEPLAY_API_KEY
- FREEPLAY_BASE_URL (default: https://app.freeplay.ai)
- OPENAI_API_KEY (or other LLM provider keys)

NOTE Project ID can be provided by user or discovered via list_projects()

Determine how to run it:

# Common patterns
python run_tests.py
python -m tests.run_freeplay_tests
npm run test:freeplay
pytest tests/test_freeplay.py

Ask the user for any required parameters:
- Dataset/testlist name
- Test run name
- Prompt template to test
- Environment (if applicable)

Step 4: Execute with Consent

Before running, confirm with the user:

I found the test runner at [path]. This will:

Run tests against the [dataset] dataset

Test the [prompt/agent]

Make API calls to Freeplay and your LLM provider

Ready to execute?

Only proceed with explicit consent.

Step 5: Handle Results

On Success:

Report the test run completed
Provide the test run ID for reviewing results
Share any immediate metrics if available in output
Always suggest using the test-run-analysis skill to analyze results and suggest improvements to the prompt or agent based on evaluation metrics

On Failure:

Capture the error output
Identify common issues:
- Missing environment variables
- Invalid API keys
- Dataset not found
- Network/connectivity issues
- Rate limiting
Help debug with the user

Example Test Run Script Structure

For reference, a typical component-level test run script looks like:

from freeplay import Freeplay, RecordPayload
from openai import OpenAI
import os
from scripts.secrets import SecretString

# Initialize clients (use SecretString to prevent accidental logging)
api_key = SecretString(os.environ.get("FREEPLAY_API_KEY"))
fp_client = Freeplay(
    api_key=api_key.get(),
    api_base=os.environ.get("FREEPLAY_BASE_URL", "https://app.freeplay.ai")
)
openai_client = OpenAI()

project_id = "<project-id>"  # Provided by user or discovered via list_projects()

# Create test run
test_run = fp_client.test_runs.create(
    project_id=project_id,
    testlist="Golden Set",
    name="My Test Run"
)

# Get prompt template
template_prompt = fp_client.prompts.get(
    project_id=project_id,
    template_name="my-prompt",
    environment="latest"
)

# Process each test case
for test_case in test_run.test_cases:
    formatted_prompt = template_prompt.bind(test_case.variables).format()

    # Call LLM
    response = openai_client.chat.completions.create(
        model=formatted_prompt.prompt_info.model,
        messages=formatted_prompt.llm_prompt,
        **formatted_prompt.prompt_info.model_parameters
    )

    # Record results back to Freeplay
    fp_client.recordings.create(RecordPayload(
        # ... recording configuration
    ))

print(f"Test run completed: {test_run.id}")

Environment Variables

Required variables that must be set:

FREEPLAY_API_KEY - Freeplay API key
FREEPLAY_BASE_URL - Freeplay API URL (default: https://app.freeplay.ai)
LLM provider keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY)

Project ID can come from:

User specification
MCP list_projects() tool to discover available projects

Tips

Always search the codebase first before assuming no test runner exists
Check common locations: scripts/, tests/, tools/, root directory
Look for files named: run_test*.py, test_runner.py, freeplay_test*.py
Check package.json scripts or pyproject.toml for test commands
Environment variables may be in .env, .env.local, or CI configuration