Run any Skill in Manus with one click

$pwd:

adaline-evaluators

Name: Adaline Evaluators
Author: adaline

// Create and manage evaluators in Adaline to score prompt outputs. Use when setting up LLM-as-a-judge, JavaScript, text-matcher, cost, latency, or response-length evaluators.

Run Skill in Manus

$ git log --oneline --stat

stars:2

forks:0

updated:May 21, 2026 at 20:01

File Explorer

2 files

SKILL.md

readonly

name	adaline-evaluators
description	Create and manage evaluators in Adaline to score prompt outputs. Use when setting up LLM-as-a-judge, JavaScript, text-matcher, cost, latency, or response-length evaluators.

Adaline Evaluators

Concepts

Evaluators define how Adaline scores prompt outputs. Each evaluator is attached to a prompt and dataset. Evaluation runs use an evaluator to produce per-row grades, scores, reasons, and aggregate metrics.

Key terms:

Evaluator — configured scoring rule attached to a prompt
Dataset — rows used by the evaluator when creating evaluation runs
Config — discriminated object by type
Status — active or archived

Configuration

Set these environment variables when credentials are available:

ADALINE_API_KEY — workspace API key from Admin > API Keys
ADALINE_PROJECT_ID — project ID
ADALINE_PROMPT_ID — prompt to attach evaluators to
ADALINE_DATASET_ID — dataset for evaluator runs

Base URL: https://api.adaline.ai/v2

Evaluator Types

LLM-as-a-Judge

{
  "type": "llm-as-a-judge",
  "value": "Pass only if the response answers the question and all factual claims are grounded in the reference answer."
}

JavaScript

{
  "type": "javascript",
  "value": "const parsed = JSON.parse(output); return parsed.answer ? 'pass' : 'fail';"
}

Text Matcher

{
  "type": "text-matcher",
  "value": {
    "operator": "contains-all",
    "value": ["Summary", "Recommendation"]
  }
}

Operators: regex, equals, starts-with, ends-with, contains-all, contains-any, not-contains-any.

Cost, Latency, Response Length

These evaluator types use a comparison rule directly as value:

{
  "type": "latency",
  "value": {
    "value": 2000,
    "unit": "ms",
    "operator": "less"
  }
}

Operators: less, greater, equals.

Quick Triage

Symptom	Fix
API rejects `group`	Remove it; current config is discriminated by `type` only
Text matcher rejected	Use `value.operator` and `value.value`, not `pattern` / `patterns`
Cost/latency rejected	Use comparison rule directly as `config.value`, not nested under `threshold`
List response lacks config	List returns lightweight summaries; call GET evaluator for full config
Evaluation needs multiple evaluators	Create one evaluation run per `evaluatorId`

Creating an Evaluator

curl -X POST "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluators" \
  -H "Authorization: Bearer $ADALINE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "projectId": "project_abc123",
    "title": "Factual accuracy",
    "datasetId": "dataset_abc123",
    "config": {
      "type": "llm-as-a-judge",
      "value": "Pass only if the answer is accurate and cites the provided context."
    }
  }'

SDK Usage

await adaline.prompts.evaluators.list({ promptId, limit: 20 });
await adaline.prompts.evaluators.create({ promptId, evaluator });
await adaline.prompts.evaluators.get({ promptId, evaluatorId });
await adaline.prompts.evaluators.update({ promptId, evaluatorId, evaluator: { title: 'Updated' } });
await adaline.prompts.evaluators.delete({ promptId, evaluatorId });

await adaline.prompts.evaluators.list(prompt_id=prompt_id, limit=20)
await adaline.prompts.evaluators.create(prompt_id=prompt_id, evaluator=evaluator)
await adaline.prompts.evaluators.get(prompt_id=prompt_id, evaluator_id=evaluator_id)
await adaline.prompts.evaluators.update(prompt_id=prompt_id, evaluator_id=evaluator_id, evaluator=patch)
await adaline.prompts.evaluators.delete(prompt_id=prompt_id, evaluator_id=evaluator_id)

Best Practices

Use LLM-as-a-judge for qualitative criteria and write explicit pass/fail rubrics.
Use JavaScript/Text Matcher for deterministic structural checks.
Use cost/latency/response-length evaluators for performance budgets.
Keep evaluator titles descriptive because list responses are lightweight.
Attach evaluators to stable datasets so runs are comparable over time.

References

See references/api.md for the full REST contract with all config shapes.

related-skills.json

same repository

adaline-datasets.md

from "adaline/skills"

Create and manage evaluation datasets in Adaline. Use when building test cases, adding dataset columns/rows, importing data, or triggering dynamic columns.

2026-05-212

adaline-deployments.md

from "adaline/skills"

Fetch deployed prompt snapshots from Adaline at runtime. Use when integrating prompt deployments, environment-based latest lookups, prompt caching, or pinned deployment IDs.

2026-05-212

adaline-evaluations.md

from "adaline/skills"

Run and manage evaluations in Adaline to test prompt quality at scale. Use when creating evaluation runs, polling status, analyzing results, or cancelling runs.

2026-05-212

adaline-integration.md

from "adaline/skills"

High-level guide for integrating your AI application with Adaline. Use when starting a new Adaline integration, choosing between API/SDK approaches, or planning which Adaline features to adopt.

2026-05-212

adaline-logs.md

from "adaline/skills"

Send traces and spans to Adaline for AI agent observability. Use when instrumenting LLM calls, tools, retrieval, embeddings, guardrails, or custom operations.

2026-05-212

adaline-prompts.md

from "adaline/skills"

Create and manage prompts in Adaline via the v2 API or SDK clients. Use when programmatically creating prompts, updating prompt drafts, listing prompts, or reading prompt/playground data.

2026-05-212

package.json

"author": "adaline"

"repository": "adaline/skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	adaline-evaluators
description	Create and manage evaluators in Adaline to score prompt outputs. Use when setting up LLM-as-a-judge, JavaScript, text-matcher, cost, latency, or response-length evaluators.

Adaline Evaluators

Concepts

Key terms:

Evaluator — configured scoring rule attached to a prompt
Dataset — rows used by the evaluator when creating evaluation runs
Config — discriminated object by type
Status — active or archived

Configuration

Set these environment variables when credentials are available:

ADALINE_API_KEY — workspace API key from Admin > API Keys
ADALINE_PROJECT_ID — project ID
ADALINE_PROMPT_ID — prompt to attach evaluators to
ADALINE_DATASET_ID — dataset for evaluator runs

Base URL: https://api.adaline.ai/v2

Evaluator Types

LLM-as-a-Judge

{
  "type": "llm-as-a-judge",
  "value": "Pass only if the response answers the question and all factual claims are grounded in the reference answer."
}

JavaScript

{
  "type": "javascript",
  "value": "const parsed = JSON.parse(output); return parsed.answer ? 'pass' : 'fail';"
}

Text Matcher

{
  "type": "text-matcher",
  "value": {
    "operator": "contains-all",
    "value": ["Summary", "Recommendation"]
  }
}

Operators: regex, equals, starts-with, ends-with, contains-all, contains-any, not-contains-any.

Cost, Latency, Response Length

These evaluator types use a comparison rule directly as value:

{
  "type": "latency",
  "value": {
    "value": 2000,
    "unit": "ms",
    "operator": "less"
  }
}

Operators: less, greater, equals.

Quick Triage

Symptom	Fix
API rejects `group`	Remove it; current config is discriminated by `type` only
Text matcher rejected	Use `value.operator` and `value.value`, not `pattern` / `patterns`
Cost/latency rejected	Use comparison rule directly as `config.value`, not nested under `threshold`
List response lacks config	List returns lightweight summaries; call GET evaluator for full config
Evaluation needs multiple evaluators	Create one evaluation run per `evaluatorId`

Creating an Evaluator

curl -X POST "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluators" \
  -H "Authorization: Bearer $ADALINE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "projectId": "project_abc123",
    "title": "Factual accuracy",
    "datasetId": "dataset_abc123",
    "config": {
      "type": "llm-as-a-judge",
      "value": "Pass only if the answer is accurate and cites the provided context."
    }
  }'

SDK Usage

await adaline.prompts.evaluators.list({ promptId, limit: 20 });
await adaline.prompts.evaluators.create({ promptId, evaluator });
await adaline.prompts.evaluators.get({ promptId, evaluatorId });
await adaline.prompts.evaluators.update({ promptId, evaluatorId, evaluator: { title: 'Updated' } });
await adaline.prompts.evaluators.delete({ promptId, evaluatorId });

await adaline.prompts.evaluators.list(prompt_id=prompt_id, limit=20)
await adaline.prompts.evaluators.create(prompt_id=prompt_id, evaluator=evaluator)
await adaline.prompts.evaluators.get(prompt_id=prompt_id, evaluator_id=evaluator_id)
await adaline.prompts.evaluators.update(prompt_id=prompt_id, evaluator_id=evaluator_id, evaluator=patch)
await adaline.prompts.evaluators.delete(prompt_id=prompt_id, evaluator_id=evaluator_id)

Best Practices

Use LLM-as-a-judge for qualitative criteria and write explicit pass/fail rubrics.
Use JavaScript/Text Matcher for deterministic structural checks.
Use cost/latency/response-length evaluators for performance budgets.
Keep evaluator titles descriptive because list responses are lightweight.
Attach evaluators to stable datasets so runs are comparable over time.

References

See references/api.md for the full REST contract with all config shapes.

adaline-evaluators

Adaline Evaluators

Concepts

Configuration

Evaluator Types

LLM-as-a-Judge

JavaScript

Text Matcher

Cost, Latency, Response Length

Quick Triage

Creating an Evaluator

SDK Usage

Best Practices

References

More from this repository

More from this repository

Adaline Evaluators

Concepts

Configuration

Evaluator Types

LLM-as-a-Judge

JavaScript

Text Matcher

Cost, Latency, Response Length

Quick Triage

Creating an Evaluator

SDK Usage

Best Practices

References