with one click
adaline-evaluations
// Run and manage evaluations in Adaline to test prompt quality at scale. Use when creating evaluation runs, polling status, analyzing results, or cancelling runs.
// Run and manage evaluations in Adaline to test prompt quality at scale. Use when creating evaluation runs, polling status, analyzing results, or cancelling runs.
Create and manage evaluation datasets in Adaline. Use when building test cases, adding dataset columns/rows, importing data, or triggering dynamic columns.
Fetch deployed prompt snapshots from Adaline at runtime. Use when integrating prompt deployments, environment-based latest lookups, prompt caching, or pinned deployment IDs.
Create and manage evaluators in Adaline to score prompt outputs. Use when setting up LLM-as-a-judge, JavaScript, text-matcher, cost, latency, or response-length evaluators.
High-level guide for integrating your AI application with Adaline. Use when starting a new Adaline integration, choosing between API/SDK approaches, or planning which Adaline features to adopt.
Send traces and spans to Adaline for AI agent observability. Use when instrumenting LLM calls, tools, retrieval, embeddings, guardrails, or custom operations.
Create and manage prompts in Adaline via the v2 API or SDK clients. Use when programmatically creating prompts, updating prompt drafts, listing prompts, or reading prompt/playground data.
| name | adaline-evaluations |
| description | Run and manage evaluations in Adaline to test prompt quality at scale. Use when creating evaluation runs, polling status, analyzing results, or cancelling runs. |
Evaluations run a prompt against a dataset and score each row with one evaluator. They are asynchronous: create a run, poll its status, then read paginated results.
Key terms:
runIdevaluatorIdpass, fail, or unknownqueued -> running -> completed
-> failed
-> cancelling -> cancelled
Set these environment variables when credentials are available:
ADALINE_API_KEY — workspace API key from Admin > API KeysADALINE_PROMPT_ID — prompt to evaluateADALINE_EVALUATOR_ID — evaluator to runADALINE_DATASET_ID — optional dataset overrideBase URL: https://api.adaline.ai/v2
| Symptom | First Fix |
|---|---|
| Create body rejected | Use singular evaluatorId, not the old plural evaluator field |
| Follow-up GET returns 404 | Use response runId as the {evaluationId} path parameter |
| Results missing row data | Add expand=row on the results endpoint |
| Pagination skips results | Use pagination.nextCursor, not page numbers |
| Python example returns coroutine | Await SDK methods inside an asyncio event loop |
curl -X POST "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluations" \
-H "Authorization: Bearer $ADALINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"evaluatorId": "evaluator_abc123",
"datasetId": "dataset_abc123"
}'
The response returns runId. Use that value as evaluationId in status/results/cancel calls.
curl "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluations/$RUN_ID" \
-H "Authorization: Bearer $ADALINE_API_KEY"
curl "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluations/$RUN_ID/results?grade=fail&expand=row&limit=50" \
-H "Authorization: Bearer $ADALINE_API_KEY"
curl -X POST "https://api.adaline.ai/v2/prompts/$ADALINE_PROMPT_ID/evaluations/$RUN_ID/cancel" \
-H "Authorization: Bearer $ADALINE_API_KEY"
const run = await adaline.prompts.evaluations.create({
promptId,
evaluation: { evaluatorId, datasetId },
});
const status = await adaline.prompts.evaluations.get({
promptId,
evaluationId: run.runId,
});
const results = await adaline.prompts.evaluations.results.list({
promptId,
evaluationId: run.runId,
grade: 'fail',
expand: 'row',
});
run = await adaline.prompts.evaluations.create(
prompt_id=prompt_id,
evaluation=CreateEvaluationRequest(evaluator_id=evaluator_id, dataset_id=dataset_id),
)
status = await adaline.prompts.evaluations.get(
prompt_id=prompt_id,
evaluation_id=run.run_id,
)
results = await adaline.prompts.evaluations.results.list(
prompt_id=prompt_id,
evaluation_id=run.run_id,
grade="fail",
expand="row",
)
runId in CI or job metadata so later steps can poll and fetch results.grade=fail&expand=row.See references/api.md for request/response schemas and curl examples.