| name | agentmark |
| description | Build, debug, and ship AgentMark prompts, datasets, experiments, and evals. TRIGGER when: working with `.prompt.mdx` files, `agentmark.json`, `agentmark.client.ts`, `agentmark_client.py`, an `agentmark/` or `.agentmark/` directory, or imports from `@agentmark-ai/*`; user says any of "set up AgentMark in this project", "wire AgentMark into my app", "integrate AgentMark", "run an AgentMark experiment", "create an AgentMark prompt", "deploy an AgentMark prompt", or runs `npm create agentmark` / `npx create-agentmark`; user runs or asks about `agentmark <cmd>` / `npx agentmark <cmd>` (`dev`, `run-prompt`, `run-experiment`, `build`, `generate-types`, `generate-schema`, `link`, `login`, `logout`, `pull-models`); user asks about driving AgentMark Cloud programmatically (the `agentmark-mcp` server, MCP tools, headless app provisioning, git-based deploys); user mentions AgentMark or asks about prompt versioning, dataset experiments, prompt evaluations, prompt deployments, or trace observability in an AgentMark project. SKIP: provider-neutral prompt code with no AgentMark markers; LangChain / LlamaIndex / raw OpenAI / Anthropic SDK code; questions about prompt engineering or LLM observability in general with no AgentMark context; questions about competing platforms (Langfuse, LangSmith, Phoenix, Braintrust, Traceloop). |
| license | AGPL-3.0-or-later |
AgentMark
AgentMark helps teams build reliable AI agents. This skill teaches you how to author .prompt.mdx prompts, run them locally, build datasets, run experiments with evals, and ship via git-based deploys.
Before you start
Scan the target file or project for AgentMark markers ā any of:
.prompt.mdx files anywhere in the tree
agentmark.json in the project root
agentmark.client.ts (TypeScript) or agentmark_client.py (Python)
- An
.agentmark/ directory
- Imports from
@agentmark-ai/* packages
If none are present, stop and tell the user that this skill applies to AgentMark projects. Ask whether they want to scaffold one with npx create-agentmark, or whether they want help with a different framework. Do not infer AgentMark conventions onto non-AgentMark code.
How to find current information
Your training data is out of date. Before answering anything specific about AgentMark APIs, CLI flags, prompt syntax, or docs content:
- CLI surface ā run
npx agentmark <command> --help. This is the canonical source for command flags, arguments, and behavior. Do not infer flags from memory.
- Docs navigation ā fetch
https://docs.agentmark.co/llms.txt for a complete page index. Use it to find the right doc page before WebFetching content.
- Specific doc pages ā append
.md to any docs.agentmark.co URL and WebFetch it. Every doc page is served as both HTML and Markdown.
Never encode API surface or CLI flags from memory. Always verify against --help output, llms.txt, or fetched docs.
Runtime model
AgentMark splits into two surfaces. Keep them straight or you will go looking for endpoints that do not exist.
- Gateway / Cloud API (
api.agentmark.co) ā observability and config. Stores prompts, traces, scores, datasets, deployments. It does not execute prompts. There is no POST /v1/prompts/{name}/run endpoint. If you can't find an "execute" action on a resource, that's expected. The CLI is intentionally curated (dev, login, logout, link, run-prompt, run-experiment, build, pull-models, generate-types, generate-schema) and stays narrow on purpose. For programmatic access to the full Cloud API surface ā provisioning apps, listing experiments, managing deployments, querying traces ā run the agentmark-mcp MCP server locally and let your IDE agent call its tools. See workflows/headless-with-mcp.md. For bespoke automation that doesn't have an MCP client (e.g. a Python script in CI), the gateway speaks plain REST ā call it with the session bearer from ~/.agentmark/auth.json or an AGENTMARK_API_KEY.
- SDK (
@agentmark-ai/sdk, @agentmark-ai/loader-api) ā execution. Customer code uses the SDK to load a deployed prompt template and call the LLM provider directly. Traces auto-forward to the gateway when AGENTMARK_API_KEY + AGENTMARK_APP_ID are set.
So "run prompt v1 in production" means a customer app using the SDK, not a gateway call. For the headless flow (commit ā push ā poll ā consume), see workflows/deploying.md.
Project anatomy
my-project/
āāā agentmark.json # Project config (required)
āāā agentmark/ # Prompts directory (path set by agentmarkPath in config)
ā āāā greeting.prompt.mdx
ā āāā qa-bot/
ā āāā prompt.prompt.mdx
ā āāā data.jsonl # Dataset
āāā .agentmark/ # Auto-generated config (gitignored)
ā āāā dev-config.json # Local dev state, linked app metadata
āāā agentmark.client.ts # TS dev server entry point (optional)
āāā .env # API keys, loaded automatically by the CLI
agentmark.json ā minimum: {"agentmarkPath": ".", "version": "2.0.0", "mdxVersion": "1.0"}. Use "." for the canonical layout, not "/".
- Prompts are
.prompt.mdx files. YAML frontmatter has name, a generation-type config block (text_config, object_config, image_config, or speech_config) that contains model_name, and an optional test_settings block for dataset + evals. Body is TemplateDX (JSX-like tags in markdown). Do not put model_name or evals at the top level ā they live inside their config blocks.
- Datasets are
.jsonl files. The datasetName used in API path parameters is the file path without the .jsonl extension, URL-encoded.
Common workflows
Conventions that catch agents out
run-prompt ā run-experiment. run-prompt <file> executes a single prompt with the --props you pass. run-experiment <file> executes the prompt against every row in its linked dataset and runs evals. Do not use one for the other.
agentmark dev runs a fully local server. Trace forwarding to AgentMark Cloud is automatic when the project is linked (via agentmark link); disable with --no-forward. There is no --remote flag on dev ā it was removed in 0.13.0 along with the @agentmark-ai/connect WebSocket package. If you see --remote on dev in older content, ignore it.
- Deployment is git-based. Connect a git provider (GitHub or GitLab) to your AgentMark Cloud app, then push to the configured branch ā AgentMark builds and deploys automatically. There is no
deploy CLI command; the watched-branch push is the deploy trigger. See workflows/deploying.md.
- The CLI is for humans; the MCP server is for agents. The
agentmark CLI stays narrow on purpose: dev, login, logout, link, run-prompt, run-experiment, build, pull-models, generate-types, generate-schema. When an agent needs to drive the Cloud API programmatically (create apps, mint API keys, connect a git provider, list deployments, query traces, ā¦) it runs the agentmark-mcp MCP server and uses its tools. Tools are auto-generated from api.agentmark.co/v1/openapi.json, so the agent surface stays in lock-step with the gateway. See workflows/headless-with-mcp.md.
- Dataset name encoding differs by endpoint. For the
?name=X query filter on GET /v1/datasets, pass the leaf name without the .jsonl extension (exact match). For POST endpoints under /v1/datasets/{datasetName}/rows*, pass the full path URL-encoded (e.g. agentmark%2Fqa-bot%2Fdata), still without the extension.
- Never send
tenant_id in a write body ā it is derived from your credential. Every Cloud API write (provision an app, mint an API key, append a dataset row, create an annotation queue or alert) scopes the new row to the tenant behind your AGENTMARK_API_KEY / session bearer. The database silently overwrites any tenant_id you pass with that tenant ā no error is raised ā so a row you tried to file under another tenant simply lands under your own. If you are debugging "why did my resource show up under a different tenant than I asked for," this is the reason: the override lives in a database trigger, not in the API handler, so you will not find it by reading application code. Omit the field entirely.
Reference material
All reference/*.md files are auto-generated from upstream sources on every release. They are the most reliable encoded facts in this skill. Hand-authored workflow files can drift; these cannot, because re-running the generators is part of the pre-push gate.
To check whether the skill has drifted from reality ā CLI command list, docs MCP availability, internal markdown links ā run the bundled smoke:
node skills/agentmark/smoke.mjs
Exits 0 when the skill is current; non-zero with one FAIL: ⦠line per drift finding. Run it before publishing the skill, or whenever a workflow seems off.
When this skill does not apply
- Generic prompt engineering with no AgentMark project context ā answer directly, do not push AgentMark conventions.
- Provider-neutral LLM code, raw
@anthropic-ai/sdk or OpenAI SDK calls ā do not introduce AgentMark imports.
- Questions about competing platforms (Langfuse, LangSmith, Phoenix, Braintrust, Traceloop) ā answer the comparison question directly. If the user then decides on AgentMark, return to this skill.
If you cannot find documentation to support an AgentMark-specific answer, say so explicitly and link the user to https://docs.agentmark.co/llms.txt so they can find the relevant page.