| name | run-examples |
| description | Picks and runs the examples under examples/ that exercise the code changed on the current branch / PR / diff, as a phase-two integration check after a code review. Use when the user says "run the relevant examples", "verify with examples", "test the changes with examples", "/run-examples", or asks to execute matching examples for a PR/branch. Always proposes the candidate list (with reasons + skip notes) and waits for approval before executing. Handles missing API keys via examples/.env.local and brings up the right Docker services when needed.
|
Run Examples
You help the maintainer of the Symfony AI monorepo run the subset of
examples/ scripts that actually exercise the code on the current branch / PR.
This is a phase-two check that runs after a code review (e.g. via
/pr-review) once the maintainer is comfortable with the change itself and
wants to see the examples still light up green.
Two non-negotiable behaviors:
- Propose first, then execute. Always present the candidate examples and
wait for approval before running anything. The point of this skill is to
give the maintainer a thinking step, not to autorun.
- Don't pretend skipped examples passed. If an example can't run because a
required key is missing from
examples/.env.local, list it as skipped
with the reason. Don't quietly drop it.
1. Determine the changeset
Same approach as pr-review:
- PR number given (e.g.
/run-examples 1234): gh pr checkout 1234
if not already on the branch, then derive the diff against the PR's
baseRefName (gh pr view 1234 --json baseRefName).
- No argument: review the current branch against the merge-base with
main
(fall back to master).
- Explicit base ref given (e.g. the maintainer says "base is
develop" or
"compare against <sha>"): use that ref as the base instead of detecting
one. This also covers replaying a historical change from a worktree.
Get the changed files with git diff --name-only <base>..HEAD. Read the actual
diff (git diff <base>..HEAD) only if you need the contents to make better
mapping decisions — for picking which examples to run, the file list is
usually enough.
2. Map changes → relevant examples
The mapping is mostly mechanical because the monorepo has strong conventions.
Apply these in order; a single change can match multiple categories.
2a. Bridge changes (most common, most direct)
src/<component>/src/Bridge/<Vendor>/... → examples that import that namespace.
The reliable way to find them is to grep examples/ for the namespace prefix
of the changed Bridge:
git diff --name-only <base>..HEAD | grep '^src/.*/src/Bridge/'
# for each Bridge directory, grep examples/ for its namespace
grep -rl "Symfony\\\\AI\\\\<Component>\\\\Bridge\\\\<Vendor>" examples/
This catches both the "obvious" matches (Bridge OpenAi/Embeddings/ → uses in
examples/openai/embeddings.php) and the cross-cutting ones (a Platform
OpenAI change is also exercised by examples/rag/postgres.php because it uses
OpenAI for embeddings).
Common shortcuts that hold:
src/platform/src/Bridge/<Vendor>/ → examples/<vendor>/*.php plus any
examples/rag/, examples/store/, examples/indexer/, examples/retriever/,
examples/chat/, examples/memory/ script that uses that vendor for
embeddings or chat.
src/store/src/Bridge/<Backend>/ → examples/store/<backend>.php,
examples/rag/<backend>.php (and variants like <backend>-openai.php),
plus indexer/retriever/memory scripts that import the bridge.
src/chat/src/Bridge/<Backend>/ → examples/chat/persistent-chat-<backend>.php.
src/agent/src/Bridge/<Tool>/ → examples/toolbox/<tool>.php and any
example that uses that tool.
When in doubt, trust the grep result over the path heuristic.
Express runs at folder granularity when you can. If most/all candidates
live in one subdirectory (typical for a Bridge change touching only
<vendor>/ scripts), propose running that subdirectory via the runner —
./runner <vendor> — rather than enumerating each script. The runner is the
intended unit; it parallelises, captures output, and reports a summary table.
Use ./runner --filter=<pattern> when you want a name-based subset
(e.g. --filter=toolcall to hit toolcall variants across vendors). Reserve
per-script enumeration for when the candidates are scattered across
directories or you genuinely want only a handful from a much larger set.
2b. Component-level changes (Platform, Agent, Store, Chat core)
Changes outside any Bridge folder (e.g. src/platform/src/Result/,
src/agent/src/Toolbox/, src/store/src/Document/) tend to affect many
examples. Don't enumerate hundreds of scripts. Instead:
- Identify which capabilities the change touches (streaming, structured
output, tool calling, message bag, vectorizer, ...).
- Pick a small representative set of examples per affected capability —
ideally one per capability, two if the capability has meaningfully
different shapes (e.g. tool-call sync vs. tool-call streaming). The
maintainer can always ask to broaden; reining in is harder.
- Prefer Docker-free examples. Unless the change is specifically about a
store / persistence backend, pick representatives that don't need
docker compose up. Persistent-chat → prefer cache / doctrine-dbal /
session / static over redis / mongodb / meilisearch. RAG → prefer
in-memory.php over postgres.php. Skip the Docker spin-up cost when it
doesn't add coverage for the diff.
- If the change is broad and uncontroversial (refactor, type tightening,
internal rename), suggest running
./runner against one or two vendor
subdirectories the maintainer is most familiar with rather than hand-picking.
When uncertain, propose fewer examples and explicitly say so — the maintainer
can ask for more.
2c. When nothing in examples/ covers the diff
Some diffs are simply not within scope of this skill. The clear cases:
- Bundle-only diffs (
src/ai-bundle/, src/mcp-bundle/) — examples/
scripts instantiate factories directly and never boot the Symfony kernel,
so they don't exercise bundle wiring at all.
- Fixtures, docs, infra, CI — nothing to run.
- Tests-only diffs — the change is the test; running examples adds no
signal.
In these cases the output is short and final:
- State that nothing in
examples/ applies, with a one-line reason.
- Optionally add a "Gap check" line only if the diff adds a new public
capability that plausibly should have a demonstrating example (a new
bridge, a new content type, a new public API surface) — but not for
internal refactors, tests, or wiring fixes.
- Stop.
Do not suggest running phpunit, the demo app, CI workflows, or any other
tool — those have their own coverage and are out of this skill's scope.
Do not propose a "smoke test of a random vendor anyway" as a fallback —
if the diff doesn't touch that vendor's code, running it gives no signal.
Do not add a "Recommendation" section pointing at non-examples/
tooling. Saying "nothing applies" with a reason is the complete answer —
trust the maintainer to know what their other tools are.
3. Filter by runnability
For every candidate example produced in step 2, decide whether it can actually
run on this machine right now. Each example reaches this state by passing
through:
3a. Required env vars
examples/bootstrap.php defines env(string $var) that prints an error and
exit(1) if the var is empty. So the gate is: every env('FOO') call inside
the example file must have FOO non-empty in examples/.env.local (or, as a
fallback, in the shell environment / examples/.env defaults).
To check:
grep -oE "env\('[^']+'\)" path/to/example.php | sort -u
Then look up each var in examples/.env.local (parse it: KEY=VALUE, ignore
blank lines and # comments, treat empty VALUE as missing). examples/.env
holds defaults for infrastructure env vars (hosts, ports for local Docker
services); .env.local is where the maintainer keeps API keys. A var is
"available" if it's filled in either file.
3b. Docker-backed services
Some examples talk to a service from examples/compose.yaml rather than (or
in addition to) a third-party API. Detection heuristic: if the example
references any of these env vars, it needs Docker:
| Env var(s) used | Compose service name |
|---|
MARIADB_URI | mariadb |
POSTGRES_URI | postgres |
MONGODB_URI | mongodb |
REDIS_HOST | redis |
MEILISEARCH_HOST | meilisearch |
QDRANT_HOST | qdrant |
WEAVIATE_HOST | weaviate |
MILVUS_HOST | milvus (+ etcd, minio) |
ELASTICSEARCH_ENDPOINT | elasticsearch |
OPENSEARCH_ENDPOINT | opensearch |
CHROMADB_HOST / CHROMADB_PORT | chromadb |
TYPESENSE_HOST | typesense |
NEO4J_HOST / NEO4J_PASSWORD | neo4j |
SURREALDB_HOST | surrealdb |
MANTICORESEARCH_HOST | manticore |
CLICKHOUSE_HOST | clickhouse |
POGOCACHE_HOST | pogocache |
PINECONE_HOST (when set to 127.0.0.1/local) | pinecone |
LITELLM_HOST_URL | litellm (+ litellm-db) |
Examples that hit a remote API (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY,
AZURE_*, AWS_*, BEDROCK, OPENROUTER_KEY, ...) do not need Docker —
they only need the key.
The local model runners (OLLAMA_HOST_URL, LMSTUDIO_HOST_URL,
DOCKER_MODEL_RUNNER_HOST_URL) point at a separately running local server
that the maintainer manages outside compose.yaml. The defaults in
examples/.env already point them at sensible local URLs, so treat them
like any other env-gated example: if the host var is set, the example is
runnable. If the local server isn't actually up, the example will exit
early and the runner will report it as Skipped — that's fine. Don't
preemptively mark these as unrunnable just because they need a local
runner; the maintainer often has one running.
3c. Runnability outcome
For each candidate, classify into one of:
- runnable — all required env vars are set; any Docker services it needs
are listed for confirmation.
- skipped (missing keys) — list which env vars are empty so the maintainer
can decide whether to fill them in.
- needs Docker — runnable in principle but requires
docker compose up -d --wait <services>
first. Group these so we can start everything in one command.
4. Propose the plan
Output a structured proposal. Don't run anything yet. Express the run as
runner invocations when possible — ./runner <subdir> or ./runner --filter=<pattern> — and only fall back to per-script enumeration when the
candidates don't sit naturally inside one folder.
## Proposed example runs for <branch / PR #>
Changeset: <one-line summary — e.g., "Adds streaming support to
src/platform/src/Bridge/OpenAi/Embeddings/">
### Will run
`cd examples && ./runner openai` — all 17 OpenAI scripts, exercises the
modified `OpenAi\Embeddings\ResultConverter`
If multiple subdirs are involved, list each on its own line. If the right
unit is "a few specific scripts", list those instead (with a one-line "why"
each).
### Docker services to start first
`docker compose up -d --wait postgres` — needed by `rag/postgres.php` only
(Omit the Docker section entirely if no service is required.)
### Skipped (M)
- `examples/anthropic/*` — `ANTHROPIC_API_KEY` is empty in `.env.local`
Reply with "go" to run, or tell me which to keep / drop.
Keep it tight. Be selective rather than maximalist — propose the minimum
that meaningfully exercises the diff, not the maximum that touches the
namespace. If a folder has 14 scripts and only 2 are relevant to the change,
list those 2 — don't propose ./runner <vendor> just because you can. The
runner shortcut is for when running the whole folder is genuinely the right
unit (e.g. a Bridge-internal change that could affect any of its examples).
5. Execute (after approval)
Once the maintainer says go (or edits the list):
-
Start Docker services if any are needed:
docker compose up -d --wait <service> <service> ...
Run this from examples/. --wait blocks until healthchecks pass.
Don't blanket-up the entire compose.yaml — only the services we need.
If services fail to come up, stop and surface the error; don't try to
run examples against a half-broken stack.
-
Run the examples via ./runner by default — it parallelises and
reports a Finished / Failed / Skipped summary table:
cd examples && ./runner <subdir> # whole subdir, e.g. ./runner openai
cd examples && ./runner <a> <b> <c> # multiple subdirs
cd examples && ./runner --filter=<pattern> # by name pattern, e.g. --filter=toolcall
"Skipped" rows mean the example exited early — usually a missing env
var; cross-reference with the step-3 skip list. Only fall back to a
bare php <path> invocation when you specifically need the verbose
output of one example (e.g. to debug a failure with -vvv).
-
Report results. Group by outcome:
- Failures: show the example path, exit code, and a short snippet of the
error output (stderr if available, otherwise stdout). Don't paste
hundreds of lines — quote the first few lines that show the actual
error.
- Successes: just count them.
- Skipped: pass through what was already known.
If anything failed, suggest re-running with -vvv for full HTTP logs:
php <path> -vvv.
-
Don't tear down Docker. Leave services up — the maintainer probably
wants to iterate. Only suggest docker compose down if they ask.
Principles
- Be selective, not maximalist. Better to propose 5 highly-relevant
examples than 50 vaguely-related ones. "It imports the namespace" alone
isn't a reason to run something — the diff has to plausibly affect what
the example exercises. Reining in is harder than expanding.
- Prefer the runner over per-script lists when the unit of work is
"a folder" or "a name pattern".
./runner openai reads better than 17
bullet points. Only enumerate scripts when the candidates are scattered
or genuinely a small subset.
- Prefer Docker-free representatives unless the diff is specifically
about a store / persistence backend. Don't make the maintainer spin up
containers just to verify a change in core platform/agent code.
- Explain your reasoning per example. A one-line "why" next to each
candidate (which file/namespace it ties to) lets the maintainer veto
quickly.
- Surface skips, don't hide them. If half the candidates can't run,
that's information — say so up front so the maintainer can decide whether
to fill in keys before running.
- Don't auto-fix env vars. Never write to
.env.local or suggest
values for missing API keys. Just report what's missing.
- Stay in scope. This skill is only about
examples/. If the diff
isn't covered by any example, say so and stop — don't pivot to phpunit,
demo runs, or other tools that have their own CI coverage.