원클릭으로 Manus에서 모든 스킬 실행

replay-railway-preview

End-to-end validate a `text.ara.so/backend` (website-agent) fix against a real Railway preview environment BEFORE merging to main. Pushes the current branch, deploys the SHA to the long-lived `preview` Railway env, confirms the new `git_sha` is live via `/healthz`, then sends a focused signed webhook replay (single-turn or short multi-turn) to a reserved 555-01XX test phone, and polls Braintrust for the root span's `outcome` / rounds / tools / reply. Use whenever you change anything in `backend/src/` (tool handlers, system prompt, builder loop, etc.) and want production-fidelity proof — not just unit tests — before opening the PR. Invoked as `/replay-railway-preview` or `/replay railway preview`.

Manus에서 실행

개요

설치 명령

npx skills add https://github.com/Aradotso/ara.engineer --skill replay-railway-preview

이 명령을 Claude Code에 복사하여 붙여넣어 스킬을 설치하세요

출처

Aradotso/ara.engineer

스타0

포크0

업데이트2026년 4월 26일 00:56

파일 탐색기

2 개 파일

SKILL.md

readonly

이 저장소의 다른 Skills

같은 저장소

secrets

Aradotso/ara.engineer

Ara's secrets convention — all runtime credentials live in Infisical (project "Ara-passwords"), one folder per active GitHub repo (`/ara-engineer`, `/text-ara-so`). The old megarepo lives at `/legacy-ara-megarepo` tagged `legacy` — reference only. Never ask the user to paste keys; never commit .env; never build a custom vault.

2026-04-260

align

Aradotso/ara.engineer

Align the website-agent (system prompt + routing) with successful on-Ara website building; analyze Braintrust traces for derailment (wrong phase, local-dev tutoring, missing deploy, paywall/connect confusion), patch `text.ara.so/backend` (primarily `system-prompt.ts`), then verify in a tight loop—`bt` / Braintrust evals / `bun run e2e` (replay + live)—and report a before vs after fit table. Invoked as `/align`, `/align <trace url>`, `/align users …` (pair with `/trace` to gather traces). Companion to `/trace`.

2026-04-250

trace

Aradotso/ara.engineer

Ara agent-trace debugging — inspect Braintrust traces for the website-agent (TS/Bun, Cerebras, Vercel AI SDK v6). Invoked as `/trace recent`, `/trace turn <turn_id>`, `/trace convo <chat_id>`, `/trace user <phone>`, `/trace top users` (aggregate: who messaged most in a time window — one latest-turn link per sender, table below), `/trace tool <name>`, `/trace span <id>`, `/trace <url>`, `/trace test` (run canonical e2e via `text.ara.so/backend/scripts/e2e.ts` (`bun run e2e --target=local --scenarios=<name>`)), `/trace score`, `/trace online`, or `/trace grow`. **Use `/align`** to align system prompt and gates to traces and report before/after tables (`skills/align/SKILL.md`). **When the user asks for users’ traces / links:** give **one permalink per user** — the **latest** `webhook.inbound` root only. **When they ask for top / busiest users over a time range:** use the **top users table** format (Msgs, User, single review link) — no per-thread “all turns” column unless they ask.

2026-04-250

Aradotso/ara.engineer

Deeply research a technology already in the stack. Pulls latest official docs, changelogs, release notes, production recipes, and pitfalls via heavy web search — then maps findings onto the exact usage pattern in this repo to surface underused features and confirm best practices. Use whenever someone asks "are we using X right?", "what's the best way to use X for Y?", or "what did X ship recently that we should adopt?"

2026-04-240

braintrust

Aradotso/ara.engineer

Braintrust `bt` CLI for Ara — every agent-facing service emits spans to the `Ara` project (org `Aradotso`). Use for inspecting logs/traces/prompts/evals, authenticating, SQL over spans, and wrapping new code with tracing. For the Ara-specific span shape and debug recipes (`/trace recent`, `/trace turn`, `/trace user`, etc.) see the companion `/trace` skill.

2026-04-240

aracli

Aradotso/ara.engineer

Meta-guide to the aracli CLI (formerly `ae`) and its skill-sharing pipeline. Covers how aracli ships Claude Code skills to every teammate automatically — first-run bootstrap, live symlinks, PreToolUse/Skill hook, SessionStart hook. Use whenever you need to author a new skill, edit an existing skill, or reason about why a skill change did (or didn't) reach another user.

2026-04-230

출처

Aradotso

Aradotso/ara.engineer

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

네트워크·컴퓨터 시스템 관리자컴퓨터 및 수학직15-1244L4

name	replay-railway-preview
version	1.0.0
description	End-to-end validate a `text.ara.so/backend` (website-agent) fix against a real Railway preview environment BEFORE merging to main. Pushes the current branch, deploys the SHA to the long-lived `preview` Railway env, confirms the new `git_sha` is live via `/healthz`, then sends a focused signed webhook replay (single-turn or short multi-turn) to a reserved 555-01XX test phone, and polls Braintrust for the root span's `outcome` / rounds / tools / reply. Use whenever you change anything in `backend/src/` (tool handlers, system prompt, builder loop, etc.) and want production-fidelity proof — not just unit tests — before opening the PR. Invoked as `/replay-railway-preview` or `/replay railway preview`.
allowed-tools	["Bash","Read","Write","Edit"]

/replay-railway-preview — production-fidelity validation for website-agent fixes

Unit tests prove the guard / handler / parser does what its inputs say. They do not prove the live model on Cerebras follows your new system prompt, or that a real signed webhook flows through Hono → builder → Blaxel sandbox → Braintrust the way you expect. Before merging anything non-trivial to text.ara.so/backend/src/, run this skill: it deploys your branch's SHA to the long-lived preview Railway env (cloned from prod, real Supabase / real BT / real Blaxel workspace, preview-* sandbox prefix so it cannot collide with prod sandboxes) and replays a representative user turn against it.

The setup ledger (already provisioned — DO NOT recreate)

The preview env is permanent infrastructure, not ephemeral per-PR. Reuse it.

Railway project   text-ara-so       234f1a04-3e7b-463a-a07e-6232b7102420
Railway service   website-agent     102b93f7-8b90-465f-9b28-98782c298841
Preview env       preview           2d16e49e-af45-4806-b860-23d4b47937d0
Production env    production        88249e7b-f653-4d9d-83c0-407f9cb156a4

Preview URL    https://website-agent-preview-bdd4.up.railway.app
Prod URL       https://website-agent-production-9ace.up.railway.app

Preview env is a clone of prod + these overrides (already set, don't touch):

SANDBOX_NAME_PREFIX=preview     ← so preview Blaxel sandboxes never collide with prod
WARM_POOL_PREFIX=preview-pool
ARA_ENVIRONMENT=preview
PORT=3000

Supabase URL, BT API key, Cerebras key, etc. are the same as prod by design — preview should fail the same way prod does.

Recipe (about 5 min wall-clock for a single-turn replay)

1. Push your branch

git push -u origin "$(git branch --show-current)"

2. Deploy the SHA to the preview env

Need the Railway MCP. Load it via ToolSearch first if deferred:

ToolSearch query="railway"  max_results=30

Then deploy:

railway_deploy_from_commit
  serviceId=102b93f7-8b90-465f-9b28-98782c298841
  environmentId=2d16e49e-af45-4806-b860-23d4b47937d0
  commitSha=<your full or short SHA>

Railpack builds backend/ (root_directory is already set on the service). Typical build is 60–120s.

3. Confirm the new SHA is live

/healthz is the source of truth. The Railway MCP get_deployment / list_deployments endpoints have stale GraphQL fragments and frequently 400 — don't waste time there.

curl -sS https://website-agent-preview-bdd4.up.railway.app/healthz \
  | jq '{status,git_sha,bt_enabled,db_configured,verify_mode,model}'

Loop this until git_sha matches your commit (truncated to 7 chars). Don't proceed with the replay until this matches — otherwise you're testing whatever was deployed before.

4. Run the replay

Use scripts/replay-template.ts from this skill as a starting point — copy it into text.ara.so/backend/scripts/replay-<bug-id>.ts, swap in the prompt sequence that exercises your fix.

Two designs, in order of preference:

Single-turn replay (fastest, ~30–60s): one webhook whose prompt alone exercises the fix. Use this whenever you can — even bugs that manifested mid-conversation in the original incident are usually reproducible from a single prompt with enough context inlined.
Multi-turn replay (~3–5 min): only when the bug genuinely needs a prior site/state. Webhooks for the same chat_id queue sequentially behind the per-conversation lock — turn N+1 does not start until turn N's builder finishes. Plan wait times accordingly.

Run from the text.ara.so worktree:

cd ~/github/text.ara.so/backend
URL_BASE=https://website-agent-preview-bdd4.up.railway.app \
  bun run scripts/replay-<bug-id>.ts

The template polls Braintrust every 10s for the root webhook.inbound span and exits as soon as outcome is non-null. That gives you the honest turn duration; no guessing.

5. Read the result

The script prints a one-screen summary:

=== RESULT ===
root_span_id: <uuid>
outcome:      ok | hallucinated_update | max_rounds | error | …
rounds:       <int>     ← llm_calls
tool_calls:   <int>
duration_s:   <float>
reply:        "<first 600 chars of the agent's reply>"

BT permalink:
  https://www.braintrust.dev/app/Aradotso/p/Ara/logs?r=<root>&s=<root>

Compare to the pre-fix incident's metrics. A real fix should show a clear delta — usually order-of-magnitude on rounds / tools / duration. If the reply text doesn't match the user-visible behaviour you wanted, the fix isn't done — open the BT permalink and walk the tool sequence to see why.

6. Wipe the test user

The script intentionally does NOT wipe automatically — the original incident-debugging session of this skill clobbered turn 2 by wiping mid-builder. Wipe by hand AFTER you've seen the result:

infisical run --projectId=6d518288-7854-49d2-aa42-8ffd285dafa1 \
  --env=prod --path=/text-ara-so --recursive -- \
  bash -c 'curl -sS -X POST $URL_BASE/admin/wipe \
    -H "x-admin-token: $ADMIN_WIPE_TOKEN" \
    -H "Content-Type: application/json" \
    -d "{\"phone_number\":\"+15550100104\"}"'

(Or use whatever Infisical wrapper you've got — secrets skill covers the pattern. Never paste ADMIN_WIPE_TOKEN or LINQ_SIGNING_SECRET into chat or commit them.)

7. Do NOT tear the preview env down

It's shared infrastructure. The whole workflow assumes it's there. If you genuinely need a clean rebuild, ping Adi first.

Conventions

Test phones — 555-01XX (reserved fiction range). Pick a free one in the 100–199 block. +15550100100..104 are already burned by prior replays' BT history; use +15550100110+ for new bugs. If Linq ever does try to deliver an outbound, it can't reach a real device.
Chat IDs — r<run-id>-<bug-tag> so they're greppable in BT. The template uses Math.floor(Date.now()/1000).toString(36) as the run id — short and unique enough.
Don't run two replays against the same chat_id concurrently. The per-conversation lock will queue the second behind the first; if your script wipes mid-flight you'll lose the second turn's BT span. Use a fresh chat_id per run.
Secrets come from Infisical. LINQ_SIGNING_SECRET and ADMIN_WIPE_TOKEN live in Infisical project Ara-passwords, folder /text-ara-so. The template falls back to env vars; export them via infisical run rather than hardcoding.

Common pitfalls (learned in production)

Symptom	Cause	Fix
`outcome=null`, all metrics 0 in BT	Builder hadn't finished when you queried — fixed-sleep too short	Use polling (template does this); don't fixed-sleep on long turns
Turn 2 of multi-turn is empty	Wipe ran before per-chat lock released turn 2	Don't wipe in the script; wipe by hand after verifying
`git_sha` on /healthz is your old commit	Deploy still building, or build failed	Check `railway_get_build_logs deploymentId=<id>` (one of the few Railway MCP calls that works); don't replay until /healthz matches
BT preview-length truncates JSON parse	Default `--preview-length` is small; some `output` blobs exceed it	Pass `--preview-length 5000` or higher when fetching root spans
`railway_get_deployment` 400s with GraphQL field error	Stale schema fragment in MCP	Use `/healthz` for status; use `railway_get_build_logs` / `railway_get_deploy_logs` for what's running
Two preview domains both work	`website-agent-preview.up.railway.app` and `…-bdd4.up.railway.app` are aliases on the same service instance	Either is fine; the `-bdd4` suffix is in the original setup ledger so prefer it for greppability

Cross-refs

/trace — pull and inspect BT spans by chat_id, phone, turn, etc. Use this to read the replay results in detail.
/secrets — Infisical convention for ADMIN_WIPE_TOKEN / LINQ_SIGNING_SECRET.
/test — local-only smoke tests that DON'T need the preview env. Run that first; only use this skill for production-fidelity proof.

Quick reference

# 1. push
git push -u origin "$(git branch --show-current)"

# 2. deploy SHA  (load Railway MCP via ToolSearch first)
#    railway_deploy_from_commit serviceId=102b93f7-… envId=2d16e49e-… commitSha=<sha>

# 3. wait for /healthz to report your sha
curl -sS https://website-agent-preview-bdd4.up.railway.app/healthz | jq -r .git_sha

# 4. run replay (copy scripts/replay-template.ts → text.ara.so/backend/scripts/replay-<bug>.ts)
cd ~/github/text.ara.so/backend
URL_BASE=https://website-agent-preview-bdd4.up.railway.app \
  bun run scripts/replay-<bug>.ts

# 5. read BT permalink from script output, compare to pre-fix trace

# 6. wipe (manually, with secrets sourced via Infisical)