com um clique
codex-e2e-test
// Run PR-grade real Codex E2E validation through claude-tap, including resume turns, multiple tool calls, optional image input, viewer verification, and screenshot evidence.
// Run PR-grade real Codex E2E validation through claude-tap, including resume turns, multiple tool calls, optional image input, viewer verification, and screenshot evidence.
Full pre-PR merge-readiness check. Run this before opening or merging a pull request — it validates local gates (lint, format, tests), CI status, screenshot evidence, and PR metadata in one pass. Also useful for reviewing an existing PR's readiness.
Test JS logic embedded in HTML using two-layer strategy - Python unit tests + Playwright browser integration tests
Fill missing i18n translations in the viewer source JSON. Run this after adding or modifying English or Chinese UI strings in claude_tap/viewer_i18n.json — it auto-translates to ja, ko, fr, ar, de, ru via OpenRouter.
Validate screenshot and viewer HTML quality for PR evidence. Run this after adding or modifying images under .agents/evidence/pr/ or .agents/recordings/, or after generating a new viewer HTML file. Combines image quality checks (resolution, blankness, file size) with Playwright-based viewer rendering verification.
Generate demo assets (GIF/MP4) from real tmux E2E runs and viewer screenshots using asciinema and Playwright
Validate maintainer docs structure, standards freshness, manifest paths, and plan state. Run this after modifying any file under .agents/docs/standards/, .agents/docs/plans/, .agents/docs/architecture/, or AGENTS.md — it catches stale metadata, broken manifest paths, and plan state drift before CI does.
| name | codex-e2e-test |
| description | Run PR-grade real Codex E2E validation through claude-tap, including resume turns, multiple tool calls, optional image input, viewer verification, and screenshot evidence. |
| tags | testing, e2e, codex, responses-api |
Run real end-to-end validation that starts claude-tap from local source,
connects to the real Codex CLI via OAuth, captures OpenAI Responses API traces,
and produces viewer screenshots suitable for PR evidence.
Use this skill for every PR that changes capture, proxying, viewer rendering, session/dashboard behavior, client launch logic, trace ordering, content blocks, tools, token usage, or screenshot/demo assets. If a PR cannot run this flow, state why in the PR and cover the same risk with another real client trace.
codex CLI installed (npm install -g @openai/codex) and authenticated via OAuthuv sync --extra devuv run playwright install chromiumVerify OAuth works:
codex exec "say hello" --dangerously-bypass-approvals-and-sandbox
If it fails with token errors, re-authenticate:
codex auth login
Codex uses the OpenAI Responses API (/v1/responses) instead of Anthropic Messages API.
With OAuth authentication, the upstream is https://chatgpt.com/backend-api/codex,
not https://api.openai.com.
The proxy must be told the correct target with --tap-target.
Prefer the resume + multimodal flow below for PR evidence. The simple commands are only smoke tests for checking local setup.
claude-tap --tap-client codex \
--tap-target https://chatgpt.com/backend-api/codex \
--tap-output-dir /tmp/codex-e2e \
--tap-no-open --tap-no-update-check \
-- exec "say hello" \
--dangerously-bypass-approvals-and-sandbox
Use a task that requires shell tool use — this forces the agent to make multiple Responses API calls (models lookup + actual responses):
claude-tap --tap-client codex \
--tap-target https://chatgpt.com/backend-api/codex \
--tap-output-dir /tmp/codex-e2e \
--tap-no-open --tap-no-update-check \
-- exec "Read pyproject.toml and tell me the project name and version" \
--dangerously-bypass-approvals-and-sandbox
Expected: 4+ API calls (2x GET /v1/models + 2x POST /v1/responses).
Use this flow for viewer changes that affect message rendering, content block boundaries, tool call ordering, images, or copy/select behavior. It creates a real Codex session, resumes it at least once, forces multiple shell tool calls per user turn, and attaches an actual image so the trace includes multimodal content. This is the default PR evidence flow.
mkdir -p /tmp/claude-tap-real-codex-workspace
printf 'project = "claude-tap-real-codex-e2e"\n' \
> /tmp/claude-tap-real-codex-workspace/project.toml
Create or copy a small valid PNG into the workspace. If you need a deterministic local image, generate it with Python's standard library:
python3 - <<'PY'
from pathlib import Path
import struct
import zlib
def chunk(kind: bytes, data: bytes) -> bytes:
return struct.pack(">I", len(data)) + kind + data + struct.pack(">I", zlib.crc32(kind + data) & 0xFFFFFFFF)
width = height = 8
rows = b"".join(b"\x00" + (b"\x2f\x80\xed\xff" * width) for _ in range(height))
png = (
b"\x89PNG\r\n\x1a\n"
+ chunk(b"IHDR", struct.pack(">IIBBBBB", width, height, 8, 6, 0, 0, 0))
+ chunk(b"IDAT", zlib.compress(rows))
+ chunk(b"IEND", b"")
)
Path("/tmp/claude-tap-real-codex-workspace/input.png").write_bytes(png)
PY
Run from the repository, but point Codex at the isolated workspace with -C.
The prompt should explicitly require several tool calls so the viewer has
enough messages, tool calls, and response blocks to inspect.
uv run claude-tap --tap-client codex \
--tap-target https://chatgpt.com/backend-api/codex \
--tap-output-dir /tmp/claude-tap-real-codex-traces \
--tap-no-open --tap-no-update-check \
-- exec -C /tmp/claude-tap-real-codex-workspace \
--image /tmp/claude-tap-real-codex-workspace/input.png \
--dangerously-bypass-approvals-and-sandbox \
"Inspect this workspace and the attached image. Use shell tools to run pwd, list files, inspect project.toml, inspect input.png, then write codex_e2e_report.txt with your findings. Keep all writes inside this workspace."
Use the session id printed by the first Codex run when possible. Avoid relying
on --last on busy maintainer machines because it can resume an unrelated
recent Codex session.
uv run claude-tap --tap-client codex \
--tap-target https://chatgpt.com/backend-api/codex \
--tap-output-dir /tmp/claude-tap-real-codex-traces \
--tap-no-open --tap-no-update-check \
-- exec resume <SESSION_ID_FROM_FIRST_RUN> \
--image /tmp/claude-tap-real-codex-workspace/input.png \
--dangerously-bypass-approvals-and-sandbox \
"Continue the same investigation in /tmp/claude-tap-real-codex-workspace. Use shell tools to read /tmp/claude-tap-real-codex-workspace/codex_e2e_report.txt, compute the byte size of /tmp/claude-tap-real-codex-workspace/input.png, and write /tmp/claude-tap-real-codex-workspace/codex_e2e_followup.txt. Then summarize what changed since the previous turn."
Take screenshots at multiple scroll positions, including a deeper position in
the same detail pane. Store them under .agents/evidence/pr/<topic>/ and use
raw.githubusercontent.com links in the PR body.
mkdir -p .agents/evidence/pr/codex-real-e2e
uv run python - <<'PY'
from pathlib import Path
from playwright.sync_api import sync_playwright
html_files = sorted(Path("/tmp/claude-tap-real-codex-traces").rglob("trace_*.html"))
if not html_files:
raise SystemExit("No viewer HTML found in /tmp/claude-tap-real-codex-traces")
html = html_files[-1]
out_dir = Path(".agents/evidence/pr/codex-real-e2e")
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={"width": 1760, "height": 1100}, device_scale_factor=1)
page.goto(f"file://{html}", wait_until="domcontentloaded", timeout=10000)
page.wait_for_selector(".sidebar-item", timeout=5000)
page.evaluate("document.documentElement.setAttribute('data-theme', 'light')")
# Select the last Responses call so resume context is visible.
page.evaluate(
"""() => {
const items = Array.from(document.querySelectorAll('.sidebar-item'));
const responses = items.filter(item => item.textContent.includes('/v1/responses'));
(responses.at(-1) || items.at(-1))?.click();
}"""
)
page.wait_for_timeout(300)
page.evaluate(
"""() => {
for (const section of document.querySelectorAll('#detail .section')) {
const title = section.querySelector('.title')?.textContent || '';
const body = section.querySelector('.section-body');
const header = section.querySelector('.section-header');
if (!body || !header) continue;
const shouldOpen = ['System Prompt', 'Messages', 'Response'].includes(title);
const isOpen = body.classList.contains('open');
if (shouldOpen !== isOpen) header.click();
}
}"""
)
page.wait_for_timeout(200)
def shot(name: str, scroll_top: int) -> None:
page.evaluate("y => { const d = document.querySelector('#detail'); if (d) d.scrollTop = y; }", scroll_top)
page.wait_for_timeout(200)
page.screenshot(path=str(out_dir / name), full_page=False)
shot("codex-real-top.png", 0)
shot("codex-real-mid.png", 700)
shot("codex-real-deep.png", 1400)
image_count = page.evaluate(
"""() => {
const img = document.querySelector('#detail img');
if (!img) return 0;
img.scrollIntoView({ block: 'center', inline: 'nearest' });
return document.querySelectorAll('#detail img').length;
}"""
)
if image_count:
page.wait_for_timeout(200)
page.screenshot(path=str(out_dir / "codex-real-image.png"), full_page=False)
browser.close()
PY
Validate screenshots:
uv run python scripts/check_screenshots.py .agents/evidence/pr/codex-real-e2e
.jsonl traces and generated
.html viewers from claude-tap.POST /v1/responses entries with status 200 and
non-zero token usage.--image attachment.from playwright.sync_api import sync_playwright
import time, glob
html = glob.glob("/tmp/codex-e2e/trace_*.html")[-1]
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page(viewport={"width": 1440, "height": 1000})
page.goto(f"file://{html}")
page.wait_for_load_state("networkidle")
time.sleep(1)
# Select a Responses call (data-idx matches trace line index)
page.click('.sidebar-item[data-idx="1"]')
time.sleep(0.5)
# Collapse System Prompt, keep Messages open
page.evaluate("""() => {
const h = document.querySelectorAll('.section-header')[1];
const next = h.nextElementSibling;
if (next && getComputedStyle(next).display !== 'none') h.click();
}""")
# Scroll to Messages section
page.evaluate("""() => {
document.querySelectorAll('.section-header')[2]
.scrollIntoView({behavior: 'instant', block: 'start'});
}""")
time.sleep(0.3)
page.screenshot(path="/tmp/codex-e2e/messages.png")
browser.close()
.jsonl has ≥2 POST /v1/responses entriestrace_*.html)user message text (verifies #41 fix)| Symptom | Cause | Fix |
|---|---|---|
| WebSocket 502 then HTTP 401 | Default target api.openai.com rejects ChatGPT OAuth tokens | Use --tap-target https://chatgpt.com/backend-api/codex |
Missing scopes: api.responses.write | API key lacks Responses API access | Use OAuth (codex auth login) instead of OPENAI_API_KEY |
| Only 1 API call | Simple prompt completed in one round | Use a task requiring tool use (file reads, shell commands) |
OPENAI_BASE_URL is deprecated warning | Codex v0.115+ prefers config.toml | Harmless — proxy still works via env var |
codex exec session also calls GET /v1/models for model discovery.--dangerously-bypass-approvals-and-sandbox flag is required for non-interactive exec.