Run any Skill in Manus with one click

$pwd:

ai-rpg-acceptance-verification

Name: Ai Rpg Acceptance Verification
Author: Zenoffice-co-ltd

// Use when the task is to validate release readiness, acceptance, publish readiness, smoke checks, or end-to-end evidence for this repository. Do not use for isolated feature implementation unless the task explicitly asks for verification or release evidence.

Run Skill in Manus

$ git log --oneline --stat

stars:0

forks:0

updated:May 6, 2026 at 10:51

File Explorer

2 files

SKILL.md

readonly

package.json

"author": "Zenoffice-co-ltd"

"repository": "Zenoffice-co-ltd/AI_RPG"

View GitHub Repository

$ install --globalskills.sh

$ download --local

Run Skill in Manus

[HINT] Download the complete skill directory including SKILL.md and all related files

Run any Skill with one click

name	ai-rpg-acceptance-verification
description	Use when the task is to validate release readiness, acceptance, publish readiness, smoke checks, or end-to-end evidence for this repository. Do not use for isolated feature implementation unless the task explicitly asks for verification or release evidence.

AI RPG Acceptance Verification

Use this skill when the job is to prove that the repo is shippable.

Canonical Sources

README.md
docs/OPERATIONS.md
docs/DELIVERY_STATUS.md

Default Workflow

Start from the narrowest preflight or targeted command that matches the task.
If release readiness is the goal, finish with pnpm verify:acceptance.
If the canonical acceptance command fails, identify whether the blocker is:
- missing runtime input
- vendor readiness
- local app startup
- actual product regression
For a blocker outside the touched scenario or package, run the narrow targeted command enough times to distinguish deterministic failure from a one-off vendor judge result.
When a legacy scenario fails during a new-scenario task, compare the relevant generated scenario/assets and live test definition before calling it a regression. If needed, use a temporary clean worktree at the pre-task baseline to establish causality.
Record concrete evidence, not just that scripts exist.

Grok Voice v2.1 PR58+ Release DOD

Use this subsection when validating, deploying, or closing follow-up work for Grok Voice v2.1 on the Adecco manufacturer scenario.

Completion is not "PR merged" alone. Treat the release as done only after:

PR is merged to main and the merge commit is known.
App Hosting backend adecco-roleplay is deployed to project adecco-mendan.
Production smoke passes against the hosted URL and reports the expected promptVersion, guardrailVersion, model, voice, and VAD values.
Scenario E2E full regression passes: corepack pnpm exec tsx scripts/grok-voice-v21-scenario-e2e.ts --rounds 2 --critical-rounds 3.
New numeric/condition correction cases case19 through case24 pass in either the full run or a focused run.
Results are recorded on the PR as a follow-up comment, including the E2E evidence directory under out/grok_voice_v21_e2e/<timestamp>/.

Grok Voice v2.1 scope locks:

VAD A/B is excluded unless explicitly requested.
Do not change threshold, silence_duration_ms, or prefix_padding_ms.
Do not change model, voice, or scenario facts.
Do not relax existing PR57 E2E expectations to hide regressions.

Voice E2E gate:

scripts/grok-voice-v21-voice-e2e.ts --limit 5 is currently a harness gate: executable, evidence saved, clear pass/fail.
Do not claim it is a 5/5 quality PASS gate unless that has been explicitly promoted in the task. STT drift such as 施工日→施工費 or 単価→短歌 should be reported separately from harness breakage.

xAI realtime failures:

429 can be balance-related or transient rate limiting. After top-up, wait a short interval and rerun a focused case before retrying the full regression.
If a run fails only from 429, do not call it a product regression.

Windows smoke note:

corepack pnpm exec tsx scripts/grok-voice-v21-prod-smoke.mjs can print PASS and then exit non-zero on Windows due to a Node handle assertion. Re-run the .mjs directly with node scripts/grok-voice-v21-prod-smoke.mjs and use the direct node exit code as the smoke result.

Grok Voice audio-fix closure gate:

Before calling a Grok Voice audio PR merge-ready, check active PR review threads with GraphQL and resolve any non-outdated P1/P0 thread. A green browser smoke does not override an unresolved race-condition review thread.
For locked-response audio fixes, include browser WebAudio evidence from the production route, not only API responses. Minimum evidence is greeting.playback.completed, locked_response.playback.completed, turn.completed with lockedResponse=true, audioBytes > 0, error=null, and audio.queue.flushed absent except for barge_in or locked_response_preempt_realtime.
For voice locked-response races, unit coverage must prove that late response.created / audio delta / response.done after deterministic TTS is cancelled or discarded and does not emit a second turn.completed or no_audio metric.
After the final code commit, redeploy App Hosting and rerun at least: node scripts/grok-voice-v21-prod-smoke.mjs, one production browser locked-response smoke, and node scripts/grok-voice-v21-prod-logs.mjs --session <sessionId> for that browser session.

Representative Commands

pnpm verify:acceptance -- --preflight
pnpm bootstrap:vendors
pnpm smoke:eleven
pnpm smoke:liveavatar
pnpm verify:acceptance

Orb UI Evidence

For Adecco Orb web UI changes, prefer targeted evidence before broader gates:

pnpm --filter @top-performer/web exec eslint components/roleplay lib/roleplay --ext .ts,.tsx --ignore-pattern '**/*.test.ts' --ignore-pattern '**/*.test.tsx' --no-error-on-unmatched-pattern
pnpm --filter @top-performer/web test:e2e
pnpm --filter @top-performer/web test:visual
pnpm --filter @top-performer/web build

Use /demo/adecco-orb?fakeLive=1 to prove event-driven transcript behavior without external voice network calls.
Use /demo/adecco-orb?mock=1&visualTest=1 only for deterministic visual regression.
Record live browser and microphone smoke evidence in docs/qa.md; if it is not run, report 実装済み・live未検証.
When root lint/typecheck fails from unrelated repo-wide blockers, capture the exact blocker and keep targeted evidence for the touched Orb files.

Lint Baseline Lock (added 2026-04-26)

pnpm lint has a 162-error baseline rooted in pre-existing files (accountingArtifacts.ts, benchmarkRenderer.ts, compileAccountingScenario.ts, phase34.ts, voiceProfiles.ts). The baseline is captured at docs/lint-baseline.json with per-file error counts and rule-id breakdown.

Rule for any PR:

All files modified or created by the PR must produce zero new lint errors. Confirm by filtering pnpm lint output to those file paths.
Per-file error counts in docs/lint-baseline.json must NOT increase. If a refactor reduces the count, lower the number in the same PR — do not silently grow the baseline.
Files NOT listed in docs/lint-baseline.json are at zero and must remain at zero. Adding a new file that triggers any lint error blocks release.

Reporting template for the PR:

Lint baseline check:
  - Total errors: 162 (unchanged from baseline)
  - New files (zero errors): <list>
  - Modified files (zero new errors): <list>
  - Files with reduced error count: <list> (baseline updated)

DoD G §6.2 Legacy Acceptance Exception

When pnpm verify:acceptance (full) fails ONLY on the legacy staffing_order_hearing_busy_manager_medium::no-coaching ConvAI judge, the failure is treated as a documented baseline blocker (vendor judge flake observed since 2026-04-19) and is out of scope for any new-scenario PR.

The exception requires ALL of the following to hold before applying:

The new scenario's own pnpm publish:scenario PASSED (vendor smoke 8/8 if using the split, else 100% pass).
The new scenario's snapshot has passed=true and binding != null.
The new scenario's voice mirror (if any) is verified equal to its source profile.
The new scenario's post-publish SAP/ERP/AP grep is clean.
pnpm smoke:eleven passes for the new scenario (retry up to 3 within a single operator session is allowed for vendor flake; do not silently retry forever).
The verify:acceptance failure stack trace shows the exact legacy scenario name staffing_order_hearing_busy_manager_medium::no-coaching and nothing else from the new scenario.
The PR description and docs/OPERATIONS.md Latest execution log explicitly cite the exception.

If any of (1)–(6) fail, the exception does NOT apply and the PR must hold release until the legacy blocker is resolved or the new scenario's own gate is fixed.

Vendor Judge Flake Retry Policy

For ConvAI / smoke:eleven failures that look like vendor judge variance (different test failing across runs of the same prompt):

Retry up to 3 times within a single operator session before treating it as a deterministic failure.
If all 3 retries fail with the same single test name, treat it as a deterministic failure on that test.
If retries fail with different test names, treat it as vendor judge non-determinism and escalate to ai-rpg-convai-vendor-smoke-split for redesign.
Never retry-loop more than 3 times in CI — the failure must be addressed in code or in test design, not by retry.

Guardrails

Do not claim acceptance is done unless the canonical gate passed or you explicitly document the remaining blocker.
For publish-facing work, include the exact scenario or profile that was exercised.
If a local server is involved, prefer a fresh process over reusing stale output.
If verify:acceptance remains blocked, add or update docs/OPERATIONS.md Known issues / Follow-up Backlog with status, scope, owner placeholder, and acceptance criteria.
Do not invoke DoD G §6.2 for any failure that is NOT scoped to staffing_order_hearing_busy_manager_medium::no-coaching. Other legacy failures must be triaged separately — they are not pre-approved for exception.

name	ai-rpg-acceptance-verification
description	Use when the task is to validate release readiness, acceptance, publish readiness, smoke checks, or end-to-end evidence for this repository. Do not use for isolated feature implementation unless the task explicitly asks for verification or release evidence.