Run any Skill in Manus with one click

Get Started

$pwd:

woo-ai-smoke

Name: Woo Ai Smoke
Author: woocommerce

// Run the Android AI Assistant headless smoke regression harness without launching UI.

Run Skill in Manus

$ git log --oneline --stat

stars:312

forks:146

updated:May 22, 2026 at 14:07

SKILL.md

readonly

related-skills.json

same repository

ui-review.md

from "woocommerce/woocommerce-android"

Generate screenshot coverage for screen-level Compose previews from the current diff or a specific target, check required visual and data variations, and produce a compact visual report

2026-05-01312

verify-on-device.md

from "woocommerce/woocommerce-android"

Build, install, and visually verify the app on an Android emulator or device. Uses the Android CLI for agents (android) when available with a full mobile-mcp/adb fallback.

2026-04-28312

pr.md

from "woocommerce/woocommerce-android"

Create a pull request following project conventions. Triggers on any request to create, open, make, submit, file, send, push, spin up, put up, draft, raise, or prepare a PR/pull request.

2026-04-21312

pos-analytics.md

from "woocommerce/woocommerce-android"

POS analytics tracking patterns (WooPosAnalyticsEvent sealed class, WooPosAnalyticsTracker, WooPosAnalyticsEventConstant). Use when writing, editing, exploring, debugging, or reviewing analytics tracking in POS (WooPos*) code. NOT for main store app — use the `store-analytics` skill instead.

2026-04-10312

pos-tests.md

from "woocommerce/woocommerce-android"

POS unit testing patterns (WooPosCoroutineTestRule, runTest, advanceUntilIdle, mockito-kotlin, event bus mocking, analytics verification). Use when writing, editing, exploring, debugging, or reviewing unit tests for POS (WooPos*) code. NOT for main store app tests — use the `store-tests` skill instead.

2026-04-10312

review.md

from "woocommerce/woocommerce-android"

Review code changes against project architecture, style, and conventions

2026-04-10312

package.json

"author": "woocommerce"

"repository": "woocommerce/woocommerce-android"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software Quality Assurance Analysts and TestersComputer and Mathematical Occupations15-1253L4

name	woo-ai-smoke
description	Run the Android AI Assistant headless smoke regression harness without launching UI.

Woo AI Smoke

Default Live Command

The live suite mirrors the iOS /woo-ai-smoke scenario list from woocommerce/woocommerce-ios#17016: 25 scripted scenarios. The Android Robolectric test produces the trace artifacts and enforces the deterministic baseline gate.

If ~/.woo-ai-smoke/store.env does not exist, create it with these keys and stop so the developer can fill it in outside the repo:

WOO_SITE_URL=
WOO_WPCOM_USERNAME=
WOO_WPCOM_PASSWORD=

The target store must be Jetpack-connected and connected to the same WordPress.com account used by WOO_WPCOM_USERNAME.

WOO_WPCOM_PASSWORD may be a WordPress.com Application Password when the account requires 2FA. The smoke harness does not implement an interactive 2FA challenge.

Live chat routes through the WPCOM wrapper endpoint /wpcom/v2/woo-mobile-ai/chat/completions with a WordPress.com OAuth bearer. Store tools still target WOO_SITE_URL through the WPCOM REST / Jetpack-connected path.

Never print the file contents, expanded env, WPCOM username, WPCOM password/Application Passwords, WPCOM bearer tokens, cookies, or raw credential config.

while IFS='=' read -r key value; do
  case "$key" in
    WOO_SITE_URL|WOO_WPCOM_USERNAME|WOO_WPCOM_PASSWORD) export "$key=$value" ;;
  esac
done < "$HOME/.woo-ai-smoke/store.env"
./gradlew -PwooAiSmokeRunLive=true :libs:ai-assistant:feature:testDebugUnitTest \
    --tests "*.WooAiSmokeLiveRobolectricTest"

Optional focused/debug controls:

WOO_AI_SMOKE_SCENARIO_ID=orders_with_email WOO_AI_SMOKE_SAMPLES=3 \
  ./gradlew -PwooAiSmokeRunLive=true :libs:ai-assistant:feature:testDebugUnitTest \
    --tests "*.WooAiSmokeLiveRobolectricTest"

WOO_AI_SMOKE_SCENARIO_ID supports a comma-separated list for the check test entrypoint. Approval uses the separate WooAiSmokeLiveRobolectricApprovalTest entrypoint, must run the full suite, and rejects scenario filters. WOO_AI_SMOKE_SAMPLES supports 1..3. In check runs, primary scenario status and JUnit failure use sample 1. Baseline comparison also uses sample 1 unless the checked-in baseline contains an approved sampleExpectation or knownFailure. sampleExpectation checks compare the sampled classification and requested sample count; approved knownFailure checks compare every failing sample's failed hard-check set against knownFailure.expectedFailedHardChecks. Approved FLAKY is a sampled-run tolerance for acceptable scenario-specific variability: sampled FLAKY remains non-blocking only while global guards still pass, sampled PASS asks for a baseline refresh, and single-sample FAIL is blocking because one sample cannot prove flakiness.

Artifacts are written to:

libs/ai-assistant/feature/build/outputs/woo-ai-smoke/live/latest

After the run, always read run.json, turns.jsonl, and baseline-comparison.json from that directory and include a scenario recap plus an iOS-style Rubric table in the final response. The recap must show every scenario, the run result, sampled classification when present, and the comparison against the checked-in baseline. Do not paste raw turns.jsonl, credentials, WPCOM bearer tokens, cookies, or expanded environment values.

KNOWN_FAILURE in the baseline column is an accepted, explicitly documented live failure; include it in the recap instead of converting it to PASS. KNOWN_FAILURE_FIXED is non-blocking but means the baseline exception should be removed after review. Any REGRESSION, NEW, or MISSING status still needs triage.

Every scenario also has global guards for no FAILED outcome, no turn errors, and non-blank assistant text. Empty/error outputs should never be treated as passing just because negative checks passed.

Use this helper when the artifact files exist:

RUN_DIR="libs/ai-assistant/feature/build/outputs/woo-ai-smoke/live/latest"
jq -r --slurpfile comparison "$RUN_DIR/baseline-comparison.json" '
  def tool_summary($scenario):
    [
      $scenario.result.turns[]
      | .toolCalls[]
      | "\(.name)(\(.resultKind))"
    ] | if length == 0 then "none" else join(", ") end;
  def outcomes($scenario):
    [$scenario.result.turns[].outcome] | unique | join("/");
  def sampled($scenario):
    if $scenario.sampleSummary == null then "n/a"
    else "\($scenario.sampleSummary.classification) (PASS=\($scenario.sampleSummary.passCount) FAIL=\($scenario.sampleSummary.failCount))"
    end;
  ($comparison[0].scenarioStatuses
    | map({ key: .scenarioId, value: { status: .status, message: .message } })
    | from_entries) as $baseline
  | "| Scenario | Category | Result | Sampled | Baseline | Outcome | Tools |",
    "| --- | --- | --- | --- | --- | --- | --- |",
    (.scenarios[] |
      ($baseline[.scenarioId] // { status: "MISSING", message: "No baseline comparison." }) as $b
      | "| \(.scenarioId) | \(.category) | \(.status) | \(sampled(.)) | \($b.status): \($b.message) | \(outcomes(.)) | \(tool_summary(.)) |"
    )
' "$RUN_DIR/run.json"

If the Gradle command fails before artifacts are written, say that no scenario recap is available and include the failure reason instead.

Live Baseline Approval

while IFS='=' read -r key value; do
  case "$key" in
    WOO_SITE_URL|WOO_WPCOM_USERNAME|WOO_WPCOM_PASSWORD) export "$key=$value" ;;
  esac
done < "$HOME/.woo-ai-smoke/store.env"
WOO_AI_SMOKE_SAMPLES=3 \
  ./gradlew -PwooAiSmokeRunLive=true :libs:ai-assistant:feature:testDebugUnitTest \
    --tests "*.WooAiSmokeLiveRobolectricApprovalTest"

Approval mode accepts WOO_AI_SMOKE_SAMPLES=1..3 and still rejects scenario filters. A sampled approval writes a sampleExpectation for all-pass and mixed pass/fail scenarios: all-pass samples approve PASS, mixed pass/fail samples approve FLAKY, and all-fail samples are rejected unless an existing knownFailure is being preserved because every failing sample has the same expected failed hard-check set. Preserved known-failure approvals do not write sampleExpectation. Approved FLAKY is separate from knownFailure; it does not accept failed outcomes, turn errors, or blank assistant responses.

If live auth fails with a 2FA-required message, tell the operator to use a WordPress.com Application Password as WOO_WPCOM_PASSWORD. If site resolution fails, verify the target store is connected to the same WordPress.com account and is Jetpack-connected.

After reviewer inspection:

cp \
  libs/ai-assistant/feature/build/outputs/woo-ai-smoke/live/latest/approved-live-baseline.json \
  libs/ai-assistant/feature/src/testDebug/resources/woo-ai-smoke/live-baseline.json

After an approval run, print the same scenario recap table from libs/ai-assistant/feature/build/outputs/woo-ai-smoke/live/latest. Also state whether approved-live-baseline.json was produced. Approval can preserve an existing knownFailure entry only when every failing sample still has the same expected failed hard-check set, but new failures must not be added by hand without a reason and expected failed hard checks. If a scenario is intentionally flaky, approve it with sampled approval so the checked-in baseline records the FLAKY sample expectation instead of hiding it as a known failure.

Support/Unit Coverage

./gradlew :libs:ai-assistant:feature:testDebugUnitTest --tests "*.WooAiSmokeDeterministicSupportTest"

Deterministic support tests validate harness wiring only. They are not accepted primary smoke evidence and must not be used to approve the live baseline. They do not use a deterministic baseline; fake-chat/fake-tool failures fail directly.

Rubric

Do not make Gradle, CI, or the Kotlin baseline comparison depend on a model judge. After artifacts exist, the final response must include a separate Rubric section based only on redacted artifacts: run.json, turns.jsonl, and baseline-comparison.json.

The report must clearly separate:

Deterministic gate: scenario status, failed hard checks, and baseline comparison. This is the merge-blocking result.
Rubric: iOS-style 0/1/2 scoring from traces and scenario intent. These scores are reviewer guidance and are not the Kotlin/JUnit gate.

Score each scenario, or each turn when a scenario has materially different turn outcomes, using:

2: correct / well-grounded / appropriate / recovered or no recovery needed.
1: partially correct or minor issue that reviewers should inspect.
0: incorrect, unsupported, wrong tool/safety behavior, or failed recovery.

Use these dimensions:

Correct: answers the merchant's request and follows scenario-specific requirements.
Grounded: user-facing claims are supported by tool traces or explicit tool failures.
Tools: tool choices, safety behavior, and write-confirmation handling fit the scenario.
Recovery: handles errors, declines, empty results, and clarification needs appropriately.

The Rubric table must include deterministic status in the same row so reviewers get one cohesive report:

| Scenario | Gate | Sampled | Baseline | Correct | Grounded | Tools | Recovery | Notes |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| orders_with_email | PASS | n/a | PASS | 2 | 2 | 2 | 2 | Email appears only when supported by the orders tool result. |

Gate is the primary scenario status from run.json. Sampled is PASS, FAIL, FLAKY, or n/a from sampleSummary when present. Baseline is the scenario status from baseline-comparison.json. Notes should be short and should call out failed hard checks, baseline regressions, sampled flakiness, tool mismatch, unsupported claims, or recovery concerns.

For the spanish scenario, Android keeps the same hard-check floor as iOS: turn 1 contains pedido|pedidos, and turn 2 contains ayer. The rubric must flag user-facing English or mixed-language replies as a Correct issue because the scenario expects Spanish throughout. Do not add deterministic negative English substring checks for this; full-language review belongs in the rubric.

woo-ai-smoke

More from this repository

More from this repository

Woo AI Smoke

Default Live Command

Live Baseline Approval

Support/Unit Coverage

Rubric

Woo AI Smoke

Default Live Command

Live Baseline Approval

Support/Unit Coverage

Rubric