mit einem Klick
e2e-diagnose-and-fix
// Analyze a failing E2E test, determine root cause, and fix it using Playwright Test Agents and RHDH project conventions
// Analyze a failing E2E test, determine root cause, and fix it using Playwright Test Agents and RHDH project conventions
Workflow to backport Backstage changes into RHDH by syncing a downstream maintenance branch and generating yarn patches.
Workflow to backport Backstage changes into RHDH by syncing a downstream maintenance branch and generating yarn patches.
Deploy RHDH to an OpenShift cluster using local-run.sh for E2E test execution, with autonomous error recovery for deployment failures
Parse a Prow CI job URL or Jira ticket to extract E2E test failure details including test name, spec file, release branch, platform, and error messages
Run a specific failing E2E test against a deployed RHDH instance to confirm the failure and determine if it is consistent or flaky
Create a PR for an E2E test fix, trigger Qodo agentic review, address review comments, and monitor CI results
| name | e2e-diagnose-and-fix |
| description | Analyze a failing E2E test, determine root cause, and fix it using Playwright Test Agents and RHDH project conventions |
Analyze the root cause of a failing E2E test and implement a fix following RHDH project conventions.
Use this skill after reproducing a failure (via e2e-reproduce-failure) when you have confirmed the test fails and need to determine the root cause and implement a fix.
If the fix branch is based on a release branch (e.g., release-1.9), check whether the failing test was already fixed on main before proceeding with the healer:
git fetch upstream main
git log --oneline upstream/main -- <path-to-failing-spec-file> | head -10
If there are recent commits touching the failing spec file or its page objects, inspect them:
git log --oneline upstream/main -p -- <path-to-failing-spec-file> | head -100
If a fix exists on main, always cherry-pick it — this takes priority over running the healer:
git cherry-pick <commit-sha>
If the cherry-pick has conflicts, resolve them manually using the main commit as the source of truth and adapting to the release branch's code. Do not abandon the cherry-pick in favor of the healer — the fix on main is the authoritative solution.
After a successful cherry-pick (with or without conflict resolution), proceed to e2e-verify-fix. Only proceed to the healer below if no relevant fix exists on main, or if the cherry-picked fix doesn't resolve the issue on the release branch.
The Playwright healer agent MUST be used for ALL test failures, regardless of failure category. Do not attempt manual diagnosis without first running the healer. The healer can run the test, debug it step-by-step, inspect the live UI, generate correct locators, and edit the code — often resolving the issue end-to-end without manual intervention.
Note: The Playwright healer agent is currently supported in OpenCode and Claude Code only. In Cursor or other tools without Playwright agent support, skip the healer initialization and proceed directly to the "Failure Pattern Recognition" section below. Use manual diagnosis with direct test execution (
yarn playwright test ...) and headed/debug mode (--headed,--debug) for live UI inspection.
If not already initialized in this session, initialize the healer agent in e2e-tests/:
cd e2e-tests
# For OpenCode
npx playwright init-agents --loop=opencode
# For Claude Code
npx playwright init-agents --loop=claude
See https://playwright.dev/docs/test-agents for the full list of supported tools and options. The generated files are local tooling — do NOT commit them.
The healer agent needs a .env file in e2e-tests/ with all required environment variables (BASE_URL, K8S_CLUSTER_TOKEN, vault secrets, etc.). Generate it by passing the --env flag to local-test-setup.sh:
cd e2e-tests
source local-test-setup.sh <showcase|rbac> --env
The .env file is gitignored — never commit it. To regenerate (e.g. after token expiry), re-run the command above.
Invoke the healer agent via the Task tool with subagent_type: general:
Task: "You are the Playwright Test Healer agent. Run the failing test, debug it, inspect the UI, and fix the code.
Working directory: <path>/e2e-tests
Test: <spec-file> --project=any-test -g '<test-name>'
Run command: set -a && source .env && set +a && npx playwright test <spec-file> --project=any-test --retries=0 --workers=1 -g '<test-name>'"
The healer will autonomously:
After the healer has run, supplement with manual investigation only for:
Symptoms: Error: locator.click: Error: strict mode violation or Timeout waiting for selector or element not found errors.
Cause: The UI has changed and selectors no longer match.
Fix approach:
@playwright-test-healer) — it will replay the test, inspect the current UI via page snapshots, generate updated locators, and edit the code automaticallySymptoms: Test passes sometimes, fails sometimes. Errors like Timeout 10000ms exceeded or assertions failing on stale data.
Cause: Test acts before the UI is ready, or waits are insufficient.
Fix approach:
page.waitForTimeout() with proper waits: expect(locator).toBeVisible(), page.waitForLoadState()expect().toPass() with retry intervals for inherently async checks:
await expect(async () => {
const text = await page.locator('.count').textContent();
expect(Number(text)).toBeGreaterThan(0);
}).toPass({ intervals: [1000, 2000, 5000], timeout: 30_000 });
Common.waitForLoad() utility before interacting with the page after navigationSymptoms: expect(received).toBe(expected) with clearly different values.
Cause: The expected value has changed due to a product change, data change, or environment difference.
Fix approach:
test.fixme() (see the "Decision: Product Bug vs Test Issue" section below)Symptoms: Test fails because expected entities, users, or resources don't exist.
Cause: Test data assumptions no longer hold (GitHub repos deleted, Keycloak users changed, catalog entities removed).
Fix approach:
e2e-tests/playwright/support/test-data/ or e2e-tests/playwright/data/beforeAll/beforeEach and cleans up in afterAll/afterEachAPIHelper for programmatic setup (GitHub API, Backstage catalog API)Symptoms: Test passes on OCP but fails on GKE/AKS/EKS, or vice versa.
Cause: Platform differences (Routes vs Ingress, different auth, different network policies).
Fix approach:
import { skipIfJobName, skipIfIsOpenShift } from '../utils/helper';
// Skip on GKE
skipIfJobName(constants.GKE_JOBS);
// Skip on non-OpenShift
skipIfIsOpenShift('false');
process.env.IS_OPENSHIFT, process.env.CONTAINER_PLATFORMSymptoms: RHDH itself is broken (500 errors, missing plugins, wrong behavior).
Cause: ConfigMap or Helm values are incorrect for this test scenario.
Fix approach:
.ci/pipelines/resources/config_map/app-config-rhdh.yaml and app-config-rhdh-rbac.yaml.ci/pipelines/value_files/.ci/pipelines/resources/config_map/dynamic-plugins-config.yamlrhdh-operator and rhdh-chart repos for configuration reference (use Sourcebot, Context7, gh search code, or a local clone — whichever is available)The Playwright Test Agents are initialized via npx playwright init-agents --loop=opencode (see initialization section above). This creates an MCP server and agent definitions in e2e-tests/opencode.json.
The healer agent is the primary and mandatory tool for fixing failing tests. It has access to:
test_run: Run tests and identify failurestest_debug: Step through failing tests with the Playwright Inspectorbrowser_snapshot: Capture accessibility snapshots of the live UIbrowser_console_messages: Read browser console logsbrowser_network_requests: Monitor network requestsbrowser_generate_locator: Generate correct locators from the live UIedit/write: Edit test code directlyThe healer autonomously cycles through: run → debug → inspect → fix → re-run until the test passes.
Use @playwright-test-planner when you need to understand a complex user flow before fixing a test. It explores the app and maps out the interaction patterns.
Use @playwright-test-generator when a test needs major rework and you need to generate new test steps from a plan.
Every fix must follow Playwright best practices. Before writing or modifying test code, consult these resources in order:
Project rules (always available locally):
playwright-locators rule — locator priority, anti-patterns, assertions, Page Objects, DataGrid handlingci-e2e-testing rule — test structure, component annotations, project configuration, CI scriptsOfficial Playwright docs (fetch via Context7 if available, otherwise use web):
getByRole(), getByLabel(), getByPlaceholder() over CSS/XPath selectors. Never use MUI class names (.MuiButton-label, .MuiDataGrid-*).expect(locator).toBeVisible()) — never use manual waitForSelector() or waitForTimeout().*.spec.ts file must have a component annotation in test.beforeAll.Locator objects from page classes, not raw strings or elements.force: true: if a click requires force, the locator or timing is wrong — fix the root cause.waitForNetworkIdle(): use proper load-state waits or assertion-based waiting instead.When the issue is in RHDH deployment/config rather than test code, search the relevant repos using whichever tool is available. Try them in this order and use the first one that works:
gh search code: e.g. gh search code '<pattern>' --repo redhat-developer/rhdh-operatorredhat-developer/rhdh-operator)install-rhdh-catalog-source.sh)redhat-developer/rhdh-chart)test.fixme() is a last resort. You must be absolutely certain the failure is a product bug before marking a test this way. Follow this checklist:
test.fixme() — do not decide unilaterallyOnly after all of the above confirm a product bug:
RHDHBUGS project (or update the existing ticket) documenting the product regressiontest.fixme(), preceded by a // TODO: comment linking to the Jira ticket:
// TODO: https://redhat.atlassian.net/browse/RHDHBUGS-XXXX
test.fixme('Button no longer visible after version upgrade');
e2e-submit-and-review with the test.fixme() change