تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

e2e-failure-analyzer

Name: E2e Failure Analyzer
Author: posit-dev

// Analyze e2e test failures from a GitHub Actions run. Provide a run ID or URL to download reports, extract traces/screenshots/logs, identify root causes, and get suggested actions. Works with both posit-dev/positron and posit-dev/positron-builds repos.

تشغيل في Manus

$ git log --oneline --stat

stars:٤٬١٤٥

forks:١٥٩

updated:١٩ مايو ٢٠٢٦ في ٢٢:٥٣

مستكشف الملفات

8 ملفات

SKILL.md

readonly

related-skills.json

نفس المستودع

pr-test-checker.md

from "posit-dev/positron"

Grade whether a Positron PR has adequate test coverage. Evaluates new tests in the PR, checks existing coverage for changed source, and suggests concrete additions when coverage is insufficient. Used by the pr-test-checker GitHub Action.

2026-05-274.1k

author-vitest-tests.md

from "posit-dev/positron"

Use when writing, generating, or adding Vitest tests for Positron source code in src/vs/. Load this skill when analyzing a branch or PR for test coverage gaps, when the user asks to write tests using createTestContainer(), or when testing React components with RTL.

2026-05-194.1k

review-vitest-tests.md

from "posit-dev/positron"

Use when reviewing Vitest test files for quality, maintainability, and adherence to Positron's testing patterns. Load this skill when test files need a quality check against the builder pattern, RTL patterns, and project conventions. Not for e2e or Playwright tests.

2026-05-194.1k

positron-pull-vscode-server.md

from "posit-dev/positron"

Use when pulling changes from rstudio/vscode-server into Positron — fetching upstream, triaging commits, and applying diffs

2026-05-124.1k

positron-data-grid-pattern.md

from "posit-dev/positron"

Use when building any list, table, grid, or virtualized scrolling UI in Positron, or whenever you encounter `DataGridInstance`, `PositronDataGrid`, or files importing them. Critical for avoiding a common architectural mistake (wrapping the grid in a React component that mediates props into the instance via useEffect).

2026-05-064.1k

positron-pr-helper.md

from "posit-dev/positron"

Generates well-structured PR bodies with dynamically fetched e2e test tags

2026-05-054.1k

package.json

"author": "posit-dev"

"repository": "posit-dev/positron"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

محللو ضمان جودة البرمجيات والمختبرونمهن الحاسوب والرياضيات15-1253L4

name	e2e-failure-analyzer
description	Analyze e2e test failures from a GitHub Actions run. Provide a run ID or URL to download reports, extract traces/screenshots/logs, identify root causes, and get suggested actions. Works with both posit-dev/positron and posit-dev/positron-builds repos.
disable-model-invocation	true

E2E Failure Analyzer

Analyzes Playwright e2e test failures from a GitHub Actions run using JSON reports, trace files, screenshots, and test logs to identify root causes and suggest next actions.

When to Use

A CI run has failed and you want to understand why
Triaging e2e test failures from Test: Merge to branch, Test: Full Suite, or Positron Build: Daily Release
Investigating flaky tests from a specific run

Prerequisites

GitHub CLI (gh) authenticated
@playwright/test available via npx (for merging blob reports, positron repo only)

Helper Scripts

Scripts live alongside this skill in scripts/. Use the base directory path shown above when the skill loads (the "Base directory for this skill: ..." line) as $SKILL_DIR. Scripts require Node.js and are cross-platform (Windows via Git Bash, macOS, Linux). Scripts that extract from zip files require unzip to be available in PATH (included in Git Bash on Windows).

Consolidated scripts (preferred -- fewer tool calls)

e2e-gather-run-info.js - Gathers all run metadata, failed jobs, artifacts, non-e2e job log excerpts, and commit info in one call. Replaces multiple gh api invocations.
e2e-process-project.js - Path A. Processes a merged blob report project end-to-end: extracts failures, scans blobs, extracts/parses traces, extracts screenshots and error-context. Replaces multiple script + unzip invocations.
e2e-process-s3.js - Path B. Processes a CloudFront-hosted Playwright HTML report end-to-end: fetches index.html, decodes the embedded base64 report.zip, downloads trace + error-context attachments from S3, parses traces, and extracts screencast frames. Produces the same JSON shape as e2e-process-project.js so the downstream analyzer treats both paths identically.

Standalone scripts (used by consolidated scripts internally, or for ad-hoc debugging)

e2e-extract-failures.js - Extracts failures from a merged Playwright JSON report
e2e-parse-trace.js - Parses a trace.trace file into an action timeline with errors and last screenshot hash
e2e-inspect-blobs.js - Scans blob report zips to find failed test IDs and their trace/log resource hashes
e2e-query-history.js - Queries the e2e-test-insights API for historical test health data (requires E2E_INSIGHTS_API_KEY env var)

Input

Run ID or URL from either repo:

https://github.com/posit-dev/positron/actions/runs/23610137774
https://github.com/posit-dev/positron-builds/actions/runs/23938334846

Step 1: Gather Run Info (single script call)

The consolidated e2e-gather-run-info.js script handles everything: run metadata, failed jobs, blob report artifacts, non-e2e job log excerpts, and commit info.

node "$SKILL_DIR/scripts/e2e-gather-run-info.js" <RUN_URL>

Output JSON contains:

repo, runId - parsed from URL
run - metadata (name, conclusion, html_url, head_sha, branch)
failedJobs - array of {id, name, isE2e} for all failed jobs
nonE2eJobLogs - map of job ID to failure log excerpts (for non-e2e jobs)
artifacts - sorted list of blob report artifact names
projects - unique project names extracted from artifacts (e.g., e2e-chromium, e2e-windows)
commit - {message, author, files} for the head commit

Use projects to determine what to process:

If projects list is non-empty -> use Path A (positron repo flow) for each project
If empty -> use Path B (positron-builds flow)

The two repos have different data access patterns:

posit-dev/positron: Uses sharded blob reports uploaded as GitHub artifacts. Requires downloading and merging.
posit-dev/positron-builds: Non-sharded single-job runs. HTML reports uploaded to S3 at CloudFront. No blob report artifacts.

Path A: posit-dev/positron (Sharded Blob Reports)

A1+A2: Download, Merge, and Process (single script call)

The e2e-process-project.js script handles everything in one call: downloads blob report artifacts, copies shards into a merged directory, runs npx playwright merge-reports, then extracts failures, scans blobs, extracts/parses traces, and extracts screenshots. Use --cleanup to remove intermediate download/merge artifacts automatically.

For each project from Step 1, run:

node "$SKILL_DIR/scripts/e2e-process-project.js" \
  --download --run-id <RUN_ID> --repo <REPO> --project <PROJECT> \
  --output-dir /tmp/e2e-analysis-<PROJECT> --cleanup

If there are multiple projects, run them sequentially (each call uses npx internally).

Fallback: If blob reports were already downloaded and merged (e.g., for debugging), you can skip --download and pass the directories directly:

node "$SKILL_DIR/scripts/e2e-process-project.js" \
  /tmp/blob-merged-<PROJECT> /tmp/report-<PROJECT>.json \
  --output-dir /tmp/e2e-analysis-<PROJECT>

Output JSON contains:

outputDir - path where screenshots and error-context files were saved
failures - array of final failures (tests that failed all retries) with title, file, tags, suite, project, errors
failedTests - array of all failed test attempts (including those that passed on retry) with testId, title, file, status, blob
testDetails - array of per-test objects, each containing:
- testId, title, file, status, blob, attemptCount
- attempts - array of per-attempt objects with:
  - trace - parsed trace data: timeline (human-readable string), errors (array), screenshotShas (array of {sha1, timestamp} in chronological order), lastScreenshotSha1 (legacy: same as last entry of screenshotShas)
  - screenshotPaths - chronological array of paths to extracted screenshot JPEGs (view with Read tool); the last entry is the failure-state frame, earlier entries show the moments before it
  - screenshotPath - legacy alias pointing to the last entry of screenshotPaths
  - errorContextPath - path to extracted page snapshot markdown (view with Read tool if needed)
- logHashes - array of {resourceHash, blob} for logs (extract manually if needed)

IMPORTANT: View screenshots using the screenshotPaths arrays with the Read tool. You MUST Read all screenshots in a single message with multiple parallel Read tool calls -- this results in only one approval prompt instead of one per screenshot. View all attempts and all frames per attempt; comparing across retries reveals whether a failure is consistent or intermittent, and comparing the trailing frames within an attempt often shows where the test went wrong before the visible error. Screenshots are the most revealing evidence for diagnosing failures. Default frame count per attempt is 3 (configurable via --screenshots N on e2e-process-project.js).

View error context with the Read tool using errorContextPath paths if the screenshot and trace timeline are insufficient for diagnosis.

Path B: posit-dev/positron-builds (S3 HTML Reports)

Process the HTML report (single script call)

The e2e-process-s3.js script handles everything in one call: fetches the report's index.html, decodes the embedded base64 report.zip, walks failures + per-file detail JSONs, downloads trace and error-context attachments from S3, parses traces, and extracts trailing screencast frames.

For each failed e2e job from Step 1, resolve the job's REPORT_DIR from its logs (the workflow logs both an unresolved template line containing literal ${IDENTIFIER} / ${OS_SUFFIX} and the expanded value -- ignore the template), then run:

node "$SKILL_DIR/scripts/e2e-process-s3.js" \
  --report-url https://d38p2avprg8il3.cloudfront.net/<REPORT_DIR>/ \
  --output-dir /tmp/e2e-analysis-<JOB_LABEL> \
  --cleanup

For interactive / ad-hoc use, you can call the script directly with any CloudFront-hosted Playwright HTML report URL -- no run ID required.

Output JSON is identical to Path A's e2e-process-project.js (see the field list above), so the same screenshot-reading and analysis flow applies. The blob field is the report directory name (last path segment of the S3 URL) rather than a zip filename, since Path B has no blob zips.

IMPORTANT: View screenshots the same way as Path A -- Read all screenshotPaths arrays in a single message with multiple parallel Read tool calls, and view errorContextPath files when the screenshot and trace timeline aren't enough.

Step 6: Query Historical Test Health (optional)

If the E2E_INSIGHTS_API_KEY environment variable is set, query the e2e-test-insights dashboard for historical failure data. This step is optional -- if the API is unavailable, skip it and proceed with analysis.

The repo identifier for the API is always positron for both posit-dev/positron and posit-dev/positron-builds. Both repos run the same tests (positron-builds uses positron as a submodule) and test results are stored under the positron repo ID in the dashboard.

Option 1: Query by workflow run ID (preferred)

If the GitHub run ID is available, use --run-id to get history for all tests that failed or flaked in this run:

node "$SKILL_DIR/scripts/e2e-query-history.js" --repo positron --run-id <RUN_ID> --lookback-days 14 --branch <BRANCH>

The branch is important -- a test may be rare_flake on main but known_flaky on a release branch. Get the branch from the run metadata (Step 1) or the onProject event in blob reports. Common branches: main, release/YYYY.MM.

Option 2: Query by test keys

If the run isn't in the dashboard yet, construct test keys manually from extracted failures:

node "$SKILL_DIR/scripts/e2e-query-history.js" --repo positron \
  --test-keys "testName1|||specPath1,testName2|||specPath2" --lookback-days 14

Using the history in analysis

The response includes per-test data. Use it to enhance the analysis:

insight.type: "new" = first-time failure (likely regression), "recurring" / "known_flaky" = known pattern, "rare_flake" = infrequent
history.pass_rate: Low pass rate = known flaky test, 100% pass rate before this run = regression
failure_patterns: Compare today's error message against historical patterns -- same pattern = recurring, new pattern = potential regression even for known-flaky tests
insight.first_failure_sha / insight.timing_value: When the failures started -- useful for bisecting

Interpreting `environment_breakdown` -- look across environments

The environment_breakdown array is often more informative than the aggregate history stats. Always check per-environment pass rates before concluding a test is "flaky":

0% pass rate on one environment, 100% on others = deterministic regression on that platform, NOT flaky. Example: a test failing on all chromium runs but passing on all electron runs is a chromium-specific bug, even if the aggregate pass rate is 58%.
Low pass rate across all environments = genuinely flaky
Low pass rate on one environment only = platform-specific flakiness (e.g., "worse on win/electron")

When the breakdown reveals an environment-specific pattern, call it out explicitly:

"History: 100% failure on chromium (0/4 passed), 100% pass on electron (6/6) -- deterministic regression on chromium, not flaky"
"History: known flaky across all platforms, worst on win/electron (88% pass rate)"

History line format

Include a History line in each failure's analysis, e.g.:

"History: failed 4/18 runs (22%) over last 14 days, same error pattern -- known flaky"
"History: passed 15/15 runs over last 14 days -- new regression"
"History: 0% pass rate on chromium (10/10 failed since Apr 02), 100% on electron -- deterministic platform regression"
"History: no data available (API unreachable)"

Step 7: Analyze and Present Results

Using all the data gathered (failures, trace actions, screenshots, logs), analyze the failures. For each failure (or group of related failures), determine:

Root cause category - one of: flaky test, infrastructure issue, product regression, test environment issue, timeout, test logic bug
Brief explanation - 1-2 sentences on what likely went wrong, referencing specific evidence from traces/screenshots/logs
Suggested action - what a developer should do next

When analyzing, consider:

Multiple tests failing in the same file/suite likely share a root cause
timedOut status often indicates flakiness or infrastructure slowness
Errors mentioning "locator" or "expect" timeouts are usually test/product issues
Errors during app startup (e.g., waiting for workbench) are usually infrastructure
Check if the failing test has :soft-fail tag (known flaky)
The screenshot at failure is often the most revealing piece of evidence

Check the triggering commit

For tests that failed all retries (not just flaky), inspect the head commit using the commit field from Step 1's e2e-gather-run-info.js output (already includes message, author, and changed files).

Compare the changed files against:

The failing test file itself and its page objects/helpers -- changes here could introduce a test logic bug
Product source code exercised by the test -- changes to the feature under test could be a real product regression that the test correctly caught
Shared infrastructure (startup, layout, rendering) -- changes here could alter timing or behavior enough to surface a latent flaky test

If the commit touched relevant files, read the diff and assess causality. A commit that modifies notebook cell rendering is a plausible cause for a notebook cell-count assertion failure, even if the test has flaked before. Conversely, a commit that only changes R interpreter code is unlikely to cause a Python plot test failure.

Include a Commit line in the detailed analysis when the commit is relevant, e.g.:

"Commit: modified notebookCellList.ts (notebook cell rendering) -- plausible cause"
"Commit: no files related to this test's feature area -- unlikely cause"

Additional repo context

Also use context from the repo when helpful:

Read the failing test file to understand what it does
Check git log for recent changes to the test or related product code beyond the head commit
Search for related issues

Key log files to check:

window1/renderer.log - Main window renderer process logs
window1/exthost/exthost.log - Extension host logs
window1/exthost/positron.positron-supervisor/Python Kernel.log - Python kernel logs
window1/exthost/positron.positron-r/R Language Pack.log - R runtime logs
e2e-test-runner.log - Test runner output
main.log - Electron main process logs

For each failure, include the platform (OS and project/browser) where it occurred. This information comes from:

Path A: The project name (e.g., e2e-windows, e2e-electron, e2e-chromium) and the workflow name
Path B: The job name (e.g., "electron (macOS)", "electron (ubuntu)") and Playwright project in the test output (e.g., [e2e-macOS-ci])

When multiple projects/platforms are analyzed in a single run, note which platforms each failure occurred on and whether the same test passed on other platforms.

Present the analysis in a summary table that includes columns for: test name, platform, root cause category, and severity. In the severity column, clearly distinguish tests that failed all retries (hard failures) from tests that passed on retry (flaky). This distinction comes from comparing failures (final failures after all retries) vs failedTests (all attempts including those that recovered). Then provide detailed analysis for each failure below the table.

Include non-e2e job failures (unit tests, integration tests, build failures) in the summary table as well, with the job name as the test name and a brief description of the failure extracted from the job logs.

Offer to:

Open the relevant test files
Search for related recent changes
Create GitHub issues

Cleanup

Path A and Path B: If you used --cleanup with e2e-process-project.js / e2e-process-s3.js, the intermediate download/unzip dirs are already removed. Only the --output-dir remains (screenshots and error-context). Remove it with exact paths (no globs):

rm -rf /tmp/e2e-analysis-<PROJECT_OR_JOB_LABEL>

e2e-failure-analyzer

المزيد من هذا المستودع

المزيد من هذا المستودع

E2E Failure Analyzer

When to Use

Prerequisites

Helper Scripts

Consolidated scripts (preferred -- fewer tool calls)

Standalone scripts (used by consolidated scripts internally, or for ad-hoc debugging)

Input

Step 1: Gather Run Info (single script call)

Path A: posit-dev/positron (Sharded Blob Reports)

A1+A2: Download, Merge, and Process (single script call)

Path B: posit-dev/positron-builds (S3 HTML Reports)

Process the HTML report (single script call)

Step 6: Query Historical Test Health (optional)

Option 1: Query by workflow run ID (preferred)

Option 2: Query by test keys

Using the history in analysis

Interpreting environment_breakdown -- look across environments

History line format

Step 7: Analyze and Present Results

Check the triggering commit

Additional repo context

Cleanup

E2E Failure Analyzer

When to Use

Prerequisites

Helper Scripts

Consolidated scripts (preferred -- fewer tool calls)

Standalone scripts (used by consolidated scripts internally, or for ad-hoc debugging)

Input

Step 1: Gather Run Info (single script call)

Path A: posit-dev/positron (Sharded Blob Reports)

A1+A2: Download, Merge, and Process (single script call)

Path B: posit-dev/positron-builds (S3 HTML Reports)

Process the HTML report (single script call)

Step 6: Query Historical Test Health (optional)

Option 1: Query by workflow run ID (preferred)

Option 2: Query by test keys

Using the history in analysis

Interpreting environment_breakdown -- look across environments

History line format

Step 7: Analyze and Present Results

Check the triggering commit

Additional repo context

Cleanup

Interpreting `environment_breakdown` -- look across environments

Interpreting `environment_breakdown` -- look across environments