Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

job-failure-analyzer

Sterne39

Forks26

Aktualisiert26. Mai 2026 um 11:17

Extract and analyze failure events from AAP jobs to classify errors and reconstruct failure timelines. Use when: - "Job #X failed", "Why did the execution fail?" - "Analyze the failed job", "What went wrong?" - "Show me the failure details" NOT for: host fact correlation (use host-fact-inspector) or resolution recommendations (use resolution-advisor).

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

RHEcosystemAppEng

RHEcosystemAppEng/agentic-collections

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

Datei-Explorer

5 Dateien

SKILL.md

readonly

name	job-failure-analyzer
description	Extract and analyze failure events from AAP jobs to classify errors and reconstruct failure timelines. Use when: - "Job #X failed", "Why did the execution fail?" - "Analyze the failed job", "What went wrong?" - "Show me the failure details" NOT for: host fact correlation (use host-fact-inspector) or resolution recommendations (use resolution-advisor).
model	inherit
color	yellow
license	Apache-2.0
allowed-tools	jobs_retrieve jobs_job_events_list jobs_job_host_summaries_list jobs_stdout_retrieve

Job Failure Analyzer

Prerequisites

Required MCP Servers:

aap-mcp-job-management - Job details, events, host summaries, stdout

Verification: Run the aap-mcp-validator skill with aap-mcp-job-management before proceeding.

When to Use This Skill

Use this skill when:

User reports a failed job and wants analysis
As the first step in the forensic-troubleshooter workflow
User asks to understand what went wrong with a job

Do NOT use when:

User wants to execute a job (use execution-risk-analyzer + governed-job-launcher)
User wants host fact correlation (use host-fact-inspector after this skill)
User wants resolution recommendations (use resolution-advisor after this skill)

Workflow

Step 1: Consult Troubleshooting Documentation

CRITICAL: Document consultation MUST happen BEFORE any MCP tool invocations.

Document Consultation (REQUIRED - Execute FIRST):

Action: Read job-troubleshooting.md using the Read tool to understand event extraction, failure patterns, host summary interpretation, and root cause classification
Output to user: "I consulted job-troubleshooting.md which references Red Hat's AAP 2.6 Troubleshooting Guide for failure analysis patterns."

Step 2: Retrieve Job Status

MCP Tool: jobs_retrieve (from aap-mcp-job-management) Parameters:

id: "<job_id>"

Extract: status, failed, job_type, elapsed, launch_type.

Per job-troubleshooting.md, the status determines the analysis path:

failed → Analyze events for runner_on_failed
error → Platform-level failure (check capacity, EE, credentials)
canceled → Check timeout or manual cancellation

Step 3: Extract Failure Events

MCP Tool: jobs_job_events_list (from aap-mcp-job-management) Parameters:

id: "<job_id>"
page_size: 100

Filter for failure-related events:

runner_on_failed -- task failures (PRIMARY)
runner_on_unreachable -- host connectivity failures (PRIMARY)
playbook_on_stats -- final summary

From each failure event, extract:

host: which host failed
task: which task failed
event_data.res.msg: error message
event_data.task_action: Ansible module
counter: sequence number for timeline

Step 4: Retrieve Host Summaries

MCP Tool: jobs_job_host_summaries_list (from aap-mcp-job-management) Parameters:

id: "<job_id>"

Map each host's ok, changed, failures, dark, skipped counts.

Step 5: Classify the Failure

Apply the classification from job-troubleshooting.md:

dark > 0	failures > 0	Classification
Yes	No	Platform issue: Host connectivity
No	Yes	Code/Config issue: Task failure
Yes	Yes	Mixed: Both connectivity and task issues

For each runner_on_failed event, match against the failure patterns in job-troubleshooting.md:

Pattern 1: Host Unreachable
Pattern 2: Module Failure (Package Operations)
Pattern 3: Privilege Escalation Timeout
Pattern 4: Service Start Failure
Pattern 5: Template Rendering Error
Pattern 6: Execution Environment Issue

Step 6: Reconstruct Failure Timeline

Sort events by counter and produce a chronological failure narrative:

First failure event (root cause candidate)
Subsequent failures (cascade effects)
Final stats (scope of impact)

Step 7: Generate Failure Analysis Report

Output format (per job-troubleshooting.md template):

## Job Failure Analysis: Job #[job_id]

**Job Status**: [status]
**Elapsed Time**: [elapsed]s
**Launch Type**: [launch_type]

### Failure Timeline

1. [counter] - Task "[task_name]" on host "[hostname]": [event_type]
   Error: "[error_message]"
   Module: [module_name]
2. [subsequent failure events]

### Host Summary

| Host | OK | Changed | Failed | Unreachable |
|---|---|---|---|---|
| [host1] | [ok] | [changed] | [failures] | [dark] |

### Preliminary Classification

**Type**: [Platform / Code / Configuration] Issue
**Pattern Match**: [Pattern name from failure patterns reference]
**Evidence**: Per Red Hat's Troubleshooting Guide: "[relevant guidance from job-troubleshooting.md]"
**Root Cause Candidate**: [first failure event analysis]

### Next Steps

- Host fact correlation recommended: [yes/no, with affected hostnames]
- Resolution advisory recommended: [yes/no, with error pattern]

Dependencies

Required MCP Servers

aap-mcp-job-management - Job data and events

Required MCP Tools

jobs_retrieve (from job-management) - Job status
jobs_job_events_list (from job-management) - Event stream
jobs_job_host_summaries_list (from job-management) - Per-host summary
jobs_stdout_retrieve (from job-management) - Full stdout (supplementary)

Related Skills

aap-mcp-validator - Prerequisite validation
host-fact-inspector - Next step: correlate with host facts
resolution-advisor - Next step: get resolution recommendations
execution-summary - Audit trail

Reference Documentation

job-troubleshooting.md - Event parsing and failure patterns

Example Usage

User: "Job #4451 failed halfway through. Analyze the logs."

Agent:

Reads job-troubleshooting.md
Reports: "I consulted job-troubleshooting.md which references Red Hat's AAP 2.6 Troubleshooting Guide."
Retrieves job #4451 → status: failed
Extracts events → finds runner_on_failed on task "Install security package" with ansible.builtin.dnf, msg: "No package matching 'nonexistent-package'"
Retrieves host summaries → 1 host with failures=1
Classifies: Code Error (Pattern 2: Module Failure - Package Operations)
Reports structured analysis with timeline, classification, and next steps

Mehr aus diesem Repository

gleiches Repository

cve-impact

RHEcosystemAppEng/agentic-collections

**CRITICAL**: Use for ALL CVE discovery and listing. DO NOT call get_cves directly. Use when: "show critical CVEs", "CVEs on hostname X", "remediatable vulnerabilities", "impact of CVE-X", risk assessment. NOT for remediation (use `/remediation`). System-level: FIRST reply = pagination prompt (Step -1). Parsing: references/01-cve-response-parser.py.

2026-06-2339

fleet-inventory

RHEcosystemAppEng/agentic-collections

Query and display Red Hat Lightspeed managed system inventory. This skill focuses on discovery and listing only - for remediation actions, transition to the `/remediation` skill. Use when: - "Show the managed fleet" - "List all systems registered in Lightspeed" - "What systems are affected by CVE-X?" - "How many RHEL 8 systems do we have?" - "Show me production systems" **When NOT to use this skill** (use `/remediation` skill instead): - "Remediate CVE-X on these systems" - "Create a playbook for..." - "Patch system Y" This skill orchestrates MCP tools from lightspeed-mcp for fleet visibility and system inventory management.

2026-06-2339

mcp-lightspeed-validator

RHEcosystemAppEng/agentic-collections

Validate Red Hat Lightspeed MCP server connectivity. Use when the user asks to "validate Lightspeed MCP", "check Lightspeed connection", or when other skills need to verify lightspeed-mcp availability before CVE operations.

2026-06-2239

agentic-contribution-skill

RHEcosystemAppEng/agentic-collections

Interactive skill creation and import with automated validation and marketplace compliance. Use when: - "Create a new skill" - "Import an existing skill" - "Create a new agentic pack" - "Add skill to <pack>" - "Build skill for <rh-product>" - User mentions "skill builder", "contribute", "new skill", "import skill", or "new pack" Two modes: create from scratch or import existing SKILL.md. Guides through discovery, definition, generation, and validation. Enforces SKILL_DESIGN_PRINCIPLES.md and agentskills.io spec.

2026-06-1639

collection-compliance

RHEcosystemAppEng/agentic-collections

Diagnose and fix `.catalog/` validation failures (schema, roster, banners, sample workflows, JSON mirror). Use when: - `make validate` or CI reports collection compliance errors - A PR adds skills but catalog was not updated - `collection.json` is out of sync with `collection.yaml` - Catalog metadata/fragments might have drifted from README/CLAUDE/SKILL golden sources Remediation is via the create-collection workflow and `catalog_yaml_to_json.py`—not by weakening checks.

2026-06-1639

create-collection

RHEcosystemAppEng/agentic-collections

Author or refresh `<pack>/.catalog/collection.yaml` and related `.catalog/` artifacts from golden sources (SKILL.md, README, AGENTS.md, Lola marketplace). Use when: - Adding a new pack or refreshing the collection catalog for GitHub Pages / tooling - Aligning catalog narrative, sample workflows, and decision guide with skills on disk - Preparing a PR after changing skills or marketplace metadata Outputs only under `<pack>/.catalog/` (never overwrite README, SKILL, CLAUDE, or marketplace YAML).

2026-06-1639

name	job-failure-analyzer
description	Extract and analyze failure events from AAP jobs to classify errors and reconstruct failure timelines. Use when: - "Job #X failed", "Why did the execution fail?" - "Analyze the failed job", "What went wrong?" - "Show me the failure details" NOT for: host fact correlation (use host-fact-inspector) or resolution recommendations (use resolution-advisor).
model	inherit
color	yellow
license	Apache-2.0
allowed-tools	jobs_retrieve jobs_job_events_list jobs_job_host_summaries_list jobs_stdout_retrieve

Job Failure Analyzer

Prerequisites

Required MCP Servers:

aap-mcp-job-management - Job details, events, host summaries, stdout

Verification: Run the aap-mcp-validator skill with aap-mcp-job-management before proceeding.

When to Use This Skill

Use this skill when:

User reports a failed job and wants analysis
As the first step in the forensic-troubleshooter workflow
User asks to understand what went wrong with a job

Do NOT use when:

User wants to execute a job (use execution-risk-analyzer + governed-job-launcher)
User wants host fact correlation (use host-fact-inspector after this skill)
User wants resolution recommendations (use resolution-advisor after this skill)

Workflow

Step 1: Consult Troubleshooting Documentation

CRITICAL: Document consultation MUST happen BEFORE any MCP tool invocations.

Document Consultation (REQUIRED - Execute FIRST):

Action: Read job-troubleshooting.md using the Read tool to understand event extraction, failure patterns, host summary interpretation, and root cause classification
Output to user: "I consulted job-troubleshooting.md which references Red Hat's AAP 2.6 Troubleshooting Guide for failure analysis patterns."

Step 2: Retrieve Job Status

MCP Tool: jobs_retrieve (from aap-mcp-job-management) Parameters:

id: "<job_id>"

Extract: status, failed, job_type, elapsed, launch_type.

Per job-troubleshooting.md, the status determines the analysis path:

failed → Analyze events for runner_on_failed
error → Platform-level failure (check capacity, EE, credentials)
canceled → Check timeout or manual cancellation

Step 3: Extract Failure Events

MCP Tool: jobs_job_events_list (from aap-mcp-job-management) Parameters:

id: "<job_id>"
page_size: 100

Filter for failure-related events:

runner_on_failed -- task failures (PRIMARY)
runner_on_unreachable -- host connectivity failures (PRIMARY)
playbook_on_stats -- final summary

From each failure event, extract:

host: which host failed
task: which task failed
event_data.res.msg: error message
event_data.task_action: Ansible module
counter: sequence number for timeline

Step 4: Retrieve Host Summaries

MCP Tool: jobs_job_host_summaries_list (from aap-mcp-job-management) Parameters:

id: "<job_id>"

Map each host's ok, changed, failures, dark, skipped counts.

Step 5: Classify the Failure

Apply the classification from job-troubleshooting.md:

dark > 0	failures > 0	Classification
Yes	No	Platform issue: Host connectivity
No	Yes	Code/Config issue: Task failure
Yes	Yes	Mixed: Both connectivity and task issues

For each runner_on_failed event, match against the failure patterns in job-troubleshooting.md:

Pattern 1: Host Unreachable
Pattern 2: Module Failure (Package Operations)
Pattern 3: Privilege Escalation Timeout
Pattern 4: Service Start Failure
Pattern 5: Template Rendering Error
Pattern 6: Execution Environment Issue

Step 6: Reconstruct Failure Timeline

Sort events by counter and produce a chronological failure narrative:

First failure event (root cause candidate)
Subsequent failures (cascade effects)
Final stats (scope of impact)

Step 7: Generate Failure Analysis Report

Output format (per job-troubleshooting.md template):

## Job Failure Analysis: Job #[job_id]

**Job Status**: [status]
**Elapsed Time**: [elapsed]s
**Launch Type**: [launch_type]

### Failure Timeline

1. [counter] - Task "[task_name]" on host "[hostname]": [event_type]
   Error: "[error_message]"
   Module: [module_name]
2. [subsequent failure events]

### Host Summary

| Host | OK | Changed | Failed | Unreachable |
|---|---|---|---|---|
| [host1] | [ok] | [changed] | [failures] | [dark] |

### Preliminary Classification

**Type**: [Platform / Code / Configuration] Issue
**Pattern Match**: [Pattern name from failure patterns reference]
**Evidence**: Per Red Hat's Troubleshooting Guide: "[relevant guidance from job-troubleshooting.md]"
**Root Cause Candidate**: [first failure event analysis]

### Next Steps

- Host fact correlation recommended: [yes/no, with affected hostnames]
- Resolution advisory recommended: [yes/no, with error pattern]

Dependencies

Required MCP Servers

aap-mcp-job-management - Job data and events

Required MCP Tools

jobs_retrieve (from job-management) - Job status
jobs_job_events_list (from job-management) - Event stream
jobs_job_host_summaries_list (from job-management) - Per-host summary
jobs_stdout_retrieve (from job-management) - Full stdout (supplementary)

Related Skills

aap-mcp-validator - Prerequisite validation
host-fact-inspector - Next step: correlate with host facts
resolution-advisor - Next step: get resolution recommendations
execution-summary - Audit trail

Reference Documentation

job-troubleshooting.md - Event parsing and failure patterns

Example Usage

User: "Job #4451 failed halfway through. Analyze the logs."

Agent:

Reads job-troubleshooting.md
Reports: "I consulted job-troubleshooting.md which references Red Hat's AAP 2.6 Troubleshooting Guide."
Retrieves job #4451 → status: failed
Extracts events → finds runner_on_failed on task "Install security package" with ansible.builtin.dnf, msg: "No package matching 'nonexistent-package'"
Retrieves host summaries → 1 host with failures=1
Classifies: Code Error (Pattern 2: Module Failure - Package Operations)
Reports structured analysis with timeline, classification, and next steps