Run any Skill in Manus with one click

$pwd:

cxas-protocol-robust-extraction

Name: Cxas Protocol Robust Extraction
Author: GoogleCloudPlatform

// A robust methodology for LLM-based requirements gathering and high-fidelity artifact generation. Employs 'Divide, Conquer, and Verify' tactics using specialized subagents, iterative exhaustion loops, and batched execution to ensure zero data loss.

Run Skill in Manus

$ git log --oneline --stat

stars:43

forks:42

updated:May 21, 2026 at 00:17

SKILL.md

readonly

name	cxas-protocol-robust-extraction
description	A robust methodology for LLM-based requirements gathering and high-fidelity artifact generation. Employs 'Divide, Conquer, and Verify' tactics using specialized subagents, iterative exhaustion loops, and batched execution to ensure zero data loss.

Robust Extraction Protocol

This protocol defines the standard operating procedure for extracting exhaustive requirements (like subintents, CUJs, or logic rules) from large, complex, or fragmented customer artifacts.

It prevents the common LLM pitfalls of "context drift" and "truncation" by enforcing a strict "Divide, Conquer, and Verify" methodology.

General Principles & Anti-Hallucination Guardrails

To ensure 100% coverage and prevent data loss due to tool limits or implicit filtering, follow these principles across all phases:

Quantify the Scope: Before spawning any subagents or starting extraction, determine the exact total count of target items (files, directories, database rows). Record this number as your "Success Target." You must verify that the sum of items processed equals this target before proceeding to consolidation.
Coverage over Curation: Default to 100% extraction coverage. Never assume the user only wants the "top" or "most interesting" items unless explicitly instructed to apply a quality filter. A standard or repetitive item is still data that must be reported.
Circumvent Tool Caps: Be aware that search and listing tools often have display limits (e.g., capped at 50 or 1000 results). If the expected scale (from the Quantify step) exceeds the tool's limit, you must partition the work (e.g., by alphabet or ID range) to ensure no items are hidden by the tool's cap.
Maintain Traceability: For every extracted requirement or item, record the source file or location it was extracted from. This allows for easy verification and provides context when reviewing the consolidated results.

Core Directives

When tasked with comprehensive extraction or generation from a large corpus, you MUST follow this four-phase methodology:

Phase 1: Parallel Expert Discovery

Never use a single generalist agent or a single prompt to read all files.

Categorize the input artifacts (e.g., Code/ADK, Diagrams, Test Cases).
Spawn specialized expert subagents (e.g., cxas-ingestor-adk) in parallel, providing each with only the context relevant to their expertise.
Consolidate their initial findings into a centralized list.

Phase 2: The Iterative Exhaustion Loop

LLMs often miss items in a single pass of a large document. You must force them to iterate.

Provide the current consolidated list of findings back to the expert subagents.
Ask a direct question: "Based on your artifacts, are there ANY MORE items missing from this list? If yes, list them. If no, reply EXACTLY 'NO'."
The Loop Rule: You MUST continue this loop, updating the consolidated list each time, until ALL expert subagents unanimously reply with "NO".

Phase 3: Logical Clustering

Once the exhaustive list is finalized (e.g., 100+ subintents), organize it.

Group the granular findings into high-level logical categories (Parent CUJs).
Verify with the experts that the parent categories encompass all findings.

Phase 4: Batched Execution & Verification

Never ask an LLM to generate 100+ complex artifacts (like conversational transcripts) in a single prompt. It will hallucinate or truncate.

Batching: Divide the exhaustive list into small, manageable batches (e.g., 10 batches of 10 items). Write these batches to temporary files.
Delegated Execution: Spawn a new subagent for each batch. Instruct them to process only their assigned batch and write the output to a specific file.
The Verification Gate: As the orchestrator, you MUST verify the output of each subagent. Did they generate an output for every single item in their batch?
If YES: Accept the batch.
If NO: Discard the output and respawn the subagent for that specific batch with stronger steering instructions.
Consolidation: Only when all batches pass the Verification Gate, merge them into the final, exhaustive deliverable.

related-skills.json

same repository

cxas-agent-foundry.md

from "GoogleCloudPlatform/cxas-scrapi"

End-to-end GECX/CXAS/CES conversational agent lifecycle -- build agents from requirements (PRD-to-agent), create and run evals (goldens, simulations, tool tests, callback tests), debug failures, and iterate to production quality. Use this skill whenever the user mentions GECX, CXAS, CES, SCRAPI, conversational agents, voice agents, audio agents, agent evals, pushing/pulling/linting agents, or agent instructions/callbacks/tools on the Google Customer Engagement Suite platform.

2026-05-2943

cxas-dfcx-migration.md

from "GoogleCloudPlatform/cxas-scrapi"

Migrate Dialogflow CX (DFCX) agents to CXAS (Customer Experience Agent Studio) agents. Use this skill when the user mentions DFCX migration, migrating agents, converting DFCX to CXAS, porting agents, agent migration, or post-migration optimization/consolidation. Four independently runnable scripts: migrate.py (1:1), stage_1.py (variable dedup + consolidation), stage_2.py (instruction state machines + tool mocks + lint + report), stage_3.py (rewires consolidated topology from source dep graph; only needed when stage_1 ran consolidation). State persists between scripts via <target>_ir.json so each can run / re-run / resume independently.

2026-05-2943

cxas-loss-analysis.md

from "GoogleCloudPlatform/cxas-scrapi"

Retrieves non-contained CCAI Insights conversations (losses), uses agent intelligence to cluster them into common failure patterns, and generates a markdown report. Use when you need to analyze failure patterns and build targeted regression/evaluation reports.

2026-05-2643

cxas-cuj-report-generator.md

from "GoogleCloudPlatform/cxas-scrapi"

Automates the ingestion of customer requirement documents such as diagrams, BRDs, code etc., synthesizes high-fidelity natural transcripts, and compiles them into highly interactive, responsive Critical User Journey (CUJ) reports.

2026-05-2143

cxas-protocol-two-phase-ingestion.md

from "GoogleCloudPlatform/cxas-scrapi"

A two-phase protocol for extracting structure and generating transcripts from customer artifacts.

2026-05-2143

task-coverage-protocol.md

from "GoogleCloudPlatform/cxas-scrapi"

Enforces task coverage and prevents drift via a deterministic checklist tool with output verification.

2026-05-2143

package.json

"author": "GoogleCloudPlatform"

"repository": "GoogleCloudPlatform/cxas-scrapi"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Project Management SpecialistsBusiness and Financial Operations Occupations13-1082L4

name	cxas-protocol-robust-extraction
description	A robust methodology for LLM-based requirements gathering and high-fidelity artifact generation. Employs 'Divide, Conquer, and Verify' tactics using specialized subagents, iterative exhaustion loops, and batched execution to ensure zero data loss.

Robust Extraction Protocol

This protocol defines the standard operating procedure for extracting exhaustive requirements (like subintents, CUJs, or logic rules) from large, complex, or fragmented customer artifacts.

It prevents the common LLM pitfalls of "context drift" and "truncation" by enforcing a strict "Divide, Conquer, and Verify" methodology.

General Principles & Anti-Hallucination Guardrails

To ensure 100% coverage and prevent data loss due to tool limits or implicit filtering, follow these principles across all phases:

Quantify the Scope: Before spawning any subagents or starting extraction, determine the exact total count of target items (files, directories, database rows). Record this number as your "Success Target." You must verify that the sum of items processed equals this target before proceeding to consolidation.
Coverage over Curation: Default to 100% extraction coverage. Never assume the user only wants the "top" or "most interesting" items unless explicitly instructed to apply a quality filter. A standard or repetitive item is still data that must be reported.
Circumvent Tool Caps: Be aware that search and listing tools often have display limits (e.g., capped at 50 or 1000 results). If the expected scale (from the Quantify step) exceeds the tool's limit, you must partition the work (e.g., by alphabet or ID range) to ensure no items are hidden by the tool's cap.
Maintain Traceability: For every extracted requirement or item, record the source file or location it was extracted from. This allows for easy verification and provides context when reviewing the consolidated results.

Core Directives

When tasked with comprehensive extraction or generation from a large corpus, you MUST follow this four-phase methodology:

Phase 1: Parallel Expert Discovery

Never use a single generalist agent or a single prompt to read all files.

Categorize the input artifacts (e.g., Code/ADK, Diagrams, Test Cases).
Spawn specialized expert subagents (e.g., cxas-ingestor-adk) in parallel, providing each with only the context relevant to their expertise.
Consolidate their initial findings into a centralized list.

Phase 2: The Iterative Exhaustion Loop

LLMs often miss items in a single pass of a large document. You must force them to iterate.

Provide the current consolidated list of findings back to the expert subagents.
Ask a direct question: "Based on your artifacts, are there ANY MORE items missing from this list? If yes, list them. If no, reply EXACTLY 'NO'."
The Loop Rule: You MUST continue this loop, updating the consolidated list each time, until ALL expert subagents unanimously reply with "NO".

Phase 3: Logical Clustering

Once the exhaustive list is finalized (e.g., 100+ subintents), organize it.

Group the granular findings into high-level logical categories (Parent CUJs).
Verify with the experts that the parent categories encompass all findings.

Phase 4: Batched Execution & Verification

Never ask an LLM to generate 100+ complex artifacts (like conversational transcripts) in a single prompt. It will hallucinate or truncate.

Batching: Divide the exhaustive list into small, manageable batches (e.g., 10 batches of 10 items). Write these batches to temporary files.
Delegated Execution: Spawn a new subagent for each batch. Instruct them to process only their assigned batch and write the output to a specific file.
The Verification Gate: As the orchestrator, you MUST verify the output of each subagent. Did they generate an output for every single item in their batch?
If YES: Accept the batch.
If NO: Discard the output and respawn the subagent for that specific batch with stronger steering instructions.
Consolidation: Only when all batches pass the Verification Gate, merge them into the final, exhaustive deliverable.

cxas-protocol-robust-extraction

Robust Extraction Protocol

General Principles & Anti-Hallucination Guardrails

Core Directives

Phase 1: Parallel Expert Discovery

Phase 2: The Iterative Exhaustion Loop

Phase 3: Logical Clustering

Phase 4: Batched Execution & Verification

More from this repository

More from this repository

Robust Extraction Protocol

General Principles & Anti-Hallucination Guardrails

Core Directives

Phase 1: Parallel Expert Discovery

Phase 2: The Iterative Exhaustion Loop

Phase 3: Logical Clustering

Phase 4: Batched Execution & Verification