Run any Skill in Manus with one click

$pwd:

a2rag-adaptive-agentic-graph

Name: A2rag Adaptive Agentic Graph
Author: ndpvt-web

// Build adaptive, cost-aware Graph-RAG pipelines that route queries through escalating retrieval stages (local -> bridge -> global) with triple-check verification and provenance map-back. Use when: 'build a graph RAG pipeline', 'implement adaptive retrieval for knowledge graphs', 'cost-aware multi-hop question answering', 'add evidence verification to RAG', 'handle mixed-difficulty queries efficiently', 'graph retrieval with source text grounding'.

Run Skill in Manus

$ git log --oneline --stat

stars:4

forks:0

updated:February 13, 2026 at 13:35

SKILL.md

readonly

related-skills.json

same repository

adaptbpe-general-purpose-specialized.md

from "ndpvt-web/arxiv-claude-skills"

Adapt general-purpose BPE tokenizers into domain- or language-specialized tokenizers using the AdaptBPE post-training strategy. Replaces low-utility tokens with high-frequency domain-specific tokens to improve tokenization efficiency without retraining from scratch. Trigger phrases: "adapt tokenizer to domain", "specialize BPE for medical text", "optimize tokenizer for French", "reduce token fertility for code", "adapt vocabulary for legal documents", "domain-specific tokenizer"

2026-02-134

addressing-explainability-generative-ai.md

from "ndpvt-web/arxiv-claude-skills"

Explain generative AI outputs using the gSMILE perturbation-based attribution framework. Builds local surrogate models from controlled input perturbations and Wasserstein distance to produce token-level or word-level importance scores for LLM and diffusion model outputs. Triggers: 'explain why the model generated this', 'token attribution for prompt', 'which words in my prompt matter most', 'interpret generative model output', 'build explainability for my LLM pipeline', 'debug prompt influence on generation'

2026-02-134

agent-based-software-artifact-evaluation.md

from "ndpvt-web/arxiv-claude-skills"

Automatically evaluate software research artifacts (code repositories with READMEs) by constructing dependency-aware command graphs, building containerized environments, and executing instructions with structured error recovery. Use when asked to: 'evaluate this artifact', 'reproduce this paper's results', 'run this repo's README instructions', 'check if this artifact builds and runs', 'automate artifact evaluation', 'verify research reproducibility'.

2026-02-134

agentcgroup-understanding-controlling-os.md

from "ndpvt-web/arxiv-claude-skills"

Design and implement OS-level resource controls for sandboxed AI agents using hierarchical cgroups, eBPF enforcement, and tool-call-level resource management. Use when: 'set up cgroups for AI agent containers', 'control memory for coding agents', 'isolate tool-call resources with eBPF', 'manage multi-tenant agent resource limits', 'prevent OOM kills in agent sandboxes', 'configure agent resource policies with cgroup v2'.

2026-02-134

ai-agent-systems-supply.md

from "ndpvt-web/arxiv-claude-skills"

Build LLM-based multi-agent systems for supply chain inventory management using structured decision prompts and memory-retrieval (AIM-RM). Implements the beer game multi-echelon supply chain simulation with per-stage agents that use stepwise ordering prompts, safety-stock calculations, and Euclidean-distance memory retrieval of similar historical episodes. Use when asked to: "build a supply chain agent", "implement inventory management with LLMs", "create a beer game simulation with AI agents", "multi-agent ordering system", "AIM-RM memory retrieval agent", "supply chain decision prompt design".

2026-02-134

alertguardian-intelligent-alert-life-cycle.md

from "ndpvt-web/arxiv-claude-skills"

Build intelligent alert lifecycle management systems for cloud infrastructure using graph-based denoising, RAG-powered summarization, and multi-agent rule refinement. Trigger phrases: - "reduce alert fatigue in our monitoring system" - "deduplicate and correlate alerts" - "summarize alerts for on-call engineers" - "refine our alerting rules automatically" - "build an alert denoising pipeline" - "too many alerts, help me triage"

2026-02-134

package.json

"author": "ndpvt-web"

"repository": "ndpvt-web/arxiv-claude-skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

a2rag-adaptive-agentic-graph

description

Build adaptive, cost-aware Graph-RAG pipelines that route queries through escalating retrieval stages (local -> bridge -> global) with triple-check verification and provenance map-back. Use when: 'build a graph RAG pipeline', 'implement adaptive retrieval for knowledge graphs', 'cost-aware multi-hop question answering', 'add evidence verification to RAG', 'handle mixed-difficulty queries efficiently', 'graph retrieval with source text grounding'.

A2RAG: Adaptive Agentic Graph Retrieval

This skill enables Claude to design and implement Graph Retrieval-Augmented Generation systems that adaptively route queries through escalating retrieval stages based on difficulty, verify evidence with a triple-check mechanism, and ground answers back to source text via provenance mapping. The core insight from the A2RAG paper is that ~55% of queries can be answered with cheap local graph lookups, ~27% need bridge discovery, and only ~15% require expensive global diffusion -- so a progressive escalation pipeline cuts token consumption and latency by ~50% while improving recall by +10 points over flat retrieval baselines.

When to Use

When the user wants to build a RAG system over a knowledge graph and needs cost-efficient multi-hop reasoning
When implementing question-answering pipelines that must handle a mix of easy single-hop and hard multi-hop queries
When a graph-based retrieval system suffers from extraction loss (knowledge graph triples miss fine-grained qualifiers like dates, numbers, or exceptions that exist in source text)
When the user asks for evidence verification or answer grounding in a retrieval pipeline
When building an agentic retrieval loop that should know when to stop retrieving and when to escalate
When optimizing an existing Graph-RAG system for cost (token budget, API calls, latency)

Key Technique

A2RAG decouples adaptive control from agentic retrieval. The adaptive controller acts as the outer loop: it gates whether retrieval is needed at all (via summarized-KB similarity scoring), orchestrates answer generation, and enforces a triple-check verification before accepting any answer. The three checks are: (1) evidence relevance -- do retrieved passages actually address the query, (2) answer grounding -- are claims derivable from the evidence, and (3) query resolution -- does the answer adequately address the question. Only when all three pass does the system return a result. On failure, the controller identifies which check failed and rewrites the query accordingly (sharpening entities for relevance failures, requesting stricter grounding, or adding missing constraints for resolution failures), up to a bounded retry limit.

The agentic retriever inside this loop maintains a stateful evidence accumulator and escalates through three stages. Stage 1 (Local) extracts entity mentions from the query, aligns them to knowledge graph nodes via hybrid lexical-semantic matching, and collects 1-hop neighbor triples -- this resolves the majority of queries. Stage 2 (Bridge) discovers bridge entities that connect two or more query entity seeds within K hops, extracting the shortest connecting paths -- this handles multi-hop reasoning. Stage 3 (Global Fallback) runs degree-normalized Personalized PageRank from seed nodes, selects top-L nodes by PPR score, and critically performs provenance map-back: mapping each node back to its source text chunks to recover fine-grained qualifiers lost during graph construction. Escalation is monotonic (Local -> Bridge -> Global), with sufficiency checks gating each transition.

Step-by-Step Workflow

Build the knowledge graph index. Parse the corpus into documents, extract entities and relations (via NER + relation extraction or LLM-based extraction), and construct a graph G = (V, E). Store an offline provenance map pi: V -> 2^D linking each node back to its source text chunks. Precompute document summaries for gating.
Implement the gating check. For each incoming query, compute dense embedding similarity against precomputed document summaries. If max(similarity_scores) < tau_g (configurable threshold, typically 0.3-0.5), return "Abstain" immediately -- the corpus likely cannot answer this query.
Implement Stage 1: Local Evidence Collection. Extract entity mentions from the query using NER or phrase extraction. Align them to graph nodes using a hybrid score (edit distance + embedding cosine similarity). Collect 1-hop neighbors of aligned nodes, optionally filtered by relation seed constraints. Package triples as evidence.
Implement Stage 2: Bridge Discovery. For multi-hop queries where local evidence is insufficient, construct an augmented graph with inverse edges. Find bridge candidates -- nodes reachable from 2+ query entity seeds within K hops (K >= 2). Extract shortest paths connecting bridges to seeds. Cap path count and hop length to control evidence size.
Implement Stage 3: Global Fallback with PPR and Provenance Map-Back. When bridge discovery fails, run degree-normalized Personalized PageRank from seed nodes (lower-degree seeds get higher personalization weight to avoid hub bias). Select top-L nodes by PPR score. Map each selected node back to source text chunks via the provenance map pi(v). This recovers fine-grained qualifiers (dates, numbers, exceptions) that graph triples lost.
Implement the triple-check verification. After generating an answer from retrieved evidence, run three validators: (a) V_rel(q, E) -- are the passages relevant to the query? (b) V_grd(a, E) -- is the answer grounded in the evidence? (c) V_ans(q, a) -- does the answer resolve the question? Use NLI models or prompted LLM classifiers. Accept only when all three pass.
Implement failure-aware query rewriting. When verification fails, identify the first violated check. Rewrite the query accordingly: sharpen entity/relation expressions for relevance failures, request stricter evidence for grounding failures, add missing constraints for resolution failures. Retry up to I_max iterations (2-3).
Wire the outer adaptive control loop. Connect gating -> agentic retrieval (with escalation) -> answer generation -> triple-check -> conditional rewrite/retry. Track token consumption per stage for cost monitoring.
Add cost instrumentation. Log which retrieval stage resolved each query. Monitor the stage distribution (target: ~55% local, ~27% bridge, ~15% global). Alert if global fallback usage exceeds 20%, indicating potential graph quality issues.
Test with mixed-difficulty query sets. Evaluate on both single-hop and multi-hop questions. Verify that easy queries terminate at Stage 1, multi-hop queries use Stage 2, and only adversarial/incomplete-graph cases hit Stage 3.

Concrete Examples

Example 1: Building an A2RAG pipeline for a documentation QA system

User: "I have a knowledge graph built from our product documentation. I want to build a QA system that handles both simple factual questions and complex multi-hop questions efficiently."

Approach:

Index the documentation corpus with entity extraction and build provenance map linking KG nodes to source paragraphs
Implement the three-stage retriever with escalation gating
Add triple-check verification before returning answers

class A2RAGPipeline:
    def __init__(self, kg, corpus, summaries, tau_g=0.4, i_max=2, alpha=0.15):
        self.kg = kg                # Knowledge graph with adjacency
        self.corpus = corpus        # Source text chunks indexed by doc ID
        self.summaries = summaries  # Precomputed doc summary embeddings
        self.tau_g = tau_g          # Gating threshold
        self.i_max = i_max          # Max rewrite retries
        self.alpha = alpha          # PPR teleport probability
        self.provenance = kg.provenance_map  # node -> source chunk IDs

    def answer(self, query: str) -> dict:
        # Step 1: Gating
        if self._gate(query) < self.tau_g:
            return {"answer": None, "status": "abstain"}

        q = query
        for i in range(self.i_max + 1):
            # Step 2: Agentic retrieval with escalation
            evidence = self._agentic_retrieve(q)

            # Step 3: Generate answer
            answer = self._generate(q, evidence)

            # Step 4: Triple-check verification
            checks = self._triple_check(q, answer, evidence)
            if all(checks.values()):
                return {"answer": answer, "evidence": evidence,
                        "stage": evidence.source_stage, "retries": i}

            # Step 5: Failure-aware rewrite
            failed = next(k for k, v in checks.items() if not v)
            q = self._rewrite(q, answer, evidence, failure_type=failed)

        return {"answer": answer, "status": "unverified", "retries": self.i_max}

    def _agentic_retrieve(self, query: str) -> Evidence:
        seeds = self._extract_and_align_entities(query)
        rel_seeds = self._extract_relation_seeds(query)

        # Stage 1: Local (1-hop neighbors of aligned entities)
        evidence = self._local_collect(seeds, rel_seeds)
        if self._evidence_sufficient(query, evidence):
            evidence.source_stage = "local"
            return evidence

        # Stage 2: Bridge discovery (K-hop connecting paths)
        bridge_evidence = self._bridge_discover(seeds, k_hops=2)
        evidence.merge(bridge_evidence)
        if self._evidence_sufficient(query, evidence):
            evidence.source_stage = "bridge"
            return evidence

        # Stage 3: Global PPR + provenance map-back
        ppr_nodes = self._degree_normalized_ppr(seeds, top_l=20)
        source_chunks = self._provenance_mapback(ppr_nodes)
        evidence.merge(source_chunks)
        evidence.source_stage = "global"
        return evidence

Example 2: Adding provenance map-back to an existing Graph-RAG system

User: "Our graph RAG often gives wrong answers because the KG triples don't capture specific dates and numbers from the source documents. How do I fix this?"

Approach:

This is the extraction loss problem -- graph abstraction drops fine-grained qualifiers
Implement a provenance map linking each KG node to its source text chunks
After graph traversal, map selected nodes back to source text for answer generation

class ProvenanceMapBack:
    def __init__(self, kg, corpus):
        # Build offline provenance map: node_id -> set of chunk_ids
        self.prov_map = {}
        for node_id, node in kg.nodes.items():
            self.prov_map[node_id] = set()
            for chunk_id, chunk in corpus.items():
                if node.label.lower() in chunk.text.lower():
                    self.prov_map[node_id].add(chunk_id)

    def mapback(self, selected_nodes: list[str], corpus) -> list[str]:
        """Given graph-selected nodes, retrieve original source text."""
        chunk_ids = set()
        for node_id in selected_nodes:
            chunk_ids.update(self.prov_map.get(node_id, set()))
        return [corpus[cid].text for cid in chunk_ids]

# Usage: After PPR or bridge discovery selects graph nodes,
# ground the answer in source text instead of just triples
graph_nodes = ppr_select(seeds, top_l=20)
source_texts = provenance.mapback(graph_nodes, corpus)
answer = llm.generate(query=q, context=source_texts)  # Grounded in full text

Example 3: Implementing triple-check verification for answer quality

User: "I want to add verification to my RAG pipeline so it doesn't return hallucinated answers."

Approach:

Implement three independent checks: relevance, grounding, resolution
Only accept answers that pass all three
On failure, rewrite the query targeting the specific failure mode

class TripleCheck:
    def __init__(self, nli_model, llm):
        self.nli = nli_model
        self.llm = llm

    def verify(self, query: str, answer: str, evidence: list[str]) -> dict:
        context = "\n".join(evidence)
        return {
            "relevance": self._check_relevance(query, context),
            "grounding": self._check_grounding(answer, context),
            "resolution": self._check_resolution(query, answer),
        }

    def _check_relevance(self, query, context) -> bool:
        """Do the retrieved passages actually address the query?"""
        return self.nli.entails(premise=context,
                                hypothesis=f"This text is relevant to: {query}")

    def _check_grounding(self, answer, context) -> bool:
        """Is every claim in the answer supported by the evidence?"""
        return self.nli.entails(premise=context, hypothesis=answer)

    def _check_resolution(self, query, answer) -> bool:
        """Does the answer adequately resolve the question?"""
        prompt = f"Does this answer fully resolve the question?\nQ: {query}\nA: {answer}\nRespond YES or NO."
        return "YES" in self.llm.generate(prompt).upper()

    def rewrite_on_failure(self, query, answer, evidence, failure_type) -> str:
        strategies = {
            "relevance": f"Rephrase to be more specific about entities and relations: {query}",
            "grounding": f"Find stronger evidence for: {query}. Previous answer '{answer}' was not grounded.",
            "resolution": f"Add missing constraints. Original: {query}. Incomplete answer: {answer}",
        }
        return self.llm.generate(f"Rewrite this query. Strategy: {strategies[failure_type]}")

Best Practices

Do: Build the provenance map (node -> source chunks) at index time, not query time. This is an offline operation and critical for the map-back stage to be fast.
Do: Set the gating threshold tau_g conservatively (0.3-0.4). False negatives (abstaining on answerable queries) are worse than letting a few irrelevant queries through to the retriever.
Do: Use degree-normalized personalization for PPR. Giving lower-degree (rarer) entity seeds higher weight prevents hub nodes from dominating results. The formula is p0(u) = deg(u)^{-1} / Z.
Do: Track per-query stage resolution statistics. If more than 20% of queries hit Stage 3 (global), your knowledge graph likely has coverage gaps that should be addressed upstream.
Avoid: Running all three retrieval stages unconditionally. The entire point of A2RAG is that most queries terminate early. Skipping the sufficiency check between stages negates the cost savings.
Avoid: Using only graph triples for answer generation. Triples lose qualifiers (dates, quantities, exceptions). Always map back to source text for the final answer generation step, especially in Stage 3.

Error Handling

Failure Mode	Symptom	Resolution
Gating too aggressive	Many answerable queries return "Abstain"	Lower `tau_g` threshold; check summary embedding quality
Entity alignment misses	Seeds don't match KG nodes, Stage 1 returns empty evidence	Improve hybrid matching; add alias tables; lower alignment threshold
Bridge discovery timeout	Stage 2 hangs on dense subgraphs	Cap path count and hop budget K; set explicit traversal limits
PPR convergence issues	Stage 3 returns low-quality nodes	Increase iteration count for fixed-point computation; verify graph connectivity
Triple-check too strict	All answers fail verification, retry budget exhausted	Relax individual check thresholds; consider soft scoring instead of binary
Provenance map gaps	Map-back returns empty for some nodes	Audit extraction pipeline; ensure all entities have source chunk links

Limitations

Requires a pre-built knowledge graph. A2RAG assumes an existing KG with reasonable entity/relation extraction quality. If the KG is very sparse or noisy, the local and bridge stages degrade significantly and most queries fall through to the expensive global stage.
Not suited for single-document QA. The graph-based retrieval adds overhead that is only justified when the corpus has relational structure across multiple documents.
Triple-check adds latency for hard queries. While easy queries are faster, the worst-case path (3 retrieval stages x 3 retry iterations x 3 verification checks) is more expensive than a flat retrieval baseline. The savings come from the aggregate workload, not individual hard queries.
Entity alignment quality is a bottleneck. If the NER/entity linking step produces poor seed alignments, all three retrieval stages suffer. This is the most important component to get right.
PPR assumes graph connectivity. Disconnected graph components cannot be bridged. Ensure the KG construction pipeline produces a reasonably connected graph.

Reference

Paper: A2RAG: Adaptive Agentic Graph Retrieval for Cost-Aware and Reliable Reasoning (Liu et al., 2026). Look for: the three-stage escalation policy (Section 3.2), the triple-check verification formulation (Section 3.1), degree-normalized PPR with provenance map-back (Section 3.2.3), and the cost-vs-recall ablation tables (Section 4.3) showing that ~55% of queries resolve at Stage 1.

a2rag-adaptive-agentic-graph

More from this repository

More from this repository

A2RAG: Adaptive Agentic Graph Retrieval

When to Use

Key Technique

Step-by-Step Workflow

Concrete Examples

Best Practices

Error Handling

Limitations

Reference

A2RAG: Adaptive Agentic Graph Retrieval

When to Use

Key Technique

Step-by-Step Workflow

Concrete Examples

Best Practices

Error Handling

Limitations

Reference