com um clique
document-grounding
// Convert a raw document into a structured grounding note for downstream research and summarization.
// Convert a raw document into a structured grounding note for downstream research and summarization.
Unpack a ZIP archive, inventory its files, run the corresponding child grounding skill for each supported child file, and then write a real archive-level grounded.md.
Run focused literature and web research from a grounded note. Use when a grounded note already exists and you want targeted research results, opened-link evidence, deeper per-paper analysis materials, optional downloaded literature, and a two-stage literature output (`lit_initial.md` then refined `lit.md`).
Review a research report draft with a structured scoring rubric, run a bounded repair loop when needed, and produce the final deliverable report.
Create a rich, evidence-preserving research report draft from a grounded note and its follow-up literature result. This is the main report-writing stage of the middle pipeline, not a compression memo.
Use the input path to select the correct downstream grounding pipeline and continue execution until the selected grounding workflow is completed.
Convert a meeting audio file into a transcript bundle, then use meeting-grounding to produce structured meeting grounding outputs.
| name | document-grounding |
| description | Convert a raw document into a structured grounding note for downstream research and summarization. |
Convert a raw document into a structured grounding note.
This skill is for document grounding, not a narrative recap. It should produce a stable intermediate note that is easy to read and easy for downstream skills to use.
Use this skill when:
Do not use this skill when:
A single document file.
Supported first-stage formats:
.pdf.docx.md.txtThe document may contain:
For each input document, create one bundle directory:
data/grounded_notes/<type>-<doc_id>_<timestamp>/
where <type> is the file extension (e.g. pdf, docx, md, txt), <doc_id> is the sanitized filename without extension, and <timestamp> is the Beijing-time execution timestamp (format: YYYYMMDDHHMMSS).
For example:
data/grounded_notes/pdf-paper_name_20260410153022/
data/grounded_notes/docx-notes_001_20260410153100/
data/grounded_notes/md-project_readme_20260410153215/
Inside that bundle, the expected outputs are:
<bundle_dir>/
├─ ground_id.txt # Ground ID for this unit (reused by all downstream stages)
├─ extracted.md
├─ extracted_meta.json
├─ asset_index.json
├─ grounded.md
└─ assets/
├─ tables/
├─ figures/
└─ formulas/
The <ground_id> (e.g. pdf-paper_name_20260410153022) is the single stable identifier for the entire pipeline — all downstream directories (lit_inputs, lit_results, report_inputs, review_outputs, reports, final_outputs) reuse this same <ground_id>.
ground_document.py is responsible for building the extraction bundle:
extracted.mdextracted_meta.jsonasset_index.jsonassets/...grounded.mdgrounded.md must be a real grounding note.
It must not remain a placeholder scaffold.
run.sh entrypoint.extracted.mdextracted_meta.jsonasset_index.jsonextracted.md contains AssetRef blocks, inspect the referenced files in assets/... before writing grounded.md.grounded.md as a real structured grounding note.extracted.md may contain blocks such as:
[AssetRef]
type: figure
id: figure_001
path: assets/figures/figure_001.png
instruction: Inspect this asset before writing grounded.md if it is relevant to the document's claims, comparisons, or conclusions.
[/AssetRef]
and
[AssetRef]
type: table
id: table_001
path_md: assets/tables/table_001.md
path_csv: assets/tables/table_001.csv
instruction: Inspect this table before writing grounded.md if it contains key evidence, comparisons, or numerical results.
[/AssetRef]
These are not decorative markers.
They indicate that important evidence may exist outside the plain extracted text.
If an AssetRef appears relevant, the agent must inspect the referenced asset before final grounding.
Return markdown with exactly these sections.
# Document Grounding
## 1. Main Topic / Purpose
[2–4 sentence statement of the document’s main topic and purpose.]
## 2. Main Points
- [One bullet per major sub-topic, contribution, or argument]
## 3. Key Findings / Claims
- [Only items explicitly stated or strongly supported by the document]
## 4. Constraints / Risks
- [Only constraints, limitations, caveats, or risks explicitly stated or strongly supported by the document]
## 4a. Important Non-Textual Elements
- [Key tables, figures, diagrams, formulas, or code blocks that materially affect interpretation]
- [Mention how they affect the reading of the document if relevant]
## 5. Unresolved Issues
- [Preserve uncertainty, unanswered questions, incomplete evidence, or limitations that remain open]
## 6. Suggested Next Steps
- [Concrete follow-up directions grounded in the document]
- [Do not invent owners, deadlines, or commitments]
## 7. Search Keywords
### Problem Keywords
- ...
### Method / Solution Keywords
- ...
### Domain / Constraint Keywords
- ...
Only include something here if it is explicitly stated or strongly supported by the document. If it is merely hinted, proposed, or speculative, do not present it as a confirmed finding.
Only include constraints, limitations, or risks that are explicitly stated or strongly evidenced. Do not infer hidden constraints from weak hints.
Only mention tables, figures, formulas, diagrams, or code blocks that materially affect interpretation.
Do not list every asset mechanically.
If an asset was referenced in extracted.md through an AssetRef block and it appears relevant, inspect it before deciding whether to include it.
Only include follow-up actions or directions that are explicitly discussed or strongly implied by the document. Do not invent action owners, deadlines, or commitments.
Use specific noun phrases that are useful for later search. Avoid generic terms such as:
Unclear — multiple loosely related topics appear in the document.The task is not complete if grounded.md is missing, empty, or still contains placeholder scaffold text.
The agent must overwrite or newly create grounded.md with a real grounding note based on:
extracted.mdextracted_meta.jsonasset_index.jsonassets/.../document-grounding
When running the extraction script, ensure the Docling model path is set:
export DOCLING_ARTIFACTS_PATH="${PROJECT_ROOT}/models/docling"
Supported model path locations (checked in order):
${PROJECT_ROOT}/models/docling${PROJECT_ROOT}/models/docling-project/docling-models/root/.cache/docling/models