en un clic
research-docs
// Parse documents with LiteParse and answer questions with visual citations. Generates an HTML report with the answer, source page images, and bounding box highlights on cited text.
// Parse documents with LiteParse and answer questions with visual citations. Generates an HTML report with the answer, source page images, and bounding box highlights on cited text.
| name | research-docs |
| description | Parse documents with LiteParse and answer questions with visual citations. Generates an HTML report with the answer, source page images, and bounding box highlights on cited text. |
| argument-hint | [data_directory] [question] |
| disable-model-invocation | true |
| allowed-tools | Bash(python *) |
| compatibility | Requires Python 3.9+, `pip install liteparse`, Node 18+, `npm i -g @llamaindex/liteparse` |
Parse documents with LiteParse, answer a question using the parsed text, and generate an HTML report with source citations highlighted on page images.
$ARGUMENTS should contain:
$0): Path to the data directory containing documentsIf either is missing, ask the user to provide them.
IMPORTANT: Always use the bundled Python script below for parsing. Do NOT call lit or liteparse CLI commands directly — use only generate_report.py.
Run the bundled parse script to extract text and bounding boxes from all supported files:
python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" \
--skill-dir "${CLAUDE_SKILL_DIR}" \
--dir "$0" \
--parse-only \
--output /tmp/research_docs_parsed.json
This discovers and parses all supported files in the directory:
The output is a JSON file with parsed text and bounding box coordinates for each page.
If the directory has more than 50 files and the user's question targets a specific document not in the first 50, re-run with a narrower --dir pointing to a subdirectory, or ask the user which files to focus on.
Read /tmp/research_docs_parsed.json using the Read tool. Focus on:
name and typetext field (skip raw textItems — those are for bounding box rendering)text fieldsummary object for total countsBuild a mental model of all document content before answering.
Using the parsed text as context, answer the user's question. Write your response as a JSON file:
cat > /tmp/research_docs_answer.json << 'ANSWER_EOF'
{
"question": "<the user's question>",
"answer": "<your answer in markdown with [N] citation markers>",
"citations": [
{
"file": "<filename e.g. report.pdf>",
"page": <1-indexed page number>,
"quote": "<exact verbatim substring from the parsed text>",
"relevance": "<explanation of what this value/quote means and how it supports the answer>"
}
]
}
ANSWER_EOF
Critical rules for the answer:
[1], [2], etc. in your answer text, corresponding to the 1-indexed position in the citations array"Reserve Bank credit totaled **$6,613,609 million** [1], with securities held outright at $6,375,679 million [2]."Critical rules for what to cite:
1,200,000 from the text — not just the heading "Revenue". You can cite both the value and the label if they're on the same page.relevance field should explain the "so what" — not just restate the label but explain what this value means in context and how it supports your answer. E.g., instead of "Total revenue figure" write "Total revenue for Q3 2025, representing a 12% year-over-year increase that supports the growth trend discussed above."Critical rules for quote format:
quote MUST be copied character-for-character from the parsed text. It is used for bounding box lookup via exact string matching. Do NOT paraphrase, reword, clean up, or fix typos.$2,769.23 or a short contiguous phrase like Securities held outright (under 40 characters). Shorter quotes match bounding boxes much more reliably.$16,615.38), NOT a run of values spanning multiple columns (e.g., $2,769.23 80.00 $2,769.23 $16,615.38page is 1-indexed (matches LiteParse pageNum)file is just the filename (not the full path)page to 0 (they have no pages)Run the bundled script in generate mode to produce the visual report:
python "${CLAUDE_SKILL_DIR}/scripts/generate_report.py" \
--skill-dir "${CLAUDE_SKILL_DIR}" \
--dir "$0" \
--answer-file /tmp/research_docs_answer.json \
--output research_docs_output/
This will:
Tell the user: