| name | document-analysis |
| description | Read and analyze documents — PDF, DOCX, Markdown, HTML, CSV, XLSX, JSON, YAML. Provides read_document tool with no output truncation and page-range support for PDFs. Use when the user shares a document or asks to explain, summarize, or extract information from files. |
| metadata | {"author":"agenticops","version":"1.0","domain":"operations"} |
| tools | ["agenticops.tools.file_tools.read_document"] |
Document Analysis Skill
Overview
When this skill is activated, the read_document tool is dynamically registered on the agent:
| Tool | Purpose | Key Args |
|---|
read_document | Read full document content (no truncation) | path, pages |
Unlike read_local_file (which truncates at 4K chars for operational safety), read_document
returns the complete content so you can fully understand and explain the document.
Supported Formats
| Format | Library | Notes |
|---|
| PDF | pymupdf or pypdf | Page-range support (pages="1-5") |
| DOCX | python-docx | Full paragraph extraction |
| Markdown | built-in | Full content, no truncation |
| HTML | built-in | Full content, no truncation |
| CSV | built-in | Full content, no truncation |
| JSON/YAML | built-in | Full content, no truncation |
| XLSX | openpyxl | Multi-sheet, all rows |
Quick Decision Trees
User Shares a Document
User provides a document (via @path or upload)
|
+-- Already injected as context?
| +-- Yes (attached via @path or web upload) → analyze directly
| +-- No (user mentions a file path) → read_document(path="...")
|
+-- Document too large?
| +-- PDF: use pages="1-5" to read in chunks
| +-- Other: summarize what you can, note truncation
|
+-- What does the user want?
+-- "Explain this" → structured summary
+-- "Summarize" → executive summary (key points + conclusions)
+-- "Find X in this" → targeted extraction
+-- "Compare with Y" → side-by-side analysis
Analyzing PDF Reports
PDF document received
|
+-- Large (>10 pages)?
| +-- Start with read_document(path, pages="1-3") for overview
| +-- Then read specific sections as needed
|
+-- What type of document?
| +-- Architecture/design doc → focus on components, data flow, decisions
| +-- Incident report → focus on timeline, root cause, remediation
| +-- Compliance/audit → focus on findings, risk level, recommendations
| +-- Cost report → focus on top spenders, trends, anomalies
| +-- Runbook/SOP → focus on steps, prerequisites, rollback
|
+-- Output format?
+-- Brief: 3-5 bullet points
+-- Detailed: section-by-section breakdown
+-- Actionable: extract TODOs and next steps
Analyzing Spreadsheets
CSV or XLSX received
|
+-- Understand structure first
| +-- Column headers, row count, data types
|
+-- What does the user want?
+-- "What's in this?" → schema + sample rows + summary stats
+-- "Find anomalies" → look for outliers, missing data, spikes
+-- "Trends" → time-series patterns if date column exists
+-- "Top N" → sort/rank by a metric column
Analysis Workflow
Step 1: Read the Document
read_document(path="/path/to/report.pdf")
# or with page range for large PDFs:
read_document(path="/path/to/report.pdf", pages="1-5")
Step 2: Identify Structure
- Document type (report, spec, spreadsheet, log dump)
- Key sections / headings
- Tables, figures, or data present
Step 3: Analyze Based on User Intent
- Explain: Walk through each section, clarify technical terms
- Summarize: Extract key findings, conclusions, action items
- Extract: Pull specific data points the user asked about
- Compare: Side-by-side with another document or known state
Step 4: Present Findings
- Lead with the answer, not the process
- Use structured format (headers, bullets, tables)
- Quote specific passages when relevant
- Note any limitations (scanned PDF with no text, truncated content)
Tool Reference Quick Card
| Example | Description |
|---|
read_document(path="report.pdf") | Read entire PDF |
read_document(path="report.pdf", pages="1-3") | Read pages 1-3 only |
read_document(path="report.pdf", pages="5") | Read page 5 only |
read_document(path="spec.docx") | Read Word document |
read_document(path="data.csv") | Read CSV file |
read_document(path="metrics.xlsx") | Read Excel workbook |
Edge Cases
- Scanned PDFs (image-only): text extraction will return empty — inform the user that OCR is needed
- Password-protected PDFs: will fail — ask user to provide an unprotected copy
- Very large spreadsheets: output is truncated to 6000 chars — suggest filtering or specifying columns of interest
- Mixed content PDFs (text + images): only text is extracted — note that charts/diagrams are not visible