| name | cli-hub-matrix-knowledge-research |
| description | Capability-based multi-tool matrix for research, note-taking, document authoring, and publishing. Agents compose providers (CLI-Anything harnesses, public CLIs, Python libs, native binaries, cloud APIs) per capability. |
Knowledge / Office / Research Matrix (S2 โ v2 capability-based)
Scenario S2. Strongest coverage in the current ecosystem: search, reference management, notes, synthesis, documents, and diagrams all have first-party harnesses; Python/native fallbacks fill the rest.
Schema: docs/cli-matrix/matrix_registry.schema.md. Matrix plan: docs/cli-matrix/cli-matrix-plan.md.
Install
cli-hub matrix install knowledge-research
cli-hub matrix info knowledge-research
cli-hub matrix preflight knowledge-research --json
Provider selection constraints
- Use preflight as an availability report, not as a provider selector.
- Treat provider order as documentation order only.
- Choose from user requirements, output quality bar, offline needs, credential state, install cost, and provider notes.
- Escalate to paid or metered APIs only when credentials are already present or the user explicitly consents.
Offline context? Filter to offline: true providers.
Preflight
Run cli-hub matrix preflight knowledge-research --json first. Use the manual block below for extra probes or older cli-hub versions.
cli-hub list --json
python - <<'PY'
import importlib.util
for m in ("pyzotero","bibtexparser","trafilatura","readability","httpx","playwright",
"python_docx","openpyxl","pptx","reportlab","pypdf","pdfplumber","weasyprint",
"langchain","llama_index","haystack"):
print(m, importlib.util.find_spec(m) is not None)
PY
for b in pandoc qpdf pdftk exiftool latexmk tex dot mermaid plantuml hugo mkdocs jekyll; do
command -v "$b" >/dev/null && echo "$b: yes" || echo "$b: no"
done
for e in OPENAI_API_KEY ANTHROPIC_API_KEY GOOGLE_API_KEY PERPLEXITY_API_KEY TAVILY_API_KEY \
SERPAPI_API_KEY BRAVE_API_KEY FIRECRAWL_API_KEY \
NOTION_API_KEY SEMANTIC_SCHOLAR_API_KEY WORDPRESS_TOKEN GHOST_ADMIN_KEY; do
[ -n "${!e}" ] && echo "$e: set" || echo "$e: unset"
done
Suggest-to-user template
To enable <capability> via <provider>, please set <ENV_VAR>.
Cost: <cost notes>
Quality: <quality tier>
Reply 'skip' to fall back to <next provider>.
Example: To enable live Notion sync via the Notion API, please set NOTION_API_KEY. Cost: free tier suffices for most users. Quality: sota (round-trips block/page IDs). Reply 'skip' to fall back to local Obsidian vault + manual paste.
Capabilities
research.search โ web / literature search
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-exa | harness-cli | EXA_API_KEY | metered | sota | no |
cli-anything-browser / safari / clibrowser | harness-cli | browser installed | free | high | no |
duckduckgo-search | python | pkg | free | good | no |
googlesearch-python | python | pkg | free | good | no |
| Perplexity API | api | PERPLEXITY_API_KEY | paid | sota | no |
| Tavily | api | TAVILY_API_KEY | metered | high | no |
| Brave Search | api | BRAVE_API_KEY | metered | good | no |
| Google Custom Search | api | GOOGLE_API_KEY | metered | high | no |
research.retrieve โ fetch article text / scrape pages
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-browser / safari / clibrowser | harness-cli | harness installed | free | high | no |
trafilatura | python | pkg | free | high | yes |
readability-lxml | python | pkg | free | good | yes |
httpx + manual parse | python | pkg | free | basic | yes |
playwright | python | pkg + browsers | free | sota | no |
| Firecrawl | api | FIRECRAWL_API_KEY | metered | sota | no |
| ScrapingBee | api | SCRAPINGBEE_API_KEY | paid | high | no |
research.reference โ citations, BibTeX, references
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-zotero | harness-cli | Zotero installed | free | sota | yes |
pyzotero | python | pkg + Zotero key | free | high | no |
habanero (Crossref) | python | pkg | free | high | no |
bibtexparser | python | pkg | free | good | yes |
| Crossref REST | api | none | free | high | no |
| Semantic Scholar | api | SEMANTIC_SCHOLAR_API_KEY (optional) | free | high | no |
knowledge.note โ note-taking / PKM
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-obsidian | harness-cli | Obsidian installed | free | sota | yes |
cli-anything-mubu | harness-cli | Mubu installed | free | high | no |
obsidian-cli | public-cli | bin + vault path | free | high | yes |
raw markdown + git | native | git | free | good | yes |
mdformat | python | pkg | free | good | yes |
| Notion API | api | NOTION_API_KEY | free | sota | no |
| Roam / Logseq sync | โ | โ | โ | โ | โ |
knowledge.outline โ outlining
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-mubu | harness-cli | Mubu installed | free | high | no |
OPML via lxml | python | pkg | free | good | yes |
YAML outlines + yq | native | binary | free | good | yes |
knowledge.synthesize โ multi-doc synthesis / summarization
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-notebooklm | harness-cli | harness installed | metered | sota | no |
cli-anything-intelwatch | harness-cli | harness installed | free-metered | high | partial |
langchain / llama_index + local LLM | python | pkg + ollama | free | high | yes |
haystack | python | pkg | free | good | partial |
| Claude / OpenAI / Gemini (long context) | api | model key | metered | sota | no |
document.author โ write docs (DOCX, XLSX, PPTX, PDF)
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-libreoffice | harness-cli | LibreOffice installed | free | high | yes |
cli-anything-anygen | harness-cli | harness installed | free | high | yes |
python-docx / openpyxl / python-pptx / reportlab | python | pkg | free | good | yes |
| Google Docs API | api | OAuth token | metered | sota | no |
| Microsoft Graph | api | Azure app + token | metered | sota | no |
document.format โ convert / reformat
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
pandoc | native | binary | free | sota | yes |
cli-anything-libreoffice (headless convert) | harness-cli | LibreOffice | free | high | yes |
docx2pdf | python | pkg + Word/LO | free | good | yes |
weasyprint | python | pkg | free | high | yes |
| CloudConvert | api | CLOUDCONVERT_API_KEY | metered | sota | no |
document.pdf โ manipulate PDFs (split, merge, OCR, extract)
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
qpdf / pdftk | native | binary | free | sota | yes |
pypdf | python | pkg | free | high | yes |
pdfplumber | python | pkg | free | high | yes |
pdf2image | python | pkg + poppler | free | good | yes |
ocrmypdf | native | binary + tesseract | free | high | yes |
| Adobe PDF Services | api | Adobe creds | paid | sota | no |
diagram.create โ diagrams, flowcharts, architecture
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
cli-anything-drawio | harness-cli | drawio installed | free | sota | yes |
cli-anything-mermaid | harness-cli | mermaid-cli | free | high | yes |
graphviz (dot) | native | binary | free | high | yes |
plantuml | native | binary | free | high | yes |
publish.web โ publish articles / blog posts
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
contentful | public-cli | bin + space token | metered | high | no |
sanity | public-cli | bin + project token | metered | high | no |
hugo / mkdocs / jekyll / pelican | native/python | binary/pkg | free | high | yes |
| WordPress REST | api | app password | free | high | no |
| Ghost Admin | api | GHOST_ADMIN_KEY | free | high | no |
| Medium (legacy) | api | token | free | good | no |
publish.latex โ build PDFs from LaTeX
| Provider | Kind | Requires | Cost | Quality | Offline |
|---|
latexmk + texlive | native | binaries | free | sota | yes |
pylatex | python | pkg + texlive | free | high | yes |
| Overleaf API | api | OVERLEAF_API_KEY | paid | sota | no |
| Tectonic | native | binary | free | high | yes |
Recipes
-
literature-review โ survey a topic, extract key papers, synthesize.
Uses: research.search, research.retrieve, research.reference, knowledge.synthesize, document.author.
-
meeting-to-doc โ turn a recorded meeting into a structured document.
Uses: text.transcribe (via S1 matrix), knowledge.synthesize, document.author, document.format.
-
blog-post โ research โ write โ publish.
Uses: research.search, research.retrieve, knowledge.synthesize, document.author, publish.web.
-
paper-draft โ research + citations + LaTeX.
Uses: research.reference, knowledge.synthesize, document.author, publish.latex.
-
competitive-brief โ fetch competitor pages, extract positioning, produce a briefing deck.
Uses: research.retrieve, knowledge.synthesize, document.author (PPTX).
-
pdf-book-digest โ ingest a PDF book, OCR if needed, produce chapter summaries + quote file.
Uses: document.pdf, knowledge.synthesize, knowledge.note.
Known gaps
- Notion โ Notion API exists as an escalation, but there is no harness for local-first Notion workflows. Recurs in S11 (team comms).
- Google Docs / Microsoft 365 live editing โ API path only.
- LaTeX harness โ
latexmk + Tectonic cover the native path, but no agent-native wrapper yet.
- Translation โ no first-party CLI; Python fallback and APIs only.
Agent guidance
- Run preflight once per session. Many capabilities have strong Python or native paths that bypass the need to install anything.
- Prefer local for sensitive docs (legal, health, internal): use Obsidian + local LLM +
whisper rather than cloud APIs.
- Citations are load-bearing โ for academic work always route through
research.reference so bibliographies stay consistent.
- Chain
document.format at the end, not the middle: author in the richest source format (markdown or LaTeX) and convert last.
- Workspace discipline โ one directory per project; keep
refs.bib, source markdown, figures, and output PDFs colocated.