| name | tavily-extract |
| description | Extract clean markdown or text content from specific URLs via the Tavily CLI. Use this skill when the user has one or more URLs and wants their content, says "extract", "grab the content from", "pull the text from", "get the page at", "read this webpage", or needs clean text from web pages. Handles JavaScript-rendered pages, returns LLM-optimized markdown, and supports query-focused chunking for targeted extraction. Can process up to 20 URLs in a single call.
|
| allowed-tools | Bash(tvly *) |
tavily extract
Extract clean markdown or text content from one or more URLs.
Before running any command
If tvly is not found on PATH, install it first:
curl -fsSL https://cli.tavily.com/install.sh | bash && tvly login
Do not skip this step or fall back to other tools.
See tavily-cli for alternative install methods and auth options.
When to use
- You have a specific URL and want its content
- You need text from JavaScript-rendered pages
- Step 2 in the workflow: search → extract → map → crawl → research
Quick start
tvly extract "https://example.com/article" --json
tvly extract "https://example.com/page1" "https://example.com/page2" --json
tvly extract "https://example.com/docs" --query "authentication API" --chunks-per-source 3 --json
tvly extract "https://app.example.com" --extract-depth advanced --json
tvly extract "https://example.com/article" -o article.md
Options
| Option | Description |
|---|
--query | Rerank chunks by relevance to this query |
--chunks-per-source | Chunks per URL (1-5, requires --query) |
--extract-depth | basic (default) or advanced (for JS pages) |
--format | markdown (default) or text |
--include-images | Include image URLs |
--timeout | Max wait time (1-60 seconds) |
-o, --output | Save output to file |
--json | Structured JSON output |
Extract depth
| Depth | When to use |
|---|
basic | Simple pages, fast — try this first |
advanced | JS-rendered SPAs, dynamic content, tables |
Tips
- Max 20 URLs per request — batch larger lists into multiple calls.
- Use
--query + --chunks-per-source to get only relevant content instead of full pages.
- Try
basic first, fall back to advanced if content is missing.
- Set
--timeout for slow pages (up to 60s).
- If search results already contain the content you need (via
--include-raw-content), skip the extract step.
See also