mit einem Klick
deep-crawl
// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.
// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.
Harnessed audio-artifact starter. Synthesizes a minimal WAV file under audio/ and relies on the workspace contract to deliver it.
Harnessed coding-assistant starter. Produces a unified-diff artifact and a file-list preview under patches/.
Minimal harnessed single-artifact starter. Use as a template for a custom app that produces one deliverable.
Harnessed report-generator starter. Writes a markdown artifact under reports/ and relies on the workspace contract to deliver it.
OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.
Deep multi-round web research with parallel fetching. Triggers: deep search, research, 深度搜索, 调研, investigate, deep research.
| name | deep-crawl |
| description | Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content. |
The deep_crawl tool recursively crawls a website using a headless Chrome browser via the Chrome DevTools Protocol (CDP). It renders JavaScript, follows same-origin links via BFS, extracts text content from each page, and saves results to disk. This is ideal for crawling JS-rendered SPAs, documentation sites, and any site that requires a full browser environment.
/Applications/Google Chrome.app/Contents/MacOS/Google Chromegoogle-chrome, google-chrome-stable, or chromium-browserCall the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | yes | -- | The seed URL to start crawling from |
max_depth | integer | no | 3 | Maximum link-following depth (1-10) |
max_pages | integer | no | 50 | Maximum number of pages to crawl (1-200) |
path_prefix | string | no | -- | Only follow links whose path starts with this prefix |
{
"url": "https://docs.example.com/guide/",
"max_depth": 3,
"max_pages": 30,
"path_prefix": "/guide/"
}
The tool returns a JSON object on stdout:
{
"output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...",
"success": true
}
The output field contains:
.md filesResults are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).
http:// and https:// URLs are allowed