with one click
deep-crawl
// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.
// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.
OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.
Deep multi-round web research with parallel fetching. Triggers: deep search, research, 深度搜索, 调研, investigate, deep research.
Validates and optimizes run_pipeline DOT graphs with model selection from QoS catalog
Manage sub-accounts under the current profile. Triggers: create account, 创建账号, sub account, manage account, list accounts, 子账号.
Send emails via SMTP or Feishu/Lark Mail. Triggers: send email, 发邮件, email to, 发送邮件, mail, send mail.
Get current weather for any city worldwide. Triggers: weather, forecast, temperature, 天气, 气温, how cold, how hot, is it raining, wind.
| name | deep-crawl |
| description | Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content. |
The deep_crawl tool recursively crawls a website using a headless Chrome browser via the Chrome DevTools Protocol (CDP). It renders JavaScript, follows same-origin links via BFS, extracts text content from each page, and saves results to disk. This is ideal for crawling JS-rendered SPAs, documentation sites, and any site that requires a full browser environment.
/Applications/Google Chrome.app/Contents/MacOS/Google Chromegoogle-chrome, google-chrome-stable, or chromium-browserCall the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | yes | -- | The seed URL to start crawling from |
max_depth | integer | no | 3 | Maximum link-following depth (1-10) |
max_pages | integer | no | 50 | Maximum number of pages to crawl (1-200) |
path_prefix | string | no | -- | Only follow links whose path starts with this prefix |
{
"url": "https://docs.example.com/guide/",
"max_depth": 3,
"max_pages": 30,
"path_prefix": "/guide/"
}
The tool returns a JSON object on stdout:
{
"output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...",
"success": true
}
The output field contains:
.md filesResults are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).
http:// and https:// URLs are allowed