Run any Skill in Manus with one click

$pwd:

deep-crawl

Name: Deep Crawl
Author: ZhangHanDong

// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.

Run Skill in Manus

$ git log --oneline --stat

stars:1

forks:1

updated:April 4, 2026 at 23:15

File Explorer

4 files

SKILL.md

readonly

name	deep-crawl
description	Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.

deep_crawl

Overview

The deep_crawl tool recursively crawls a website using a headless Chrome browser via the Chrome DevTools Protocol (CDP). It renders JavaScript, follows same-origin links via BFS, extracts text content from each page, and saves results to disk. This is ideal for crawling JS-rendered SPAs, documentation sites, and any site that requires a full browser environment.

Requirements

Google Chrome or Chromium must be installed and available in PATH, or at a standard system location.
- macOS: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
- Linux: google-chrome, google-chrome-stable, or chromium-browser

Usage

Call the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.

Parameters

Parameter	Type	Required	Default	Description
`url`	string	yes	--	The seed URL to start crawling from
`max_depth`	integer	no	3	Maximum link-following depth (1-10)
`max_pages`	integer	no	50	Maximum number of pages to crawl (1-200)
`path_prefix`	string	no	--	Only follow links whose path starts with this prefix

Example

{
  "url": "https://docs.example.com/guide/",
  "max_depth": 3,
  "max_pages": 30,
  "path_prefix": "/guide/"
}

Output

The tool returns a JSON object on stdout:

{
  "output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...",
  "success": true
}

The output field contains:

A sitemap listing all crawled pages with their depth and status
A content preview (first ~2000 characters) for each page
The directory path where full page contents are saved as .md files

Results are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).

Behavior Details

Only http:// and https:// URLs are allowed
Only same-origin links are followed (no cross-domain crawling)
The crawler uses stealth techniques to avoid bot detection (custom user-agent, webdriver flag removal)
Pages that appear empty or bot-blocked are retried with longer wait times
URL fragments are stripped and trailing slashes normalized to avoid duplicate visits
Private/internal IP addresses are blocked (SSRF protection)

related-skills.json

same repository

voice.md

from "ZhangHanDong/octos"

OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.

2026-04-011

deep-search.md

from "ZhangHanDong/octos"

Deep multi-round web research with parallel fetching. Triggers: deep search, research, 深度搜索, 调研, investigate, deep research.

2026-04-011

pipeline-guard.md

from "ZhangHanDong/octos"

Validates and optimizes run_pipeline DOT graphs with model selection from QoS catalog

2026-03-271

account-manager.md

from "ZhangHanDong/octos"

Manage sub-accounts under the current profile. Triggers: create account, 创建账号, sub account, manage account, list accounts, 子账号.

2026-03-161

send-email.md

from "ZhangHanDong/octos"

Send emails via SMTP or Feishu/Lark Mail. Triggers: send email, 发邮件, email to, 发送邮件, mail, send mail.

2026-03-161

weather.md

from "ZhangHanDong/octos"

Get current weather for any city worldwide. Triggers: weather, forecast, temperature, 天气, 气温, how cold, how hot, is it raining, wind.

2026-03-161

package.json

"author": "ZhangHanDong"

"repository": "ZhangHanDong/octos"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

deep_crawl

Overview

Requirements

Google Chrome or Chromium must be installed and available in PATH, or at a standard system location.

macOS: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Linux: google-chrome, google-chrome-stable, or chromium-browser

Usage

Call the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.

Parameters

Parameter

Type

Required

Default

Description

url

string

yes

The seed URL to start crawling from

max_depth

integer

Maximum link-following depth (1-10)

max_pages

integer

Maximum number of pages to crawl (1-200)

path_prefix

string

Only follow links whose path starts with this prefix

Example

{ "url": "https://docs.example.com/guide/", "max_depth": 3, "max_pages": 30, "path_prefix": "/guide/" }

Output

The tool returns a JSON object on stdout:

{ "output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...", "success": true }

The output field contains:

A sitemap listing all crawled pages with their depth and status

A content preview (first ~2000 characters) for each page

The directory path where full page contents are saved as .md files

Results are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).

Behavior Details

Only http:// and https:// URLs are allowed

Only same-origin links are followed (no cross-domain crawling)

The crawler uses stealth techniques to avoid bot detection (custom user-agent, webdriver flag removal)

Pages that appear empty or bot-blocked are retried with longer wait times

URL fragments are stripped and trailing slashes normalized to avoid duplicate visits

Private/internal IP addresses are blocked (SSRF protection)

deep-crawl

deep_crawl

Overview

Requirements

Usage

Parameters

Example

Output

Behavior Details

More from this repository

More from this repository

deep_crawl

Overview

Requirements

Usage

Parameters

Example

Output

Behavior Details