Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

deep-crawl

Name: Deep Crawl
Author: octos-org

// Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.

In Manus ausführen

$ git log --oneline --stat

stars:964

forks:66

updated:26. Mai 2026 um 03:32

Datei-Explorer

4 Dateien

SKILL.md

readonly

name	deep-crawl
description	Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.

deep_crawl

Overview

The deep_crawl tool recursively crawls a website using a headless Chrome browser via the Chrome DevTools Protocol (CDP). It renders JavaScript, follows same-origin links via BFS, extracts text content from each page, and saves results to disk. This is ideal for crawling JS-rendered SPAs, documentation sites, and any site that requires a full browser environment.

Requirements

Google Chrome or Chromium must be installed and available in PATH, or at a standard system location.
- macOS: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
- Linux: google-chrome, google-chrome-stable, or chromium-browser

Usage

Call the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.

Parameters

Parameter	Type	Required	Default	Description
`url`	string	yes	--	The seed URL to start crawling from
`max_depth`	integer	no	3	Maximum link-following depth (1-10)
`max_pages`	integer	no	50	Maximum number of pages to crawl (1-200)
`path_prefix`	string	no	--	Only follow links whose path starts with this prefix

Example

{
  "url": "https://docs.example.com/guide/",
  "max_depth": 3,
  "max_pages": 30,
  "path_prefix": "/guide/"
}

Output

The tool returns a JSON object on stdout:

{
  "output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...",
  "success": true
}

The output field contains:

A sitemap listing all crawled pages with their depth and status
A content preview (first ~2000 characters) for each page
The directory path where full page contents are saved as .md files

Results are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).

Behavior Details

Only http:// and https:// URLs are allowed
Only same-origin links are followed (no cross-domain crawling)
The crawler uses stealth techniques to avoid bot detection (custom user-agent, webdriver flag removal)
Pages that appear empty or bot-blocked are retried with longer wait times
URL fragments are stripped and trailing slashes normalized to avoid duplicate visits
Private/internal IP addresses are blocked (SSRF protection)

related-skills.json

gleiches Repository

harness-starter-audio.md

from "octos-org/octos"

Harnessed audio-artifact starter. Synthesizes a minimal WAV file under audio/ and relies on the workspace contract to deliver it.

2026-05-27964

harness-starter-coding.md

from "octos-org/octos"

Harnessed coding-assistant starter. Produces a unified-diff artifact and a file-list preview under patches/.

2026-05-27964

harness-starter-generic.md

from "octos-org/octos"

Minimal harnessed single-artifact starter. Use as a template for a custom app that produces one deliverable.

2026-05-27964

harness-starter-report.md

from "octos-org/octos"

Harnessed report-generator starter. Writes a markdown artifact under reports/ and relies on the workspace contract to deliver it.

2026-05-27964

voice.md

from "octos-org/octos"

OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.

2026-05-27964

deep-search.md

from "octos-org/octos"

Deep multi-round web research with parallel fetching. Triggers: deep search, research, 深度搜索, 调研, investigate, deep research.

2026-05-22964

package.json

"author": "octos-org"

"repository": "octos-org/octos"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

SoftwareentwicklerInformatik- und Mathematikberufe15-1252L4

deep_crawl

Overview

Requirements

Google Chrome or Chromium must be installed and available in PATH, or at a standard system location.

macOS: /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
Linux: google-chrome, google-chrome-stable, or chromium-browser

Usage

Call the deep_crawl tool with a starting URL. The crawler will follow same-origin links up to the specified depth and page limits.

Parameters

Parameter

Type

Required

Default

Description

url

string

yes

The seed URL to start crawling from

max_depth

integer

Maximum link-following depth (1-10)

max_pages

integer

Maximum number of pages to crawl (1-200)

path_prefix

string

Only follow links whose path starts with this prefix

Example

{ "url": "https://docs.example.com/guide/", "max_depth": 3, "max_pages": 30, "path_prefix": "/guide/" }

Output

The tool returns a JSON object on stdout:

{ "output": "# Deep Crawl: https://docs.example.com/guide/\nCrawled 12 pages ...\n\n## Sitemap\n1. [depth=0] https://docs.example.com/guide/ (OK)\n...", "success": true }

The output field contains:

A sitemap listing all crawled pages with their depth and status

A content preview (first ~2000 characters) for each page

The directory path where full page contents are saved as .md files

Results are saved to a research directory named crawl-<hostname>/ under the current working directory. Each page is saved as a numbered markdown file (e.g., 000_index.md, 001_docs_install.md).

Behavior Details

Only http:// and https:// URLs are allowed

Only same-origin links are followed (no cross-domain crawling)

The crawler uses stealth techniques to avoid bot detection (custom user-agent, webdriver flag removal)

Pages that appear empty or bot-blocked are retried with longer wait times

URL fragments are stripped and trailing slashes normalized to avoid duplicate visits

Private/internal IP addresses are blocked (SSRF protection)

deep-crawl

deep_crawl

Overview

Requirements

Usage

Parameters

Example

Output

Behavior Details

Mehr aus diesem Repository

Mehr aus diesem Repository

deep_crawl

Overview

Requirements

Usage

Parameters

Example

Output

Behavior Details