Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

Commencer

crawler

Universal web crawling and content extraction skill.

Exécuter dans Manus

Aperçu

Universal web crawling and content extraction skill.

Commande d'installation

npx skills add https://github.com/MarshallEriksen-Neura/Deeting --skill crawler

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

MarshallEriksen-Neura/Deeting

Étoiles56

Forks5

Mis à jour25 mars 2026 à 09:52

Explorateur de fichiers

7 fichiers

SKILL.md

readonly

name	crawler
description	Universal web crawling and content extraction skill.

Scout Crawler

Overview

The Scout Crawler is a versatile tool for extracting structured information from the web. It can fetch individual pages, render JavaScript-heavy sites, or recursively crawl an entire domain to build a knowledge base.

Core Tools

fetch_web_content: Fetches the clean, readable content of a single URL and returns it in Markdown format. Useful for quick information retrieval from articles or documentation.
crawl_website: Automatically follows links from a starting URL to a specified depth and page count. Ideal for comprehensive domain analysis or preparing data for RAG (Retrieval-Augmented Generation).

Usage Guidelines

Rule 1: Use JS Mode: Always keep js_mode=true for modern web applications (SPAs) like React or Vue sites to ensure content is fully rendered.
Rule 2: Respect Robots.txt: Be aware that excessive crawling can be blocked by servers. Use moderate max_depth and max_pages for broad crawls.
Rule 3: URL Precision: Ensure URLs include the scheme (e.g., https://).

Anti-Patterns

Avoid Deep Crawls on Large Sites: Do NOT set high max_depth on massive domains (like Wikipedia or Amazon) as it will waste resources and may get blocked.
Do NOT use for API endpoints: This tool is designed for human-readable web pages, not for raw JSON/XML APIs.

Plus depuis ce dépôt

même dépôt

provider-doc-ingestion

MarshallEriksen-Neura/Deeting

Provider documentation acquisition and evidence-first candidate drafting for desktop-local provider onboarding.

2026-05-2056

provider-registry

MarshallEriksen-Neura/Deeting

Expert plugin for discovering AI Providers and writing local provider presets.

2026-05-2056

monitor

MarshallEriksen-Neura/Deeting

Proactive monitoring and alerting system.

2026-04-1756

skill-manager

MarshallEriksen-Neura/Deeting

Guidance for installing skills into shared directories that Deeting scans and can refresh directly.

2026-03-1656

hello-deeting

MarshallEriksen-Neura/Deeting

A simple local Deeting plugin that greets the user and emits a UI card.

2026-03-1456

Source

MarshallEriksen-Neura

MarshallEriksen-Neura/Deeting

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	crawler
description	Universal web crawling and content extraction skill.

Scout Crawler

Overview

Core Tools

fetch_web_content: Fetches the clean, readable content of a single URL and returns it in Markdown format. Useful for quick information retrieval from articles or documentation.
crawl_website: Automatically follows links from a starting URL to a specified depth and page count. Ideal for comprehensive domain analysis or preparing data for RAG (Retrieval-Augmented Generation).

Usage Guidelines

Rule 1: Use JS Mode: Always keep js_mode=true for modern web applications (SPAs) like React or Vue sites to ensure content is fully rendered.
Rule 2: Respect Robots.txt: Be aware that excessive crawling can be blocked by servers. Use moderate max_depth and max_pages for broad crawls.
Rule 3: URL Precision: Ensure URLs include the scheme (e.g., https://).

Anti-Patterns

Avoid Deep Crawls on Large Sites: Do NOT set high max_depth on massive domains (like Wikipedia or Amazon) as it will waste resources and may get blocked.
Do NOT use for API endpoints: This tool is designed for human-readable web pages, not for raw JSON/XML APIs.