Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

mcp-scraping

Expose the scraping toolkit as MCP tools so any MCP-compatible AI agent can autonomously discover, configure, and execute web scrapes. Use this to connect Claude Desktop, Claude Code, or any MCP client to the scraping engine.

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/heldernoid/scrapping --skill mcp-scraping

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

heldernoid/scrapping

Étoiles0

Forks0

Mis à jour11 avril 2026 à 15:43

SKILL.md

readonly

name	mcp-scraping
description	Expose the scraping toolkit as MCP tools so any MCP-compatible AI agent can autonomously discover, configure, and execute web scrapes. Use this to connect Claude Desktop, Claude Code, or any MCP client to the scraping engine.

What This Skill Does

Starts an MCP server that exposes 6 scraping tools over STDIO. Any MCP-compatible agent (Claude Desktop, Claude Code, Cursor, etc.) can then call these tools to scrape websites without leaving the conversation.

Exposed Tools

Tool	What it does
`fetch_page_structure(url)`	Fetches a URL and returns a compact HTML structure summary
`generate_scraper_config(url)`	LLM generates a scraper config from the page structure
`test_selector(url, selector)`	Tests a CSS selector and returns matching text
`run_scrape(config_json, max_results)`	Executes a scrape with the given config JSON string
`load_config(config_name)`	Loads a pre-built demo config by name
`list_configs()`	Lists all available pre-built configs

Preconditions

Python environment active (uv sync or pip install -r requirements.txt)
LLM backend configured in .env (needed for generate_scraper_config)
For demo sites: cd demo-sites && docker compose up -d

Steps

Start the MCP server:

python agents/mcp_server.py

Or using the MCP dev inspector:

mcp dev agents/mcp_server.py

Connect from Claude Desktop - add to claude_desktop_config.json:

{
  "mcpServers": {
    "scraper": {
      "command": "python",
      "args": ["/path/to/scrapping/agents/mcp_server.py"]
    }
  }
}

Connect from Claude Code:

claude mcp add scraper python /path/to/scrapping/agents/mcp_server.py

Typical Agent Workflow

Once connected, an agent can scrape a site in 3 tool calls:

1. fetch_page_structure("http://localhost:8001/products")
   -> Returns HTML structure summary

2. generate_scraper_config("http://localhost:8001/products")
   -> Returns JSON config string

3. run_scrape(config_json=<config from step 2>, max_results=20)
   -> Returns list of extracted records as JSON string

Pre-built Config Names

Pass these to load_config():

Name	Site
`shopsphere-ssr`	Product marketplace, server-side rendered
`shopsphere-csr`	Product marketplace, client-side rendered
`jobhive-ssr`	Job board, server-side rendered
`jobhive-csr`	Job board, client-side rendered

Example:

load_config("jobhive-ssr")
-> Returns the full config JSON string ready for run_scrape

Adapting for Your Own Agent

Add new tools to the MCP server by editing agents/mcp_server.py:

from mcp.server.fastmcp import FastMCP
mcp = FastMCP("scraper")

@mcp.tool()
async def my_custom_tool(url: str, my_param: str) -> str:
    # your logic here
    return json.dumps({"result": "..."})

Or embed the server as a subprocess and connect your own MCP client to its STDIO pipes:

import subprocess
proc = subprocess.Popen(
    ["python", "/path/to/scrapping/agents/mcp_server.py"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

For calling the underlying scraping functions directly (without MCP), see skills/integrate/SKILL.md.

Notes

All tools return JSON strings (MCP requires string returns)
run_scrape accepts either a JSON string or a dict
The server runs over STDIO - it does not bind to a network port
See agents/mcp_server.py for full tool implementations

Plus depuis ce dépôt

même dépôt

autonomous-scrape

heldernoid/scrapping

Run the full AI-driven scraping loop - the LLM generates a config, executes it, validates results, and repairs the config if extraction fails. Use this when you want to scrape a site with zero human configuration.

2026-04-110

generate-config

heldernoid/scrapping

Use an LLM to automatically generate a scraper config from a URL. Use this when you have a target URL and a list of fields to extract but no existing config. The LLM inspects the page structure and writes the JSON config for you.

2026-04-110

integrate

heldernoid/scrapping

Embed this repo's scraping primitives into your own autonomous agent. Use this when you want to call the scraping engine programmatically - not via CLI - and drive the generate/execute/repair loop from your own code with your own LLM.

2026-04-110

scrape-url

heldernoid/scrapping

Scrape structured data from a URL using a declarative JSON config. Use this when you have a scraper config (or can load one) and want to extract data from a website into structured JSON records.

2026-04-110

Source

heldernoid

heldernoid/scrapping

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name	mcp-scraping
description	Expose the scraping toolkit as MCP tools so any MCP-compatible AI agent can autonomously discover, configure, and execute web scrapes. Use this to connect Claude Desktop, Claude Code, or any MCP client to the scraping engine.

What This Skill Does

Exposed Tools

Tool	What it does
`fetch_page_structure(url)`	Fetches a URL and returns a compact HTML structure summary
`generate_scraper_config(url)`	LLM generates a scraper config from the page structure
`test_selector(url, selector)`	Tests a CSS selector and returns matching text
`run_scrape(config_json, max_results)`	Executes a scrape with the given config JSON string
`load_config(config_name)`	Loads a pre-built demo config by name
`list_configs()`	Lists all available pre-built configs

Preconditions

Python environment active (uv sync or pip install -r requirements.txt)
LLM backend configured in .env (needed for generate_scraper_config)
For demo sites: cd demo-sites && docker compose up -d

Steps

Start the MCP server:

python agents/mcp_server.py

Or using the MCP dev inspector:

mcp dev agents/mcp_server.py

Connect from Claude Desktop - add to claude_desktop_config.json:

{
  "mcpServers": {
    "scraper": {
      "command": "python",
      "args": ["/path/to/scrapping/agents/mcp_server.py"]
    }
  }
}

Connect from Claude Code:

claude mcp add scraper python /path/to/scrapping/agents/mcp_server.py

Typical Agent Workflow

Once connected, an agent can scrape a site in 3 tool calls:

1. fetch_page_structure("http://localhost:8001/products")
   -> Returns HTML structure summary

2. generate_scraper_config("http://localhost:8001/products")
   -> Returns JSON config string

3. run_scrape(config_json=<config from step 2>, max_results=20)
   -> Returns list of extracted records as JSON string

Pre-built Config Names

Pass these to load_config():

Name	Site
`shopsphere-ssr`	Product marketplace, server-side rendered
`shopsphere-csr`	Product marketplace, client-side rendered
`jobhive-ssr`	Job board, server-side rendered
`jobhive-csr`	Job board, client-side rendered

Example:

load_config("jobhive-ssr")
-> Returns the full config JSON string ready for run_scrape

Adapting for Your Own Agent

Add new tools to the MCP server by editing agents/mcp_server.py:

from mcp.server.fastmcp import FastMCP
mcp = FastMCP("scraper")

@mcp.tool()
async def my_custom_tool(url: str, my_param: str) -> str:
    # your logic here
    return json.dumps({"result": "..."})

Or embed the server as a subprocess and connect your own MCP client to its STDIO pipes:

import subprocess
proc = subprocess.Popen(
    ["python", "/path/to/scrapping/agents/mcp_server.py"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE
)

For calling the underlying scraping functions directly (without MCP), see skills/integrate/SKILL.md.

Notes

All tools return JSON strings (MCP requires string returns)
run_scrape accepts either a JSON string or a dict
The server runs over STDIO - it does not bind to a network port
See agents/mcp_server.py for full tool implementations