ワンクリックでManusで任意のスキルを実行

$pwd:

fetch4ai

Name: Fetch4ai
Author: Xueheng-Li

// MUST USE THIS SKILL when the user asks or an agent needs to "fetch web content", "crawl a page", "use crawl4ai", "extract content from URL", "fetch with filtering", "get clean markdown from webpage", "research with content filtering", or needs to fetch web pages with customizable noise removal for LLM processing.

Manusで実行

$ git log --oneline --stat

stars:43

forks:12

updated:2026年1月13日 14:15

ファイルエクスプローラー

3 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

arxiv.md

from "Xueheng-Li/sysu-awesome-cc"

搜索 arxiv 论文并总结。当用户说"找寻XX的论文"、"搜索XX的论文"、"找arxiv上XX主题的论文"时使用。

2026-01-2543

frontend-design.md

from "Xueheng-Li/sysu-awesome-cc"

创建独特、生产级的高质量前端界面。当用户要求构建 Web 组件、页面、作品、海报或应用程序时使用此技能（例如网站、着陆页、仪表板、React 组件、HTML/CSS 布局，或对任何 Web UI 进行样式/美化）。生成富有创意、精致的代码和 UI 设计，避免通用的 AI 美学风格。

2026-01-2543

github-trending.md

from "Xueheng-Li/sysu-awesome-cc"

获取 GitHub 热门项目信息。当用户说"获取 github trending"、"今日/本周/本月热门项目"、"github 上有什么热门"时使用。

2026-01-2543

mineru-pdf-converter.md

from "Xueheng-Li/sysu-awesome-cc"

This skill should be used when the user asks to "convert PDF to markdown", "use MinerU to convert [file]", "extract text from PDF", "PDF转Markdown", "转换PDF [路径]", "MinerU转换 [file]", "/mineru [file path]", or needs high-quality document conversion with formula and table recognition.

2026-01-2543

cc-insights.md

from "Xueheng-Li/sysu-awesome-cc"

This skill should be used when the user asks to "归档聊天记录", "archive my chats", "分析我与CC的交互", "analyze my Claude Code usage", "反思我的CC使用习惯", "生成CC洞察报告", "深度分析CC使用模式", "更新聊天归档", or mentions keywords like "交互日志", "使用模式分析", "CC insights", "deep analysis". Provides automated archiving and deep analysis of Claude Code interaction history.

2026-01-1343

chat-history-summarizer.md

from "Xueheng-Li/sysu-awesome-cc"

Extract and summarize Claude Code chat history into structured documentation. Use when the user asks to export, summarize, or document a conversation session, extract prompts and actions from chat logs, or create a record of what was accomplished in a session.

2026-01-1343

package.json

"author": "Xueheng-Li"

"repository": "Xueheng-Li/sysu-awesome-cc"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ウェブ開発者コンピュータ・数学職15-1254L4

name	fetch4ai
description	MUST USE THIS SKILL when the user asks or an agent needs to "fetch web content", "crawl a page", "use crawl4ai", "extract content from URL", "fetch with filtering", "get clean markdown from webpage", "research with content filtering", or needs to fetch web pages with customizable noise removal for LLM processing.
version	0.1.0
allowed-tools	Read, Write, Bash

fetch4ai Skill

Fetch web content using crawl4ai with customizable filtering strategies. Produces clean, LLM-ready markdown with noise removed.

Can be used as:

Standalone CLI tool - Simple command-line web fetching with clean output
web-research backend - Fetching layer for research workflows

Prerequisites

Ensure crawl4ai is installed:

pip install -U crawl4ai
crawl4ai-setup  # First-time setup for Playwright

Standalone Quick Use

For simple fetching when you just want clean markdown:

# Simplest: fetch URL, get markdown output
python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/article" \
  --format markdown

# With timeout control (default: 30s)
python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://slow-site.com/page" \
  --format md \
  --timeout 60

# Save directly to file
python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com" \
  --format markdown \
  -o content.md

Quiet Mode

Suppress crawl4ai status messages for clean piping:

# Clean output for piping to other tools
python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com" \
  --format md \
  --quiet

# Short form
python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com" -q --format md

Shell Alias (Optional)

Add to your ~/.zshrc or ~/.bashrc:

alias fetch4ai='python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py'

# Then use simply:
# fetch4ai --url "https://example.com" --format md -q

Quick Start

Basic Fetch (Pruning Filter - Default)

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/article" \
  --strategy pruning

Query-Focused Fetch (BM25)

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/article" \
  --strategy bm25 \
  --query "machine learning applications"

Clean Article Extraction (Tag Exclusion)

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/article" \
  --strategy tags \
  --excluded-tags "nav,footer,aside,header"

Filtering Strategies

Strategy 1: Pruning (Default)

Automatically removes low-quality content by scoring text density, link density, and tag importance.

When to use:

General content extraction from any webpage
Articles, blog posts, documentation
Cases without a specific search query

Parameters:

--threshold (0.0-1.0, default 0.48): Higher = stricter filtering
--min-words (default 5): Minimum words per content block

Example:

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://en.wikipedia.org/wiki/Artificial_intelligence" \
  --strategy pruning \
  --threshold 0.5

Strategy 2: BM25 (Query-Relevant)

Uses BM25 ranking algorithm to extract only content relevant to your search query.

When to use:

Focused research on specific topics
Extracting relevant sections from long pages
Targeted extraction with known search terms

Parameters:

--query (required): Search terms for relevance scoring
--bm25-threshold (default 1.2): Minimum relevance score

Example:

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://docs.python.org/3/tutorial/" \
  --strategy bm25 \
  --query "list comprehension syntax"

Strategy 3: Tag Exclusion

Removes specific HTML elements and filters by word count.

When to use:

Clean article extraction
Removing navigation, footers, sidebars
Pages with predictable noise elements

Parameters:

--excluded-tags (comma-separated): Tags to remove
--word-count-threshold (default 10): Minimum words per block

Common tag presets:

Article: nav,footer,header,aside
Minimal: nav,footer
Aggressive: nav,footer,header,aside,advertisement,script,style

Example:

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/blog/post" \
  --strategy tags \
  --excluded-tags "nav,footer,aside,header,advertisement" \
  --word-count-threshold 15

Strategy 4: Composite (Multi-Pass)

Combine strategies for high-precision extraction: Pruning first, then BM25.

When to use:

Research requiring both noise removal and relevance filtering
Long pages with scattered relevant content
Maximum precision extraction

Example:

python ~/.claude/skills/fetch4ai/scripts/fetch4ai.py \
  --url "https://example.com/research-paper" \
  --strategy composite \
  --threshold 0.4 \
  --query "experimental results methodology"

Output Format

The script returns JSON with:

{
  "success": true,
  "url": "https://example.com/article",
  "title": "Page Title",
  "content": "# Clean markdown content...",
  "links": [
    {"text": "Link Text", "href": "https://..."}
  ],
  "stats": {
    "raw_length": 45000,
    "fit_length": 12000,
    "reduction_percent": 73.3
  },
  "strategy": "pruning",
  "metadata": {
    "fetch_time": "2025-01-04T10:30:00",
    "word_count": 2500
  }
}

Advanced Options

Output Format

# JSON with full metadata (default)
--format json

# Plain markdown content only (great for piping)
--format markdown
--format md

Timeout Control

# Default is 30 seconds
--timeout 60  # 60 seconds for slow pages

Include/Exclude Links and Images

# Include links (default: true)
--include-links

# Include image references
--include-images

# Exclude external links (keep only same-domain)
--exclude-external-links

Session Management (Multi-Page)

For crawling multiple pages with shared browser state:

# First page
python fetch4ai.py --url "https://example.com/page1" --session-id "my_session"

# Subsequent pages (shares cookies, state)
python fetch4ai.py --url "https://example.com/page2" --session-id "my_session"

Output to File

python fetch4ai.py --url "https://example.com" --output result.json

Integration with web-research Skill

fetch4ai serves as the fetching layer for the web-research skill:

web-research spawns research subagents
Subagents use fetch4ai to get clean content
Content is saved to findings files
web-research synthesizes all findings

Usage in research workflow:

# In research subagent prompt:
Use fetch4ai to get content from [URL] with BM25 filtering for "[query]".
Save the fit_markdown to findings_[topic].md.

Error Handling

The script handles common errors:

Network timeouts (30s default)
Invalid URLs
JavaScript-heavy pages (Playwright handles JS)
Empty content after filtering

Errors return:

{
  "success": false,
  "url": "https://...",
  "error": "Error description",
  "error_type": "timeout|network|parsing|empty_content"
}

Strategy Selection Guide

Scenario	Strategy	Key Parameters
General article	`pruning`	`--threshold 0.48`
Specific topic search	`bm25`	`--query "your terms"`
Blog/news extraction	`tags`	`--excluded-tags "nav,footer,aside"`
Research paper sections	`composite`	`--threshold 0.4 --query "..."`
Documentation pages	`pruning`	`--threshold 0.3` (lower for docs)
Product listings	`tags`	`--word-count-threshold 20`

Reference Documentation

For detailed strategy comparisons and advanced patterns:

See references/filtering-strategies.md

fetch4ai

このリポジトリの他の Skills

このリポジトリの他の Skills

fetch4ai Skill

Prerequisites

Standalone Quick Use

Quiet Mode

Shell Alias (Optional)

Quick Start

Basic Fetch (Pruning Filter - Default)

Query-Focused Fetch (BM25)

Clean Article Extraction (Tag Exclusion)

Filtering Strategies

Strategy 1: Pruning (Default)

Strategy 2: BM25 (Query-Relevant)

Strategy 3: Tag Exclusion

Strategy 4: Composite (Multi-Pass)

Output Format

Advanced Options

Output Format

Timeout Control

Include/Exclude Links and Images

Session Management (Multi-Page)

Output to File

Integration with web-research Skill

Error Handling

Strategy Selection Guide

Reference Documentation

fetch4ai Skill

Prerequisites

Standalone Quick Use

Quiet Mode

Shell Alias (Optional)

Quick Start

Basic Fetch (Pruning Filter - Default)

Query-Focused Fetch (BM25)

Clean Article Extraction (Tag Exclusion)

Filtering Strategies

Strategy 1: Pruning (Default)

Strategy 2: BM25 (Query-Relevant)

Strategy 3: Tag Exclusion

Strategy 4: Composite (Multi-Pass)

Output Format

Advanced Options

Output Format

Timeout Control

Include/Exclude Links and Images

Session Management (Multi-Page)

Output to File

Integration with web-research Skill

Error Handling

Strategy Selection Guide

Reference Documentation