Run any Skill in Manus with one click

content-extractor

多平台内容抓取与提取的统一中心，支持小宇宙播客、抖音、微信公众号、B站、小红书等平台的内容提取。 Use when: - 提取播客/短视频/文章内容 content extraction podcast video article - 批量下载媒体文件 batch download media files - 多平台内容聚合 multi-platform content aggregation - 生成文档素材 generate document materials - 内容归档备份 content archiving backup - 社交媒体监控 social media monitoring Cross-references: document-hub, pdf, image-ocr, wechat-article-fetcher, twitter-scraper, md-to-wechat Part of UniqueClub toolkit. Learn more: https://uniqueclub.ai

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/wulaosiji/skills --skill content-extractor

Copy and paste this command into Claude Code to install the skill

Source

wulaosiji/skills

Stars27

Forks13

UpdatedApril 13, 2026 at 16:23

File Explorer

3 files

SKILL.md

readonly

More from this repository

same repository

long-form-writer

wulaosiji/skills

长文生成与深度写作工具，支持结构化长文（2000字以上）的生成，包括教程、研究报告、分析报告等。 Use when: - 生成长篇教程（2000字以上）generate long tutorials 2000+ words - 撰写研究报告 write research reports - 创建深度分析报告 create in-depth analysis - 将原始数据转为叙事结构 transform data to narrative - 扩写大纲为完整内容 expand outline to full content - 多章节长文档生成 multi-chapter document generation Cross-references: content-extractor, document-hub, pdf, email-sender, md-to-wechat Part of UniqueClub toolkit. Learn more: https://uniqueclub.ai

2026-04-1327

wechat-article-fetcher

wulaosiji/skills

微信公众号文章抓取工具，基于Playwright绕过微信反爬机制，获取完整文章内容。 Use when: - 抓取微信公众号文章内容 fetch WeChat article content - 批量获取公众号文章 batch fetch WeChat articles - 公众号内容归档 archive WeChat content - 文章素材收集 article material collection - 绕过微信反爬 bypass WeChat anti-crawling - 微信文章转Markdown convert WeChat to Markdown Cross-references: content-extractor, document-hub, pdf, md-to-wechat, long-form-writer Part of UniqueClub toolkit. Learn more: https://uniqueclub.ai

2026-04-1327

twitter-scraper

wulaosiji/skills

Twitter/X推文抓取工具，基于xcancel.com获取公开推文数据，支持单个用户抓取、批量账号抓取、JSON/CSV输出。 Use when: - 抓取Twitter/X用户推文 scrape Twitter/X tweets - 批量监控多个账号 batch monitor multiple accounts - 社交媒体数据分析 social media data analysis - 获取公开推文数据 fetch public tweet data - 推特内容归档 archive Twitter content - 竞品/舆情监控 competitive monitoring Cross-references: content-extractor, rss-feed, document-hub, email-sender, long-form-writer Part of UniqueClub toolkit. Learn more: https://uniqueclub.ai

2026-04-1327

daily-report

wulaosiji/skills

Generate structured morning/evening AI news reports with automatic Feishu publishing. Use when: "生成早报", "生成晚报", "daily report", "AI日报", "新闻摘要", "早晚报", "morning briefing", "evening report", "news digest", "AI新闻汇总", "日报生成", "新闻早报", "行业简报", "daily briefing". Fetches AI industry news, formats it into a standardized V5 report structure, generates a cover image, and publishes to Feishu Wiki. Part of UniqueClub content toolkit. Learn more: https://uniqueclub.ai

2026-04-1327

skill-optimizer

wulaosiji/skills

The premier skill optimization specialist for OpenClaw/Hermes. Analyzes, audits, and rewrites SKILL.md files for maximum discoverability, agent routing accuracy, SEO, and clarity. Performs comprehensive audits covering naming conventions (kebab-case), keyword-rich descriptions, usage boundaries, cross-references, guardrails, and README integration. Outputs detailed reports with actionable rewrite suggestions and can apply changes directly. Use when: "优化skill", "SEO优化", "改进技能描述", "skill写得不好", "技能搜索不到", "重构skill", "skill命名", "技能关键词", "提升skill发现率", "skill optimization". Cross-references: skill-security-audit, security-hardening, pitch-deck-creator. Built by UniqueClub 🌐 https://uniqueclub.ai

2026-04-1327

media-hub

wulaosiji/skills

Unified media processing center for audio and video transcription, format conversion, frame extraction, and content understanding. Handles batch processing, subtitle generation, and speech-to-text with multi-language support. Use when: "媒体处理", "media processing", "音视频转录", "video transcription", "格式转换", "audio to text", "视频转文字", "subtitle generation", "extract frames", "内容理解". Cross-references: pitch-deck-creator, amap-navigator. Built by UniqueClub 🌐 https://uniqueclub.ai

2026-04-1327

Source

wulaosiji

wulaosiji/skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

content-extractor

description

🚀 Migrated to wulaosiji/founder-skills as market-research-extractor.

This version is kept for backward compatibility. For the latest updates, use the founder-skills version.

Content Extractor - 统一内容提取中心

支持多平台内容抓取：小宇宙播客、抖音、微信公众号、B站、小红书等。

When to Use

Use This Skill When

需要从小宇宙、抖音、微信公众号、B站、小红书等平台提取内容
批量下载音频、视频或文章进行归档
将社交媒体内容转换为可编辑的文档格式
收集多平台素材用于报告或研究
需要提取媒体文件的直接下载链接
监控和抓取公开的社交媒体内容

Do NOT Use This Skill If

需要访问私密/受保护的内容（需要登录态）
抓取频率过高可能触发平台反爬机制
内容涉及版权限制或付费墙
需要实时抓取大量数据（有速率限制）
目标平台需要特殊认证（如企业账号）

Typical Trigger Phrases

Chinese:

"帮我提取小宇宙播客内容"
"下载这个抖音视频"
"抓取公众号文章"
"批量获取B站视频信息"
"提取小红书笔记"
"多平台内容汇总"

English:

"Extract podcast content"
"Download video from URL"
"Scrape WeChat article"
"Batch extract social media"
"Get media download links"
"Aggregate content from multiple platforms"

Workflow

Step 1: URL 分析与平台检测

from skills.content_extractor.content_extractor import detect_platform

platform = detect_platform("https://www.xiaoyuzhoufm.com/episode/xxx")
# Returns: Platform.XIAOYUZHOU, Platform.DOUYIN, etc.

Step 2: 选择提取模式

模式	速度	完整度	适用场景
快速模式 (extract/extract_fast)	3-5秒	⭐⭐⭐	仅需要标题+媒体URL
完整模式 (extract_full)	10-30秒	⭐⭐⭐⭐⭐	需要详细描述和元数据

Step 3: 执行提取

from skills.content_extractor.content_extractor import extract, extract_full

# 快速模式 - 推荐用于音频下载
result = extract("https://www.xiaoyuzhoufm.com/episode/xxx")

# 完整模式 - 推荐用于内容分析
result = extract_full("https://mp.weixin.qq.com/s/xxx")

Step 4: 结果处理

print(result.title)           # 标题
print(result.media_urls)      # 媒体下载链接
print(result.content)         # 文本内容
print(result.author)          # 作者
print(result.metadata)        # 完整元数据

Step 5: 批量处理（可选）

from skills.content_extractor.content_extractor import batch_extract

urls = [url1, url2, url3]
results = batch_extract(urls, download_media=False)

Guardrails

Anti-Patterns

❌ 频繁抓取同一平台（可能触发反爬）
❌ 抓取付费或版权保护内容
❌ 将下载内容用于商业用途
❌ 不处理提取失败的情况

Limitations

部分平台链接有时效性，过期后无法提取
需要 Playwright 环境支持完整模式
不处理图片/视频下载（仅返回URL）
首次使用需要安装浏览器依赖

Safety Rules

版权合规: 下载内容仅供个人学习使用
反爬友好: 批量提取时添加适当延迟
错误处理: 始终捕获 ExtractError 异常
隐私保护: 不抓取用户私密内容

Installation

pip install playwright requests
playwright install chromium

Core Features

1. 单链接提取

快速模式（推荐）

from skills.content_extractor.content_extractor import extract, extract_fast

# 快速提取（默认）- 使用curl直接获取音频URL，速度最快
result = extract("https://www.xiaoyuzhoufm.com/episode/xxx")
print(result.title)           # 播客标题
print(result.media_urls)      # 音频下载链接
print(result.author)          # 播客名称

# 快速提取 + 下载音频
result = extract_fast(
    "https://www.xiaoyuzhoufm.com/episode/xxx",
    download=True,
    save_path="./downloads"
)

完整模式（需要Playwright）

from skills.content_extractor.content_extractor import extract_full

# 完整提取 - 使用浏览器渲染，获取更完整的信息
result = extract_full("https://www.xiaoyuzhoufm.com/episode/xxx")
print(result.content)         # 详细描述
print(result.metadata)        # 完整元数据

2. 批量提取

from skills.content_extractor.content_extractor import batch_extract

urls = [
    "https://www.xiaoyuzhoufm.com/episode/xxx",
    "https://mp.weixin.qq.com/s/xxx",
    "https://www.bilibili.com/video/xxx",
]

results = batch_extract(urls, download_media=False)
for result in results:
    print(f"{result.platform.value}: {result.title}")

3. 平台检测

from skills.content_extractor.content_extractor import detect_platform

platform = detect_platform("https://www.xiaoyuzhoufm.com/episode/xxx")
print(platform)  # Platform.XIAOYUZHOU

支持的平台

平台	支持内容	音频	视频	文字	图片
小宇宙	播客	✅	-	✅	-
抖音	短视频	✅	✅	-	-
微信公众号	文章	-	✅	✅	✅
B站	视频	✅	✅	✅	-
小红书	笔记	-	✅	✅	✅

ExtractResult 数据结构

@dataclass
class ExtractResult:
    platform: Platform          # 平台类型
    title: str                  # 标题
    content: str                # 内容/描述
    author: Optional[str]       # 作者
    publish_time: Optional[str] # 发布时间
    media_urls: List[str]       # 音频/视频URL列表
    images: List[str]           # 图片URL列表
    metadata: Dict[str, Any]    # 元数据（包含原始URL等）

Options Parameters

参数	类型	默认值	说明
`download_media`	bool	False	是否下载音视频
`save_path`	str	None	下载保存路径
`extract_text`	bool	True	是否提取文字内容

Error Handling

from skills.content_extractor.content_extractor import ExtractError

try:
    result = extract("https://invalid-url.com")
except ExtractError as e:
    print(f"提取失败: {e}")

Related Skills

Skill	Relationship	Use Case
document-hub	下游处理	将提取内容生成Word/Excel文档
pdf	下游处理	将内容转换为PDF格式
image-ocr	辅助识别	提取图片中的文字内容
wechat-article-fetcher	专用替代	专门用于微信公众号文章抓取
twitter-scraper	平台扩展	抓取Twitter/X平台内容
md-to-wechat	输出转换	将提取内容转换为公众号格式
long-form-writer	内容加工	将提取素材扩展为长文

Workflow Integration Examples

Workflow 1: 播客内容提取 → 生成文档

from skills.content_extractor.content_extractor import extract
from skills.document_hub.document_hub import write

# 提取播客
result = extract("https://www.xiaoyuzhoufm.com/episode/xxx")

# 生成Word文档
content = {
    "title": result.title,
    "paragraphs": [
        f"来源：小宇宙播客",
        f"作者：{result.author}",
        "",
        "内容描述：",
        result.content
    ]
}
write("播客笔记.docx", content)

Workflow 2: 多平台内容汇总 → Excel

from skills.content_extractor.content_extractor import batch_extract

urls = [
    "https://www.xiaoyuzhoufm.com/episode/xxx",
    "https://mp.weixin.qq.com/s/xxx",
    "https://www.bilibili.com/video/xxx",
]

results = batch_extract(urls)

# 汇总到Excel
excel_data = []
for result in results:
    excel_data.append({
        "平台": result.platform.value,
        "标题": result.title,
        "作者": result.author,
    })

write("内容汇总.xlsx", {"sheets": {"内容汇总": {"data": excel_data}}})

Extending New Platforms

# 在 ContentExtractor 类中添加新的提取方法
def _extract_new_platform(self, url: str, **options) -> ExtractResult:
    # 实现提取逻辑
    return ExtractResult(
        platform=Platform.NEW_PLATFORM,
        title="...",
        content="...",
        metadata={"url": url}
    )

# 在 __init__ 中注册
self.extractors[Platform.NEW_PLATFORM] = self._extract_new_platform

About UniqueClub

Part of the UniqueClub toolkit - a collection of skills for AI-powered content creation and automation.