تشغيل أي مهارة في Manus بنقرة واحدة

$pwd:

web-to-markdown

Name: Web To Markdown
Author: rookie-ricardo

// Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.

تشغيل في Manus

$ git log --oneline --stat

stars:٨٨٠

forks:٨٥

updated:٦ أبريل ٢٠٢٦ في ١٤:٤٨

مستكشف الملفات

9 ملفات

SKILL.md

readonly

name	web-to-markdown
description	Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.

Web To Markdown

Convert URLs into usable Markdown by applying domain-aware fetching routes, then return the cleaned content directly.

Quick Workflow

Normalize and validate the input URL.
Select route:

r.jina.ai: general web + X/Twitter.
defuddle.md: YouTube transcript/content extraction.
special-browser-fetch: WeChat/Zhihu/Feishu.

Return markdown text (or JSON metadata if needed).

For generic URLs (non-YouTube, non-WeChat/Zhihu/Feishu), use this fallback chain:

try r.jina.ai first,
if it fails, fallback to direct HTTP fetch + Readability,
if direct fetch still fails or returns shell-like content, fallback to browser extraction.

Commands

Run from this skill directory (skills/web-to-markdown):

npm install
node scripts/url_to_markdown.mjs <url>

Return metadata with markdown:

node scripts/url_to_markdown.mjs <url> --json

Force special-site browser extraction:

node scripts/fetch_special_sites.mjs <url> --json

Routing Policy

Default route: https://r.jina.ai/<url>.
YouTube (youtube.com, youtu.be): https://defuddle.md/<url>.
X/Twitter (x.com, twitter.com): https://r.jina.ai/<url>.
WeChat/Zhihu/Feishu: run scripts/fetch_special_sites.mjs.
If input is already proxy-formatted (https://defuddle.md/https://... or https://r.jina.ai/https://...), normalize back to the original URL and re-apply routing.

Special-Site Extraction Behavior

Use a two-stage strategy for WeChat/Zhihu/Feishu:

Try cuimp HTTP/TLS impersonation first, then clean HTML with Mozilla Readability.
If stage 1 fails or returns blocked/shell content, fallback to puppeteer-extra browser impersonation.

HTTP stage impersonates modern Chrome TLS/HTTP profile via cuimp.
Browser stage impersonates a modern Chrome user agent and standard sec-ch-ua headers.
Remove known login modals and backdrop overlays (best effort).
Scroll the page to trigger lazy-loaded article blocks.
Parse cleaned document with Mozilla Readability.
Convert extracted HTML body to Markdown via Turndown.
Resolve browser executable from CHROME_PATH first, then system Chrome/Chromium/Edge paths.

If special-site extraction fails due to anti-bot checks, account-only pages, or network limits, report failure clearly and ask for fallback input (for example raw page text).

Output Contract

For normal usage, output markdown only.

When --json is used, return:

source: backend source (r.jina.ai, defuddle, cuimp, browser-readability).
strategy: selected route (r-jina, defuddle, special-http-fetch, special-browser-fetch-fallback).
requestedUrl: original input.
resolvedUrl: normalized/final URL.
markdown: extracted markdown body.

Resources

references/routing-and-notes.md: domain routing rules and operational caveats.
scripts/url_to_markdown.mjs: primary entrypoint.
scripts/fetch_special_sites_http.mjs: WeChat/Zhihu/Feishu HTTP impersonation fetcher (cuimp JS).
scripts/fetch_special_sites.mjs: two-stage extractor (HTTP-first, browser-fallback).

related-skills.json

نفس المستودع

translate-polisher.md

from "rookie-ricardo/erduo-skills"

高质量文章翻译技能，采用"分析→初译→审校→终稿"四步精翻工作流。仅支持中文↔英文、中文↔日文翻译。当用户明确提出"翻译"、"translate"、"精翻"、"翻訳"、"翻译文章"、"translate to Chinese/English/Japanese"、"改成中文"、"改成英文"、"改成日文"、"翻成中文"、"翻成日文"、"翻成英文"、"英译中"、"中译英"、"中译日"、"日译中"、"日本語に翻訳"、"中国語に翻訳"、"英語に翻訳"、"これを翻訳して"、"put this in Chinese"、"put this in English"、"put this in Japanese"、"convert to Chinese"、"convert to English"、"convert to Japanese"、"帮我翻一下"、"本地化"、"localize"、"这篇文章翻译一下"，或给出 URL/文件/正文并明确要求输出目标语言成稿时触发。不用于仅做摘要、解释、理解或整理的请求。若输入是 URL，优先使用 `curl -L` 请求 `r.jina.ai` 抓取正文 Markdown；抓取失败或正文不完整时必须直接停止并要求用户自行提供正文。

2026-04-06880

transcript-polisher.md

from "rookie-ricardo/erduo-skills"

将语音转录文本（访谈、演讲、播客、会议）精修为可读性更高的文章段落。当用户提到"字幕精修"、"transcript polish"、"润色字幕"、"把视频字幕整理成文章"、"访谈文字整理"、处理访谈记录、转录文本优化、语音转文字整理、或者需要将大段对话/演讲文本整理成可读文章时触发。适用于单人演说或多人对谈的转录文本整理，要求保留原句原词、拒绝高度概括。即使用户只是说"帮我整理一下这段文字"并附上了明显的口语化文本，也应该触发此技能。

2026-03-18880

ak-rss-digest.md

from "rookie-ricardo/erduo-skills"

Curate a Chinese reading digest from a fixed bundle of RSS and Atom feeds, with a strong preference for AI agent thinking, frontier AI commentary, deep interviews, and non-boring high-signal essays. Use when Codex needs to pull the latest week's posts by default, or a specific day's posts when explicitly requested, summarize them, score each article on a 10-point scale, and output only the posts scoring above 7 in a concise Chinese daily-brief style.

2026-03-16880

gemini-watermark-remover.md

from "rookie-ricardo/erduo-skills"

Remove the visible Gemini AI watermark from images using reverse alpha blending. Use when asked to strip Gemini watermarks, batch-process Gemini images, or build/modify a CLI script that removes the bottom-right Gemini watermark without HTML or server-side components.

2026-02-02880

daily-news-report.md

from "rookie-ricardo/erduo-skills"

基于预设 URL 列表抓取内容，筛选高质量技术信息并生成每日 Markdown 报告。

2026-01-20880

package.json

"author": "rookie-ricardo"

"repository": "rookie-ricardo/erduo-skills"

فتح مستودع GitHub عرض مستودعات المنشئ

$ install --global

$ download --local

تشغيل في Manus

$ useful --forSOC

مطوّرو البرمجياتمهن الحاسوب والرياضيات15-1252L4

name	web-to-markdown
description	Convert a web URL into cleaned Markdown with deterministic routing. Use when Codex needs to read article-like content from links and should apply source-aware fetch strategies: default to r.jina.ai for general pages (including X/Twitter), use defuddle.md for YouTube links, and use browser-impersonated extraction for WeChat/Zhihu/Feishu pages with Mozilla Readability cleanup.

Web To Markdown

Convert URLs into usable Markdown by applying domain-aware fetching routes, then return the cleaned content directly.

Quick Workflow

Normalize and validate the input URL.
Select route:

r.jina.ai: general web + X/Twitter.
defuddle.md: YouTube transcript/content extraction.
special-browser-fetch: WeChat/Zhihu/Feishu.

Return markdown text (or JSON metadata if needed).

For generic URLs (non-YouTube, non-WeChat/Zhihu/Feishu), use this fallback chain:

try r.jina.ai first,
if it fails, fallback to direct HTTP fetch + Readability,
if direct fetch still fails or returns shell-like content, fallback to browser extraction.

Commands

Run from this skill directory (skills/web-to-markdown):

npm install
node scripts/url_to_markdown.mjs <url>

Return metadata with markdown:

node scripts/url_to_markdown.mjs <url> --json

Force special-site browser extraction:

node scripts/fetch_special_sites.mjs <url> --json

Routing Policy

Default route: https://r.jina.ai/<url>.
YouTube (youtube.com, youtu.be): https://defuddle.md/<url>.
X/Twitter (x.com, twitter.com): https://r.jina.ai/<url>.
WeChat/Zhihu/Feishu: run scripts/fetch_special_sites.mjs.
If input is already proxy-formatted (https://defuddle.md/https://... or https://r.jina.ai/https://...), normalize back to the original URL and re-apply routing.

Special-Site Extraction Behavior

Use a two-stage strategy for WeChat/Zhihu/Feishu:

Try cuimp HTTP/TLS impersonation first, then clean HTML with Mozilla Readability.
If stage 1 fails or returns blocked/shell content, fallback to puppeteer-extra browser impersonation.

HTTP stage impersonates modern Chrome TLS/HTTP profile via cuimp.
Browser stage impersonates a modern Chrome user agent and standard sec-ch-ua headers.
Remove known login modals and backdrop overlays (best effort).
Scroll the page to trigger lazy-loaded article blocks.
Parse cleaned document with Mozilla Readability.
Convert extracted HTML body to Markdown via Turndown.
Resolve browser executable from CHROME_PATH first, then system Chrome/Chromium/Edge paths.

If special-site extraction fails due to anti-bot checks, account-only pages, or network limits, report failure clearly and ask for fallback input (for example raw page text).

Output Contract

For normal usage, output markdown only.

When --json is used, return:

source: backend source (r.jina.ai, defuddle, cuimp, browser-readability).
strategy: selected route (r-jina, defuddle, special-http-fetch, special-browser-fetch-fallback).
requestedUrl: original input.
resolvedUrl: normalized/final URL.
markdown: extracted markdown body.

Resources

references/routing-and-notes.md: domain routing rules and operational caveats.
scripts/url_to_markdown.mjs: primary entrypoint.
scripts/fetch_special_sites_http.mjs: WeChat/Zhihu/Feishu HTTP impersonation fetcher (cuimp JS).
scripts/fetch_special_sites.mjs: two-stage extractor (HTTP-first, browser-fallback).

web-to-markdown

Web To Markdown

Quick Workflow

Commands

Routing Policy

Special-Site Extraction Behavior

Output Contract

Resources

المزيد من هذا المستودع

Web To Markdown

Quick Workflow

Commands

Routing Policy

Special-Site Extraction Behavior

Output Contract

Resources

المزيد من هذا المستودع