一键在 Manus 中运行任何 Skill

$pwd:

medium-fetch

Name: Medium Fetch
Author: beatai-org

// Use this skill when the user wants to download / 抓取 / 下载 a Medium article (including paid member-only stories) by URL. Trigger on requests like "抓取这篇 Medium 文章 <url>", "下载 medium 文章", "fetch this Medium article", "把这个 Medium 链接拉下来", or any URL pointing to medium.com / *.medium.com / publication custom domains (levelup.gitconnected.com, towardsdatascience.com, betterprogramming.pub, uxdesign.cc, hackernoon.com 等). Uses a persistent real-Chrome profile for member authentication, automatically routes custom-domain URLs through Medium's cross-domain SSO bridge, and outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

在 Manus 中运行

$ git log --oneline --stat

stars:4,675

forks:256

updated:2026年5月28日 07:52

文件资源管理器

6 个文件

SKILL.md

readonly

related-skills.json

同仓库

x-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a tweet (single or thread) from X / Twitter by URL. Trigger on requests like "抓取这条推 <url>", "下载 X 帖文", "fetch this tweet", "把这个 X 链接拉下来", or any URL pointing to `x.com/<user>/status/<id>` or `twitter.com/<user>/status/<id>`. Uses a persistent real-Chrome profile (separate from medium-fetch) and walks the thread of the OP author, outputting Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns. Preserves the tweet's original language verbatim.

2026-05-294.7k

material-pipeline.md

from "beatai-org/beatai"

Use this skill for the daily Medium article "selection → download → translate → publish" pipeline. Trigger phrases: "跑一遍今日素材" / "material pipeline" / "批量抓 medium" / "medium 流水线" / "日更素材". The optional publish step has its own triggers: "发布到 ai-insights" / "publish ai-insights" / "publish".

2026-05-294.7k

translate.md

from "beatai-org/beatai"

Use this skill when the user / a caller asks to translate or rewrite English Markdown articles into Chinese. Two modes: **原文翻译模式** (default) preserves paragraph/list/heading 1:1 with the source; **原文重写模式** restructures along Chinese tech-blog conventions (use only when caller explicitly opts in or user says "按中文重写"). Trigger on requests like "翻译英文文章", "把英文文章翻成中文", "翻译这些 .md", "用中文重写", "按中文阅读习惯写", or a fully-spec'd handoff prompt with slugs + input_root + output_root + date (+ optional mode). This is a **leaf executor**—it owns single-article quality (忠实通顺、人称代词约束、保留英文白名单、领域专名一致性). Path/dir setup, frontmatter shape, image-ref rewriting, and post-translation self-checks are all delegated to scripts/translate-prepare.mjs + scripts/translate-verify.mjs. Caller (typically material-pipeline) decides slugs / input_root / output_root / date / mode — if any required field is missing, stop and ask. Does not register results into site navigation; does not orchestrate batches.

2026-05-294.7k

extract-excerpt.md

from "beatai-org/beatai"

Use this skill to fill the `excerpt` field in translated Chinese Markdown articles when the article had no subtitle. Trigger on requests like "提取 excerpt", "补 excerpt", "extract excerpt", "给文章补 excerpt", or a fully-spec'd handoff prompt with slugs + target_root. This is a **leaf executor**—it owns the semantic judgment of "which candidate paragraph is the real opening body text" but **does NOT decide paths**: target_root comes from the caller. If the caller didn't provide it, stop and ask. Does not translate, does not register results into site navigation, does not orchestrate batches.

2026-05-284.7k

medium-sub.md

from "beatai-org/beatai"

Use this skill when the user wants to fetch Medium's "Recommended" article list for a configured set of tags. Trigger on requests like "拉取 Medium 推荐文章", "fetch medium recommended for tags", "每天抓一次 medium tag 推荐", or any request that mentions Medium tag recommended pages. Pure fetcher — emits a single JSON document to stdout. Does NOT write files, apply per-day limits, or do cross-day dedup; those concerns belong to the caller (e.g. material-pipeline). Reuses the chrome-profile from the medium-fetch skill (no separate login required).

2026-05-284.7k

substack-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Substack article by URL. Trigger on requests like "抓取这篇 Substack 文章 <url>", "下载 substack 文章", "fetch this Substack article", "把这个 substack 链接拉下来", or any URL pointing to `*.substack.com/p/<slug>` or a Substack custom domain (e.g. `blog.dailydoseofds.com/p/...`, `www.oneusefulthing.org/p/...`). Plain HTTP fetch (no Chrome / no login required for free posts); outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

package.json

"author": "beatai-org"

"repository": "beatai-org/beatai"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	medium-fetch
description	Use this skill when the user wants to download / 抓取 / 下载 a Medium article (including paid member-only stories) by URL. Trigger on requests like "抓取这篇 Medium 文章 <url>", "下载 medium 文章", "fetch this Medium article", "把这个 Medium 链接拉下来", or any URL pointing to medium.com / *.medium.com / publication custom domains (levelup.gitconnected.com, towardsdatascience.com, betterprogramming.pub, uxdesign.cc, hackernoon.com 等). Uses a persistent real-Chrome profile for member authentication, automatically routes custom-domain URLs through Medium's cross-domain SSO bridge, and outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.
version	2.0.0

Medium 会员文章下载器

通过真实 Chrome + 持久化用户会话，下载 Medium（含付费会员墙后）文章并产出清洗后的 Markdown + assets。

这是一个 leaf executor——只负责单篇下载的执行（Cloudflare 绕过、SSO 桥、Readability 清洗、图片本地化），不决定输出位置：RAW_DIR 必须由调用方通过环境变量传入（绝对路径），未设置直接报错退出。这是和编排层的硬契约。

适用场景

用户给一个 Medium 文章 URL，要求"下载/抓取/拉取"
URL 指向 medium.com / *.medium.com / 知名出版物自定义域（levelup.gitconnected.com、towardsdatascience.com、betterprogramming.pub、uxdesign.cc、hackernoon.com 等）
后续会有其他 skill / 流程把 HTML 转 Markdown / 翻译 / 入库

不适用

非 Medium 内容（用通用 HTML 下载工具）
用户没有付费会员且文章是 member-only（脚本能跑但只能拿到 paywall preview）

前置依赖

系统已安装 Google Chrome（脚本通过 channel: 'chrome' 调用，不使用 Playwright 自带 Chromium）
Node.js ≥ 18
已 npm install（首次使用时运行一次）

核心机制（为什么这么做）

1. 反 Cloudflare 检测

Medium 用 Cloudflare 防自动化，三个关键 take：

必须用系统真实 Chrome（channel: 'chrome'）。Playwright 自带 Chromium 的指纹会被 Cloudflare 识破。
必须有头（headful）+ 屏幕外窗口（--window-position=-3000,-3000）。纯 headless 即使是真实 Chrome 也会被 Cloudflare 拦——浏览器渲染管线的差异是它的判据之一。
持久化 profile（chrome-profile/）保留 Cloudflare clearance cookie + Medium session。
去掉 --enable-automation 标志 + 注入脚本隐藏 navigator.webdriver。

2. 跨域 SSO 桥接（关键）

Medium 出版物常用自定义域（如 levelup.gitconnected.com），但你的会员 cookie 只在 .medium.com 域。直接访问自定义域 URL → Medium 把你当未登录访客 → 返回 paywall preview 或 500。

解法：在自定义域时，自动把 URL 包成 SSO 桥：

https://medium.com/m/global-identity-2?redirectUrl=<目标 URL，URL-encoded>

这个端点读 .medium.com 的 sid，签发一次性 token，302 重定向到目标 URL，并在重定向链路上给目标域种 cookie。脚本检测到 hostname 不以 .medium.com 结尾时自动包一层。

3. SPA 渲染等待

Medium 是 SPA，正文靠 JS 异步渲染。waitUntil: 'networkidle' + waitForSelector('article') + 1.5s buffer，确保 DOM 稳定后再 page.content()。

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

cd /Users/sunfei/development/beatai/.claude/skills/medium-fetch/scripts
npm install
node login.js

node login.js 弹出真实 Chrome 窗口。用户手动完成：

如出现 Cloudflare 人机验证 → 手动通过
用付费 Medium 账号登录（确保 https://medium.com/me/membership 显示 Active）
看到首页 feed → 回终端按回车，session 保存到 chrome-profile/

Session 过期信号：fetch 输出包含 ⚠ 检测到付费墙提示 或 ⚠ Cloudflare 拦截 → 重跑 node login.js。Medium session 实测可用数月。

第二步：抓取一篇文章（无人值守）

RAW_DIR 必填——调用方必须以环境变量传入 md + assets 的目的根目录（绝对路径）：

RAW_DIR=/abs/path/to/raw node fetch.js <url>

未设置 RAW_DIR 会直接报错退出。这是 leaf executor 的硬契约：本脚本不替编排者决定文件落到哪里。

输出（拆成两个根）：

中间产物区 <articles> = .claude/skills/medium-fetch/scripts/articles/（脚本本地，gitignored，默认在 md 转换成功后自动清空）

<articles>/<slug>/<slug>.html — Playwright 渲染后的完整 HTML，仅作为 Readability/Turndown 的输入；md 写完后会立即 unlink 并 rmdir 子目录。只有当 Readability 提取正文失败时才保留，便于人工排查。

工作区 <RAW_DIR> = 调用方传入（必填）。material-pipeline 的 run.js 会传 .claude/skills/material-pipeline/materials/raw/；其它调用方按需自定。

<RAW_DIR>/<slug>/<slug>.md — Readability + Turndown 清洗的 Markdown 正文，顶部带 YAML frontmatter 元数据块（title / author / url / fetched / lang，可选 excerpt），紧随 # <title> 一级标题与正文；图片路径已重写为本地相对路径。frontmatter 由站点渲染器（gray-matter）识别为元数据并隐藏，不会作为正文显示，且 frontmatter.title 会被用作页面标题。
<RAW_DIR>/<slug>/assets/<NN>.<ext> — 文章中所有远程图片的本地副本（按出现顺序编号 01.png、02.jpg…）

为什么拆开：md 的图片引用是 ./assets/NN.ext，必须和 assets/ 同目录才能离线工作；html 体量大但只是中间产物（≈数 MB/篇），转 md 成功后没有保留必要，所以默认清掉以免占盘。

每篇文章在 <RAW_DIR> 下各一个独立子目录。Markdown 文件 + assets 自成闭包——可整目录拷贝、离线浏览，链接不会断。元数据通过 md 内嵌 frontmatter 承载，没有外挂的 misc.json。

<slug> = URL 末段去掉 Medium 的 -<hex_id> 后缀。例如 the-4-lines-every-claude-md-needs-2717a46866f6 → the-4-lines-every-claude-md-needs。

环境变量

变量	必填	含义	默认
`RAW_DIR`	是	md + assets 输出根（绝对路径）	无；未设置即报错退出
`MEDIUM_FETCH_HOME`	否	脚本内部状态根（chrome-profile / 默认 articles）	脚本所在目录
`ARTICLES_DIR`	否	html 中间产物归档根	`$MEDIUM_FETCH_HOME/articles`
`ALLOW_TAGS`	否	标签白名单（JSON 数组字符串）。设置后，抓到的 tags 只保留交集（大小写不敏感）；未设置则保留全部。与 `RAW_DIR` 同模式——由调用方（如 material-pipeline）决定并传入，leaf 只执行	无；不过滤

RAW_DIR 是输出位置——属编排者决策；MEDIUM_FETCH_HOME / ARTICLES_DIR 是脚本内部 state 位置——leaf 自己有合理默认。

# material-pipeline 标准调用
RAW_DIR=/abs/path/to/raw node fetch.js <url>

# 同时迁移内部状态（罕用）
RAW_DIR=/abs/raw MEDIUM_FETCH_HOME=/some/where node fetch.js <url>

输出契约

<raw>/<slug>/<slug>.md 结构：

---
title: <原文标题>
author: <作者署名>
url: <最终 URL（含 SSO 桥重定向后）>
fetched: YYYY-MM-DD
lang: en
tags:
  - <Medium 原文标签 1>
  - <Medium 原文标签 2>
excerpt: <Medium 副标题；仅当作者真写了副标题（DOM 含 h2.pw-subtitle-paragraph）才写>
---

# <原文标题>

<正文……>

顶部三横杠包裹的 YAML 块即 frontmatter，被站点渲染器（gray-matter）识别为元数据，不会作为正文显示；title 字段同时作为页面标题。
tags 是文章在 Medium 底部标注的 topic 标签，从渲染后 DOM 的 /tag/ 链接抓取（这些链接位于 <article> 之外的页脚区），原样英文、去重保序、不翻译；文章没有标签时该字段省略。
若调用方传入 ALLOW_TAGS 环境变量，抓到的标签会先与白名单求交集再写入（大小写不敏感）；未传则保留全部。
frontmatter 之后紧跟 # <title> 一级标题与正文。图片以本地相对路径 ./assets/<NN>.<ext> 引用（assets/ 与 md 同目录于 raw/ 下）。
元数据通过 md 内嵌承载，不再生成单独的 misc.json。

失败排查清单

现象	原因	处置
`Failed to create a ProcessSingleton`	上次 Chrome 进程未退出	`pkill -f "user-data-dir=.*chrome-profile" && rm -f chrome-profile/Singleton{Lock,Cookie,Socket}`
标题 = `Medium`，正文是 `500 Apologies`	跨域 cookie 未带（SSO 桥首次失败）	重跑一次；仍失败检查 `chrome-profile/Default/Cookies` 里是否有 `.medium.com\|sid`
正文 < 2000 字符，含 `Member-only story` / `Create an account to read`	会员 session 过期 OR 登录账号无付费订阅	重跑 `login.js`；登录时确认 `medium.com/me/membership` 显示 Active
正文含 `Performing security verification` / `Just a moment...`	Cloudflare 拦截，profile 没 clearance cookie	重跑 `login.js`（headful 下手动通过 Cloudflare）
`<article>` selector timeout	页面结构变化或加载缓慢	脚本会继续；若 Readability 也失败，html 会被保留在 `<articles>/<slug>/<slug>.html` 供排查；持续问题考虑增加 timeout
Chrome 窗口闪现 / 抢焦点	macOS headful 必须的妥协；窗口已被 `--window-position=-3000,-3000` 推到屏幕外	正常现象

这个 skill 不做的事

不翻译——只下载和清洗。翻译/学习笔记由其他 skill / 对话完成。
不去重——同一 URL 重复抓会覆盖之前的 <slug>/。
不批量发现——单次单 URL。trending/topic discovery 是另一个 skill 的事。

使用示例

User: 把这篇下载下来 https://levelup.gitconnected.com/the-4-lines-every-claude-md-needs-2717a46866f6

Claude:
[bash] node fetch.js "https://levelup.gitconnected.com/the-4-lines-every-claude-md-needs-2717a46866f6"
↪ via SSO bridge for levelup.gitconnected.com
✓ HTML: (已清理，转 md 成功)
✓ MD:   <RAW_DIR>/the-4-lines-every-claude-md-needs/the-4-lines-every-claude-md-needs.md
  标题: The 4 Lines Every CLAUDE.md Needs
  作者: Yanli Liu
  字符数: 23925

文件清单

medium-fetch/
├── SKILL.md           # 本文档
└── scripts/
    ├── package.json   # 依赖声明
    ├── login.js       # 一次性登录脚本
    ├── fetch.js       # 抓取脚本（CLI: node fetch.js <url>）
    └── .gitignore     # 忽略 chrome-profile, articles, node_modules（articles 仅在转 md 期间存在 html 中间产物；md+assets 走 RAW_DIR——material-pipeline 调用时指向 .claude/skills/material-pipeline/materials/raw——元数据内嵌在 md 顶部 frontmatter）

medium-fetch

同仓库更多 Skills

Medium 会员文章下载器

适用场景

不适用

前置依赖

核心机制（为什么这么做）

1. 反 Cloudflare 检测

2. 跨域 SSO 桥接（关键）

3. SPA 渲染等待

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

第二步：抓取一篇文章（无人值守）

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

文件清单

Medium 会员文章下载器

适用场景

不适用

前置依赖

核心机制（为什么这么做）

1. 反 Cloudflare 检测

2. 跨域 SSO 桥接（关键）

3. SPA 渲染等待

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

第二步：抓取一篇文章（无人值守）

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

文件清单

同仓库更多 Skills