一键在 Manus 中运行任何 Skill

$pwd:

substack-fetch

Name: Substack Fetch
Author: beatai-org

// Use this skill when the user wants to download / 抓取 / 下载 a Substack article by URL. Trigger on requests like "抓取这篇 Substack 文章 <url>", "下载 substack 文章", "fetch this Substack article", "把这个 substack 链接拉下来", or any URL pointing to `*.substack.com/p/<slug>` or a Substack custom domain (e.g. `blog.dailydoseofds.com/p/...`, `www.oneusefulthing.org/p/...`). Plain HTTP fetch (no Chrome / no login required for free posts); outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

在 Manus 中运行

$ git log --oneline --stat

stars:4,675

forks:256

updated:2026年5月28日 07:52

文件资源管理器

4 个文件

SKILL.md

readonly

related-skills.json

同仓库

x-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a tweet (single or thread) from X / Twitter by URL. Trigger on requests like "抓取这条推 <url>", "下载 X 帖文", "fetch this tweet", "把这个 X 链接拉下来", or any URL pointing to `x.com/<user>/status/<id>` or `twitter.com/<user>/status/<id>`. Uses a persistent real-Chrome profile (separate from medium-fetch) and walks the thread of the OP author, outputting Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns. Preserves the tweet's original language verbatim.

2026-05-294.7k

material-pipeline.md

from "beatai-org/beatai"

Use this skill for the daily Medium article "selection → download → translate → publish" pipeline. Trigger phrases: "跑一遍今日素材" / "material pipeline" / "批量抓 medium" / "medium 流水线" / "日更素材". The optional publish step has its own triggers: "发布到 ai-insights" / "publish ai-insights" / "publish".

2026-05-294.7k

translate.md

from "beatai-org/beatai"

Use this skill when the user / a caller asks to translate or rewrite English Markdown articles into Chinese. Two modes: **原文翻译模式** (default) preserves paragraph/list/heading 1:1 with the source; **原文重写模式** restructures along Chinese tech-blog conventions (use only when caller explicitly opts in or user says "按中文重写"). Trigger on requests like "翻译英文文章", "把英文文章翻成中文", "翻译这些 .md", "用中文重写", "按中文阅读习惯写", or a fully-spec'd handoff prompt with slugs + input_root + output_root + date (+ optional mode). This is a **leaf executor**—it owns single-article quality (忠实通顺、人称代词约束、保留英文白名单、领域专名一致性). Path/dir setup, frontmatter shape, image-ref rewriting, and post-translation self-checks are all delegated to scripts/translate-prepare.mjs + scripts/translate-verify.mjs. Caller (typically material-pipeline) decides slugs / input_root / output_root / date / mode — if any required field is missing, stop and ask. Does not register results into site navigation; does not orchestrate batches.

2026-05-294.7k

extract-excerpt.md

from "beatai-org/beatai"

Use this skill to fill the `excerpt` field in translated Chinese Markdown articles when the article had no subtitle. Trigger on requests like "提取 excerpt", "补 excerpt", "extract excerpt", "给文章补 excerpt", or a fully-spec'd handoff prompt with slugs + target_root. This is a **leaf executor**—it owns the semantic judgment of "which candidate paragraph is the real opening body text" but **does NOT decide paths**: target_root comes from the caller. If the caller didn't provide it, stop and ask. Does not translate, does not register results into site navigation, does not orchestrate batches.

2026-05-284.7k

medium-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Medium article (including paid member-only stories) by URL. Trigger on requests like "抓取这篇 Medium 文章 <url>", "下载 medium 文章", "fetch this Medium article", "把这个 Medium 链接拉下来", or any URL pointing to medium.com / *.medium.com / publication custom domains (levelup.gitconnected.com, towardsdatascience.com, betterprogramming.pub, uxdesign.cc, hackernoon.com 等). Uses a persistent real-Chrome profile for member authentication, automatically routes custom-domain URLs through Medium's cross-domain SSO bridge, and outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

medium-sub.md

from "beatai-org/beatai"

Use this skill when the user wants to fetch Medium's "Recommended" article list for a configured set of tags. Trigger on requests like "拉取 Medium 推荐文章", "fetch medium recommended for tags", "每天抓一次 medium tag 推荐", or any request that mentions Medium tag recommended pages. Pure fetcher — emits a single JSON document to stdout. Does NOT write files, apply per-day limits, or do cross-day dedup; those concerns belong to the caller (e.g. material-pipeline). Reuses the chrome-profile from the medium-fetch skill (no separate login required).

2026-05-284.7k

package.json

"author": "beatai-org"

"repository": "beatai-org/beatai"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	substack-fetch
description	Use this skill when the user wants to download / 抓取 / 下载 a Substack article by URL. Trigger on requests like "抓取这篇 Substack 文章 <url>", "下载 substack 文章", "fetch this Substack article", "把这个 substack 链接拉下来", or any URL pointing to `*.substack.com/p/<slug>` or a Substack custom domain (e.g. `blog.dailydoseofds.com/p/...`, `www.oneusefulthing.org/p/...`). Plain HTTP fetch (no Chrome / no login required for free posts); outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.
version	1.0.0

Substack 文章下载器

通过 HTTP fetch + JSDOM + Readability 下载 Substack 公开文章并产出清洗后的 Markdown + assets。

这是一个 leaf executor——只负责单篇下载的执行（HTML 拉取、Readability 清洗、图片本地化），不决定输出位置：RAW_DIR 必须由调用方通过环境变量传入（绝对路径），未设置直接报错退出。和 medium-fetch 同样的硬契约。

适用场景

用户给一个 Substack 文章 URL，要求"下载/抓取/拉取"
URL 指向 *.substack.com/p/<slug>，或 Substack 自定义域（blog.dailydoseofds.com、www.oneusefulthing.org、*.substack.app 等）
后续会有其他 skill / 流程把 md 翻译 / 入库

不适用

Substack 付费 paid-only post（脚本能跑但只拿到 preview 段落，没有付费会员 session 处理逻辑）
非 Substack 的独立博客（用 medium-fetch / 通用 HTML 下载工具）
评论 / 互动数据抓取（本 skill 只取正文）

与 medium-fetch 的差异（为什么不复用）

	medium-fetch	substack-fetch
抓取通路	Playwright + 真实 Chrome + 持久化 profile	纯 `fetch()` HTTP
反爬	Cloudflare clearance、SSO 桥	不需要（Substack 公开页 SSR，UA 合理即放行）
会员墙	Medium membership session 必填	公开页直抓；paid-only 文章只能拿 preview
启动开销	启动 Chrome ≈ 2s + Cloudflare 通过	一次 HTTP ≈ 200ms
依赖	playwright, readability, jsdom, turndown, yaml	readability, jsdom, turndown, yaml（无 playwright）

Substack 没有 Medium 那种付费墙 cookie 体系，且全站 SSR——<article class="typography newsletter-post post"> 正文直接在初始 HTML 里，不需要等 JS 渲染。所以这个 skill 用 plain HTTP 即可，没必要拖一个 Chrome 进来。

前置依赖

Node.js ≥ 18（用内建 fetch）
首次使用时 npm install（仅 4 个轻量依赖）

工作流程

一次性 setup

cd /Users/sunfei/development/beatai/.claude/skills/substack-fetch/scripts
npm install

抓取一篇文章

RAW_DIR 必填——调用方必须以环境变量传入 md + assets 的目的根目录（绝对路径）：

RAW_DIR=/abs/path/to/raw node fetch.js <url>

未设置 RAW_DIR 会直接报错退出。

输出：

<RAW_DIR>/<slug>/<slug>.md — Readability + Turndown 清洗的 Markdown 正文，顶部带 YAML frontmatter 元数据块（title / author / url / fetched / lang / published / excerpt / tags），紧随 # <title> 一级标题与正文；图片路径已重写为本地相对路径。

<RAW_DIR>/<slug>/assets/<NN>.<ext> — 文章中所有远程图片的本地副本（按出现顺序编号 01.png、02.jpg…）。

<slug> = URL 末段（/p/<slug> 中的 <slug>，不需要剥后缀；Substack slug 是人类可读的 kebab-case，没有 Medium 的 -<hex> 尾部）。

每篇文章在 <RAW_DIR> 下各一个独立子目录，md + assets 自成闭包，可整目录拷贝、离线浏览。

环境变量

变量	必填	含义	默认
`RAW_DIR`	是	md + assets 输出根（绝对路径）	无；未设置即报错退出
`ALLOW_TAGS`	否	标签白名单（JSON 数组字符串）。设置后，抓到的 tags 只保留交集（大小写不敏感）；未设置则保留全部	无；不过滤

RAW_DIR 是编排者决策；其它都是 leaf 自己的状态/默认。

# material-pipeline 标准调用
RAW_DIR=/abs/path/to/raw node fetch.js <url>

输出契约

<raw>/<slug>/<slug>.md 结构：

---
title: <原文标题>
author: <作者署名>
url: <最终 URL>
fetched: YYYY-MM-DD
lang: en
published: YYYY-MM-DD            # Substack 文章发布日，从 JSON-LD datePublished 解析
tags:
  - <Substack section / topic（若有）>
excerpt: <Substack 副标题；仅当作者真写了副标题（DOM 含 h3.subtitle）才写>
---

# <原文标题>

<正文……>

顶部三横杠包裹的 YAML 块即 frontmatter，被站点渲染器（gray-matter）识别为元数据，不会作为正文显示；title 字段同时作为页面标题。
published 来自 JSON-LD datePublished 或 article:modified_time meta，取前者（如果存在）；都缺失时省略该字段。
tags 优先来自 JSON-LD articleSection / keywords 数组（若 Substack 提供）；多数自由作者的 post 没有 tags，此时该字段省略。
frontmatter 之后紧跟 # <title> 一级标题与正文。图片以本地相对路径 ./assets/<NN>.<ext> 引用（assets/ 与 md 同目录于 raw/ 下）。
元数据通过 md 内嵌承载，不生成单独的 misc.json（和 medium-fetch 一致）。

失败排查清单

现象	原因	处置
`HTTP 403` / `cf-mitigated: challenge`	Cloudflare 把 IP 临时拉黑 / 异地访问触发挑战	换 IP、隔一段时间再试；持续 403 考虑加 Playwright 通路
正文 < 1000 字符，含 `Subscribe to keep reading`、`This post is for paid subscribers`	Substack paid-only post	本 skill 不处理付费墙；改用浏览器手动复制或扩展付费 session 支持
Readability 提取的正文里夹了 "Share"、"Restack"、"Subscribe" 等按钮文本	Readability 兜底，主体抓到了 `<article>` 但页脚 CTA 被并入	当前实现已 strip 主流 CTA 容器；如发现新模板可在 fetch.js `stripChrome()` 中追加选择器
图片缺失 / 显示原远程 URL	Substack CDN 返回 4xx / 网络抖动	重试；脚本对单图失败是 graceful degrade，保留原 URL，整篇仍可读
标题为 `404` / 正文为空	文章已删除 / URL 拼写错误	在浏览器打开同一 URL 确认

这个 skill 不做的事

不翻译——只下载和清洗。翻译/学习笔记由其他 skill 完成。
不去重——同一 URL 重复抓会覆盖之前的 <slug>/。
不批量发现——单次单 URL。如需"按订阅源批量发现"是另一个 skill 的事（未来可加 fetch-list.js 模式，类似 medium-sub）。
不处理 paid-only——没有 Substack 会员 cookie 体系；公开 preview 部分能拿到，正文段落拿不到。

使用示例

User: 把这篇下载下来 https://blog.dailydoseofds.com/p/the-anatomy-of-an-agent-harness

Claude:
[bash] RAW_DIR=/tmp/sub node fetch.js "https://blog.dailydoseofds.com/p/the-anatomy-of-an-agent-harness"
📥 抓取 https://blog.dailydoseofds.com/p/the-anatomy-of-an-agent-harness
✓ MD:   /tmp/sub/the-anatomy-of-an-agent-harness/the-anatomy-of-an-agent-harness.md
  标题: The Anatomy of an Agent Harness
  作者: Avi Chawla
  发布: 2026-04-06
  字符数: 8731

文件清单

substack-fetch/
├── SKILL.md           # 本文档
└── scripts/
    ├── package.json   # 依赖声明（jsdom / readability / turndown / yaml）
    ├── fetch.js       # 抓取脚本（CLI: RAW_DIR=... node fetch.js <url>）
    └── .gitignore     # 忽略 node_modules / package-lock.json（若调用方需要锁定，自行 commit lockfile）

substack-fetch

同仓库更多 Skills

Substack 文章下载器

适用场景

不适用

与 medium-fetch 的差异（为什么不复用）

前置依赖

工作流程

一次性 setup

抓取一篇文章

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

文件清单

Substack 文章下载器

适用场景

不适用

与 medium-fetch 的差异（为什么不复用）

前置依赖

工作流程

一次性 setup

抓取一篇文章

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

文件清单

同仓库更多 Skills