Run any Skill in Manus with one click

$pwd:

material-pipeline

Name: Material Pipeline
Author: beatai-org

// Use this skill for the daily Medium article "selection → download → translate → publish" pipeline. Trigger phrases: "跑一遍今日素材" / "material pipeline" / "批量抓 medium" / "medium 流水线" / "日更素材". The optional publish step has its own triggers: "发布到 ai-insights" / "publish ai-insights" / "publish".

Run Skill in Manus

$ git log --oneline --stat

stars:4,675

forks:256

updated:May 29, 2026 at 03:40

File Explorer

20 files

SKILL.md

readonly

related-skills.json

same repository

x-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a tweet (single or thread) from X / Twitter by URL. Trigger on requests like "抓取这条推 <url>", "下载 X 帖文", "fetch this tweet", "把这个 X 链接拉下来", or any URL pointing to `x.com/<user>/status/<id>` or `twitter.com/<user>/status/<id>`. Uses a persistent real-Chrome profile (separate from medium-fetch) and walks the thread of the OP author, outputting Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns. Preserves the tweet's original language verbatim.

2026-05-294.7k

translate.md

from "beatai-org/beatai"

Use this skill when the user / a caller asks to translate or rewrite English Markdown articles into Chinese. Two modes: **原文翻译模式** (default) preserves paragraph/list/heading 1:1 with the source; **原文重写模式** restructures along Chinese tech-blog conventions (use only when caller explicitly opts in or user says "按中文重写"). Trigger on requests like "翻译英文文章", "把英文文章翻成中文", "翻译这些 .md", "用中文重写", "按中文阅读习惯写", or a fully-spec'd handoff prompt with slugs + input_root + output_root + date (+ optional mode). This is a **leaf executor**—it owns single-article quality (忠实通顺、人称代词约束、保留英文白名单、领域专名一致性). Path/dir setup, frontmatter shape, image-ref rewriting, and post-translation self-checks are all delegated to scripts/translate-prepare.mjs + scripts/translate-verify.mjs. Caller (typically material-pipeline) decides slugs / input_root / output_root / date / mode — if any required field is missing, stop and ask. Does not register results into site navigation; does not orchestrate batches.

2026-05-294.7k

extract-excerpt.md

from "beatai-org/beatai"

Use this skill to fill the `excerpt` field in translated Chinese Markdown articles when the article had no subtitle. Trigger on requests like "提取 excerpt", "补 excerpt", "extract excerpt", "给文章补 excerpt", or a fully-spec'd handoff prompt with slugs + target_root. This is a **leaf executor**—it owns the semantic judgment of "which candidate paragraph is the real opening body text" but **does NOT decide paths**: target_root comes from the caller. If the caller didn't provide it, stop and ask. Does not translate, does not register results into site navigation, does not orchestrate batches.

2026-05-284.7k

medium-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Medium article (including paid member-only stories) by URL. Trigger on requests like "抓取这篇 Medium 文章 <url>", "下载 medium 文章", "fetch this Medium article", "把这个 Medium 链接拉下来", or any URL pointing to medium.com / *.medium.com / publication custom domains (levelup.gitconnected.com, towardsdatascience.com, betterprogramming.pub, uxdesign.cc, hackernoon.com 等). Uses a persistent real-Chrome profile for member authentication, automatically routes custom-domain URLs through Medium's cross-domain SSO bridge, and outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

medium-sub.md

from "beatai-org/beatai"

Use this skill when the user wants to fetch Medium's "Recommended" article list for a configured set of tags. Trigger on requests like "拉取 Medium 推荐文章", "fetch medium recommended for tags", "每天抓一次 medium tag 推荐", or any request that mentions Medium tag recommended pages. Pure fetcher — emits a single JSON document to stdout. Does NOT write files, apply per-day limits, or do cross-day dedup; those concerns belong to the caller (e.g. material-pipeline). Reuses the chrome-profile from the medium-fetch skill (no separate login required).

2026-05-284.7k

substack-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Substack article by URL. Trigger on requests like "抓取这篇 Substack 文章 <url>", "下载 substack 文章", "fetch this Substack article", "把这个 substack 链接拉下来", or any URL pointing to `*.substack.com/p/<slug>` or a Substack custom domain (e.g. `blog.dailydoseofds.com/p/...`, `www.oneusefulthing.org/p/...`). Plain HTTP fetch (no Chrome / no login required for free posts); outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

package.json

"author": "beatai-org"

"repository": "beatai-org/beatai"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	material-pipeline
description	Use this skill for the daily Medium article "selection → download → translate → publish" pipeline. Trigger phrases: "跑一遍今日素材" / "material pipeline" / "批量抓 medium" / "medium 流水线" / "日更素材". The optional publish step has its own triggers: "发布到 ai-insights" / "publish ai-insights" / "publish".

Material Pipeline 编排器

把每天的「找选题 → 抓正文 → 翻译 → 发布」串成单一流水线。

核心规则：orchestrator vs leaf executors

leaf 只负责执行单一动作；orchestrator (本 skill) 拥有流程、路径、调度、并发策略。

任何 leaf 收不到必填参数时直接报错退出，不替编排者填默认值——这是和 leaf 的硬契约。

Orchestrator（本 skill）拥有的决策

流程串接：list → fetch → translate handoff → extract-excerpt → compress → review → publish 的顺序与协议
路径策略：materials/ 工作区位置、sub-list/<date> / raw/<slug> / output/<date> 的布局，publish 的 <dest>/<YYYY-MM>/<DD>/ 双层日期分层
每次都拉新 + 含当天去重：每跑 run.js 都调 medium-sub 拿新候选；与最近 LOOKBACK_DAYS 天（含当天）共 N 个 sub-list 求差集，只有"本次新增"才进流水线
sub-list 只增不减：当日 sub-list 是"今日所有进过流水线的 slug"的累计集合；同日多次跑会 merge，不会重抓重译已处理 slug
per-tag per-fetch limit：sources.medium.limit 是单 tag 单次拉新上限，由 medium-sub 内部 slice(0, MAX_PER_TAG) 截取强制。run.js 不再对当日 sub-list 累计篇数做限制——同日多次跑就是无脑累加，跨日 dedup 自然保证不重复
差集决策唯一在 orchestrator：阶段 1 算出 newlyAdded → fetch/handoff 无脑消费；leaf 内不再做"该不该 fetch"的二次判断（避免双层 dedup 隐藏 orchestrator bug）
配置真相：config.yaml 的 sources.medium.* / publish.targets.<name>.*
handoff prompt 生成：run.js 末尾打印完整 prompt block（含 input_root / output_root / date / 并发策略），用户复制即可给 Claude
并发策略：translate 的 N=1 主对话译 / N>=2 spawn ceil(N/2) 个并行 agent
中间产物 cleanup：publish 全部 target 成功后默认删 raw + output，仅保留 sub-list

Leaf executor 契约

每个 leaf 只接受调用方明确传入的参数，自身没有默认路径、不做策略决定。

Leaf	调用方必传	Leaf 输出
medium-sub	`--tags` / `--max-per-tag`（或 `--url <X>`，CLI flags）	stdout JSON；无文件输出
substack-sub	`--url <X>`（v1.0.0 唯一模式；discovery 是 Phase B）	stdout JSON（与 medium-sub schema 一致）；无文件输出
medium-fetch	`RAW_DIR`（env var，必填，绝对路径）+ url（CLI arg）	`<RAW_DIR>/<slug>/<slug>.md` + assets/
substack-fetch	`RAW_DIR`（env var，必填，绝对路径）+ url（CLI arg）	`<RAW_DIR>/<slug>/<slug>.md` + assets/（与 medium-fetch 同 schema）
translate	prompt 里显式给：`slugs` / `input_root` / `output_root` / `date`	`<output_root>/<slug>.md` + `images/<slug>/`
extract-excerpt	prompt 里显式给：`slugs` / `target_root`	改写 `<target_root>/<slug>.md` frontmatter，按位插入 `excerpt` 字段（已有则跳过）

Source / Fetcher 路由（orchestrator 在 run.js 里维护，是唯一 source-aware 的逻辑）：

URL 特征	sub leaf	fetch leaf
`*.substack.com` 或路径形如 `/p/<slug>`	substack-sub	substack-fetch
其它（兜底）	medium-sub	medium-fetch

run.js 的 detectSource(url) 是单一真相源；Stage 1（discovery）和 Stage 3（fetch）都查它。Stage 2（dedup / sub-list merge）source-agnostic，对两个来源逻辑完全一致。

注：publish.js 住在本 skill 内部，是 publish 步骤的子 orchestrator，自己读 config 决定 dest、自己执行 cleanup；不是 leaf。

工作流程

步骤 1：list + fetch

cd .claude/skills/material-pipeline/scripts && node run.js

run.js 会：（a）每次调 medium-sub 拉新候选；（b）与最近 LOOKBACK_DAYS 天（含当天，由 config.yaml 的 lookback_days 配置，缺省 15）sub-list 求差集得到「本次新增」；（c）把本次新增 merge 进当日 sub-list（只增不减）；（d）无脑 spawn medium-fetch 下载「本次新增」全部 slug，通过 RAW_DIR 环境变量指向 materials/raw；（e）末尾打印一段 translate handoff prompt block（也只列本次新增中 fetch 成功的）。

关键保证：当日 sub-list 中已有的 slug 永远不再 fetch、不再翻译——哪怕 raw/ 已被 publish 的 cleanup 删光。这是防止同一天重复处理同批文章的核心机制。

leaf 无脑契约：阶段 3 的 medium-fetch 调用层不做"raw/ 已存在则跳过"的判断——orchestrator 通过差集已经保证传进来的 slug 不重复，leaf 内再判断属于策略越界，会掩盖 orchestrator 的 bug。如要重抓某 slug，删 sub-list 里的条目即可。

完整 CLI / 输出契约 / 跨日去重 / 幂等性 / 失败排查 → docs/run.md。

步骤 2：translate handoff

把 run.js 末尾打印的 prompt block 整块发给 Claude。prompt 已包含 translate 契约要求的全部 4 个字段（slugs / input_root / output_root / date）+ 并发策略提示。translate skill 是 leaf executor，按这 4 项参数严格执行；不自决路径或日期。

prompt 末尾 append 了后续步骤指引：translate 跑完后先做 extract-excerpt（步骤 2.4），再按顺序执行 node scripts/compress-images.js --date <date> 然后 node scripts/review.js --date <date>——Claude 应自动执行,把流程交给步骤 2.4 / 2.5 / 3。

步骤 2.4：extract-excerpt（补 excerpt 字段）

excerpt 是文章的开篇摘要，两个来源：

作者写了副标题 → medium-fetch / substack-fetch 在抓取时已写入；translate 透传并译成中文。这种情况 extract-excerpt 看到 excerpt 已存在 → 跳过。
作者没写副标题 → 上游不写 excerpt。extract-excerpt 从正文挑首段、语义判断后写入 excerpt 字段。

translate 落地译文后，handoff prompt 指示 Claude 调用 extract-excerpt skill，仅对没有 excerpt 的 slug 生效。

调用契约：slugs（本批 slug）/ target_root（= translate 的 output_root = materials/output/<date>/）；max_chars / overwrite 用 skill 默认值（200 / false）
extract-excerpt 是 leaf executor：只做"读译文 → 看 excerpt 是否已有 → 语义判断第一段有效正文 → 写 frontmatter"，不决定路径、不翻译、不接入站点导航
摘要提取是语义判断，由 Claude 执行，不是脚本；详见 .claude/skills/extract-excerpt/SKILL.md
这一步只动 frontmatter、不动正文；与步骤 2.5 的图片改写互不冲突，但放在 compress 之前先跑

写进 frontmatter 的 excerpt（以及 medium-fetch 抓来、translate 透传的 tags）会在步骤 4 publish 时被 register_meta 自动同步进站点 _meta.json 的 item，连同 cover（正文首图）一并写入——正常日更流水线无需任何额外操作。

存量文章 backfill：对【已经发布过】的文章补写 excerpt / tags / cover（item 早已在 _meta.json 里，register_meta 查重会跳过、不会补新字段），补好 frontmatter 后再跑一次 node scripts/sync-meta.js 把 excerpt / tags（frontmatter）/ cover（正文首图）回填进 _meta.json。该脚本幂等、只改有差异的 item；tags 仅在 frontmatter 真有非空标签时才写入，不会用空数组覆盖已手工精选的存量标签；cover 仅在文章有图片时写入。这是 _meta.json 同步的标准动作，不需要再逐次确认。

步骤 2.5：compress（图片转 WebP）

node scripts/compress-images.js --date <YYYY-MM-DD>     # 默认今天
node scripts/compress-images.js --date <YYYY-MM-DD> --dry-run
node scripts/compress-images.js --png-q 80 --gif-q 75   # 默认值，按需调

把 materials/output/<date>/images/<slug>/*.{png,gif} 用 cwebp / gif2webp -mixed 转成 .webp,删原文件,并改写当日所有 .md 中的图片引用 (./images/...png|gif → .webp)。jpg / jpeg / webp 一律不动 (jpg→webp 是 lossy→lossy,得不偿失)。

依赖系统二进制 cwebp / gif2webp (brew install webp)。缺工具时启动报错退出。

review.js 不感知本步——它读到的 .md 已是 .webp 引用,自动渲染对的图。

步骤 3：review（人工过滤）

node scripts/review.js --date <YYYY-MM-DD>      # 默认今天
node scripts/review.js --no-publish             # 只写 _review.yaml，不自动 spawn publish.js

review.js 起本地 HTTP 服务（端口默认由 OS 分配）并自动开浏览器：左侧卡片列表（每篇标题/作者/字数 + 复选框），点击在右侧渲染完整译文（含图片）。勾选后点「提交并发布」：

把 { date, publish: [slugs] } 写入 materials/output/<date>/_review.yaml
默认顺手 spawn node publish.js --date <date>；stdio 透传到 review.js 所在终端
浏览器显示最终成功/失败 + 末尾 30~50 行日志；成功时服务自动退出

publish 失败时服务不退出，方便手动重跑 node scripts/publish.js --date <date>（会读现有 _review.yaml，无需重审）。

步骤 4：publish

node scripts/publish.js                          # 用 publish.default
node scripts/publish.js --target ai-insights     # 指定 target
node scripts/publish.js --keep-intermediates     # 跳过 cleanup

publish.js 把 output/<date>/ 镜像复制到 <dest>/<YYYY-MM>/<DD>/；target 上配了 register_meta 时（默认 ai-insights 已配）会顺手把本日文章注册进站点 _meta.json，按月分组、自动提 title/author/translated/excerpt/tags（来自 frontmatter）+ cover（正文第一张图）、item.path = <url_prefix>/<slug>（不含日期，全站唯一）幂等。全部 target 成功后默认清理 raw/<slug>/ 与 output/<date>/（保留 sub-list/<date>.json）。

_meta.json 的 item 携带了 /ai-insights 列表卡片渲染所需的全部字段（title / summary / cover / tags / path）。站点卡片直接读 _meta.json，不再 fetch 文章 .md 原文。

publish.js 启动时会读 output/<date>/_review.yaml（review 步骤产物）：

date: 2026-05-19
publish: [slug-a, slug-b, ...]

只发布 publish 列表里的 slug，其它跳过 copy + register_meta。_review.yaml 不存在时按全部发布（向后兼容，可独立手动 publish）。

完整 CLI / 流程 / cleanup 行为 / 失败排查 → docs/publish.md。

正常流程是由 review.js 的「提交并发布」按钮自动触发 publish.js；手动跑的触发短语：「发布到 ai-insights」/「publish ai-insights」/「publish」等。run.js 不会自动调 publish.js。

触发短语 → 行为

短语	行为	必先 Read
「跑一遍今日素材」/「material pipeline」/「日更素材」	`node scripts/run.js`	`docs/run.md`
「压缩图片 / compress images」	`node scripts/compress-images.js --date <date>`	见步骤 2.5
「审稿 / review」	`node scripts/review.js --date <date>`	见步骤 3
「发布到 ai-insights」/「publish ai-insights」/「publish」	`node scripts/publish.js`	`docs/publish.md`
「同步 excerpt / tags 到 _meta.json」/「回填摘要标签」	`node scripts/sync-meta.js`	见步骤 2.4

配置

config.yaml 是订阅源 + 发布目标的单一真相：

sources:
  medium:
    tags: [ai, artificial-intelligence, ai-agent, llm, technology]
    limit: 5

publish:
  default: ai-insights
  targets:
    ai-insights:
      type: local-dir
      dest: "/Users/sunfei/development/beatai/public/docs/ai-insights"
      images_repo:                                            # 可选；填了图片就推到独立 GitHub 仓库（含 cover 缩略图），并更新 src/config/assetsPin.json 锁定的 SHA。删除此段则退回老行为：图片 cpSync 到 dest 下。
        repo_key: "primary"                                   # 对应 src/config/assetsRepos.json 的 repos.<key>
      register_meta:                                          # 可选；填了就会把本日文章注册进站点 _meta.json
        path: ".../ai-insights/_meta.json"
        url_prefix: "/ai-insights"
        file_prefix: "/docs/ai-insights"
      sync_readme: ".../scripts/sync-ai-insights-readme.mjs"  # 可选；register_meta 后再 spawn 它把 README 同步到最新文章列表

tags:                       # 可选；文章标签策略
  allow_tags:               # 标签白名单——只有列出的标签才会被写进文章
    - Technology            # medium-fetch 抓到的标签先与此列表求交集（大小写不敏感）
    - Artificial Intelligence
    - AI Agent
    - LLM
    - Machine Learning

字段详解：sources.* 见 docs/run.md，publish.targets.*（含 register_meta）见 docs/publish.md。

tags.allow_tags（可选）：文章标签白名单。run.js 把它通过 ALLOW_TAGS 环境变量传给 medium-fetch，抓到的 Medium 标签只保留在白名单内的（大小写不敏感、保留原始大小写）。未配置或为空 → 不过滤，保留 Medium 全部标签。过滤在抓取阶段一次完成，因此 raw / 译文 / 已发布 md / _meta.json 各处标签自然一致。

这个 skill 不做的事

不实现 LLM 翻译——translate skill 的职责
不自动 publish——必须用户明确触发，避免误推
不生成 / 不翻译 tags——tags 来自 medium-fetch 抓取的 Medium 原文标签（英文原样），经 translate 不翻译地透传、再由 register_meta 写进 _meta.json；pipeline 不做 LLM 打标签、不翻译。唯一的处理是按 config.tags.allow_tags 白名单做交集过滤（见「配置」），未配置则全部保留。文章在 Medium 上没有标签、或全部被白名单过滤掉时 tags 为 []
不并发 fetch / 不错误重试——对 Cloudflare 友好；失败下次幂等再来。但单篇 fetch 有硬超时（run.js 顶部 FETCH_TIMEOUT_MS，默认 4 分钟）：medium-fetch 单篇卡死（如某张图片 CDN 连接挂死、Node fetch 无超时）时由 orchestrator kill 子进程、该 slug 记 fetch 失败并丢弃、继续下一篇——详见 docs/run.md「单篇 fetch 硬超时」
不修改 leaf skill 的任何代码——只编排，不改 medium-sub / medium-fetch / translate 内部

文件清单

material-pipeline/
├── SKILL.md             # 本文档（核心规则 + 流程入口）
├── config.yaml          # 订阅源 + publish 配置（单一真相）
├── docs/
│   ├── run.md           # run.js 详解（CLI / 输出契约 / dedup / 失败排查）
│   └── publish.md       # publish.js 详解（CLI / cleanup / 扩展指南）
├── materials/           # 中间产物工作区（脚本内部使用）
│   ├── sub-list/<date>.json    # 耐久；跨日去重源；publish 不清理
│   ├── raw/<slug>/             # 临时；publish 成功后清理
│   └── output/<date>/          # 临时；publish 成功后清理
│       ├── <slug>.md           # 翻译产物
│       ├── images/<slug>/      # 翻译产物图片
│       └── _review.yaml        # review.js 写；publish.js 读，决定哪些 slug 入站
└── scripts/
    ├── package.json     # 依赖：yaml / gray-matter / marked；type:module，node>=18
    ├── run.js           # 主脚本：list + fetch + handoff
    ├── compress-images.js   # output 内 png/gif → webp + 改写 .md 引用（依赖系统 cwebp / gif2webp）
    ├── review.js        # 本地审稿服务：UI 勾选 → 写 _review.yaml → spawn publish.js
    ├── publish.js       # output 镜像 → dest + register_meta（excerpt/tags/cover）+ cleanup
    ├── sync-meta.js     # 把 excerpt/tags/cover 回填进 _meta.json（backfill 用，幂等）
    └── .gitignore       # 忽略 node_modules

转码核心位于项目根 scripts/lib/image-webp.mjs。

material-pipeline

More from this repository

Material Pipeline 编排器

核心规则：orchestrator vs leaf executors

Orchestrator（本 skill）拥有的决策

Leaf executor 契约

工作流程

步骤 1：list + fetch

步骤 2：translate handoff

步骤 2.4：extract-excerpt（补 excerpt 字段）

步骤 2.5：compress（图片转 WebP）

步骤 3：review（人工过滤）

步骤 4：publish

触发短语 → 行为

配置

这个 skill 不做的事

文件清单

Material Pipeline 编排器

核心规则：orchestrator vs leaf executors

Orchestrator（本 skill）拥有的决策

Leaf executor 契约

工作流程

步骤 1：list + fetch

步骤 2：translate handoff

步骤 2.4：extract-excerpt（补 excerpt 字段）

步骤 2.5：compress（图片转 WebP）

步骤 3：review（人工过滤）

步骤 4：publish

触发短语 → 行为

配置

这个 skill 不做的事

文件清单

More from this repository