원클릭으로 Manus에서 모든 스킬 실행

$pwd:

x-fetch

Name: X Fetch
Author: beatai-org

// Use this skill when the user wants to download / 抓取 / 下载 a tweet (single or thread) from X / Twitter by URL. Trigger on requests like "抓取这条推 <url>", "下载 X 帖文", "fetch this tweet", "把这个 X 链接拉下来", or any URL pointing to `x.com/<user>/status/<id>` or `twitter.com/<user>/status/<id>`. Uses a persistent real-Chrome profile (separate from medium-fetch) and walks the thread of the OP author, outputting Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns. Preserves the tweet's original language verbatim.

Manus에서 실행

$ git log --oneline --stat

stars:4,675

forks:256

updated:2026년 5월 29일 04:50

파일 탐색기

5 개 파일

SKILL.md

readonly

related-skills.json

같은 저장소

material-pipeline.md

from "beatai-org/beatai"

Use this skill for the daily Medium article "selection → download → translate → publish" pipeline. Trigger phrases: "跑一遍今日素材" / "material pipeline" / "批量抓 medium" / "medium 流水线" / "日更素材". The optional publish step has its own triggers: "发布到 ai-insights" / "publish ai-insights" / "publish".

2026-05-294.7k

translate.md

from "beatai-org/beatai"

Use this skill when the user / a caller asks to translate or rewrite English Markdown articles into Chinese. Two modes: **原文翻译模式** (default) preserves paragraph/list/heading 1:1 with the source; **原文重写模式** restructures along Chinese tech-blog conventions (use only when caller explicitly opts in or user says "按中文重写"). Trigger on requests like "翻译英文文章", "把英文文章翻成中文", "翻译这些 .md", "用中文重写", "按中文阅读习惯写", or a fully-spec'd handoff prompt with slugs + input_root + output_root + date (+ optional mode). This is a **leaf executor**—it owns single-article quality (忠实通顺、人称代词约束、保留英文白名单、领域专名一致性). Path/dir setup, frontmatter shape, image-ref rewriting, and post-translation self-checks are all delegated to scripts/translate-prepare.mjs + scripts/translate-verify.mjs. Caller (typically material-pipeline) decides slugs / input_root / output_root / date / mode — if any required field is missing, stop and ask. Does not register results into site navigation; does not orchestrate batches.

2026-05-294.7k

extract-excerpt.md

from "beatai-org/beatai"

Use this skill to fill the `excerpt` field in translated Chinese Markdown articles when the article had no subtitle. Trigger on requests like "提取 excerpt", "补 excerpt", "extract excerpt", "给文章补 excerpt", or a fully-spec'd handoff prompt with slugs + target_root. This is a **leaf executor**—it owns the semantic judgment of "which candidate paragraph is the real opening body text" but **does NOT decide paths**: target_root comes from the caller. If the caller didn't provide it, stop and ask. Does not translate, does not register results into site navigation, does not orchestrate batches.

2026-05-284.7k

medium-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Medium article (including paid member-only stories) by URL. Trigger on requests like "抓取这篇 Medium 文章 <url>", "下载 medium 文章", "fetch this Medium article", "把这个 Medium 链接拉下来", or any URL pointing to medium.com / *.medium.com / publication custom domains (levelup.gitconnected.com, towardsdatascience.com, betterprogramming.pub, uxdesign.cc, hackernoon.com 等). Uses a persistent real-Chrome profile for member authentication, automatically routes custom-domain URLs through Medium's cross-domain SSO bridge, and outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

medium-sub.md

from "beatai-org/beatai"

Use this skill when the user wants to fetch Medium's "Recommended" article list for a configured set of tags. Trigger on requests like "拉取 Medium 推荐文章", "fetch medium recommended for tags", "每天抓一次 medium tag 推荐", or any request that mentions Medium tag recommended pages. Pure fetcher — emits a single JSON document to stdout. Does NOT write files, apply per-day limits, or do cross-day dedup; those concerns belong to the caller (e.g. material-pipeline). Reuses the chrome-profile from the medium-fetch skill (no separate login required).

2026-05-284.7k

substack-fetch.md

from "beatai-org/beatai"

Use this skill when the user wants to download / 抓取 / 下载 a Substack article by URL. Trigger on requests like "抓取这篇 Substack 文章 <url>", "下载 substack 文章", "fetch this Substack article", "把这个 substack 链接拉下来", or any URL pointing to `*.substack.com/p/<slug>` or a Substack custom domain (e.g. `blog.dailydoseofds.com/p/...`, `www.oneusefulthing.org/p/...`). Plain HTTP fetch (no Chrome / no login required for free posts); outputs Readability-cleaned Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns.

2026-05-284.7k

package.json

"author": "beatai-org"

"repository": "beatai-org/beatai"

GitHub 저장소 열기 Creator 저장소 보기

$ install --global

$ download --local

Manus에서 실행

$ useful --forSOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

name	x-fetch
description	Use this skill when the user wants to download / 抓取 / 下载 a tweet (single or thread) from X / Twitter by URL. Trigger on requests like "抓取这条推 <url>", "下载 X 帖文", "fetch this tweet", "把这个 X 链接拉下来", or any URL pointing to `x.com/<user>/status/<id>` or `twitter.com/<user>/status/<id>`. Uses a persistent real-Chrome profile (separate from medium-fetch) and walks the thread of the OP author, outputting Markdown + assets to RAW_DIR (caller-provided; required, no default). Does NOT translate, summarize, or decide where output goes — those are orchestrator concerns. Preserves the tweet's original language verbatim.
version	1.0.0

X / Twitter 帖文下载器

通过真实 Chrome + 持久化用户会话，下载 X（Twitter）帖文并产出清洗后的 Markdown + assets。两种模式自动识别：

模式	触发	输出
Tweet / thread	普通 status URL	OP 作者的接续 thread，每条 tweet 用 `---` 分隔
X Article（长贴）	同样 URL，但页面以 `twitterArticleReadView` 渲染	解析 Draft.js 富文本：保留标题层级、段落、加粗、斜体、链接、有序/无序列表、blockquote、内嵌图片位置

这是一个 leaf executor——只负责单次下载的执行（反检测、thread 拼接、article 富文本解析、图片本地化），不决定输出位置：RAW_DIR 必须由调用方通过环境变量传入（绝对路径），未设置直接报错退出。这是和编排层的硬契约，与 medium-fetch 一致。

适用场景

用户给一个 X / Twitter URL，要求"下载 / 抓取 / 拉取"
URL 指向 x.com/<user>/status/<id> 或 twitter.com/<user>/status/<id>
后续会有其他 skill / 流程把 Markdown 翻译 / 入库

不适用

非 X 内容（用通用 HTML 下载工具，或 medium-fetch）
受保护账号（"protected tweets"）且当前 chrome-profile 未关注该账号
视频内容（v1 只抓文本 + 静态图片，不抓 video，会在输出里以占位符提示）

前置依赖

系统已安装 Google Chrome（脚本通过 channel: 'chrome' 调用，不使用 Playwright 自带 Chromium，与 medium-fetch 同理避指纹）
Node.js ≥ 18
已 npm install（首次使用时运行一次）

核心机制（为什么这么做）

1. 反检测

X 也会对自动化做指纹检测。沿用 medium-fetch 的成熟做法：

系统真实 Chrome（channel: 'chrome'）
有头 + 屏幕外窗口（--window-position=-3000,-3000），避开 headless 检测
持久化 profile（chrome-profile/，与 medium-fetch 独立，互不干扰）
去掉 --enable-automation 标志 + 注入脚本隐藏 navigator.webdriver

2. Login 可选

X 单条帖文很多时候 anonymous 也能看；但遇到以下情形时必须先登录：

长 thread 后段被"Show more / Sign in to view"挡住
帖子作者设为 protected
X 临时给某地区/IP 推全站登录墙

脚本第一次跑无 profile 时，会提示运行 node login.js。登录后 session 通常可用数月。

3. Thread 拼接（Tweet 模式）

X 一个 status URL 实际呈现的是一段"对话视图"：上方是其父级回复链、目标 tweet、下方是回复（含同作者的接续 tweet 与他人的 reply）。

本脚本只抓 OP 作者的同作者接续 thread：

找到 URL 中指定 status id 对应的 <article> 作为起点
沿 DOM 顺序向下走，遇到的每个 article 若作者 handle 与 OP 一致，加入 thread；遇到首个非 OP 的 article 即停止
如此既不漏抓自家 thread，也不混入他人 reply

X 用虚拟滚动渲染时间线，所以脚本会多次滚动累积（按 statusId 去重），避免遗漏被卸载的 tweet。article[data-testid="tweet"] 也会在 quoted tweet 内嵌，脚本基于 cellInnerDiv 容器只取每个 cell 最外层的 article，避免引用块污染。

如需扩展（抓父链或他人回复），后续可加 flag，不在 v1 范围。

3b. X Article 富文本解析（Article 模式）

X Article（长贴）的 DOM 不走普通 tweet 结构，正文由 Draft.js 编辑器渲染。脚本检测到页面有 [data-testid="twitterArticleReadView"] 时切到 article 模式：

标题：[data-testid="twitter-article-title"]
正文容器：[data-testid="longformRichTextComponent"]
每个段落/标题/列表项都是一个 [data-block="true"] 块，按 Draft.js 的类名归类：
- longform-header-one/two/three/four → H1/H2/H3/H4
- longform-unordered-list-item → - ...
- longform-ordered-list-item → 1. ...
- longform-blockquote → > ...
- 其他 longform-unstyled → 段落
内嵌图片块：[data-testid="tweetPhoto"] → 按出现顺序插入 ![](./assets/NN.ext)
分隔区块（<section> 内的 [role="separator"]）→ ---
行内样式：<span style="font-weight: bold"> → **bold**、<span style="font-style: italic"> → *italic*、<a href> → [text](url)

4. 图片本地化

X 图片在 pbs.twimg.com/media/<hash>?format=<ext>&name=<size>。<size> 有 small / medium / large / orig 几档。脚本强制改写为 name=large（在不爆带宽前提下拿到接近原图的清晰度），按出现顺序编号 01.<ext> … 存入 assets/，md 引用本地相对路径。

视频暂不抓（X 的 video CDN 是 m3u8 流，独立流程，超出 v1）。脚本在 md 里以 > 视频占位符: <video URL> 标注。

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

cd /Users/sunfei/development/beatai/.claude/skills/x-fetch/scripts
npm install
node login.js

node login.js 弹出真实 Chrome。用户手动完成：

如出现自动化检测验证 → 手动通过
用任意 X 账号登录（不需要付费/订阅）
看到首页 timeline → 脚本自动检测 auth_token cookie 后退出（无需按回车）

Session 过期信号：fetch 输出包含 ⚠ 检测到登录墙 或抓到的 tweet 数 = 0 → 重跑 node login.js。

第二步：抓取一条/一段 thread（无人值守）

RAW_DIR 必填——调用方必须以环境变量传入 md + assets 的目的根目录（绝对路径）：

RAW_DIR=/abs/path/to/raw node fetch.js <url>

未设置 RAW_DIR 会直接报错退出。这是 leaf executor 的硬契约（同 medium-fetch）：本脚本不替编排者决定文件落到哪里。

输出结构（拆成两个根，与 medium-fetch 完全对齐）：

中间产物区 <articles> = .claude/skills/x-fetch/scripts/articles/（脚本本地，gitignored；v1 暂不写 html 中间产物，目录预留）

工作区 <RAW_DIR> = 调用方传入（必填）。material-pipeline 的 run.js 复用 medium-fetch 时已经把 RAW_DIR 指向 .claude/skills/material-pipeline/materials/raw/；x-fetch 复用同一约定。

<RAW_DIR>/<slug>/<slug>.md — md 正文，顶部带 YAML frontmatter（title / author / url / fetched / lang，可选 tags / excerpt），紧随 # <title> 一级标题与正文。Thread 内每条 tweet 之间用 --- 分隔。
<RAW_DIR>/<slug>/assets/<NN>.<ext> — 文章中所有 pbs.twimg.com/media/ 图片的本地副本（按出现顺序编号 01.png、02.jpg…）

<slug> = <user>-<tweetId>，username 中的非字母数字归一为 -。例如 https://x.com/akshay_pachaar/status/2041146899319971922 → akshay-pachaar-2041146899319971922。

每条/段 thread 在 <RAW_DIR> 下各一个独立子目录。md + assets 自成闭包——可整目录拷贝、离线浏览，链接不会断。元数据通过 md 内嵌 frontmatter 承载，没有外挂的 misc.json（与 medium-fetch 一致）。

环境变量

变量	必填	含义	默认
`RAW_DIR`	是	md + assets 输出根（绝对路径）	无；未设置即报错退出
`X_FETCH_HOME`	否	脚本内部状态根（chrome-profile / articles）	脚本所在目录
`ARTICLES_DIR`	否	html / 中间产物归档根（v1 暂未使用）	`$X_FETCH_HOME/articles`
`ALLOW_TAGS`	否	标签白名单（JSON 数组字符串）。设置后，抓到的 hashtags 只保留交集（大小写不敏感）；未设置则保留全部。与 `RAW_DIR` 同模式——由调用方决定并传入	无；不过滤

RAW_DIR 是输出位置——属编排者决策；X_FETCH_HOME / ARTICLES_DIR 是脚本内部 state 位置——leaf 自己有合理默认。

# material-pipeline 标准调用（与 medium-fetch 同一 RAW_DIR）
RAW_DIR=/abs/path/to/raw node fetch.js <url>

输出契约

<raw>/<slug>/<slug>.md 结构：

---
title: <OP tweet 首句截断，<=80 char>
author: <display name> (@<handle>)
url: <原始 URL>
fetched: YYYY-MM-DD
lang: <检测得到的 zh / en / 其它>
tags:
  - <hashtag 1>
  - <hashtag 2>
excerpt: <OP tweet 全文，截断 ≤200 char>
---

# <title>

<OP tweet 正文（保留原语言）>

---

<thread 第 2 条正文>

---

<thread 第 3 条正文>
…

顶部三横杠包裹的 YAML 块即 frontmatter，被站点渲染器（gray-matter）识别为元数据，不会作为正文显示；title 字段同时作为页面标题。
tags 来自 thread 内出现的 hashtag，去重保序、保留原大小写；没有 hashtag 时该字段省略。
若调用方传入 ALLOW_TAGS 环境变量，抓到的 hashtag 会先与白名单求交集再写入（大小写不敏感）；未传则保留全部。
frontmatter 之后紧跟 # <title> 一级标题，然后是 thread 各 tweet 之间用 \n\n---\n\n 分隔的正文段落。图片以本地相对路径 ./assets/<NN>.<ext> 引用，紧跟出现它的那条 tweet 后面。
Mentions（@username）保留为 [@username](https://x.com/username)；hashtags 保留为 [#tag](https://x.com/hashtag/tag)；t.co 短链解析为展开后的目标 URL（X DOM 自带）。
视频以 > 视频占位符: <video player URL> 引用块标注。
元数据通过 md 内嵌承载，不生成单独的 misc.json。

失败排查清单

现象	原因	处置
`Failed to create a ProcessSingleton`	上次 Chrome 进程未退出	`pkill -f "user-data-dir=.*x-fetch/scripts/chrome-profile" && rm -f chrome-profile/Singleton{Lock,Cookie,Socket}`
抓到的 thread 长度 0 / 标题是登录提示文案	session 过期或 X 临时上登录墙	重跑 `node login.js`
Thread 接续条数比浏览器里少	DOM 还未加载完，自动滚动不够	增大 `SCROLL_PASSES`（fetch.js 顶部常量），或重跑一次
图片 `pbs.twimg.com` 403	罕见；Twitter CDN 偶发拒绝	重跑；持久失败 → 降级 `name=medium`
Mentions / hashtags 渲染成纯文本	DOM 解析路径漏链接	多见于 X 改版后；查 `articleToMarkdown()` 里的链接分支
Chrome 窗口闪现 / 抢焦点	macOS headful 必须的妥协；窗口已被 `--window-position=-3000,-3000` 推到屏幕外	正常现象

这个 skill 不做的事

不翻译——只下载和清洗，与 medium-fetch 一致。翻译由 translate skill 完成。
不抓视频 / GIF 流——只抓静态图片；视频以占位符标注。
不去重——同一 URL 重复抓会覆盖之前的 <slug>/，与 medium-fetch 一致。
不批量发现——单次单 URL，trending / 关键词搜索是另一个 skill 的事（参考 medium-sub 模式）。
不抓父级链 / 不抓他人 reply——只抓 OP 作者的接续 thread。

使用示例

单条 / thread

User: 把这条 X 帖下载下来 https://x.com/<user>/status/<id>

Claude:
[bash] RAW_DIR=/abs/path/raw node fetch.js "https://x.com/<user>/status/<id>"
  模式: Tweet / thread
✓ MD:   <RAW_DIR>/<user>-<id>/<user>-<id>.md
  作者: <name> (@<user>)
  thread 条数: 14
  图片: 5

X Article（长贴）

User: 把这篇 X Article 下载下来 https://x.com/akshay_pachaar/status/2041146899319971922

Claude:
[bash] RAW_DIR=/abs/path/raw node fetch.js "https://x.com/akshay_pachaar/status/2041146899319971922"
  模式: X Article (long-form)
✓ MD:   <RAW_DIR>/akshay-pachaar-2041146899319971922/akshay-pachaar-2041146899319971922.md
  标题: The Anatomy of an Agent Harness
  作者: Akshay (@akshay_pachaar)
  blocks 数: 104
  图片: 8
  字符数: 18919

文件清单

x-fetch/
├── SKILL.md           # 本文档
└── scripts/
    ├── package.json   # 依赖声明（playwright + yaml）
    ├── login.js       # 一次性登录脚本
    ├── fetch.js       # 抓取脚本（CLI: node fetch.js <url>）
    └── .gitignore     # 忽略 chrome-profile, articles, node_modules

x-fetch

이 저장소의 다른 Skills

X / Twitter 帖文下载器

适用场景

不适用

前置依赖

核心机制（为什么这么做）

1. 反检测

2. Login 可选

3. Thread 拼接（Tweet 模式）

3b. X Article 富文本解析（Article 模式）

4. 图片本地化

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

第二步：抓取一条/一段 thread（无人值守）

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

单条 / thread

X Article（长贴）

文件清单

X / Twitter 帖文下载器

适用场景

不适用

前置依赖

核心机制（为什么这么做）

1. 反检测

2. Login 可选

3. Thread 拼接（Tweet 模式）

3b. X Article 富文本解析（Article 模式）

4. 图片本地化

工作流程

第一步：一次性 setup（仅首次或 session 过期时）

第二步：抓取一条/一段 thread（无人值守）

环境变量

输出契约

失败排查清单

这个 skill 不做的事

使用示例

单条 / thread

X Article（长贴）

文件清单

이 저장소의 다른 Skills