Run any Skill in Manus with one click

Get Started

blog-rss-wiki-graph

从博客 RSS 生成 Wiki 并分析页面间语义关联关系 — 自动生成概念页、提取 wikilinks、构建链接图、输出 CSV 报告

Run Skill in Manus

Overview

从博客 RSS 生成 Wiki 并分析页面间语义关联关系 — 自动生成概念页、提取 wikilinks、构建链接图、输出 CSV 报告

Install command

npx skills add https://github.com/crazypeace/hermes-skill-blog-rss-wiki-graph --skill blog-rss-wiki-graph

Copy and paste this command into Claude Code to install the skill

Source

crazypeace/hermes-skill-blog-rss-wiki-graph

Stars1

Forks0

UpdatedApril 23, 2026 at 14:23

SKILL.md

readonly

Source

crazypeace

crazypeace/hermes-skill-blog-rss-wiki-graph

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	blog-rss-wiki-graph
description	从博客 RSS 生成 Wiki 并分析页面间语义关联关系 — 自动生成概念页、提取 wikilinks、构建链接图、输出 CSV 报告
version	1.1.1
author	Hermes Agent
tags	["wiki","blog","rss","graph-analysis","semantic-links","knowledge-base"]
metadata	{"hermes":{"tags":["wiki","blog","rss","graph-analysis","semantic-links"],"category":"research","related_skills":["llm-wiki"]}}

Blog RSS Wiki Graph Analyzer

从博客 RSS feed 批量 ingest 文章，为每篇文章生成语义关联的 wiki 概念页（含 [[wikilinks]]），构建页面间链接图，输出 CSV 格式的关联分析报告。

核心流程

RSS Feed → prepare_ingest.py（拉取+索引）→ LLM 语义分析 → 创建 concept 页（含 wikilinks）→ execute_ingest.py（提取链接+生成报告）

每篇文章经历 4 步：

Fetch — 从 RSS 拉取文章，保存 raw，更新 page_index
Analyze — LLM 分析文章内容，识别与已有页面的语义关联
Write — 创建 concept 页，写入 YAML frontmatter + 摘要 + [[wikilinks]]
Update — 提取 wikilinks，追加到 link.csv，重新生成报告

目录结构

wiki/
├── page_index.csv          # 索引：index,page,pub_date,raw_url
├── link.csv                # 边集：source,target（无向图）
├── link_inbound.csv        # 报告：按度数排名
├── link_outbound_top3.csv  # 报告：每页 outbound 前3高度数页面
├── raw/articles/           # 原始文章（不可变）
├── concepts/               # 概念页（LLM 生成，含 wikilinks）
└── scripts/
    ├── prepare_ingest.py   # 拉取 RSS + 更新索引 + slug 去重
    └── execute_ingest.py   # 提取 wikilinks + 更新链接图 + 重新生成报告

CSV 文件格式

page_index.csv

index,page,pub_date,raw_url
1,hermes-python,2026-04-21,https://zelikk.blogspot.com/2026/04/hermes-python.html

link.csv（无向图，source < target 排序去重）

source,target
1,4
2,7

link_inbound.csv（按 degree 降序）

rank,index,page,degree,neighbors
1,42,develop-with-gpt,15,"13|27|30|36|..."

link_outbound_top3.csv

index,page,top1_index,top1_page,top2_index,top2_page,top3_index,top3_page
1,hermes-python,13,hermes-agent-oracle-vps-ubuntu-root,4,hermes-agent-telegram-group-topic-agent,...

概念页模板

---
title: "<文章标题>"
created: YYYY-MM-DD
updated: YYYY-MM-DD
type: article
tags: [relevant, tags]
source_url: <原始链接>
raw_source: raw/articles/<slug>.md
---

# <文章标题>

<1-3 句中文摘要>

## 语义关联
- [[related-page-slug-1]]
- [[related-page-slug-2]]
- [[related-page-slug-3]]

关键脚本

prepare_ingest.py

拉取 RSS 最新文章，通过 URL 比对检测未索引的新文章，保存 raw，更新 page_index。

核心逻辑（URL 比对）：Blogger RSS 的 start-index=1 永远返回最新文章，新文章插入顶部会导致旧文章 index 后移。因此不能按顺序索引号 fetch，必须：

fetch_rss(start_index=1, max_results=50) 拉取最新文章列表
与 page_index.csv 中的 raw_url 比对，过滤出未索引的文章
按 pub_date 升序排序，每次处理最早的一篇（保持时间序）
连续运行可清空积压（一次一篇，直到无新文章返回 {"error": "No new article"}）
冷启动兜底：当最新 50 篇都已索引但 RSS 总数 > 已索引数时，自动分页遍历全部 RSS（每次 50 篇），找到剩余未索引的旧文章

自动初始化：如果 page_index.csv 不存在，自动创建带 header 的空文件。

Slug 去重逻辑（必须）：遇到重复 slug 时自动加 -index 后缀。

def deduplicate_slug(slug, index, existing_slugs):
    if slug not in existing_slugs:
        return slug
    import random
    chars = 'abcdefghjkmnpqrstuvwxyz23456789'
    while True:
        rand = ''.join(random.choices(chars, k=4))
        candidate = f"{slug}-{index}-{rand}"
        if candidate not in existing_slugs:
            return candidate

此逻辑同时覆盖两种场景：

Blogger 默认 slug blog-post：作者未设标题时，多篇文章会共享同一 slug，自动加后缀避免覆盖
同 slug 不同日期：如果 RSS 中出现同一 slug 但不同日期的文章（如更新版），-index 后缀同样确保唯一性

execute_ingest.py

从 concept 页提取 [[wikilinks]]，查找 page_index 获取 index，追加边到 link.csv（跳过已存在的边），重新生成报告。

批量处理策略

冷启动 vs 增量更新

冷启动（从零构建 wiki）

当 page_index.csv 不存在时，prepare_ingest.py 会自动创建空索引文件。运行流程：

前 50 次运行：每次拉取 RSS 最新 50 篇，取 pub_date 最早的一篇处理
第 51 次及之后：脚本检测到最新 50 篇都已索引，但 RSS 总数 > 已索引数，自动分页遍历全部 RSS（每次 50 篇），找到剩余未索引的旧文章继续处理
直到全部文章处理完毕，返回 {"error": "No new article"}

增量更新（已有 wiki）

每次运行只检查 RSS 最新 50 篇。新文章总是在 RSS 顶部，因此不需要全量遍历，速度快、开销小。

使用 delegate_task

每批 5 篇文章，通过 delegate_task 并行处理：

delegate_task:
  goal: "Process 5 blog articles (next 5 RSS indices) into the wiki at ~/wiki/"
  toolsets: ["terminal", "file"]
  max_iterations: 60

LLM 语义分析要点

读取 existing_pages 列表（格式 "index: slug"）
识别语义相关页面（同主题、同工具、同问题域）
每个 concept 页添加 2-6 个 [[wikilinks]]
使用中文摘要

验收检查清单

完成后运行以下检查：

cd wiki

# 1. 文件一致性
echo "index:$(tail -n +2 page_index.csv | wc -l) raw:$(ls raw/articles/*.md | wc -l) concepts:$(ls concepts/*.md | wc -l)"

# 2. slug 唯一性
python3 -c "
import csv
from collections import Counter
slugs = [r['page'] for r in csv.DictReader(open('page_index.csv'))]
dupes = {k:v for k,v in Counter(slugs).items() if v > 1}
print(f'Duplicate slugs: {dupes if dupes else \"NONE\"}')
"

# 3. 边去重
python3 -c "
import csv
edges = set()
dupes = 0
for row in csv.DictReader(open('link.csv')):
    e = tuple(sorted([row['source'], row['target']]))
    if e in edges: dupes += 1
    edges.add(e)
print(f'Edges: {len(edges)}, Duplicates: {dupes}')
"

# 4. 断裂链接
python3 -c "
import csv, os, re
valid = {r['page'] for r in csv.DictReader(open('page_index.csv'))}
broken = []
for f in os.listdir('concepts'):
    if not f.endswith('.md'): continue
    with open(f'concepts/{f}') as fh:
        for l in re.findall(r'\[\[([^\]|]+?)(?:\|[^\]]+)?\]\]', fh.read()):
            if l not in valid: broken.append(f'{f[:-3]} → [[{l}]]')
print(f'Broken: {len(broken)}')
for b in broken: print(f'  {b}')
"