一键在 Manus 中运行任何 Skill

开始使用

ndl-opensearch

NDL OpenSearch API scraper for Taiwan-related books (Publication Intel v3.1)

在 Manus 中运行

概览

NDL OpenSearch API scraper for Taiwan-related books (Publication Intel v3.1)

安装命令

npx skills add https://github.com/TuiTuiKoan/Tokyo_Taiwan_Radar --skill ndl-opensearch

复制此命令并粘贴到 Claude Code 中以安装该技能

来源

TuiTuiKoan/Tokyo_Taiwan_Radar

星标1

分支0

更新时间2026年6月3日 17:39

文件资源管理器

2 个文件

SKILL.md

readonly

name	ndl_opensearch
description	NDL OpenSearch API scraper for Taiwan-related books (Publication Intel v3.1)

ndl_opensearch Scraper

機能説明

来源名: ndl_opensearch
URL: https://ndlsearch.ndl.go.jp/api/opensearch
資料類型: 国立国会図書館（NDL）蔵書データベースの台湾関連書籍（mediatype=1）

NDL OpenSearch API に q=台湾, mediatype=1, cnt=100 でクエリし、書籍書誌情報を取得する。

技術規格

項目	詳細
プロトコル	HTTP GET / RSS 2.0 with Dublin Core namespaces
Namespace	`dc: http://purl.org/dc/elements/1.1/` / `dcterms: http://purl.org/dc/terms/`
Pagination	`&idx=` (1-based offset), 100件/ページ, 最大 500 件
`source_id` 形式	`ndl_{dc:identifier の末尾数字}` または `ndl_{md5(link)[:12]}`
発売日フィールド	`dcterms:issued` → `dc:date` → `pubDate`（優先度順）
発売日形式	`YYYY`, `YYYY-MM`, `YYYY-MM-DD` いずれも対応

来源分流説明

⚠️ NDL は発売日降順ソートではない

NDL OpenSearch のデフォルトソートは相関度 / 書誌 ID であり、発売日降順ではない。このため、古い書籍が先頭ページに登場することがある。

対策: 180 日 client-side フィルタを必ず実施（cutoff = date.today() - timedelta(days=180)）。 Server-side の日付フィルタは NDL API が提供していないため省略できない。

Active ビュー表示について

start_date が 30 日以上前の書籍 → is_active=true のまま DB に残るが、ウェブサイトの「開催中」フィルタからは外れる（仕様どおり）
書籍は単日時間点であるため end_date = start_date（単日イベント扱い）

ZERO_EVENT_OK 理由

180 日ウィンドウ内に台湾関連書籍が 0 件の日は正常（出版サイクルに依存）。 health_check.py の ZERO_EVENT_OK_SOURCES に登録済み。

特殊規則

出版事件欄位模板: location_name / location_address / business_hours / price_info 統一填 新書購買請洽各通路，performer 填作者，organizer 視為出版社，event_form = ["publication"]。
null-byte strip 必須: 全外部テキストに .replace("\x00", "") を適用
tzinfo=timezone.utc: JST-aware datetime 禁止。datetime(y, m, d, tzinfo=timezone.utc) を使用
name_ja_locked = True: 書名は NDL の確定値を保持する
organizer_type = ["government"]: 出版社ではなく NDL 登録機関扱い

既知の問題

NDL ↔ hanmoto 重複: 同一書籍が両ソースに登場する場合がある（既知の非バグ、少量の重複は許容）
発売日が年/月のみ: YYYY や YYYY-MM 形式は 1 月 1 日 / 月初として UTC midnight に正規化
dc:identifier の形式: URN・URL 混在。末尾 8 桁以上の数字列を抽出して stable ID とする

同仓库更多 Skills

同仓库

engineer

TuiTuiKoan/Tokyo_Taiwan_Radar

Implementation rules for database migrations, Python scrapers, and Next.js web for the Engineer agent

2026-06-031

scraper-expert

TuiTuiKoan/Tokyo_Taiwan_Radar

BaseScraper contract, field rules, and Peatix-specific conventions for the Scraper Expert agent

2026-06-031

hanmoto

TuiTuiKoan/Tokyo_Taiwan_Radar

版元ドットコム Playwright scraper for Taiwan-related books (Publication Intel v3.1)

2026-06-031

kawade-rss

TuiTuiKoan/Tokyo_Taiwan_Radar

河出書房新社 RDF/RSS 1.0 scraper for Taiwan-related books and events (Publication Intel v3.1)

2026-06-031

scraper-expert

TuiTuiKoan/Tokyo_Taiwan_Radar

BaseScraper contract, field rules, and Peatix-specific conventions for the Scraper Expert agent

2026-06-031

google-news-rss

TuiTuiKoan/Tokyo_Taiwan_Radar

Platform rules, Taiwan filter, date extraction, and known quirks for the Google News RSS scraper

2026-06-021

来源

TuiTuiKoan

TuiTuiKoan/Tokyo_Taiwan_Radar

打开 GitHub 仓库查看创作者相关仓库

安装命令

下载

在 Manus 中运行

适用职业SOC

软件开发工程师计算机与数学类职业15-1252L4

name	ndl_opensearch
description	NDL OpenSearch API scraper for Taiwan-related books (Publication Intel v3.1)

ndl_opensearch Scraper

機能説明

来源名: ndl_opensearch
URL: https://ndlsearch.ndl.go.jp/api/opensearch
資料類型: 国立国会図書館（NDL）蔵書データベースの台湾関連書籍（mediatype=1）

NDL OpenSearch API に q=台湾, mediatype=1, cnt=100 でクエリし、書籍書誌情報を取得する。

技術規格

項目	詳細
プロトコル	HTTP GET / RSS 2.0 with Dublin Core namespaces
Namespace	`dc: http://purl.org/dc/elements/1.1/` / `dcterms: http://purl.org/dc/terms/`
Pagination	`&idx=` (1-based offset), 100件/ページ, 最大 500 件
`source_id` 形式	`ndl_{dc:identifier の末尾数字}` または `ndl_{md5(link)[:12]}`
発売日フィールド	`dcterms:issued` → `dc:date` → `pubDate`（優先度順）
発売日形式	`YYYY`, `YYYY-MM`, `YYYY-MM-DD` いずれも対応

来源分流説明

⚠️ NDL は発売日降順ソートではない

NDL OpenSearch のデフォルトソートは相関度 / 書誌 ID であり、発売日降順ではない。このため、古い書籍が先頭ページに登場することがある。

Active ビュー表示について

start_date が 30 日以上前の書籍 → is_active=true のまま DB に残るが、ウェブサイトの「開催中」フィルタからは外れる（仕様どおり）
書籍は単日時間点であるため end_date = start_date（単日イベント扱い）

ZERO_EVENT_OK 理由

180 日ウィンドウ内に台湾関連書籍が 0 件の日は正常（出版サイクルに依存）。 health_check.py の ZERO_EVENT_OK_SOURCES に登録済み。

特殊規則

出版事件欄位模板: location_name / location_address / business_hours / price_info 統一填 新書購買請洽各通路，performer 填作者，organizer 視為出版社，event_form = ["publication"]。
null-byte strip 必須: 全外部テキストに .replace("\x00", "") を適用
tzinfo=timezone.utc: JST-aware datetime 禁止。datetime(y, m, d, tzinfo=timezone.utc) を使用
name_ja_locked = True: 書名は NDL の確定値を保持する
organizer_type = ["government"]: 出版社ではなく NDL 登録機関扱い

既知の問題

NDL ↔ hanmoto 重複: 同一書籍が両ソースに登場する場合がある（既知の非バグ、少量の重複は許容）
発売日が年/月のみ: YYYY や YYYY-MM 形式は 1 月 1 日 / 月初として UTC midnight に正規化
dc:identifier の形式: URN・URL 混在。末尾 8 桁以上の数字列を抽出して stable ID とする