| X/Twitter | x.com/*/status/*, twitter.com/* | GraphQL → FxTwitter → Syndication → oEmbed → Jina → Playwright |
| WeChat (微信公众号) | mp.weixin.qq.com/* | Playwright JS evaluate → Jina |
| Xiaohongshu (小红书) | xiaohongshu.com/explore/*, xhslink.com/* | API (xhshow) → Pinia Store injection → Jina → Playwright |
| YouTube | youtube.com/watch?v=*, youtu.be/*, Shorts | InnerTube API → yt-dlp subtitles → Groq Whisper |
| GitHub | github.com/*/* | REST API (Chinese README priority + subdirectory scan) |
| LinuxDo / Discourse | linux.do/t/* | Discourse topic JSON API → CDP → Playwright in-page fetch → Jina |
| IDCFlare / Discourse | idcflare.com/t/* | Discourse topic JSON API → CDP → Playwright in-page fetch → Jina |
| Feishu/Lark (飞书) | feishu.cn/docx/*, feishu.cn/wiki/* | Open API → CDP → Playwright PageMain → Jina |
| KDocs (金山文档) | kdocs.cn/l/* | Playwright ProseMirror DOM (virtual scroll + CDP) |
| Youdao Note (有道云笔记) | share.note.youdao.com/* | JSON API → Playwright iframe → Jina |
| Zhihu (知乎) | zhihu.com/question/*/answer/*, zhuanlan.zhihu.com/p/* | API v4 → Playwright CDP/DOM → Jina |
| Bilibili (B站) | bilibili.com/video/*, b23.tv/* | API metadata + 3-tier subtitle fallback (v2 → WBI v2 → Whisper) |
| Xiaoyuzhou (小宇宙) | xiaoyuzhoufm.com/episode/* | SSR __NEXT_DATA__ + Groq Whisper transcription |
| Ximalaya (喜马拉雅) | ximalaya.com/sound/*, m.ximalaya.com/sound/* | Web Revision API + canPlay degradation + Groq Whisper |
| Telegram | t.me/* | Telethon |
| HackerNews | news.ycombinator.com/item?id=* | Firebase API v0 (item.json + first-layer comments + hn top/new/best/ask/show/jobs list batch) |
| Medium | medium.com/*, *.medium.com/* | Jina Reader → JSON-LD articleBody → Stealth Browser; user/publication batch via RSS |
| Reddit | reddit.com/r/*/comments/*, redd.it/* | old.reddit.com .json + self-UA → CDP → Stealth Playwright + saved session → Jina (with Top 50 comments + reddit-sub batch) |
| Weibo | weibo.com/*, weibo.cn/*, m.weibo.cn/status/* | m.weibo.cn /statuses/show + container/getIndex + SSR $render_data fallback (SUB cookie optional) |
| Douyin (抖音) | douyin.com/video/*, v.douyin.com/* (short link) | CDP → Stealth Playwright + saved session → SSR RENDER_DATA → Jina (browser-side signing, no algorithm break) |
| Zsxq (知识星球) | articles.zsxq.com/id_*.html, wx.zsxq.com/group/*/topic/*, t.zsxq.com/* (short) | HTTP cookie (articles SSR HTML / api.zsxq.com topic JSON) → CDP → Stealth Playwright → Jina (auth-walled, login required) |
| RSS | RSS/Atom feed URLs | feedparser |
| Paywall news (300+) | NYT/WSJ/FT/Economist/Bloomberg... | JSON-LD → Googlebot/Bingbot UA → AMP → EU IP → archive.today → Google Cache → Jina |
| Any web page | Any other URL | JSON-LD pre-scan → Jina Reader fallback |