with one click
web-article-to-siyuan
// Extract one article from a supported Chinese content platform, review the Markdown with an AI agent, and upload the reviewed Markdown to SiYuan.
// Extract one article from a supported Chinese content platform, review the Markdown with an AI agent, and upload the reviewed Markdown to SiYuan.
[HINT] Download the complete skill directory including SKILL.md and all related files
| name | web-article-to-siyuan |
| description | Extract one article from a supported Chinese content platform, review the Markdown with an AI agent, and upload the reviewed Markdown to SiYuan. |
Use this skill when the user wants to extract one article from a supported platform, clean it into a structured Markdown article, and upload it under a specific SiYuan parent document ID.
mp.weixin.qq.com/s/...zhuanlan.zhihu.com/p/...Extract one URL:
python -m src.cli extract URL
Read the generated Markdown file in outputs/reviewed/. The first-round crawler output is kept in outputs/raw/ and should not be edited. Each extraction also writes a manifest under outputs/manifests/ with source metadata, output paths, crawl status, and image download results. Any remote images found in the Markdown are downloaded under outputs/assets/... and rewritten to local relative links.
Review and rewrite the Markdown while preserving the main content:
Save the reviewed article with this structure:
# Article Title
## AI Summary
- ...
- ...
---
## Main Article
...
Run the AI review workflow when model credentials are configured:
python -m src.cli review outputs/reviewed/ARTICLE.md
This command rewrites the Markdown, updates outputs/reviews/ARTICLE.json, runs deterministic validation, runs a pre-upload AI verification, and retries when verification feedback requires revision.
If reviewing manually instead, create and fill a structured review report:
python -m src.cli review-report outputs/reviewed/ARTICLE.md
Fill the generated outputs/reviews/ARTICLE.json with removed noise, preserved sections, formatting changes, image decisions, and suggested rule candidates when applicable. Set status to reviewed and fill review.summary after the Markdown has actually been rewritten.
Validate that the reviewed Markdown is ready for upload:
python -m src.cli validate outputs/reviewed/ARTICLE.md
Do not upload if validation fails. Fix the Markdown or review report first. Validation rejects missing review structure, remote or missing local images, bare URLs, incomplete review reports, and weak long-article WeChat structure.
Report the review result to the user:
validate passed.After confirmation, upload the reviewed Markdown:
python -m src.cli upload outputs/reviewed/ARTICLE.md
| Command | Description |
|---|---|
extract URL | Crawl one supported article URL, run first-round platform cleaning, write raw Markdown to outputs/raw/, copy a review draft to outputs/reviewed/, download referenced remote images to local assets, and write an extraction manifest to outputs/manifests/. |
review FILE | Use the configured OpenAI or Anthropic model to rewrite one reviewed Markdown file, update the review report, validate, and run pre-upload AI verification. |
review-report FILE | Create a draft structured review report under outputs/reviews/ for one reviewed Markdown file. |
verify-review FILE | Run only the configured AI pre-upload verification for one reviewed Markdown file. |
validate FILE | Check that one reviewed Markdown file has the required review structure, local image paths, Markdown links, extraction manifest, and completed review report. |
upload FILE | Upload one reviewed Markdown file to the configured SiYuan target after validation. Local images referenced by the Markdown are uploaded to SiYuan assets first. Does not re-crawl. |
run URL | Extract one URL, run AI review, validate, verify, and upload after the review passes. |
from src.integrations.siyuan import SiyuanClient
client = SiyuanClient(api_base="http://127.0.0.1:6806", token="TOKEN")
result = client.upload_markdown_under_parent("Article Title", markdown, parent_doc_id)
# result.doc_id, result.hpath, result.created
config.json; never pass API keys or tokens in shell command prefixes.outputs/raw/*.mdoutputs/reviewed/*.mdoutputs/manifests/*.jsonoutputs/reviews/*.jsonoutputs/assets/...ai in config.json; provider credentials and model parameters live under ai_providers.assets/... paths returned by SiYuan.