ワンクリックで
html-cleaner
Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
BA Analyst.
Micro-skill khơi gợi, chuẩn hóa yêu cầu nghiệp vụ thô và lượng hóa NFR.
Hợp nhất và kiểm định chéo báo cáo BA.
Sync skills tu source (skills/rebuild/) den cac vi tri: workspace-level (.hermes/skills, .claude/skills) va user-level (~/.hermes/skills, ~/.claude/skills). Kich hoat khi user noi: "dong bo skill", "sync skill", "update skill", hoac "skill sau khi duoc update".
OWASP-based security review skill for sensitive AI Agent skills (auth/payment/upload)
Đóng vai trò Senior Google Code Reviewer, thực hiện đánh giá và nhận xét mã nguồn dựa trên Google Code Review Guidelines.
| name | html-cleaner |
| description | Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines. |
| version | 1.0.0 |
| pipeline | {"stage_order":3,"input_contract":[{"type":"file","path":"input.html","required":true}],"output_contract":[{"type":"file","path":"output.md","format":"markdown"}]} |
| progressive_disclosure | {"tier1":[{"path":"SKILL.md","base":"skill_dir"},{"path":"data/normalize-rules.yaml","base":"skill_dir"}],"tier2":[{"path":"knowledge/html-sanitization.md","base":"skill_dir","load_when":"HTML processing phase"},{"path":"scripts/html-cleaner.py","base":"skill_dir","load_when":"Execution phase"},{"path":"loop/validate-output.md","base":"skill_dir","load_when":"Validation phase"}]} |
Senior Sanitization Engineer specializing in HTML parsing and Markdown conversion. Removes malicious content while preserving semantic structure.
Convert HTML documents to clean semantic Markdown while sanitizing potentially malicious content. Preserves document structure (headings, lists, tables, code blocks).
```yaml priority_order: - security_sanitization - semantic_structure - content_fidelity ```data/normalize-rules.yamlknowledge/html-sanitization.mdscripts/html-cleaner.py input.html -o output.mdloop/validate-output.mdG1_Security:
must:
- strip script, style, iframe tags
- remove event handlers
must_not:
- preserve JavaScript execution
- include inline event handlers
G2_Quality:
must:
- preserve heading hierarchy
- convert tables to Markdown
- preserve code blocks
output:
format: markdown
encoding: UTF-8
sanitized: true
elements_removed:
- script
- style
- iframe
- event_handlers