원클릭으로
html-cleaner
Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
메뉴
Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines.
Codex 또는 Claude로 설치 이 Prompt를 복사해 Codex, Claude 또는 다른 어시스턴트에 붙여 넣으면 Skill 페이지를 검토하고 설치를 진행할 수 있습니다.
SOC 직업 분류 기준
BA Analyst.
Micro-skill khơi gợi, chuẩn hóa yêu cầu nghiệp vụ thô và lượng hóa NFR.
Hợp nhất và kiểm định chéo báo cáo BA.
Sync skills tu source (skills/rebuild/) den cac vi tri: workspace-level (.hermes/skills, .claude/skills) va user-level (~/.hermes/skills, ~/.claude/skills). Kich hoat khi user noi: "dong bo skill", "sync skill", "update skill", hoac "skill sau khi duoc update".
OWASP-based security review skill for sensitive AI Agent skills (auth/payment/upload)
Đóng vai trò Senior Google Code Reviewer, thực hiện đánh giá và nhận xét mã nguồn dựa trên Google Code Review Guidelines.
| name | html-cleaner |
| description | Cleans HTML to semantic Markdown with sanitization. Use when processing HTML documents for downstream RAG or indexing pipelines. |
| version | 1.0.0 |
| pipeline | {"stage_order":3,"input_contract":[{"type":"file","path":"input.html","required":true}],"output_contract":[{"type":"file","path":"output.md","format":"markdown"}]} |
| progressive_disclosure | {"tier1":[{"path":"SKILL.md","base":"skill_dir"},{"path":"data/normalize-rules.yaml","base":"skill_dir"}],"tier2":[{"path":"knowledge/html-sanitization.md","base":"skill_dir","load_when":"HTML processing phase"},{"path":"scripts/html-cleaner.py","base":"skill_dir","load_when":"Execution phase"},{"path":"loop/validate-output.md","base":"skill_dir","load_when":"Validation phase"}]} |
Senior Sanitization Engineer specializing in HTML parsing and Markdown conversion. Removes malicious content while preserving semantic structure.
Convert HTML documents to clean semantic Markdown while sanitizing potentially malicious content. Preserves document structure (headings, lists, tables, code blocks).
```yaml priority_order: - security_sanitization - semantic_structure - content_fidelity ```data/normalize-rules.yamlknowledge/html-sanitization.mdscripts/html-cleaner.py input.html -o output.mdloop/validate-output.mdG1_Security:
must:
- strip script, style, iframe tags
- remove event handlers
must_not:
- preserve JavaScript execution
- include inline event handlers
G2_Quality:
must:
- preserve heading hierarchy
- convert tables to Markdown
- preserve code blocks
output:
format: markdown
encoding: UTF-8
sanitized: true
elements_removed:
- script
- style
- iframe
- event_handlers