com um clique
defuddle
// Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata.
// Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata.
| name | defuddle |
| description | Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata. |
| trigger | Use when user wants to extract/clean web page content, strip clutter from HTML, get article text from a URL, or convert web pages to clean markdown. Triggers include "defuddle", "extract article", "clean this page", "get content from URL", "strip clutter", "web extract". |
Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.
Before first use, check if defuddle is installed:
command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom
When user provides a URL, follow this workflow:
Always use both -m and -j flags to get markdown content with full metadata:
defuddle parse "<url>" -m -j
Show the user:
title fieldauthor fieldwordCount fieldIf this is the first time using defuddle in this conversation, ask the user:
"Save to which directory? (e.g.
~/Documents,~/Desktop, or a custom path)"
Remember the user's chosen directory for subsequent uses in the same conversation.
Write the file with frontmatter + full content:
---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---
# {title}
{markdown content}
File naming: Use the article title as filename, sanitized for filesystem:
The Shape of the Essay Field.mdTell the user the file path where it was saved.
defuddle parse <source> [options]
Arguments:
<source> — URL (https://...) or local HTML file pathOptions:
| Flag | Description |
|---|---|
-m, --markdown | Convert content to Markdown |
-j, --json | Output as JSON with full metadata |
-o, --output <file> | Write to file instead of stdout |
-p, --property <name> | Extract single property (title, description, domain, author, published, wordCount, content) |
--debug | Verbose logging |
When using -j, the response includes:
title — Article titleauthor — Author namepublished — Publication datedescription — Meta descriptioncontent — Extracted Markdown (when -m used)domain — Source domainfavicon — Favicon URLimage — Featured image URLsite — Site namewordCount — Word countparseTime — Processing time in msjsdom is required as a peer dependency