بنقرة واحدة
adapt
// Build a new platform adapter to extract content from an unsupported platform (Blogger, Ghost, Weebly, etc.)
// Build a new platform adapter to extract content from an unsupported platform (Blogger, Ghost, Weebly, etc.)
| name | adapt |
| description | Build a new platform adapter to extract content from an unsupported platform (Blogger, Ghost, Weebly, etc.) |
| allowed-tools | ["Bash","Read","Write","Edit","Glob","Grep","AskUserQuestion","WebSearch"] |
Guide the process of adding extraction support for a new platform. The result is a working adapter that plugs into the existing extraction pipeline.
src/adapters/ — if an adapter exists, this skill isn't needed.Understand how the target platform works before writing any code.
Figure out how to identify sites on this platform. Check:
.squarespace.com, .webflow.io, .wixsite.com)X-Squarespace-Version, X-Wix-Request-Id)Add detection signals to src/lib/extraction/detect-platform.ts:
URL_PATTERNSdetectFromHttp()Figure out how to find all pages on the site:
sitemap.xml, sitemap_index.xml. Most platforms generate these.extractNavLinks() in src/adapters/shared.ts handles this generically.?format=json or Shopify's /products.json).Figure out how to get the actual content from each page:
.post-body, article, .content, main).launchBrowser() from src/adapters/shared.ts.If the platform has an admin dashboard or uses client-side API calls, use liberate_map_apis to automatically discover all API endpoints:
--remote-debugging-port=9222 and log in to their account on the target platformliberate_map_apis with the CDP port, the site URL, and optionally a list of admin dashboard URLs to crawlThis is the fastest way to reverse-engineer a platform's API surface. The output tells you exactly which endpoints return content data, what auth is needed, and what the response shapes look like — everything you need to write the adapter's extractPage function.
You can also call liberate_probe to inspect window globals, localStorage, cookies, and platform identity fields on any page — useful for understanding what data the platform exposes client-side.
Document everything you find. This is research — take notes on endpoints, selectors, quirks.
Create src/adapters/<platform>.ts following the existing pattern. Read src/adapters/webflow.ts as a reference — it's the simplest adapter.
Every adapter must:
<platform>Adapter object implementing PlatformAdapter from src/types.ts<Platform>AdapterOpts extending Record<string, unknown> with: delay?, resume?, dryRun?, verbose?, outputDir?<Platform>Inventory with: siteUrl, discoveredAt, siteMeta (title, tagline, language), navigation, counts, urlsRequired methods:
id — lowercase platform name (e.g. 'ghost')detect(url) — return true if the URL belongs to this platformdiscover(url, opts) — fetch sitemap + navigation, classify URLs, return inventoryextract(inventory, wxr, opts, context) — call runExtractionLoop() from src/adapters/shared.ts with an extractPage functionThis is where platform-specific extraction lives. For each URL:
ExtractedPage object (defined in src/adapters/shared.ts)Use the shared helpers from src/adapters/shared.ts:
extractMeta(html, property) — read meta tagsextractTitle(html) — read <title> tagextractHeading(html) — read <h1> with title fallbackextractNavLinks(html, baseUrl) — parse nav linksIMAGE_EXTENSIONS — regex for image file detectionCheck during reconnaissance whether the platform has e-commerce (product pages, a store, a shop section).
Generic detection (automatic): The shared extraction loop in src/adapters/shared.ts automatically detects products via JSON-LD @type: Product on any page classified as product type. This works out of the box if:
Platform-specific detection (optional but recommended): If the platform has a richer product API or non-standard product markup, provide a custom extractProduct function to runExtractionLoop():
const result = await runExtractionLoop({
// ...other opts
csvBuilder,
extractProduct: (url: string, html: string) => {
// Try platform-specific product extraction first
// Return WooProduct or null
},
});
The custom extractor is called before the generic JSON-LD fallback, so it takes priority.
What to extract for products (see WooProduct type in src/lib/import/woo-product-csv.ts):
name (required), description, shortDescriptionregularPrice, salePriceskuimages — array of image URLscategories, tagsweight, length, width, heightinStock, stockattributes — array of { name, values[], visible, global } for product options (size, color, etc.)type — 'simple', 'variable', 'grouped', 'external', or 'variation'parentSku — for variations, the parent product's SKUVariable products: If the platform supports product variants (sizes, colors), generate one variable parent row plus variation child rows with parentSku linking them. See shopifyProductToWoo() in src/adapters/shopify.ts for the pattern.
CSV streaming: The adapter should create a WooProductCsvBuilder, call openStream(outputDir) before extraction, and closeStream() after. The shared loop calls csvBuilder.addProduct() automatically when it detects products. See the Shopify or Wix adapters for the wiring pattern.
src/mcp-server.ts and add it to the adapters arraysrc/ui/discover.tsx and add to the adapters arraysrc/ui/inspect.tsx and add to the adapters arrayCreate fixture files in test/fixtures/ with sample HTML and/or JSON from the platform. Sanitize any PII.
Create test/adapters/<platform>.test.ts. Test:
Run extraction against the user's live site:
npx tsx src/cli.ts <site-url> --dry-run --verbose
Check the output for quality: are titles correct? Is content complete? Are media URLs captured?
README.mdDISCOVERIES.md documenting what you learned about the platformAGENTS.md if any non-obvious details are worth notingExtract content from a closed web platform (GoDaddy Websites & Marketing, Hostinger, HubSpot, Shopify, Squarespace, Webflow, Weebly, Wix) into a WordPress-compatible WXR file
Debug failed or low-quality extractions by analyzing logs, probing the source site, and identifying root causes
Compare extracted WXR content against the original source site page by page. Find missing text, headings, images, and links. Fix by patching the WXR or re-extracting individual pages. Produces a health score and structured report. Use when asked to "qa", "check extraction", "compare content", or "verify extraction quality".