一键导入
playwright-scraper-skill
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
菜单
Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.
用 Codex 或 Claude 帮你安装 复制这段 Prompt,粘贴到 Codex、Claude 或其他助手里,让它检查 Skill 页面并帮你完成安装。
基于 SOC 职业分类
| name | playwright-scraper-skill |
| description | Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk. |
| whenToUse | User needs to scrape data from websites, especially those with anti-bot protection |
| version | 1.2.0 |
| author | Simon Chan |
A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.
| Target Website | Anti-Bot Level | Recommended Method | Script |
|---|---|---|---|
| Regular Sites | Low | web_fetch tool | N/A (built-in) |
| Dynamic Sites | Medium | Playwright Simple | scripts/playwright-simple.js |
| Cloudflare Protected | High | Playwright Stealth ⭐ | scripts/playwright-stealth.js |
| YouTube | Special | deep-scraper | Install separately |
| Special | reddit-scraper | Install separately |
cd playwright-scraper-skill
npm install
npx playwright install chromium
Use OpenClaw's built-in web_fetch tool:
# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com
Use Playwright Simple:
node scripts/playwright-simple.js "https://example.com"
Example output:
{
"url": "https://example.com",
"title": "Example Domain",
"content": "...",
"elapsedSeconds": "3.45"
}
Use Playwright Stealth:
node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
Features:
navigator.webdriver = false)Use deep-scraper (install separately):
# Install deep-scraper skill
npx clawhub install deep-scraper
# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"
scripts/playwright-simple.jsscripts/playwright-stealth.js ⭐If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.
If you need to wait for JavaScript rendering, use playwright-simple.js.
If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
All scripts support environment variables:
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL
# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL
# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL
| Method | Speed | Anti-Bot | Success Rate (Discuss.com.hk) |
|---|---|---|---|
| web_fetch | ⚡ Fastest | ❌ None | 0% |
| Playwright Simple | 🚀 Fast | ⚠️ Low | 20% |
| Playwright Stealth | ⏱️ Medium | ✅ Medium | 100% ✅ |
| Puppeteer Stealth | ⏱️ Medium | ✅ Medium-High | ~80% |
| Crawlee (deep-scraper) | 🐢 Slow | ❌ Detected | 0% |
| Chaser (Rust) | ⏱️ Medium | ❌ Detected | 0% |
Lessons learned from our testing:
navigator.webdriver — EssentialaddInitScript (Playwright) — Inject before page loadSolution: Use playwright-stealth.js
Solution:
headless: false (headful mode sometimes has higher success rate)Solution:
waitForTimeoutwaitUntil: 'networkidle' or 'domcontentloaded'Best Solution: Pure Playwright + anti-bot techniques (framework-independent)
browser toolHeadless browser automation CLI optimized for AI agents with accessibility tree snapshots and ref-based element selection
Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Design and implement automation workflows to save time and scale operations as a solopreneur. Use when identifying repetitive tasks to automate, building workflows across tools, setting up triggers and actions, or optimizing existing automations. Covers automation opportunity identification, workflow design, tool selection (Zapier, Make, n8n), testing, and maintenance. Trigger on "automate", "automation", "workflow automation", "save time", "reduce manual work", "automate my business", "no-code automation".
Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.
Web search and content extraction via Brave Search API. Use for searching documentation, facts, or any web content. Lightweight, no browser required.
Multi-source deep research agent. Searches the web, synthesizes findings, and delivers cited reports. No API keys required.