Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

$pwd:

web-scraping

Name: Web Scraping
Author: billy-enrizky

// Extract structured data from websites, scrape page content, and collect information across multiple pages. Trigger when the user asks to: extract data from a website, scrape a page, collect information from URLs, pull content from web pages, gather data across multiple pages, or download page content.

In Manus ausführen

$ git log --oneline --stat

stars:233

forks:19

updated:21. Mai 2026 um 01:17

SKILL.md

readonly

related-skills.json

gleiches Repository

accessibility-audit.md

from "billy-enrizky/openbrowser-ai"

Audit web pages for accessibility issues, WCAG compliance, and screen reader compatibility. Trigger when the user asks to: check accessibility, run an a11y audit, test WCAG compliance, check screen reader support, audit ARIA attributes, verify keyboard navigation, find accessibility issues, or check for missing alt text or labels.

2026-05-21233

e2e-testing.md

from "billy-enrizky/openbrowser-ai"

Test web applications end-to-end by simulating user interactions and verifying expected outcomes. Trigger when the user asks to: test a web app, verify a user flow, run end-to-end tests, QA a feature, check that a page works correctly, validate user journeys, or test a deployment.

2026-05-21233

file-download.md

from "billy-enrizky/openbrowser-ai"

Download files from websites, save PDFs, and read downloaded content. Trigger when the user asks to: download a file, save a PDF, export a document, fetch a file from a URL, grab a report, download and read a PDF, or save page content as a file.

2026-05-21233

form-filling.md

from "billy-enrizky/openbrowser-ai"

Fill out web forms, submit data, and handle login or registration flows. Trigger when the user asks to: fill a form, submit data on a website, log in to a site, register an account, complete a checkout, enter information into fields, or automate form submission.

2026-05-21233

page-analysis.md

from "billy-enrizky/openbrowser-ai"

Analyze web page content, structure, and layout to understand what a page contains and how it is organized. Trigger when the user asks to: analyze a page, understand page structure, inspect a website, summarize page content, examine page layout, review a web page, or describe what is on a page.

2026-05-21233

package.json

"author": "billy-enrizky"

"repository": "billy-enrizky/openbrowser-ai"

GitHub-Repository öffnen Creator-Repositorys ansehen

$ install --global

$ download --local

In Manus ausführen

$ useful --forSOC

ComputerprogrammiererInformatik- und Mathematikberufe15-1251L4

name	web-scraping
description	Extract structured data from websites, scrape page content, and collect information across multiple pages. Trigger when the user asks to: extract data from a website, scrape a page, collect information from URLs, pull content from web pages, gather data across multiple pages, or download page content.
allowed-tools	Bash(openbrowser-ai:) Bash(curl:) Bash(uv:) Bash(irm:) Read Write

Web Scraping

Extract structured data from websites using Python code execution with browser automation functions. Handles JavaScript-rendered content, pagination, and multi-page scraping.

All code runs via openbrowser-ai -c. The daemon starts automatically and persists variables across calls. All browser functions are async -- use await.

The CLI daemon also persists cookies and login state in ~/.config/openbrowser/profiles/daemon/storage_state.json, so authenticated sessions can be reused across later runs.

Setup

Before running, verify openbrowser-ai is installed:

openbrowser-ai --help

If not found, install:

# macOS/Linux
curl -fsSL https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.sh | sh

# Windows (PowerShell)
irm https://raw.githubusercontent.com/billy-enrizky/openbrowser-ai/main/install.ps1 | iex

Workflow

Step 1 -- Navigate and get content overview

openbrowser-ai -c - <<'EOF'
await navigate("https://example.com/data")

# Get browser state to see page title, URL, element count
state = await browser.get_browser_state_summary()
print(f"Title: {state.title}")
print(f"URL: {state.url}")
print(f"Elements: {len(state.dom_state.selector_map)}")
EOF

Step 2 -- Extract data with JavaScript

Use evaluate() to run JS in the browser and return structured data directly as Python objects:

openbrowser-ai -c - <<'EOF'
data = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".product-card")).map(el => ({
    name: el.querySelector(".title")?.textContent?.trim(),
    price: el.querySelector(".price")?.textContent?.trim(),
    url: el.querySelector("a")?.href
  }))
})()
""")

import json
print(json.dumps(data, indent=2))
EOF

Step 3 -- Process data with Python

Use pandas, regex, or other Python tools to clean and transform extracted data:

openbrowser-ai -c - <<'EOF'
import json

# Filter and transform
filtered = [item for item in data if item.get("price")]
for item in filtered:
    # Extract numeric price
    price_str = item["price"].replace("$", "").replace(",", "")
    item["price_float"] = float(price_str)

# Sort by price
filtered.sort(key=lambda x: x["price_float"])
print(json.dumps(filtered, indent=2))
EOF

Or with pandas if available:

openbrowser-ai -c - <<'EOF'
import pandas as pd
df = pd.DataFrame(data)
print(df.to_string())
EOF

Step 4 -- Handle pagination

openbrowser-ai -c - <<'EOF'
results = []
page = 1

while True:
    # Extract data from current page
    page_data = await evaluate("""
    (function(){
      return Array.from(document.querySelectorAll(".item")).map(el => ({
        name: el.textContent.trim()
      }))
    })()
    """)
    results.extend(page_data)
    print(f"Page {page}: {len(page_data)} items")

    # Check for next button
    has_next = await evaluate("""
    (function(){ return !!document.querySelector(".pagination .next:not(.disabled)") })()
    """)

    if not has_next:
        break

    # Replace with the actual index from browser.get_browser_state_summary()
    await click(next_button_index)
    await wait(2)
    page += 1

print(f"Total: {len(results)} items")
EOF

Step 5 -- Handle infinite scroll

openbrowser-ai -c - <<'EOF'
results = []
prev_count = 0

for _ in range(20):  # Max 20 scroll attempts
    # Get current items
    count = await evaluate("""
    (function(){ return document.querySelectorAll(".item").length })()
    """)

    if count == prev_count:
        break  # No new content loaded

    prev_count = count
    await scroll(down=True, pages=3)
    await wait(1)

# Now extract all loaded items
results = await evaluate("""
(function(){
  return Array.from(document.querySelectorAll(".item")).map(el => ({
    text: el.textContent.trim()
  }))
})()
""")
print(f"Extracted {len(results)} items")
EOF

Step 6 -- Multi-page scraping

openbrowser-ai -c - <<'EOF'
urls = [
    "https://example.com/page-1",
    "https://example.com/page-2",
    "https://example.com/page-3",
]

all_data = []
for url in urls:
    await navigate(url)
    await wait(1)

    page_data = await evaluate("""
    (function(){
      return document.querySelector("h1")?.textContent?.trim()
    })()
    """)
    all_data.append({"url": url, "title": page_data})
    print(f"{url}: {page_data}")

import json
print(json.dumps(all_data, indent=2))
EOF

Tips

Code is piped via stdin using heredoc (-c - <<'EOF'), so all Python syntax works without shell escaping issues.
Use evaluate() for structured DOM extraction -- it returns Python objects directly.
Use Python for post-processing: filtering, sorting, deduplication, format conversion.
For large datasets, process pages incrementally rather than loading everything into memory.
Check for rate limiting; add await wait(2) between page loads if needed.
Variables persist between -c calls while the daemon is running, so you can build up results across multiple calls.

Cleanup

This step is mandatory. Run it after the scrape finishes, whether you collected every page or hit a rate limit halfway through. Without it, the daemon keeps Chrome running until its 10-minute idle timeout, leaving a stale browser process, a locked profile, and (on macOS/Linux desktop) a visible window.

Stop the daemon, then verify it is gone:

openbrowser-ai daemon stop
openbrowser-ai daemon status

daemon stop closes every tab, exits Chrome, flushes saved cookies/login state to the profile, and shuts down the daemon process. daemon status should report the daemon is not running. If it still reports running, the daemon is wedged, force-kill it:

pkill -f 'openbrowser.*daemon' || true

Long scrapes fail often (rate limits, network drops, pagination dead-ends). Guarantee cleanup with a shell trap so a partial run never leaks a browser:

trap 'openbrowser-ai daemon stop >/dev/null 2>&1 || true' EXIT
# ... openbrowser-ai -c calls here ...

Persist scraped data to disk before calling daemon stop, in-memory variables die with the daemon. Do not rely on the idle timeout. Do not call done() as a substitute, done() only marks the task complete inside the agent loop, it does not close the browser.

web-scraping

Mehr aus diesem Repository

Mehr aus diesem Repository

Web Scraping

Setup

Workflow

Step 1 -- Navigate and get content overview

Step 2 -- Extract data with JavaScript

Step 3 -- Process data with Python

Step 4 -- Handle pagination

Step 5 -- Handle infinite scroll

Step 6 -- Multi-page scraping

Tips

Cleanup

Web Scraping

Setup

Workflow

Step 1 -- Navigate and get content overview

Step 2 -- Extract data with JavaScript

Step 3 -- Process data with Python

Step 4 -- Handle pagination

Step 5 -- Handle infinite scroll

Step 6 -- Multi-page scraping

Tips

Cleanup