Run any Skill in Manus with one click

$pwd:

data-feeds

Name: Data Feeds
Author: brightdata

// Extract structured data from 40+ supported platforms (Amazon, LinkedIn, Instagram, TikTok, Facebook, YouTube, Reddit, and more) via the Bright Data CLI (`bdata pipelines`). Use when the user wants clean JSON from a known platform URL rather than raw HTML. Hands off to `scrape` for unsupported URLs and to `search` when target URLs must be discovered first. Requires the Bright Data CLI; proactively guides install + login if missing.

Run Skill in Manus

$ git log --oneline --stat

stars:137

forks:22

updated:April 19, 2026 at 15:46

File Explorer

4 files

SKILL.md

readonly

related-skills.json

same repository

scraper-studio.md

from "brightdata/skills"

Build and run AI-generated Bright Data scrapers from the terminal via `bdata scraper create` and `bdata scraper run`. Use this skill whenever the user wants to generate a scraper from a natural-language description, build a custom scraper without writing code, turn a URL + plain-English description into a reusable scraper, or run an existing Bright Data collector against a URL and get the data back. Triggers on phrases like 'build me a scraper for', 'create a scraper that extracts', 'generate a scraper from a description', 'turn this URL into a scraper', 'run this scraper on', 'run my collector', 'scraper studio', `scraper create`, `scraper run`, `collector_id`, `automate_template`, or `/dca/`. Covers the AI flow (template create → trigger AI generation → poll progress), the run flow (async + poll by default, `--sync` for fast pages), and the silent auto-fallback to the batch endpoint when a URL expands past the realtime page limit. Requires the Bright Data CLI.

2026-05-27137

bright-data-best-practices.md

from "brightdata/skills"

Build production-ready Bright Data integrations with best practices baked in. Reference documentation for developers using coding assistants (Claude Code, Cursor, etc.) to implement web scraping, search, browser automation, and structured data extraction. Covers Web Unlocker API, SERP API, Web Scraper API, and Browser API (Scraping Browser).

2026-05-24137

brightdata-proxy.md

from "brightdata/skills"

Generate working code that routes HTTP requests through Bright Data proxy networks (Datacenter, ISP, Residential, Mobile) and help users decide which network and IP pool type to use (shared pool, shared IPs, or dedicated IPs). Use this skill whenever the user mentions Bright Data, brightdata.com, BD proxies, brd.superproxy.io, geo.brdtest.com, a brd-customer- proxy username, a Bright Data zone, the superproxy host, or wants to scrape or route requests through Bright Data — including questions about proxy URL format, country or session or IP or sticky-session targeting, SSL certificate setup for residential or mobile proxies, KYC verification, ignoring SSL errors, choosing between shared pool and shared IPs and dedicated IPs, or integrating Bright Data into Python requests/httpx/aiohttp, Node fetch/axios, Playwright, Puppeteer, Selenium, or Scrapy.

2026-05-24137

brightdata-sdk.md

from "brightdata/skills"

Web data extraction and discovery using the Bright Data Python SDK. Use when user asks to "scrape", "get data from", "extract", "search for", or "find" information from websites. Also use when user mentions specific platforms like Amazon, LinkedIn, Instagram, Facebook, TikTok, YouTube, Reddit, Pinterest, Zillow, Crunchbase, or DigiKey, or asks for "bulk data", "historical data", or "dataset". Covers scraping, searching, datasets, and browser automation.

2026-04-27137

agent-onboarding.md

from "brightdata/skills"

Onboard an agent to Bright Data. Use when a coding agent first encounters Bright Data — for live web work (search, scrape, structured data), for wiring Bright Data into product code, for installing the agent skill bundle, or for getting an API key. One install command sets up the CLI, agent skills, and authentication. Routes the reader to the right path: live tools, app integration, MCP, auth-only, or direct REST without any install.

2026-04-26137

seo-audit.md

from "brightdata/skills"

When the user wants to audit, review, or diagnose SEO issues on their site. Uses live web data via the Bright Data CLI for accurate detection of JS-injected schema, hreflang, canonicals, and live SERP-based ranking checks. Also use when the user mentions "SEO audit," "technical SEO," "why am I not ranking," "SEO issues," "on-page SEO," "meta tags review," "SEO health check," "my traffic dropped," "lost rankings," "not showing up in Google," "site isn't ranking," "Google update hit me," "page speed," "core web vitals," "crawl errors," or "indexing issues." Use this even if the user just says something vague like "my SEO is bad" or "help with SEO" — start with an audit. For building pages at scale to target keywords, see programmatic-seo. For implementing structured data, see schema-markup. For AI search optimization, see ai-seo.

2026-04-26137

package.json

"author": "brightdata"

"repository": "brightdata/skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Data ScientistsComputer and Mathematical Occupations15-2051L4

name

data-feeds

description

Extract structured data from 40+ supported platforms (Amazon, LinkedIn, Instagram, TikTok, Facebook, YouTube, Reddit, and more) via the Bright Data CLI (`bdata pipelines`). Use when the user wants clean JSON from a known platform URL rather than raw HTML. Hands off to `scrape` for unsupported URLs and to `search` when target URLs must be discovered first. Requires the Bright Data CLI; proactively guides install + login if missing.

Bright Data — Data Feeds (Pipelines)

Extract structured data from supported platforms via bdata pipelines. One call, clean JSON, no scraping logic. For unsupported URLs, hand off to scrape. To find target URLs first, hand off to search.

Setup gate (run first)

if ! command -v bdata >/dev/null 2>&1; then
    echo "bdata CLI not installed — see bright-data-best-practices/references/cli-setup.md"
elif ! bdata zones >/dev/null 2>&1; then
    echo "bdata not authenticated — run: bdata login  (or: bdata login --device for SSH)"
fi

Halt and route to skills/bright-data-best-practices/references/cli-setup.md if either check fails.

Supported pipeline types (verified 2026-04-19)

Always verify with bdata pipelines list before hardcoding names — they change. Current 43 types:

amazon_product, amazon_product_reviews, amazon_product_search, apple_app_store, bestbuy_products, booking_hotel_listings, crunchbase_company, ebay_product, etsy_products, facebook_company_reviews, facebook_events, facebook_marketplace_listings, facebook_posts, github_repository_file, google_maps_reviews, google_play_store, google_shopping, homedepot_products, instagram_comments, instagram_posts, instagram_profiles, instagram_reels, linkedin_company_profile, linkedin_job_listings, linkedin_people_search, linkedin_person_profile, linkedin_posts, reddit_posts, reuter_news, tiktok_comments, tiktok_posts, tiktok_profiles, tiktok_shop, walmart_product, walmart_seller, x_posts, yahoo_finance_business, youtube_comments, youtube_profiles, youtube_videos, zara_products, zillow_properties_listing, zoominfo_company_profile

Naming note: inconsistent across platforms. amazon_product (singular), tiktok_profiles (plural), linkedin_person_profile (not linkedin_profile). Always copy from bdata pipelines list.

Pick your path

Situation	Action
Know the platform + have URL(s)	`bdata pipelines <type> <url>`
Don't know which pipeline fits	`bdata pipelines list` first
Pipeline takes keyword or multi-arg input	See "Keyword- and multi-arg pipelines" below
Multiple URLs on the same pipeline type	shell loop with parallelism cap (see `references/patterns.md`)
Long job (reviews, company employees, big post feeds)	raise `--timeout 1800`
URL is on an unsupported platform	stop — hand off to `scrape`
Need to find URLs first	hand off to `search`

Keyword- and multi-arg pipelines (do NOT take a single URL)

A few pipelines take non-URL or multi-positional inputs. Invoke with no args to see the exact usage line from the CLI:

Pipeline	Args
`amazon_product_search`	`<keyword> <domain_url>` — e.g., `"running shoes" https://www.amazon.com`
`linkedin_people_search`	`<url> <first_name> <last_name>` — search a company/school/URL for a named person
`facebook_company_reviews`	`<url> [num_reviews]` — optional num_reviews defaults to `10`
`google_maps_reviews`	`<url> [days_limit]` — optional days_limit defaults to `3`
`youtube_comments`	`<url> [num_comments]` — optional num_comments defaults to `10`

All other 37 pipelines take a single URL.

Action

Core commands:

# List available pipeline types (source of truth)
bdata pipelines list

# Amazon product
bdata pipelines amazon_product \
    "https://www.amazon.com/dp/B08N5WRWNW" \
    --format json --pretty -o product.json

# Amazon product reviews (slower — reviews can be hundreds)
bdata pipelines amazon_product_reviews \
    "https://www.amazon.com/dp/B08N5WRWNW" \
    --timeout 1200 -o reviews.json

# Amazon product search (keyword + domain URL)
bdata pipelines amazon_product_search \
    "noise cancelling headphones" "https://www.amazon.com" \
    --format json --pretty -o search.json

# LinkedIn person profile
bdata pipelines linkedin_person_profile \
    "https://www.linkedin.com/in/example" -o person.json

# LinkedIn company
bdata pipelines linkedin_company_profile \
    "https://www.linkedin.com/company/example" -o company.json

# LinkedIn people search (url + first + last name)
bdata pipelines linkedin_people_search \
    "https://www.linkedin.com/company/example" "Jane" "Doe" \
    -o people.json

# Instagram posts
bdata pipelines instagram_posts \
    "https://www.instagram.com/example/" -o posts.json

# Google Maps reviews (url + days_limit, default 3)
bdata pipelines google_maps_reviews \
    "https://maps.google.com/?cid=1234567890" 90 -o reviews.json

# YouTube comments (url + num_comments, default 10)
bdata pipelines youtube_comments \
    "https://www.youtube.com/watch?v=abc123" 100 -o yt-comments.json

# NDJSON for big feeds (one record per line)
bdata pipelines linkedin_posts "https://www.linkedin.com/in/example" \
    --format ndjson -o posts.ndjson

# Raise polling timeout for long jobs
bdata pipelines amazon_product_reviews "<url>" --timeout 1800 -o out.json

Full flag reference + full type table: references/flags.md.

Verification gate

JSON parses cleanly: jq . <output> returns 0 (or for --format ndjson, each line parses).

Record count matches expected. One URL usually = one record, but reviews/posts/comments pipelines return arrays sized by what the platform shows. Always check:

jq 'length' out.json                       # top-level array count
# OR
jq 'if type == "array" then length else 1 end' out.json

No top-level error:

jq -e 'if type == "object" then has("error") | not else true end' out.json \
    || { echo "pipeline reported error"; exit 1; }

No per-record error: for array results, ensure no record has an error field:

jq -e 'if type == "array" then map(has("error")) | any | not else true end' out.json \
    || echo "WARN: one or more records have error fields"

Partial failures are silent — this check is non-optional.

Core fields present for the pipeline type (examples):
- amazon_product → .title + .price (or .final_price)
- linkedin_person_profile → .name + .headline (or .position)
- instagram_posts → .caption or .description + .url or .post_id
- youtube_videos → .title + .video_id or .url
Spot-check with jq keys on the first record to learn the exact schema.
On failure: double --timeout and retry once. If still failing, bdata pipelines list to confirm the type name hasn't changed.

Red flags

Using bdata scrape on Amazon/LinkedIn/TikTok/etc. when bdata pipelines <type> returns structured fields in one call. Loses structure and costs more time.
Looping bdata pipelines for large jobs without rate-limiting — each call can trigger a long-running pipeline on the server. Cap parallelism at 2–3.
Claiming success without the record-count + per-record error check. Partial failures are silent in pipeline output.
Hardcoding pipeline type names (amazon_products with an s, linkedin_profile without _person_, etc.) — they're inconsistent across platforms. Always copy from bdata pipelines list.
Using a tight --timeout on pipelines that legitimately take 5–15 minutes (reviews, company employees, big post feeds). Default 600s is a floor for small inputs; raise for long ones.
Calling a keyword- or multi-arg pipeline (amazon_product_search, linkedin_people_search, google_maps_reviews, facebook_company_reviews, youtube_comments) with URL-only args — will fail with "Usage: ...". Always check bdata pipelines <type> error output when in doubt.
Passing a pages_to_search third arg to amazon_product_search — it's hardcoded to 1 by the CLI and extra args are ignored.

References

references/flags.md — full pipelines flags + complete table of all 43 types with input shapes.
references/patterns.md — sync timeout tuning, shell-loop batching with parallelism cap, partial-failure detection, keyword-shaped pipeline cheatsheet, legacy curl fallback, shared verification checklist.
references/examples.md — (1) single Amazon product, (2) batch LinkedIn companies, (3) long reviews job with raised timeout, (4) mixed-platform workflow calling pipelines list first, (5) keyword-shaped amazon_product_search.

data-feeds

More from this repository

Bright Data — Data Feeds (Pipelines)

Setup gate (run first)

Supported pipeline types (verified 2026-04-19)

Pick your path

Keyword- and multi-arg pipelines (do NOT take a single URL)

Action

Verification gate

Red flags

References

Bright Data — Data Feeds (Pipelines)

Setup gate (run first)

Supported pipeline types (verified 2026-04-19)

Pick your path

Keyword- and multi-arg pipelines (do NOT take a single URL)

Action

Verification gate

Red flags

References

More from this repository