Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

bright-data

Sterne17

Forks5

Aktualisiert16. Juni 2026 um 18:39

Use when "query Bright Data", "Bright Data datasets", "Bright Data Web Archive / Wayback alternative", "scrape with Web Unlocker", "FINRA BrokerCheck data", "SEC IAPD / adviserinfo data", "Investment Adviser Public Disclosure", "broker/adviser disclosure snapshots", "LinkedIn/Crunchbase/Glassdoor company or people dataset", or any use of the Bright Data API (datasets/list, Web Archive search/dump, Web Unlocker zones). Covers the verified FINRA BrokerCheck + SEC IAPD archive coverage finding.

Installation

Mit Codex oder Claude installieren Kopieren Sie diesen Prompt, fügen Sie ihn in Codex, Claude oder einen anderen Assistant ein und lassen Sie die Skill-Seite prüfen und installieren.

In Manus ausführen

Quelle

edwinhu

edwinhu/workflows

GitHub-Repository öffnen Creator-Repositorys ansehen

Download

In Manus ausführen

Verwandte BerufeSOC

Basierend auf der SOC-Berufsklassifikation

SoftwareentwicklerInformatik- und Mathematikberufe·SOC 15-1252

Datei-Explorer

4 Dateien

SKILL.md

readonly

name	bright-data
version	1
description	Use when "query Bright Data", "Bright Data datasets", "Bright Data Web Archive / Wayback alternative", "scrape with Web Unlocker", "FINRA BrokerCheck data", "SEC IAPD / adviserinfo data", "Investment Adviser Public Disclosure", "broker/adviser disclosure snapshots", "LinkedIn/Crunchbase/Glassdoor company or people dataset", or any use of the Bright Data API (datasets/list, Web Archive search/dump, Web Unlocker zones). Covers the verified FINRA BrokerCheck + SEC IAPD archive coverage finding.
user-invocable	false

Cost Enforcement
Auth
What Bright Data Offers
Web Archive API (verified)
Dataset Marketplace API
Pricing Model
FINRA BrokerCheck & SEC IAPD Coverage
Additional Resources

Cost Enforcement

Bright Data bills real money. Two API actions are FREE, the rest cost.

FREE: GET /datasets/list; POST /webarchive/search + polling GET /webarchive/search/<id> (returns counts + dump_cost_usd WITHOUT charging).
PAID: POST /webarchive/dump (~~$0.001/page), Web Unlocker requests (~~$1.5–3 per 1k successes), dataset record purchases/triggers (per-record).

NEVER call /webarchive/dump, trigger a dataset collection, or create/use a Web Unlocker zone unless the user has explicitly approved the spend for THAT operation. Always run a free search first to get the exact dump_cost_usd and show it to the user before any dump.

The default read-only token (BRIGHTDATA_API_TOKEN) can list datasets and run archive searches but CANNOT create zones. Do not attempt zone creation with it.

Auth

All endpoints use a Bearer token. NEVER hardcode it. Read from env or a gitignored key file:

# preferred: env var
export BRIGHTDATA_API_TOKEN=...        # set in shell profile / .env (gitignored)
# fallback used during this project:
TOKEN=$(cat ~/projects/batm/scratch/brd_token.txt)   # gitignored key file

import os
TOKEN = os.environ.get("BRIGHTDATA_API_TOKEN") or open(os.path.expanduser("~/.config/brightdata/token")).read().strip()
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

What Bright Data Offers

Three relevant products:

Dataset marketplace — ~1,576 pre-collected datasets (GET /datasets/list). Heavily social/company/people (LinkedIn 115M people, Instagram 620M, Crunchbase 2.3M, Glassdoor, etc.). No government/regulatory/licensing products except a "US lawyers directory" (1.4M). See references/datasets-catalog.md.
Web Archive — Bright Data's own crawl archive (a Wayback-like corpus). Searchable for free by domain/URL/date; dumps cost ~$0.001/page. This is where the FINRA/SEC coverage lives. See references/webarchive-api.md.
Web Unlocker / scraping zones — on-demand unblocked fetch of live pages (anti-bot bypass). Requires a writable token to create zones. ~$1.5–3 per 1k successful requests.

Web Archive API (verified)

Base: https://api.brightdata.com/webarchive. Async — search returns a search_id, poll until status == "done".

# 1. Launch a FREE search (returns {"search_id": "..."})
curl -s -X POST https://api.brightdata.com/webarchive/search \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"filters":{"min_date":"2015-01-01","max_date":"2026-06-10","domain_whitelist":["brokercheck.finra.org"]}}'

# 2. Poll (FREE) — when done returns files_count, dump_cost_usd, estimate_batch_count
curl -s https://api.brightdata.com/webarchive/search/<search_id> \
  -H "Authorization: Bearer $TOKEN"

Filters (body {"filters":{...}}):

Required: either max_age OR min_date+max_date (YYYY-MM-DD).
domain_whitelist — exact host match, array.
domain_like_whitelist — SQL LIKE, e.g. ["%finra%"].
url_like_whitelist — SQL LIKE on full URL (use to scope a cheap subset dump).
unique_url (bool) — count/dump distinct URLs only (dedupes repeat snapshots).

Searches can take 9+ minutes. Launch many in parallel, then poll every ~20s. See references/webarchive-api.md for a working parallel-poll Python harness.

dump_cost_usd ≈ files_count / 1000 confirms the ~$0.001/page dump price.

Dataset Marketplace API

curl -s https://api.brightdata.com/datasets/list -H "Authorization: Bearer $TOKEN"
# -> [{"id":"gd_...","name":"...","size":<approx record count>}, ...]

size is approximate record count. Pulling records (snapshot/trigger) is a separate paid step — not covered by the read-only token. See references/datasets-catalog.md for the categorized highlights.

Pricing Model

Action	Cost	Notes
`GET /datasets/list`	free	metadata only
`POST /webarchive/search` + poll	free	returns count + `dump_cost_usd`
`POST /webarchive/dump`	~$0.001 / page	the paid step; confirm cost first
Web Unlocker request	~$1.5–3 / 1k successes	needs writable token + zone
Dataset records	per-record	snapshot/trigger; varies by dataset

FINRA BrokerCheck & SEC IAPD Coverage

Verified 2026-06-10 via free Web Archive searches. Bright Data IS a viable source for current BrokerCheck/IAPD data — via the Web Archive, not the marketplace.

No FINRA/broker/adviser/IAPD/RIA dataset exists in the marketplace.
Web Archive has a massive recent crawl:
- brokercheck.finra.org — 1,434,501 snapshots / 714,614 distinct URLs (~$715 to dump distinct).
- adviserinfo.sec.gov — 1,635,389 snapshots / 664,043 distinct URLs (~$664 to dump distinct).
- api.brokercheck.finra.org and reports.adviserinfo.sec.gov = 0 (only the HTML profile pages were captured, not the JSON API or PDF reports).
Temporal: zero pre-2024; essentially all 2025 (~1.0–1.2M each) + 2026 (~235k–633k). It's a current cross-section + start of a 2025→2026 panel, NOT a deep historical time series.

Full numbers, year brackets, and verdict in references/finra-sec-coverage.md. For deep disclosure history (pre-2024), use FINRA/SEC bulk downloads or WRDS instead (see the wrds skill, Form ADV).

Additional Resources

Reference Files

references/webarchive-api.md — full Web Archive API reference, filters, the parallel-poll Python harness, url_like_whitelist subset-dump pattern, cost arithmetic.
references/datasets-catalog.md — categorized highlights of the 1,576-dataset marketplace (company/people/finance/professional), with ids and sizes; how to re-fetch the catalog.
references/finra-sec-coverage.md — verified FINRA BrokerCheck + SEC IAPD coverage: totals, distinct, year-by-year temporal spread, dump costs, and the viability verdict.

Mehr aus diesem Repository

gleiches Repository

docx-repair

edwinhu/workflows

Use to REPAIR a .docx damaged by a Google Docs or Word Online round-trip — the package/XML wiring, the footnote markup, leftover content controls, and heading styling. Triggers: 'Word won't open the docx / says it's corrupt', 'Google Docs export broken', 'fix the customXML error', 'recover unreadable content', 'phantom blank page', 'repair this docx'; AND 'footnotes broken after Google Docs', 'supra notes wrong after coauthor edits', 'cross-references point to the wrong footnote', 'bio footnotes show numbers instead of symbols (*, †, ‡)', 'author note shows 1 2 3 not star dagger', 'footnote numbering starts at the wrong number', 'separator line missing', 'doubled footnote marks (**, ††)'; AND 'boxes around text after Google Docs', 'content controls / doubled boxes around paragraphs', 'remove the boxes Word draws around headings', 'heading text isn't styled as a heading', 'headings look different / inconsistent heading formatting', 'blank/empty heading lines'; AND 'clean up Google Docs XML cruft', 'strip redun

2026-06-2517

bluebook-audit

edwinhu/workflows

This skill should be used when the user asks to 'audit footnotes', 'check Bluebook formatting', 'audit citations', 'run footnote audit', 'check my footnotes', 'bluebook audit', or needs systematic Bluebook compliance checking of a law review manuscript.

2026-06-2517

docx-render

edwinhu/workflows

Use when rendering/converting an EXISTING .docx (or .pptx/.xlsx) to PDF or PNG — 'convert docx to pdf', 'docx to pdf', 'render this Word doc', 'word to pdf', 'export docx as pdf', 'make a pdf of this docx', 'pdf from the docx', 'render the document to PDF'. The faithful path (Word's engine, incl. from background/headless jobs) lives in scripts/doc_render.py. NOT for editing docx content (use the generic 'docx' skill) and NOT for building a docx from markdown (use 'law-review-docx').

2026-06-2517

law-review-docx

edwinhu/workflows

Use this skill to BUILD a formatted Word document from law review / legal MARKDOWN drafts via the law_review_template + pandoc (footnotes, TOC, styled tables) — NOT the generic 'docx' skill (which edits docx content) and NOT 'docx-render' (which only converts an existing .docx to PDF). Triggers: 'generate a docx', 'create the Word file', 'export to docx', 'build the document', 'compile/finalize the draft', 'build the law review document', 'make a Word version', 'turn my markdown draft into Word', 'make the submission docx', 'apply the law review template'.

2026-06-2517

ds-implement

edwinhu/workflows

Phase 3 of the /ds workflow — analysis task execution. Invoked by the ds-plan chain; not user-invocable.

2026-06-2517

ds-plan-reviewer

edwinhu/workflows

Internal skill used by ds-plan at Phase 2 exit gate. Dispatches a reviewer subagent to verify PLAN.md quality before implementation. NOT user-facing.

2026-06-2517

name	bright-data
version	1
description	Use when "query Bright Data", "Bright Data datasets", "Bright Data Web Archive / Wayback alternative", "scrape with Web Unlocker", "FINRA BrokerCheck data", "SEC IAPD / adviserinfo data", "Investment Adviser Public Disclosure", "broker/adviser disclosure snapshots", "LinkedIn/Crunchbase/Glassdoor company or people dataset", or any use of the Bright Data API (datasets/list, Web Archive search/dump, Web Unlocker zones). Covers the verified FINRA BrokerCheck + SEC IAPD archive coverage finding.
user-invocable	false

Cost Enforcement
Auth
What Bright Data Offers
Web Archive API (verified)
Dataset Marketplace API
Pricing Model
FINRA BrokerCheck & SEC IAPD Coverage
Additional Resources

Cost Enforcement

Bright Data bills real money. Two API actions are FREE, the rest cost.

FREE: GET /datasets/list; POST /webarchive/search + polling GET /webarchive/search/<id> (returns counts + dump_cost_usd WITHOUT charging).
PAID: POST /webarchive/dump (~~$0.001/page), Web Unlocker requests (~~$1.5–3 per 1k successes), dataset record purchases/triggers (per-record).

The default read-only token (BRIGHTDATA_API_TOKEN) can list datasets and run archive searches but CANNOT create zones. Do not attempt zone creation with it.

Auth

All endpoints use a Bearer token. NEVER hardcode it. Read from env or a gitignored key file:

# preferred: env var
export BRIGHTDATA_API_TOKEN=...        # set in shell profile / .env (gitignored)
# fallback used during this project:
TOKEN=$(cat ~/projects/batm/scratch/brd_token.txt)   # gitignored key file

import os
TOKEN = os.environ.get("BRIGHTDATA_API_TOKEN") or open(os.path.expanduser("~/.config/brightdata/token")).read().strip()
HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}

What Bright Data Offers

Three relevant products:

Dataset marketplace — ~1,576 pre-collected datasets (GET /datasets/list). Heavily social/company/people (LinkedIn 115M people, Instagram 620M, Crunchbase 2.3M, Glassdoor, etc.). No government/regulatory/licensing products except a "US lawyers directory" (1.4M). See references/datasets-catalog.md.
Web Archive — Bright Data's own crawl archive (a Wayback-like corpus). Searchable for free by domain/URL/date; dumps cost ~$0.001/page. This is where the FINRA/SEC coverage lives. See references/webarchive-api.md.
Web Unlocker / scraping zones — on-demand unblocked fetch of live pages (anti-bot bypass). Requires a writable token to create zones. ~$1.5–3 per 1k successful requests.

Web Archive API (verified)

Base: https://api.brightdata.com/webarchive. Async — search returns a search_id, poll until status == "done".

# 1. Launch a FREE search (returns {"search_id": "..."})
curl -s -X POST https://api.brightdata.com/webarchive/search \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"filters":{"min_date":"2015-01-01","max_date":"2026-06-10","domain_whitelist":["brokercheck.finra.org"]}}'

# 2. Poll (FREE) — when done returns files_count, dump_cost_usd, estimate_batch_count
curl -s https://api.brightdata.com/webarchive/search/<search_id> \
  -H "Authorization: Bearer $TOKEN"

Filters (body {"filters":{...}}):

Required: either max_age OR min_date+max_date (YYYY-MM-DD).
domain_whitelist — exact host match, array.
domain_like_whitelist — SQL LIKE, e.g. ["%finra%"].
url_like_whitelist — SQL LIKE on full URL (use to scope a cheap subset dump).
unique_url (bool) — count/dump distinct URLs only (dedupes repeat snapshots).

Searches can take 9+ minutes. Launch many in parallel, then poll every ~20s. See references/webarchive-api.md for a working parallel-poll Python harness.

dump_cost_usd ≈ files_count / 1000 confirms the ~$0.001/page dump price.

Dataset Marketplace API

curl -s https://api.brightdata.com/datasets/list -H "Authorization: Bearer $TOKEN"
# -> [{"id":"gd_...","name":"...","size":<approx record count>}, ...]

Pricing Model

Action	Cost	Notes
`GET /datasets/list`	free	metadata only
`POST /webarchive/search` + poll	free	returns count + `dump_cost_usd`
`POST /webarchive/dump`	~$0.001 / page	the paid step; confirm cost first
Web Unlocker request	~$1.5–3 / 1k successes	needs writable token + zone
Dataset records	per-record	snapshot/trigger; varies by dataset

FINRA BrokerCheck & SEC IAPD Coverage

Verified 2026-06-10 via free Web Archive searches. Bright Data IS a viable source for current BrokerCheck/IAPD data — via the Web Archive, not the marketplace.

No FINRA/broker/adviser/IAPD/RIA dataset exists in the marketplace.
Web Archive has a massive recent crawl:
- brokercheck.finra.org — 1,434,501 snapshots / 714,614 distinct URLs (~$715 to dump distinct).
- adviserinfo.sec.gov — 1,635,389 snapshots / 664,043 distinct URLs (~$664 to dump distinct).
- api.brokercheck.finra.org and reports.adviserinfo.sec.gov = 0 (only the HTML profile pages were captured, not the JSON API or PDF reports).
Temporal: zero pre-2024; essentially all 2025 (~1.0–1.2M each) + 2026 (~235k–633k). It's a current cross-section + start of a 2025→2026 panel, NOT a deep historical time series.

Full numbers, year brackets, and verdict in references/finra-sec-coverage.md. For deep disclosure history (pre-2024), use FINRA/SEC bulk downloads or WRDS instead (see the wrds skill, Form ADV).

Additional Resources

Reference Files

references/webarchive-api.md — full Web Archive API reference, filters, the parallel-poll Python harness, url_like_whitelist subset-dump pattern, cost arithmetic.
references/datasets-catalog.md — categorized highlights of the 1,576-dataset marketplace (company/people/finance/professional), with ids and sizes; how to re-fetch the catalog.
references/finra-sec-coverage.md — verified FINRA BrokerCheck + SEC IAPD coverage: totals, distinct, year-by-year temporal spread, dump costs, and the viability verdict.

bright-data

Contents

Cost Enforcement

Auth

What Bright Data Offers

Web Archive API (verified)

Dataset Marketplace API

Pricing Model

FINRA BrokerCheck & SEC IAPD Coverage

Additional Resources

Reference Files

Contents

Cost Enforcement

Auth

What Bright Data Offers

Web Archive API (verified)

Dataset Marketplace API

Pricing Model

FINRA BrokerCheck & SEC IAPD Coverage

Additional Resources

Reference Files