| name | python-sdk-best-practices |
| description | Guide for writing correct Bright Data Python SDK code. Always use this skill when writing, modifying, debugging, or reviewing Python code that uses the brightdata-sdk package, imports from brightdata, or interacts with Bright Data APIs. Use when the user asks to scrape websites, search Google/Bing, access datasets, or automate browsers via Bright Data in Python. |
Bright Data Python SDK - Best Practices for Coding Agents
You are writing code that uses the brightdata-sdk Python package. Follow these rules precisely.
Installation
pip install brightdata-sdk
Critical Rules
- Always use context managers. The client MUST be used with
async with (or with for sync). Forgetting this causes RuntimeError: BrightDataClient not initialized.
- Async is the default. The primary client is
BrightDataClient (async). Use SyncBrightDataClient only when you cannot use async.
- Never use SyncBrightDataClient inside async functions. It raises
RuntimeError. Use BrightDataClient instead.
- Token auto-loads from environment. Set
BRIGHTDATA_API_TOKEN env var or pass token= param. Do not hardcode tokens.
- All scraper methods are awaitable. Every call on the async client must be
awaited.
Authentication
async with BrightDataClient() as client:
...
async with BrightDataClient(token="your_token") as client:
...
async with BrightDataClient() as client:
...
Imports
from brightdata import BrightDataClient
from brightdata import SyncBrightDataClient
from brightdata import ScrapeResult, SearchResult, CrawlResult
from brightdata import ScrapeJob
from brightdata import (
BrightDataError,
ValidationError,
AuthenticationError,
APIError,
ZoneError,
NetworkError,
SSLError,
)
from brightdata import ScraperStudioJob, JobStatus
from brightdata.datasets import export
Core Patterns
Pattern 1: Web Scraping (Web Unlocker)
Scrapes any URL through Bright Data's proxy network, bypassing bot detection.
import asyncio
from brightdata import BrightDataClient
async def main():
async with BrightDataClient() as client:
result = await client.scrape_url("https://example.com")
print(result.success)
print(result.data)
print(result.cost)
result = await client.scrape_url(
url="https://example.com",
country="us",
response_format="raw",
method="GET",
timeout=60,
)
asyncio.run(main())
Async mode (non-blocking, for batch/background):
result = await client.scrape_url(
url="https://example.com",
mode="async",
poll_interval=5,
poll_timeout=180,
)
results = await client.scrape_url(
url=["https://example.com/1", "https://example.com/2"],
mode="async",
poll_timeout=180,
)
Pattern 2: Platform-Specific Scrapers
Structured data extraction from major platforms. Pattern: client.scrape.<platform>.<method>(url=...).
async with BrightDataClient() as client:
product = await client.scrape.amazon.products(url="https://amazon.com/dp/B0CRMZHDG8")
reviews = await client.scrape.amazon.reviews(url="https://amazon.com/dp/B0CRMZHDG8")
sellers = await client.scrape.amazon.sellers(url="https://amazon.com/dp/B0CRMZHDG8")
profile = await client.scrape.linkedin.profiles(url="https://linkedin.com/in/username")
company = await client.scrape.linkedin.companies(url="https://linkedin.com/company/name")
posts = await client.scrape.linkedin.posts(url="https://linkedin.com/posts/...")
ig_profile = await client.scrape.instagram.profiles(url="https://instagram.com/username")
ig_posts = await client.scrape.instagram.posts(url="https://instagram.com/p/...")
ig_comments = await client.scrape.instagram.comments(url="https://instagram.com/p/...")
ig_reels = await client.scrape.instagram.reels(url="https://instagram.com/reel/...")
fb_posts = await client.scrape.facebook.posts_by_profile(url="https://facebook.com/user", num_of_posts=10)
fb_group = await client.scrape.facebook.posts_by_group(url="https://facebook.com/groups/...", num_of_posts=10)
fb_comments = await client.scrape.facebook.comments(url="https://facebook.com/post/...", num_of_comments=20)
fb_reels = await client.scrape.facebook.reels(url="https://facebook.com/reel/...")
yt_profile = await client.scrape.youtube.profiles(url="https://youtube.com/@channel")
yt_video = await client.scrape.youtube.videos(url="https://youtube.com/watch?v=...")
yt_comments = await client.scrape.youtube.comments(url="https://youtube.com/watch?v=...")
response = await client.scrape.chatgpt.prompt(prompt="What is Python?")
responses = await client.scrape.chatgpt.prompts(prompts=["Q1", "Q2", "Q3"])
tt_profile = await client.scrape.tiktok.profiles(url="https://tiktok.com/@user")
reddit_post = await client.scrape.reddit.posts(url="https://reddit.com/r/...")
All scraper methods return ScrapeResult with .success, .data, .cost, .status.
Pattern 3: Search Discovery (keyword-based)
Find content by keyword instead of URL:
async with BrightDataClient() as client:
results = await client.scrape.amazon.products_search(keyword="wireless headphones")
profiles = await client.scrape.linkedin.profiles_search(keyword="data engineer", location="San Francisco")
jobs = await client.scrape.linkedin.jobs_search(keyword="python developer", location="New York")
companies = await client.scrape.linkedin.companies_search(keyword="AI startup")
ig_profiles = await client.scrape.instagram.profiles_search(user_name="photography")
ig_posts = await client.scrape.instagram.posts_search(url="https://instagram.com/user", num_of_posts=20)
ig_reels = await client.scrape.instagram.reels_search(url="https://instagram.com/user", num_of_posts=10)
videos = await client.scrape.youtube.videos_search(keyword="python tutorial", num_of_videos=10)
Pattern 4: SERP (Search Engine Results)
async with BrightDataClient() as client:
result = await client.search.google(
query="python web scraping",
location="United States",
language="en",
device="desktop",
num_results=10,
)
for item in result.data:
print(item["title"], item["link"])
result = await client.search.bing(query="python tutorial", num_results=10)
result = await client.search.yandex(query="python", num_results=10)
SERP async mode:
result = await client.search.google(
query="python",
mode="async",
poll_interval=2,
poll_timeout=30,
)
SERP returns SearchResult with .data (list of dicts), .query, .search_engine.
Pattern 5: Datasets API
Access 175+ pre-collected, structured datasets.
async with BrightDataClient() as client:
snapshot_id = await client.datasets.imdb_movies(
filter={"name": "title", "operator": "includes", "value": "black"},
records_limit=5,
)
data = await client.datasets.imdb_movies.download(snapshot_id)
print(f"Got {len(data)} records")
snapshot_id = await client.datasets.amazon_products.sample(records_limit=10)
data = await client.datasets.amazon_products.download(snapshot_id)
metadata = await client.datasets.imdb_movies.get_metadata()
for name, field in metadata.fields.items():
print(f"{name}: {field.type}")
Export to file:
from brightdata.datasets import export
export(data, "results.json")
export(data, "results.csv")
export(data, "results.jsonl")
Available datasets include: amazon_products, amazon_reviews, linkedin_profiles, linkedin_companies, linkedin_jobs, airbnb_properties, imdb_movies, google_maps_reviews, yelp_businesses, glassdoor_companies, zillow_properties, instagram_profiles, tiktok_profiles, facebook_pages_posts, reddit_posts, goodreads_books, nba_players_stats, and 150+ more.
Pattern 6: Scraper Studio (Custom Scrapers)
Run custom scrapers built in Bright Data's Scraper Studio.
async with BrightDataClient() as client:
data = await client.scraper_studio.run(
collector="c_abc123",
input={"url": "https://example.com/page"},
timeout=180,
poll_interval=10,
)
job = await client.scraper_studio.trigger(
collector="c_abc123",
input={"url": "https://example.com/page"},
)
print(job.response_id)
status = await job.status()
data = await job.wait_and_fetch(timeout=120, poll_interval=10)
Pattern 7: Browser API (CDP)
Connect to Bright Data cloud browsers via Chrome DevTools Protocol.
from brightdata import BrightDataClient
client = BrightDataClient(
browser_username="brd-customer-hl_xxx-zone-scraping_browser1",
browser_password="your_password",
)
url = client.browser.get_connect_url(country="us")
from playwright.async_api import async_playwright
async with async_playwright() as pw:
browser = await pw.chromium.connect_over_cdp(url)
page = await browser.new_page()
await page.goto("https://example.com")
content = await page.content()
await browser.close()
Pattern 8: Manual Trigger/Poll/Fetch
For fine-grained control over long-running scrapes:
async with BrightDataClient() as client:
job = await client.scrape.amazon.products_trigger(url="https://amazon.com/dp/B123")
print(f"Snapshot ID: {job.snapshot_id}")
status = await job.status()
await job.wait(timeout=180, poll_interval=10, verbose=True)
data = await job.fetch()
result = await job.to_result(timeout=180)
print(result.data)
Pattern 9: Concurrent Batch Operations
import asyncio
from brightdata import BrightDataClient
async def main():
async with BrightDataClient() as client:
urls = [
"https://amazon.com/dp/B001",
"https://amazon.com/dp/B002",
"https://amazon.com/dp/B003",
]
tasks = [client.scrape.amazon.products(url=u) for u in urls]
results = await asyncio.gather(*tasks)
for r in results:
print(f"{r.url}: success={r.success}, cost=${r.cost:.4f}")
queries = ["python", "javascript", "rust"]
search_tasks = [client.search.google(query=q) for q in queries]
search_results = await asyncio.gather(*search_tasks)
asyncio.run(main())
Pattern 10: Sync Client
For scripts, notebooks, or non-async codebases:
from brightdata import SyncBrightDataClient
with SyncBrightDataClient() as client:
result = client.scrape_url("https://example.com")
print(result.data)
result = client.scrape.amazon.products(url="https://amazon.com/dp/B123")
result = client.search.google(query="python")
snapshot_id = client.datasets.imdb_movies(
filter={"name": "title", "operator": "includes", "value": "black"},
records_limit=5,
)
data = client.datasets.imdb_movies.download(snapshot_id)
WARNING: Never use SyncBrightDataClient inside an async def function. It will raise a RuntimeError.
Result Objects Reference
All results inherit from BaseResult:
result.success
result.cost
result.error
result.elapsed_ms()
result.to_dict()
result.to_json(indent=2)
result.save_to_file("out.json")
ScrapeResult additional fields:
result.url
result.status
result.data
result.snapshot_id
result.platform
result.row_count
SearchResult additional fields:
result.query
result.data
result.search_engine
result.total_found
Error Handling
from brightdata import (
BrightDataClient,
BrightDataError,
ValidationError,
AuthenticationError,
APIError,
NetworkError,
)
async with BrightDataClient() as client:
try:
result = await client.scrape_url("https://example.com")
except AuthenticationError:
print("Invalid API token")
except APIError as e:
print(f"API error {e.status_code}: {e.message}")
print(f"Response: {e.response_text}")
except NetworkError:
print("Network connectivity issue")
except ValidationError:
print("Invalid input parameters")
except BrightDataError as e:
print(f"Bright Data error: {e.message}")
Client Configuration
client = BrightDataClient(
token="...",
timeout=30,
web_unlocker_zone="sdk_unlocker",
serp_zone="sdk_serp",
auto_create_zones=True,
validate_token=False,
rate_limit=10.0,
rate_period=1.0,
)
Zone auto-creation: By default, the SDK creates sdk_unlocker and sdk_serp zones on first use. Set auto_create_zones=False to disable.
Zone Management
async with BrightDataClient() as client:
zones = await client.list_zones()
for zone in zones:
print(f"{zone['name']}: {zone.get('type', 'unknown')}")
await client.delete_zone("test_zone")
is_valid = await client.test_connection()
Common Mistakes to Avoid
-
Forgetting the context manager:
client = BrightDataClient()
result = await client.scrape_url("https://example.com")
async with BrightDataClient() as client:
result = await client.scrape_url("https://example.com")
-
Using sync client in async code:
async def main():
with SyncBrightDataClient() as client:
result = client.scrape_url("...")
async def main():
async with BrightDataClient() as client:
result = await client.scrape_url("...")
-
Forgetting await:
result = client.scrape_url("https://example.com")
result = await client.scrape_url("https://example.com")
-
Not checking result.success:
result = await client.scrape_url("https://example.com")
if result.success:
process(result.data)
else:
print(f"Failed: {result.error}")
-
Hardcoding API tokens:
client = BrightDataClient(token="abc123secret")
client = BrightDataClient()
Environment Variables
| Variable | Purpose | Default |
|---|
BRIGHTDATA_API_TOKEN | API authentication token | Required |
WEB_UNLOCKER_ZONE | Web Unlocker zone name | sdk_unlocker |
SERP_ZONE | SERP zone name | sdk_serp |
BRIGHTDATA_BROWSERAPI_USERNAME | Browser API username | None |
BRIGHTDATA_BROWSERAPI_PASSWORD | Browser API password | None |
Quick Decision Guide
| Task | Method |
|---|
| Scrape any URL (HTML) | client.scrape_url(url) |
| Scrape Amazon/LinkedIn/etc. (structured) | client.scrape.<platform>.<method>(url=...) |
| Search Google/Bing/Yandex | client.search.google(query=...) |
| Find products/profiles by keyword | client.scrape.<platform>.<type>_search(keyword=...) |
| Access pre-collected datasets | client.datasets.<name>(filter=..., records_limit=...) |
| Run custom Scraper Studio scraper | client.scraper_studio.run(collector=..., input=...) |
| Automate browser (Playwright/Puppeteer) | client.browser.get_connect_url() |
| Long-running scrape with manual control | client.scrape.<platform>.<method>_trigger(url=...) then job.wait() + job.fetch() |
For the full API surface and advanced patterns, read references/api-reference.md.