| name | opencli-airbnb |
| description | Execute and troubleshoot opencli airbnb commands for Airbnb Experiences — search, detail. Use when the user mentions Airbnb, Airbnb Experiences, airbnb.com, or provides an Airbnb experience URL (pattern `/experiences/{id}`) or numeric experience ID; also use when opencli-router dispatches an airbnb task or a tours / compare workflow targets the airbnb platform. |
opencli-airbnb
- Platform:
airbnb
- Owner: klook-cli core team (ryan.huang@klook.com) — scrape logic maintained here; listing-page discovery owned by BD team (see
docs/colleague-handoff.md)
- Strategy:
BROWSER_BRIDGE (Airbnb is SSR + heavy React hydration with PerimeterX/Akamai bot protection; no public API)
- Domain:
airbnb.com
- Source:
src/clis/airbnb/ (search.ts, detail.ts, pricing.ts — all live)
Status: v0.3 — currency forced via URL, langs/meetup/duration captured, BD field map applied
Validated 2026-05-07 against two Japan experiences:
experience/4613909 (Drive Tokyo's car culture in a Nissan Skyline — host "Louis", no business name)
experience/232900 (Explore Kyoto in a Kimono — business "Kiwami Fujinoka")
Field-by-field state aligned with the BD comparison table (2026-05-07):
| # | Planning-CSV field | Status | Where it lands |
|---|
| 1 | Merchant name | ✅ | activities.supplier (business name preferred over host first name) |
| 2 | Duration | ✅ — listing-only | KlookActivity.duration_minutes from search.ts → threaded via IngestOptions.durationMinutes → activities.duration_minutes |
| 3 | POI / itinerary | N/A on airbnb | — |
| 4 | Meet up / pick up type | ✅ 'fixed' | packages.meetup_type |
| 5 | AID location / meet up points | ✅ | packages.meetup_points_json (parsed from "Where we'll meet" section) |
| 6 | Availability | ✅ | skus.available (booking-modal time-slots, 'Sold out' when all slots gone). Activity-level rollup bookable_days_30d computed in normalize.ts via computeAvailability(skus, windowDays:30) — see "Modal scroll for full 30-day window" below. |
| 7 | Confirmation type | ⚠ left null | BD said "no standardized column on Airbnb"; per-Ryan we don't infer 'instant' inside the adapter — leave null, downstream policy can decide. |
| 8 | Cancellation type | ✅ | activities.cancellation_policy (explicit <h2>Cancellation policy</h2>, sibling-walker scoped) |
| 9 | Language | ✅ | packages.available_languages — anchored on canonical airbnb phrasing "This experience is hosted in X and Y". Other OTA-style fallbacks (Offered in, Hosted in) kept after as A/B-test guards. |
| 10 | Group type | N/A on airbnb | packages.group_size defaults to 'big'; raw_extras.group_size_raw captures any "Up to N guests" string for audit |
| 11 | Meals option | N/A on airbnb | — |
| 12 | Tickets included | 🟡 not yet extracted | Free-text in title / description per BD; backlog item |
| 13 | Price & unit type | ✅ USD per-person | skus.price_local + skus.currency (forced USD via URL param — see below) |
Other captured fields (not in BD's list):
activities.departure_city ← detail's city chip ("Kyoto" / "Shibuya")
activities.page_text — full body innerText, capped 80KB, kept for downstream review per "全抓" principle
raw_extras.sections — 15-20 sections per activity with full {title, content}, 2KB-per-section cap
Currency: forced via URL ?currency=USD (cookie still pinnable)
As of 2026-05-07, currency is no longer manual. The adapter injects
?currency=USD and ?locale=en-US into both detail.ts and pricing.ts
URLs (when caller hasn't explicitly pinned them). Airbnb honors the URL
param and overrides the Browser Bridge cookie's default for that single
navigation.
Verified flow:
- Cold Browser Bridge on a Singapore-routed session was returning SGD
(e.g.
SGD for Kyoto Kimono).
- Adding
?currency=USD to the navigation flips the modal to USD ($37
for the same listing — matches the SGD/USD FX of the day plus airbnb's
small conversion margin).
pricing.ts row writer falls back from a parsed symbol-only currency
($, ¥) to the URL hint, so we never store ambiguous symbols in
skus.currency. ISO codes from price strings (e.g. ``) still win
if airbnb does happen to print them.
Caller can still opt out: pass an explicit ?currency=XYZ in the URL
and the adapter passes it through unchanged. Useful for cross-currency
audits without touching the cookie store.
Cookie pinning is still supported for callers that want a different
default than USD without rewriting URLs — same playbook as before
(footer dropdown → cookie sync → bridge). But for the daily tours
routine, the URL-param default is the deterministic path; cookie-only
flows had been silently flipping currency between runs.
Identifiers
- Airbnb experience IDs are numeric (e.g.
1234567).
- Extracted from URLs matching
/experiences/{id} via parseExperienceId in src/clis/airbnb/detail.ts. Accepts either a full URL or the bare numeric ID.
- Canonical URL:
https://www.airbnb.com/experiences/{id}. Locale-prefixed variants like /zh-tw/experiences/{id} exist but are not the canonical form — the adapter does not currently rewrite them.
Commands
search
opencli airbnb search-activities "<query>" --limit <N> -f json
Browser Bridge; expect ~10–15s including hydration wait + autoscroll. Search URL pattern: https://www.airbnb.com/s/<query>/experiences. Returns { id, title, price, rating, review_count, url }[]. Price string contains a currency symbol per locale (e.g. $35, €42).
detail
opencli airbnb get-activity <id-or-url> -f json
- Title from
h1; description from meta[name="description"].
- Host as supplier: Airbnb experiences don't have a separate operator field — the host (
Hosted by <Name>) maps to our supplier column.
- Single synthetic package: experiences usually offer one product (private/group is sometimes a variant axis). The detail scraper emits a single package built from the booking widget price + experience title; refine to fan into multiple packages once the variant model is observed.
- Cancellation policy is rendered as an explicit
h2 on Airbnb (cleaner than KKday/Trip), so the section walker + body-text fallback should both succeed.
pricing
opencli airbnb get-pricing-matrix <id-or-url> --days 7 -f json
SKU model decision (2026-04-27): one SKU per (experience, date), one synthesized package per experience, prices in per-person Airbnb-native units, time-of-day NOT modeled. The output daily_min_price is the lowest visible per-person price across time-slots that day. Documented in detail at the top of src/clis/airbnb/pricing.ts.
v0.1 reads inline calendar prices (the per-day "$35 / person" labels on each date cell). Experiences that don't surface inline daily prices return Unknown for those dates — clicking each cell to read time-slot prices is doable but click-heavy and risks tripping bot detection. Iterate when we observe real failure rates.
Live verification pending — code compiled and registered, but Browser Bridge was offline at scaffold time so selector-level correctness is unverified. First real run on a known-cancellable experience (e.g. 232900 Kyoto Kimono) should populate rows[] with non-empty daily_min_price values; if not, the inline-price selector likely needs adjustment.
trending
Not supported on Airbnb. Do not offer this command.
Quirks
- PerimeterX / Akamai bot protection: this is the #1 risk. First scrape attempts may return a skeletal challenge page with no real content. If
search returns 0 results on a known-valid keyword, refresh Browser Bridge cookies via opencli doctor, or open the URL in a real Chrome window with the browse skill to warm the session before retrying.
- Heavy lazy hydration: even after the initial HTML loads, Airbnb hydrates the search grid + booking widget over several seconds. The scraper waits 4–5s and autoscrolls; if you still get empty results, increase the wait.
- Currency is forced via URL param:
?currency=USD is injected by detail.ts and pricing.ts (alongside ?locale=en-US) when the caller hasn't pinned one. Overrides the Browser Bridge cookie's default for that single navigation. Symbol-only fallbacks in pricing.ts row writer use the URL hint, so we never store ambiguous $ / ¥ in skus.currency. See "Currency" section above.
- Duration is listing-only: airbnb does NOT surface duration anywhere on the experience detail page (verified 2026-05-07 — checked sections, JSON-LD, page text). It only appears on the listing card as
<category> · <N> hour(s). search.ts extracts it via card.innerText (textContent inlines everything and breaks the chip pattern); the smoke-test / listing pipeline then threads the value via IngestOptions.durationMinutes → activities.duration_minutes. Detail-only ingest paths (e.g. ingest-from-snapshot without a search hit) leave duration null — that's correct, not a bug.
- Languages anchor on canonical airbnb phrasing: BD-confirmed (2026-05-07) the page says "This experience is hosted in English and Japanese", NOT "Offered in" / "Hosts speak" / "Available in". The regex tries the canonical pattern first, then falls back to OTA-style phrasings as A/B-test guards. Output normalizes "X and Y" → "X, Y" so normalize.ts's split-on-comma yields the expected array.
- Confirmation type left null per BD: the comparison table (2026-05-07) says airbnb has "no standardized column" for confirmation. Most experiences are de-facto instant-book by platform design, but the page never states it explicitly. Adapter does NOT infer 'instant' to avoid populating the DB with a value that wasn't on the page; downstream policy (review or normalize layer) can decide whether to default-fill.
- URL → ID regex must include
/experiences/(\d+): airbnb URLs don't fit the Klook /activity/, Trip /detail/, KKday /product/, or GYG -t{N} patterns. src/tours/{commands,scan,ingest}.ts all carry the same regex chain — they MUST include /experiences/(\d+) or the smoke-test / listing pipeline / ingest-from-search will silently skip every airbnb hit with id-not-extractable. Lookup duplication (3 sites) is shotgun-surgery prone; a future refactor should put parseActivityId on the adapter interface.
- City + category line is split across newlines with a stray comma: innerText format is
"<N> reviews\n<city>\n,\n · <category>". Don't expect <city> · <category> to be on one line — the regex anchors on [N] reviews\n and tolerates the comma.
- Some experiences omit the city/category line: multi-location tours and older listings don't render this header chip. Empty
city/category is expected, not a failure.
supplier is a fallback chain: prefer the business name from "Owner of " / "Founder of "; only use the host's first name if no business is listed.
- Reviews format varies: Airbnb sometimes shows "★ 4.92 (127)" inline, sometimes "4.92 out of 5" on the booking card. The scraper tries both.
- No itinerary on most experiences: "What you'll do" is a free-text section, not a stepped timeline. The
itinerary[] array is best-effort and often empty.
- Section walker uses sibling-walk, not
.closest('section'): Airbnb groups several h2s under one <section> element ("Cancellation policy" + "Things to know" + "Guest requirements" + "Activity level"). The shared getSectionWalkerJs() helper walks siblings between a heading and the next heading, falling back to closest() only when the enclosing section has ≤3 headings. This gives clean, scoped cancellation policy text instead of the whole "Things to know" block.
Fallback playbook
When opencli airbnb get-activity <id> fails or returns empty:
- Retry once. Bot-challenge pages often clear on the second hit if cookies were warmed.
opencli doctor to refresh Browser Bridge cookies.
- Open the URL in
browse to manually warm the session and verify the page actually renders for your geo.
- Snapshot replay from
data/snapshots/airbnb-<id>-*.json (once any have been captured) with tours ingest-from-snapshot airbnb <file>.
- Manual capture via the
browse skill → save JSON matching KlookDetail shape → ingest.
Known failure modes
- Empty
title: the page hit a bot challenge — body text contains "Press & Hold" or a captcha widget instead of real content. Response: warm cookies via opencli doctor, then retry. Do not proceed with downstream ingest.
- Title is correct but
packages[] is empty: the booking widget didn't hydrate before scrape. Response: increase page.wait in src/clis/airbnb/detail.ts or scroll past the sidebar to force render.
- Region-locked experience: some experiences only show price after geo-detection. Response: verify the experience is bookable from the Browser Bridge's geo, otherwise flag via
tours set-activity-review-status <id> flagged --note "region-locked".
- Search returns stays cards mixed with experiences: the URL
/experiences filter sometimes leaks. The scraper filters on the /experiences/<id> URL pattern, so non-experience cards are dropped silently. Symptom: limit=20 returns < 20 results on a busy keyword.
- Field name alignment with shared
PricingRowRaw contract (fixed 2026-04-27): row fields renamed from daily_min_price / daily_min_price_raw to price / price_raw to match src/tours/types.ts. src/tours/normalize.ts is platform-agnostic and reads only the contract fields — don't re-introduce platform-specific names.
- Pricing flow rewritten from calendar-nav to booking-modal (fixed 2026-04-27): original scaffold assumed airbnb stays' "Check availability" → calendar pattern, which experience pages don't follow. Real flow on experience pages: a red "Show dates" CTA (
[data-testid="ExperiencesBookItController-sidebar-button"]) opens a "Select a time" modal that already contains all the data we need — date headings + time-slot cards + per-slot prices + availability ("1 spot left" / "Sold out"). The deeper calendar grid is a secondary UI we ignore. Old OPEN_PICKER_JS / buildCalNavigateJs / READ_CAL_CELLS_JS replaced with OPEN_BOOKING_MODAL_JS + READ_TIME_SLOTS_JS. Locale-blind via the data-testid attribute (developer-set, not localized). Verified end-to-end on 121104 under a zh-TW session: 3 days × $2,564 TWD captured, full chain through Supabase mirror.
- Pre-existing role="dialog" elements on the page (known quirk, handled 2026-04-27): experience pages render small popovers (cancellation policy "免費取消" tooltip, share modal) tagged
[role="dialog"] BEFORE the booking modal opens. A naive querySelector('[role="dialog"]') after the click grabs the first one (cancellation tooltip with ~1 line of text). READ_TIME_SLOTS_JS defends by picking the most content-rich dialog (max innerText length) and OPEN_BOOKING_MODAL_JS verifies a NEW dialog appeared after click. Don't simplify back to a single querySelector.
- Modal scroll for full 30-day window (fixed 2026-05-07): airbnb's "Select a time" modal renders only the first ~3-5 days on initial paint and lazy-hydrates the rest as the user scrolls. Without scroll,
--days 30 collapsed to whatever was visible (typically 5 days), and bookable_days_30d was silently biased low. SCROLL_MODAL_TO_BOTTOM_JS runs between OPEN_BOOKING_MODAL_JS and READ_TIME_SLOTS_JS: walks descendants to find the actual scroll container (modal shell often has overflow:hidden, the inner list carries the scroll), then loops scrollTop = scrollHeight with 700ms waits until scrollHeight stabilizes for 2 ticks (cap 30 iterations). Per Ryan: "整天 sold out 算當天沒 availability, 要往下滑看完 30 天,算出 %". Verified: 232900 went from 5 → 26 captured days; 4613909 from ~7 → 30 (13 bookable + 17 sold-out across the full window).
bookable_days_30d is the count, not the percent: activities.bookable_days_30d stores the integer count of days in the next 30 that have ≥1 available slot. computeAvailability is the single source of truth; it filters by inferred operating cadence (active_weekdays from any-day-with-≥1-available-SKU). Today's date is the lower bound, so a SKU dated for "today + 30" falls outside the window. Display layer derives the % as count / 30 * 100 (or count / operating for cadence-aware ratio) without re-running the rollup.
- Year inference: airbnb's modal date headings ("Tomorrow, 28 April", "4月28日") omit the year. Pricing.ts infers from current local date — if slot month >= today's month, use current year; else next year. This holds for the 7-day default window but breaks for
--days > ~330; flag if you ever need long-horizon pricing.
- Default
?locale=en-US injection (fixed 2026-04-27): airbnb URLs without a locale fall back to whatever the Browser Bridge cookie negotiates (often zh-TW for our session). All three adapter entrypoints (search, detail, pricing) now inject ?locale=en-US when the caller didn't pass an explicit language hint, via applyDefaultLanguage(platform, url) in src/tours/url-locale.ts. Result: section titles + "Things to know" bodies + supplier copy come back in English, which downstream normalize/match logic relies on. Don't strip this default — explicit language hints from the listing-driven flow still win.
- Cover image filter excludes platform assets (fixed 2026-04-27): the broad
img[src*=cdn] selector used to grab /airbnb-platform-assets/icons/host-shield.svg as first_image because it's the first hydrated image on experience pages. detail.ts now prioritises img.pb-image-grid__img and filters out /airbnb-platform-assets/, /icons/, and .svg paths. If hero images stop populating, check whether airbnb renamed the grid class — log the first 5 candidate <img> srcs before tightening the filter further.
supplier_url + section content captured for audit/dashboard (fixed 2026-04-27): detail.ts now extracts a[href*="/users/profile/"] near the host card and emits it as supplierUrl (parsers.ts pass-through to supplier_url). It also captures full section bodies (not just titles) into raw_extras_json.sections as [{title, content}] capped at 2KB each — used by the activities-modal "Page sections" block. Re-ingest needed to backfill rows scraped before this commit; new ingests are automatic.
Touchpoints
Edit these files to change skill-observed behavior:
src/clis/airbnb/search.ts — search URL, card selector, result shape
src/clis/airbnb/detail.ts — parseExperienceId, h1/meta extraction, packages synthesis, cancellation policy
src/clis/airbnb/pricing.ts — TODO, not yet implemented
After any change: npm run build (the symlink at ~/.opencli/plugins/airbnb picks up dist/ automatically).
I/O Schema
Canonical reference: docs/io-schemas.md — full input args, output JSON shapes, and DB column mappings.
Airbnb-specific nuances:
supplier = host name: parsed from "Hosted by " body text, not a separate operator label like Klook.
order_count is empty: Airbnb does not surface a "X+ booked" counter on experience pages. Don't expect this column to populate from this platform.
packages[] typically has 1 entry (the experience itself). "Private experience" / "Group experience" / language are variant axes stored under option_dimensions rather than fanned into duplicate packages rows. This is the SKU model decision (2026-04-27): cross-platform SKU = (package, date) and Airbnb has one synthesized package, so the per-platform SKU count for Airbnb is exactly days_requested per experience.
daily_min_price is per-person: Airbnb's native unit. Cross-platform compare must normalize on the consumer side if any other platform reports per-group prices. Klook/Trip/GYG/KKday for daily tours are also per-person, so no normalization is needed today.
cancellation_policy (cross-platform field, Airbnb-specific extraction path): cleanly captured from the explicit "Cancellation policy" h2. Output is typically a single sentence describing the refund window (e.g. "Free cancellation up to 24 hours before the experience starts").
Writes when called via tours pipeline:
get-activity → activities row (host as supplier, cancellation_policy populated, single synthetic package)
get-pricing-matrix — not implemented; pipeline must use tours ingest-from-detail until pricing.ts is built.
Listing (new pipeline)
Airbnb is the platform that motivated the listing pipeline. Geo radius is
wide (place_id returns Kyoto results when searching Osaka), and
subvertical needs a tag-id mapping.
Reference data:
data/listings/airbnb-place-ids.csv — geo → place_id + ready URL
data/listings/airbnb-categories.csv — subvertical → Tag:N
data/listings/geo-expansions.json — Osaka entry covers the airbnb
rubric (Sakai, Higashiosaka, Ibaraki, Minoh, Nishinomiya, Ashiya)
Reference (verbatim playbook):
docs/listings/airbnb-skill-archive.md — colleague's Claude-driven URL
build + kg_or_tags filter + relevance/location filter playbook.
Locality: search.ts:42 returns city from a DOM chip. Maps to
listing JSON locality.
Filter is mandatory: airbnb's place_id is the platform with the
highest geo-noise rate (real case: 29/66 Osaka cooking listings were in
Kyoto/Nara). Always run applyListingFilter with the correct geo expansion.
TODO: implement opencli airbnb search-experiences --geo X --subvertical Y
that resolves via the two CSVs, fetches the URL, and emits Listing JSON.
See docs/listings/design.md §6.