| name | opencli-listing-pipeline |
| description | Run the full listing pipeline for one (geo, subvertical, platform) — discover activity URLs, apply geo filter, ingest into the tours DB. Use when the user asks "run/refresh listing for X on Y", "discover airbnb experiences for Osaka cooking", "fill in the GYG Bangkok listings", or any phrasing that implies "find activities for this geo+subvertical+platform combination and load them". Companion to `opencli-pricing` (next step once activities are pinned) and `opencli-routine` (orchestrator that dispatches here for the listing branch). Owner: Ryan Huang.
|
opencli-listing-pipeline
Operational runbook for the listing pipeline. Picks the right execution path
per platform, drives the four CLI commands in order, and surfaces tweakable
quirks in the per-platform sections below — those quirks are deliberately
in markdown so colleagues can edit them directly without going through code.
The binding design (axes, schema, decisions already locked in) lives in
docs/listings/design.md. Read that for context, not duplication.
Pre-flight
Before any listing run, confirm:
opencli doctor — Browser Bridge daemon is running. Refresh the cookie
via the extension if connectivity fails.
node dist/cli.js tours preflight-locale — Browser Bridge cookie is
en-US. Adapters break silently against zh-TW DOM.
OPENROUTER_API_KEY is set in .env* — only consulted on cache miss for
the resolver, but a missing key fails on first cold geo / subvertical.
Decide which path to take
| Platform | Path | Notes |
|---|
| airbnb | A — tours listing-run | Adapter wired; URL builder + scrape + filter + ingest in one command. |
| klook | B — manual chain | Destination URL builder ready; search-by-url adapter not yet shipped. |
| getyourguide | B — manual chain | URL slug carries city; recon a small search-by-url first. |
| trip | B + keyword fallback | No destination-URL pattern documented yet. |
| kkday | B + keyword fallback | Same. |
When the structured path is unavailable, fall back to C — the
colleague's Claude-driven listing skill in docs/listings/<platform>-skill-archive.md. That path produces raw cards manually inside a Claude
session and feeds them into tours listing-finalize (step 3 below). Cron
cannot run path C — schedule a human reminder instead.
Path A — airbnb one-shot
node dist/cli.js tours listing-run \
--platform airbnb \
--geo "Osaka" \
--subvertical "cooking class" \
--limit 100
Runs the full chain: resolve → build URL → scrape → applyListingFilter →
write Listing JSON → ingestFromListing. Read the final
kept / kept_unchecked / dropped_geo line; if dropped_geo is high
relative to total, jump to Geo filter sanity check below.
Path B — manual chain (4 commands)
Use when no search-by-url adapter exists for the platform yet.
node dist/cli.js tours listing-prepare \
--platform <p> --geo "<geo>" --subvertical "<subv>"
opencli <p> search-activities "<keyword>" --limit 100 -f json > /tmp/cards.json
node dist/cli.js tours listing-finalize \
--cards-file /tmp/cards.json \
--platform <p> --geo "<geo>" --subvertical "<subv>" \
--filter-signature "<from step 1>" \
--total-in-filter <if known>
node dist/cli.js tours ingest-listing --file <path-from-step-3>
After step 4 on a first-time (geo, platform) pair:
node dist/cli.js tours pin --top 5 --platform <p> --poi "<geo>"
Pricing then refreshes the pinned items via opencli-pricing (separate skill).
Per-platform tweakable quirks
These knobs drift over time. Edit them inline in this skill — no TS change
needed unless the change is structural (DOM extractor logic, filter math).
airbnb
- Geo radius noise is real. Airbnb
place_id returns Kyoto / Nara when
searching Osaka. The title-fallback in listing-filter.ts handles this,
but keep data/listings/geo-expansions.json accept-list well-curated for
the geo, and reject_examples populated — those drive the title-based
drops.
--expect-keyword asserts document.title contains the geo. Default
is the geo name verbatim. If a destination's airbnb title differs (e.g.
"Tokyo, Japan" vs "Tokyo"), pass a substring that matches both.
- Subvertical → tag mapping lives in
data/listings/subvertical-mappings.json. If the LLM picked the wrong
tag, edit the file directly: set "source": "manual" and write the tag
IDs you want. Resolver trusts manual entries forever.
- Card extractor is
buildSearchEvaluate() in
src/clis/airbnb/search.ts (shared between search-activities and
search-by-url). Empty city field on destination listing pages is
expected — title-fallback covers that.
- Plateau scroll: 30 passes max, stops after 2 passes with no count
growth. Increase
SCROLL_PASSES in src/clis/airbnb/search-by-url.ts
if a destination has > 200 cards (Tokyo / Bangkok scale).
klook
- Destination URL pattern:
https://www.klook.com/en-AU/destination/c{N}-{slug}/{cat}-things-to-do/?main_category={N}.
Mapping in data/listings/klook-destinations.csv (geo → city_id + slug)
and data/listings/klook-categories.csv (subvertical → main_category).
- Stale city IDs: large IDs (e.g.
c702414 for Koh Samui) coexist
with early sequential IDs (c4 Bangkok). Verify with document.title
after navigation if a new ID looks unfamiliar.
- Locality comes from the public-API
cityName field — already
populated by src/clis/klook/search.ts. No DOM scraping required.
getyourguide
- City lives in URL slug:
/{city}-l{LID}/ and /{slug}-t{ID}/.
Adapter TODO: derive locality from URL when extracting cards
(currently empty in src/clis/getyourguide/search.ts:41). Until shipped,
GYG cards default to kept_unchecked — geo filter degrades to "include all".
- Stale LIDs are a real hazard — the colleague archive flagged
l1391 → Hungary, l1399 → France. When verifying a new LID, check
document.title against a Thai or relevant country word before trusting
the value.
- Show-more pagination: max 5 clicks per JS call (CDP timeout at 45s).
This recipe lives in
docs/listings/gyg-skill-archive.md — port to the
adapter when shipping search-by-url.
trip
- No destination URL pattern documented yet. Recon needed. Today the
pipeline runs only via keyword search through
opencli trip search-activities.
- Locality is empty in
src/clis/trip/search.ts:37 — same fix path as
GYG (extract from URL or DOM).
kkday
- No destination URL pattern documented yet. Same as trip — keyword
fallback only.
- Locality is populated from DOM
[class*="location"], [class*="city"] in src/clis/kkday/search.ts:34.
Reasonably reliable.
Geo filter sanity check
If dropped_geo is high (> 30% of total), inspect:
- The
accept list for that geo in data/listings/geo-expansions.json.
Missing common neighbouring cities is the most frequent cause.
- The
reject_examples list — title-fallback uses these to drop. Too
narrow → false positives; too broad → false negatives.
- The actual cards: read the Listing JSON file under
data/listings/listing-runs/ and grep "filter_status":"dropped_geo".
Read the titles to confirm they really are out-of-region.
To force a re-fill of an expansion entry, delete that geo's key from
data/listings/geo-expansions.json and re-run; LLM fills anew.
Reporting
After every run, surface to the user:
total discovered + kept (passed filter) + dropped_geo (filter caught)
new_unique (actually new to the DB)
- LLM cache hit/miss per resolver (visible in
tours listing-prepare
output as (manual) / (llm-cache) / (csv-seed))
- Failures from
ingestFromListing per canonical_url
- Path to the Listing JSON file for audit
Troubleshooting
"Bot challenge" / "verify you" in adapter output
→ opencli doctor, refresh the Browser Bridge cookie via the extension,
retry once.
LLM resolver hangs or errors out
→ Check OPENROUTER_API_KEY in .env*. Cache misses on cold geos cost
one chat completion each.
Adapter returns 0 cards
→ DOM probably drifted. Run
opencli browser open <url> then
opencli browser find --css 'a[href*="/experiences/"]' --limit 1 (airbnb)
or the equivalent platform-specific selector. Compare against
buildSearchEvaluate() in src/clis/<p>/search.ts. If the selector needs
an update, that is a TS change — escalate to opencli-<platform>.
tours listing-finalize rejects the cards file
→ Card shape is wrong. Each card needs at minimum canonical_url. Schema
is ListingActivitySchema in src/tours/listing.ts (passthrough — extra
fields are tolerated).
All cards labelled dropped_geo
→ Geo expansion is too narrow OR listing source labels localities
differently. See Geo filter sanity check.
Listing run succeeds but coverage_runs doesn't reflect subvertical
→ Confirm the --subvertical flag was passed; the column is nullable but
should be populated for new runs.
Companion skills
- opencli-routine dispatches here when its decision matrix lands on
the listing branch.
- opencli-pricing runs after
tours pin to start daily price refresh.
- opencli- holds platform-specific adapter quirks at the TS
level — escalate there if the issue is below this skill's layer.
- opencli-router is the entry-point dispatcher when the user hasn't
named a platform.
Owner: Ryan Huang