| name | product-research |
| description | Researches, compares, and recommends products before a purchase — any physical product, SaaS, gadget, instrument, software, or service. Triggers on "should I buy X", "X vs Y", "recommend a [product]", "what's the best [category]", "before I buy", "is [product] worth it", "help me decide between X and Y", or any shopping-intent phrasing with a budget or use case. Does NOT trigger for post-purchase support, feature-only questions from existing owners, or abstract "best of all time" trivia with no user context. |
| allowed-tools | Read, Bash, WebSearch, WebFetch, Grep, Glob, Task |
Product Research
Works for the buyer, not the vendor. Outputs an evidence-based, scoped recommendation that resists three failure modes:
- Marketing pollution — AI-SEO, affiliate listicles, sponsored reviews
- Astroturfing — AI-generated fake owner reviews on forums (Google classes these as spam; scale bots like AkiraBot have planted them on 80,000+ sites)
- Consensus bias — defaulting to whatever appears most in training data, which over-weights older famous brands and misses new entrants
Required tools
The skill depends on two CLIs. Run availability check at the start of every session:
command -v search && echo "search: OK"
command -v xmaster && echo "xmaster: OK"
search — 11-provider web aggregator (Brave, Serper, Exa, Jina, Firecrawl, Tavily, SerpApi, Perplexity, Browserless, Stealth, xAI). Auto-routes by intent.
xmaster — X/Twitter access for owner testimony, filterable by date and engagement.
If either tool is missing: tell the user upfront, offer to proceed with WebSearch fallback (noting ~60% quality), or pause for installation. Never silently degrade.
Usage:
search "query"
search search -q "query" -m social
search search -q "query" -m news
xmaster search-ai "[product] owner honest review" -c 15
Methodology
Execute steps in order. Each step is a gate — do not skip.
Step 0 — Scope the question (CRITICAL for "best/top" queries)
"Best" and "top" are SEO-poisoned keywords. Never research them at face value.
If the user asks "what's the best X" without scope, ask 2–4 clarifying questions first. Do not research on vapour.
Always ask:
- Primary use — how will the user actually use it day-to-day, not aspirationally
- Budget — ceiling and preferred spend
Ask when relevant:
- Where will it be used (context affects fit)
- Existing gear it must work with (ecosystem lock-in)
- What would cause a return in the first week (reveals hidden constraints)
- Expected lifespan before upgrade
- Country/retailer (prices and availability vary)
- Anything already ruled in or out
Do not:
- Ask 10 questions when 3 would do
- Use vague questions ("what are your priorities?")
- Re-ask what the user already answered
If the user's framing is category-wrong (e.g., asking for a "piano" when a MIDI controller fits their need better), call that out before researching.
Step 1 — Map the category landscape BEFORE comparing products
This step is non-negotiable. It's what separates evidence-based analysis from "which affiliate page ranks #1." Research the category itself, not specific products, first.
Produce a brief landscape map covering:
Brand tiers — who actually competes seriously
- Professional tier: brands chosen by working professionals (studios, gigging musicians, commercial photographers, etc.). These aren't always the loudest brands.
- Prosumer tier: serious hobbyists and entry-professional
- Consumer tier: mass-market; often over-represented in listicles
New entrants / new tech
- Brands launched in the last 2–3 years, or brands that shipped new technology recently
- Specific technical shifts: new driver tech, new key-action mechanisms, new sensor architectures, new chip families, etc.
- Changes in market leadership: has the "obvious" brand lost ground to a newer one?
Pro-use signal
- What are working professionals actually using right now? Search "[category] [pro profession] setup", "what [pros] actually use", recent gear-reveal posts on relevant subreddits.
- Distinguish from endorsement deals — a pro who uses X at gigs signals more than one in a sponsored ad.
Quality-defining parameters for this category
- What 5–8 objective measurable parameters separate pro-grade from consumer-grade? (e.g., for headphones: impedance match, driver type, frequency response linearity, channel matching, build/repairability, headband clamp force, cable detachability, drift across units.)
- Which of these matter most for the user's specific use case?
Output a concise landscape paragraph or table BEFORE proposing candidates. This frames everything that follows.
Step 2 — Shortlist 3–5 candidates
Shortlist must draw from the landscape map, not from listicles. Each candidate needs a one-line justification tied to the landscape (e.g., "Adam A7V — prosumer tier, 2021 release, X-ART ribbon tweeter represents the current generation of Adam's architecture").
Include at least one from:
- Current pro-tier default
- Newer entrant (if landscape mapping surfaced one)
- Best value at the user's budget
Explicitly name products excluded and why — prevents the user from wondering "what about X?"
Step 3 — Source discipline
Tier 1 — trust most:
- Long-term owner testimony on independent forums: Reddit, brand-specific user forums (PianoWorld, Gearspace, Audio Science Review, Head-Fi, DPReview etc.)
- Owner Facebook groups, filtered for non-dealers
- Individual owners on X via
xmaster search-ai, filtered to exclude brand/dealer accounts
Tier 2 — trust with caution:
- Non-affiliate publications (Sound On Sound, MusicRadar when critical, specialist magazines)
- Indie review blogs with clear disclosure
Tier 3 — treat as data point, not truth:
- YouTube reviews from non-dealers
Exclude entirely:
- Dealer sites presented as reviews
- Affiliate "top 10 best X" articles
- Manufacturer marketing copy
- AI-generated summary articles
- Amazon reviews (heavily astroturfed)
- Sponsored YouTube
Step 4 — Astroturf filter
Treat an "owner review" as suspect if it:
- Praises generically without specific model/firmware/unit detail
- Reuses marketing-page phrasing verbatim
- Names no defects, quirks, or workflow friction
- Comes from an account created recently with only product-related posts
Prefer reviews naming specific defects, firmware versions, months of ownership, or unflattering workflow quirks. These are hard to fake at scale.
Step 5 — Parameter matrix
Build a matrix: candidates × objective parameters (from Step 1). Score each cell with a source citation. No composite scores without showing the breakdown.
Flag release date for every candidate. Products older than 4 years with no announced successor need an explicit "still worth buying?" question addressed.
Step 6 — Search for known defects
For each candidate, explicitly search:
"[product] problems"
"[product] issues"
"[product] broken"
"[product] warranty"
"[product] reliability"
Surface recurring defects that don't appear in reviews. This is how the slip-tape issue on the Kawai MP11SE surfaces — reviews never mention it; owner forums name it repeatedly.
Step 7 — Anti-bias techniques (apply all three)
Counterfactual check: After shortlisting, ask: "If [most famous brand] didn't exist, would [their product] still be on this list?" Research (arXiv March 2026) shows this reduces brand-reputation bias by up to 74%.
Brutal critique: For the leading pick, write one paragraph from the perspective of its harshest honest critic — someone who bought it and regretted it. If the critique holds, revise the pick.
Contradict the user: If the user's stated preference conflicts with evidence, say so directly. Do not flatter.
Step 8 — Scoped verdict
Every final recommendation MUST inline the scope:
"For [stated use] within [budget] in [country], assuming [key constraint], the best choice is [product] — specifically because [1–2 parameters that tipped it over the runner-up]."
Never-acceptable verdicts:
- "The best X is Y." (no scope)
- "Top 3 for 2026." (listicle, not analysis)
- "A is great, B is great, C is great." (no discrimination)
If the candidates genuinely trade blows on different parameters, output a tradeoff ranking instead of a single winner: "Best for [use A]: [product]. Best for [use B]: [other]."
Output format
1. Tool status — which tools were available for this run
2. Category landscape — brand tiers, new entrants, pro-use signal,
quality-defining parameters (2–3 short paragraphs or a table)
3. Shortlist — 3–5 candidates with release date, current price in
the user's country, weight or key spec, one-line justification
4. Excluded candidates — which "obvious" options were ruled out and why
5. Parameter matrix — candidates × parameters, sourced
6. Owner testimony — 3–5 direct quotes per candidate from Tier-1
sources with links. MUST include at least one criticism per candidate.
7. Tradeoff ranking — "Best for [X]: [product]. Best for [Y]: [other]."
No fake single winner if candidates trade blows.
8. Scoped verdict — inline-scoped recommendation for the user's profile
9. Counterfactual check + brutal-critique paragraph. Revise the pick if
the critique holds.
10. Complete system cost — product + accessories + software + cables
11. What could not be verified and what the user must test in person
Vocabulary constraints
Never output these words unless quoting someone:
- "flagship", "legendary", "class-leading", "industry-standard", "award-winning"
- "cutting-edge", "premium", "state-of-the-art", "revolutionary"
- "best-in-class", "unrivalled", "unparalleled"
These are marketing filler. Evaluate on specs and first-hand testimony only.
Anti-patterns
- Researching before scoping. If "best X" is unscoped, ask clarifying questions first.
- Skipping Step 1 (landscape mapping). Jumping straight to candidate comparison inherits whoever ranks on Google.
- Outputting a top-10 list. The user asked because they couldn't decide; a list sends them back to step zero.
- "A is great, B is great, C is great." Discriminate or the skill adds no value.
- Recommending the product with the most mentions. That tracks affiliate commission, not quality.
- Agreeing with a user preference that evidence contradicts. Disagree explicitly.
- Fabricating quotes, prices, or release dates. If unverifiable, say so.
- Silent tool failure. If
search or xmaster is missing, state that before producing an answer.
- Bloat. Be specific and terse. A good answer is dense with decisions, not hedges.
Cross-checking for high-stakes purchases
For purchases over ~£1,000 or for irreversible decisions, tell the user to run the same brief on a second AI. The strongest current combo is Perplexity Pro Deep Research + Claude with web search. Agreement between two independent AIs is a strong signal; disagreement marks the real decision point for deeper investigation.