| name | tikhub-xiaohongshu-search |
| description | Lightweight TikHub Xiaohongshu image-search workflow. Prioritizes single-request usage with curl or minimal Python, saves raw API JSON by default, and includes a small stdlib post-processor for CSV and simplified JSON. Use when the user wants Xiaohongshu keyword image search, page-based pagination, or structured note/image metadata from TikHub without a heavy wrapper. |
TikHub Xiaohongshu Search
What this skill gives you
This skill is optimized for the common case: one keyword search request.
It provides:
-
Minimal request patterns
curl for quickest validation
- tiny
httpx example for people who prefer Python
-
Raw JSON saving
- save the full TikHub response after each request
- useful for audit, replay, and later post-processing
-
One optional post-processor
postprocess_xiaohongshu_raw.py
- reads one raw file or a directory of raw files
- writes
xiaohongshu_search_summary.csv and xiaohongshu_search_summary.json
-
Optional pagination guidance
- enough information for later page turning
- intentionally brief, not the main path
Does not import TikHub-Multi-Functional-Downloader or any other project package.
API key requirement
This skill intentionally does not contain any API key.
Use one of these:
- environment variable:
TIKHUB_API_KEY
- ask the user to provide an API key explicitly
If the key is missing, stop and ask for it instead of hardcoding one into scripts.
Install
pip install httpx
Post-processor: no extra packages.
API (for reference)
- Image search:
GET https://api.tikhub.io/api/v1/xiaohongshu/app_v2/search_images?keyword=...&page=1&source=explore_feed
- Header:
Authorization: Bearer <API_KEY>
Notes from real requests
- In
curl, Chinese keywords should be URL-encoded. Directly putting 壁纸 into the query caused 400, while %E5%A3%81%E7%BA%B8 succeeded.
- A working minimal first-page request was:
keyword=%E5%A3%81%E7%BA%B8&page=1&source=explore_feed
- The first-page response returns pagination context:
search_id, search_session_id, word_request_id, and next_page
- Search results are in:
data.data.items
- Useful nested sections include:
image_info, note_info, share_info, and user_info
Preferred path: single request
1. Quickest: curl
First page:
curl --location --request GET "https://api.tikhub.io/api/v1/xiaohongshu/app_v2/search_images?keyword=%E5%A3%81%E7%BA%B8&page=1&source=explore_feed" \
--header "Authorization: Bearer $TIKHUB_API_KEY"
Another keyword example:
curl --location --request GET "https://api.tikhub.io/api/v1/xiaohongshu/app_v2/search_images?keyword=%E6%B2%BB%E6%84%88%E7%B3%BB&page=1&source=explore_feed" \
--header "Authorization: Bearer $TIKHUB_API_KEY"
2. Preferred Python pattern: tiny httpx
If the user wants Python, prefer a small request snippet, not a framework.
Search and save raw JSON:
import json
import os
import urllib.parse
import httpx
api_key = os.getenv("TIKHUB_API_KEY", "").strip()
if not api_key:
raise SystemExit("Missing TIKHUB_API_KEY")
keyword = "壁纸"
url = "https://api.tikhub.io/api/v1/xiaohongshu/app_v2/search_images"
params = {
"keyword": keyword,
"page": 1,
"source": "explore_feed",
}
headers = {"Authorization": f"Bearer {api_key}", "Accept": "*/*"}
with httpx.Client(timeout=30.0, follow_redirects=True) as client:
raw = client.get(url, params=params, headers=headers).json()
safe_keyword = urllib.parse.quote(keyword, safe="")
with open(f"xiaohongshu_search_{safe_keyword}.json", "w", encoding="utf-8") as f:
json.dump(raw, f, ensure_ascii=False, indent=2)
items = raw.get("data", {}).get("data", {}).get("items", [])
for item in items[:5]:
note = item.get("note_info", {})
share = item.get("share_info", {})
user = item.get("user_info", {})
print(note.get("title", ""))
print(share.get("link", ""))
print(user.get("nickname", ""))
Save raw JSON by default
For this workflow, the recommended default is:
- request the API
- save the full raw JSON immediately
- print only a few useful fields for quick inspection
- optionally run the post-processor later
Suggested file naming:
- first page raw:
search_<keyword>_page1_<request_id>.json
- next page raw:
search_<keyword>_page2_<request_id>.json
If request_id is unavailable, hash the keyword plus page number.
Pagination
Only care about this if the user wants page 2 or beyond.
From the first response, keep these fields:
search_id
search_session_id
word_request_id
next_page
Then use them in the next request:
curl --location --request GET "https://api.tikhub.io/api/v1/xiaohongshu/app_v2/search_images?keyword=%E5%A3%81%E7%BA%B8&page=2&search_id=<search_id>&search_session_id=<search_session_id>&word_request_id=<word_request_id>&source=explore_feed" \
--header "Authorization: Bearer $TIKHUB_API_KEY"
If the endpoint behavior changes, trust the latest response fields over assumptions.
Post-process raw JSON
Save as postprocess_xiaohongshu_raw.py (stdlib only).
Input:
- one raw search JSON file
- or a directory containing multiple raw JSON files
Output:
xiaohongshu_search_summary.csv
xiaohongshu_search_summary.json
from __future__ import annotations
import argparse
import csv
import json
import os
import sys
from glob import glob
from typing import Any, Dict, List
def collect_inputs(path: str) -> List[str]:
if os.path.isfile(path):
return [path]
if os.path.isdir(path):
return sorted(glob(os.path.join(path, "*.json")))
raise FileNotFoundError(path)
def as_list(value: Any) -> List[dict]:
return value if isinstance(value, list) else []
def flatten_for_csv(row: Dict[str, Any]) -> Dict[str, Any]:
out: Dict[str, Any] = {}
for k, v in row.items():
if v is None:
out[k] = ""
elif isinstance(v, (dict, list)):
out[k] = json.dumps(v, ensure_ascii=False)
else:
out[k] = v
return out
def simplify_raw(raw: dict, source_file: str) -> Dict[str, Any]:
outer = raw.get("data") or {}
inner = outer.get("data") or {}
items = as_list(inner.get("items"))
first = items[0] if items else {}
note = first.get("note_info") or {}
share = first.get("share_info") or {}
user = first.get("user_info") or {}
image = first.get("image_info") or {}
return {
"source_file": os.path.basename(source_file),
"request_id": raw.get("request_id"),
"api_code": raw.get("code"),
"router": raw.get("router"),
"keyword": (raw.get("params") or {}).get("keyword", ""),
"page": inner.get("page"),
"next_page": inner.get("next_page"),
"search_id": inner.get("search_id", ""),
"search_session_id": inner.get("search_session_id", ""),
"word_request_id": inner.get("word_request_id", ""),
"item_count": len(items),
"top_note_id": note.get("note_id", ""),
"top_title": note.get("title", ""),
"top_desc": note.get("desc", ""),
"top_liked_count": note.get("liked_count"),
"top_collected_count": note.get("collected_count"),
"top_comments_count": note.get("comments_count"),
"top_share_link": share.get("link", ""),
"top_user_nickname": user.get("nickname", ""),
"top_user_id": user.get("user_id", ""),
"top_image_url": image.get("url", ""),
"top_image_original": image.get("original", ""),
}
def main() -> int:
ap = argparse.ArgumentParser(description="Raw TikHub Xiaohongshu JSON -> CSV + simplified JSON")
ap.add_argument("--input", "-i", required=True, help="One .json file or a directory of .json")
ap.add_argument("--out-dir", "-o", default=".", help="Output directory (default: current working directory)")
args = ap.parse_args()
try:
files = collect_inputs(args.input)
except FileNotFoundError as e:
print("Input not found:", e, file=sys.stderr)
return 2
if not files:
print("No JSON files found.", file=sys.stderr)
return 2
out_dir = os.path.abspath(args.out_dir)
os.makedirs(out_dir, exist_ok=True)
csv_path = os.path.join(out_dir, "xiaohongshu_search_summary.csv")
json_path = os.path.join(out_dir, "xiaohongshu_search_summary.json")
rows: List[Dict[str, Any]] = []
for fp in files:
try:
with open(fp, "r", encoding="utf-8") as f:
raw = json.load(f)
except Exception as ex:
rows.append({"source_file": os.path.basename(fp), "error": f"json load: {ex}"})
continue
rows.append(simplify_raw(raw, fp))
with open(json_path, "w", encoding="utf-8") as f:
json.dump(
{
"generated_from": os.path.abspath(args.input),
"record_count": len(rows),
"records": rows,
},
f,
ensure_ascii=False,
indent=2,
)
flat = [flatten_for_csv(r) for r in rows]
fieldnames = sorted({k for row in flat for k in row.keys()})
with open(csv_path, "w", encoding="utf-8", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
writer.writeheader()
for row in flat:
writer.writerow({k: row.get(k, "") for k in fieldnames})
print("Wrote:", csv_path)
print("Wrote:", json_path)
return 0
if __name__ == "__main__":
raise SystemExit(main())
Commands:
python postprocess_xiaohongshu_raw.py --input ./xiaohongshu_raw
python postprocess_xiaohongshu_raw.py --input ./search_%E5%A3%81%E7%BA%B8_page1.json --out-dir .
Optional: multiple pages or multiple keywords
Only use this when the user clearly needs:
- multiple keywords
- page 2+
- bulk result collection
Keep the batching layer thin:
- accept a list of keywords
- request page 1 first
- store the returned pagination fields
- fetch more pages only if needed
- save one raw JSON per request
- reuse
postprocess_xiaohongshu_raw.py afterward
Recommended limits:
- start sequentially or with
max_workers=2 to 3
- reduce concurrency if you hit
429
- avoid assuming pagination tokens are reusable across different keywords
Do not lead with a big wrapper if the task is only one keyword search.
End-to-end workflow
- Provide
TIKHUB_API_KEY.
- Make a single image-search request with
curl or a tiny httpx snippet.
- Save the full raw response JSON.
- Inspect a few important fields directly.
- If needed, run
postprocess_xiaohongshu_raw.py on one file or a directory of raw files.
- Only then expand to page 2+ or multiple keywords.
Troubleshooting
401/403: invalid API key or missing Xiaohongshu scopes.
400 with Chinese keyword in curl: URL-encode the keyword.
- No items: keyword too narrow, source changed, or upstream result shape changed.
429: rate limit; retry later or reduce concurrency.
- Page 2 fails: confirm you passed the latest
search_id, search_session_id, and word_request_id from the prior response.
What this skill does not cover
- note detail endpoints
- note comment crawling
- downloading all images from every note as a batch export
- non-search Xiaohongshu workflows