원클릭으로
vs-search-tuning
// Use when a user asks an agent to evaluate or tune text search similarity for an existing Viking AI Search application and dataset.
// Use when a user asks an agent to evaluate or tune text search similarity for an existing Viking AI Search application and dataset.
Shared SearchCLI setup: install, authenticate, run doctor, and verify the local environment.
Search runtime and scene management: run queries, inspect scenes, debug app readiness, and diagnose recall or config issues.
Provide system alias mapping for Search CLI. Invoke this skill when user mentions "Search CLI", "search_cli", or tries to execute search_cli commands.
Binds a dataset to an application with reviewed field config inference. Invoke when using `vs app dataset bind` and searchable/filter/suggest fields must be inferred and confirmed first.
Conversational search runtime: send messages, keep sessions consistent, and verify retrieval behavior and responses.
General item-level onboarding: understand structured item data, generate schema and config plans, create datasets, and create or bind apps only when the user explicitly asks for app-level setup.
| name | vs-search-tuning |
| description | Use when a user asks an agent to evaluate or tune text search similarity for an existing Viking AI Search application and dataset. |
| category | search |
| applies_to | codex, agents, external-agent |
| requires_cli | >=0.1.0 |
| keywords | search tuning, search evaluation, llm judge, ndcg, query generation, similarity tuning |
| commands | llm login, llm import-env, llm status, search tune llm-check, search tune validate, search tune query-generate, search tune plan, search tune run, search tune report, search tune compare, search tune apply, app status, doctor |
Use this skill when the user wants an external agent to evaluate and tune text search similarity for an existing AI Search application and dataset.
This first version is for text-query similarity only. It fixes mode=UserDefined and tunes the user-defined recall strategy, recall weights, keyword match ratio, and max retrieved count. It does not tune rerank, personalization, hotness, boost/bury, sort rules, serving controls, or business operating rules.
application-id is availabledataset-id is preferred; if omitted, the CLI can try to infer a unique search dataset from the applicationvs auth statusvs llm login, vs llm import-env, or VIKING_LLM_BASE_URL / VIKING_LLM_API_KEY / VIKING_LLM_MODEL when generating queries or using LLM relevance labelssourceItemIds can be evaluated with --label-source source-item for a fast first-pass silver-label run without LLM relevance judgingllm login / llm import-env / llm status: configure and verify OpenAI-compatible LLM credentials without exposing API keys in chat or plain configsearch tune llm-check: verify CLI-managed LLM configurationsearch tune validate: validate a query set locally before planning or running; reports schema issues, duplicate ids/text, sourceItemIds coverage, query type skew, and a label-source recommendationsearch tune query-generate: generate a reusable synthetic query set from paged dataset samples with batched concurrent LLM calls when the user has no query setsearch tune plan: show query source, candidate strategies, estimated requests/labels, parameter coverage, source-item coverage, warnings, and suggested first-pass size before runningsearch tune run: generate or load queries, run candidate search strategies, label top results, compute metrics, and write artifacts; supports --label-source llm|source-item|auto, --llm-retries, --max-label-failure-rate, and --verbose; use --resume-run-id <run-id> to continue an interrupted runsearch tune report: read a previous tuning reportsearch tune compare: compare completed tuning runs with --run-ids, or compare existing scenes online with --scene-ids --queries using source-item silver labelssearch tune apply: create a new candidate search scene from a completed tuning report recommendationapp status / doctor: verify app and local environment readiness--queries <file>. If not, say the CLI will generate synthetic queries from dataset samples and that those queries should be reviewed.vs auth status --jsonvs doctor --jsonvs search tune llm-check --json
If LLM check fails and LLM query generation or LLM judging is needed, configure LLM first:vs llm loginvs llm import-envvs llm status --jsonvs app status --application-id <id> --jsonmode=UserDefineduser_defined_recall_mode, dense_weight, text_weight, query_keyword_match_percent, and max_retrieved_numvs search tune query-generate --application-id <id> --dataset-id <dataset> --query-count 100 --sample-size 200 --query-batch-size 10 --llm-concurrency 100 --timeout-ms 120000 --json
Show the returned sampleQueries, typeCounts, requestedQueryCount, actualQueryCount, shortfall, and warnings to the user. If ok=false, do not continue to plan or run; retry with larger timeout/sample size or ask for a real query set. Use the returned queryFile only after the user accepts the query set for first-pass tuning.vs search tune validate --queries <file> --json
Summarize ok, validQueryCount, duplicateIdCount, sourceItemQueryCoverage, labelSourceRecommendation, and any blocking problems. If ok=false, fix or regenerate the query set before continuing.vs search tune plan --application-id <id> --dataset-id <dataset> --queries <file> --profile similarity-only --jsonqueryFile returned by query-generate
Summarize the estimated search requests, max pointwise LLM judgements, source-item coverage, suggested first-pass size, warnings, and parameter coverage.sourceItemIds: vs search tune run --application-id <id> --dataset-id <dataset> --queries <file> --profile similarity-only --label-source source-item --search-concurrency 18 --timeout-ms 120000 --jsonvs search tune run --application-id <id> --dataset-id <dataset> --queries <file> --profile similarity-only --label-source llm --search-concurrency 18 --llm-concurrency 100 --llm-retries 1 --max-label-failure-rate 0.01 --timeout-ms 120000queryFile returned by query-generate
Use the command form above for first-pass tuning unless the user explicitly asks for a different evaluation scope. Search requests default to 18-way concurrency, and LLM judgements default to 100-way concurrency. LLM judging runs as a worker pool, so completed labels are checkpointed while slower LLM requests continue in their own worker slots.run-state.json: current status, completed searches, labels, and resume metadatapartial-metrics.json: partial metrics from completed query/strategy pairsperformance-summary.json: elapsed time, search/LLM wall time, average and percentile latency, throughput, cache hits, label failures, and configured concurrencyrankings.jsonl, labels-used.jsonl, and label-failures.jsonl: completed rankings, labels used by the run, and tolerated/diagnostic label failures
If the process is interrupted, resume with vs search tune run --application-id <id> --resume-run-id <run-id>.vs search tune report --run-id <run-id> --jsonvs search tune compare --run-ids <run_a,run_b> --jsonvs search tune compare --application-id <id> --dataset-id <dataset> --scene-ids <scene_a,scene_b> --queries <file> --json
For scene compare, every query must include sourceItemIds; otherwise use search tune run with LLM labels and compare completed run IDs.vs search tune apply --application-id <id> --run-id <run-id> --dry-run --json
Explain unappliedRequestParams; request-only params such as query_keyword_match_percent are not persisted in scene config.vs search tune apply --application-id <id> --run-id <run-id> --confirm-create-scene --jsonvs CLI surface (--help, command output, and observed runtime behavior), and explicit user-provided information.search tune validate has checked the accepted query file, unless the user explicitly asks to skip validation.search tune plan has been shown and summarized.search tune run auto-generate queries during agent-led tuning. If the user has no query set, run search tune query-generate, show query samples, and then pass the generated queryFile to plan and run.query-generate returns ok=false; inspect warnings and retry generation before asking the user.search tune llm-check succeeds. A --label-source source-item run may skip LLM judging only when the query file already contains usable sourceItemIds.vs llm login in a real terminal, or ask the user to set VIKING_LLM_BASE_URL, VIKING_LLM_API_KEY, and VIKING_LLM_MODEL in that terminal and then run vs llm import-env.search tune apply creates a new candidate scene only; it does not switch the default entrance.--resume-run-id over starting a duplicate run with the same query set and strategy space.search tune apply after a completed report and explicit user approval.search tune run report exists. If the report used --label-source source-item, call it a fast source-item silver-label recommendation and explain that LLM or human labels can be used for higher-confidence validation..viking/search-tuning artifacts unless the user explicitly asks.search tune run generates queries automatically, tell the user the query set is synthetic and should be reviewed for high-risk usage.