원클릭으로
anserini-cli
// Run Anserini command-line and REST workflows from either a built fatjar or an Anserini source checkout. Use for PrebuiltIndexRegistry, TopicsRegistry, ad hoc search, interactive search, output formats, and RestServer examples.
// Run Anserini command-line and REST workflows from either a built fatjar or an Anserini source checkout. Use for PrebuiltIndexRegistry, TopicsRegistry, ad hoc search, interactive search, output formats, and RestServer examples.
Install and verify Anserini quickly by downloading the published fatjar from Maven Central instead of cloning or building the source repository. Use when users want fast setup, smoke tests, or CLI examples from a released Anserini jar.
Reproduce experimental results with Anserini. Use when Codex needs to run or explain Anserini reproduction workflows for published or reported results, including reproductions with prebuilt indexes, reproductions from raw document collections, reproduction YAMLs, run generation, evaluation, and metric verification.
Set up and verify Anserini source-development environments. Use for JDK 21, Maven 3.9+, submodules, Anserini build scripts, smoke tests, and Java/Maven troubleshooting in castorini/anserini.
| name | anserini-cli |
| description | Run Anserini command-line and REST workflows from either a built fatjar or an Anserini source checkout. Use for PrebuiltIndexRegistry, TopicsRegistry, ad hoc search, interactive search, output formats, and RestServer examples. |
| metadata | {"version":"v0.2.0"} |
Use this skill when Anserini is already available through either a resolved
fatjar or a source checkout. This skill covers command usage, not environment
setup or builds. If no usable fatjar or checkout is present, use
$install-anserini-fatjar or $install-anserini-dev-env first.
Do not run commands that trigger large prebuilt-index downloads unless the user explicitly asks for retrieval experiments or index downloads.
Examples below use the fatjar form:
java -cp "$ANSERINI_JAR" <main-class> <args>
From an Anserini source checkout, replace java -cp "$ANSERINI_JAR" with
bin/run.sh:
bin/run.sh <main-class> <args>
Keep commands pinned to the same jar or checkout unless the user asks to change versions.
For a fatjar workflow, confirm ANSERINI_JAR is set and points to an existing
jar:
test -n "$ANSERINI_JAR"
test -f "$ANSERINI_JAR"
For a checkout workflow, confirm bin/run.sh is available:
test -x bin/run.sh
A useful functional smoke test is:
java -cp "$ANSERINI_JAR" io.anserini.search.SearchCollection \
-threads 1 \
-index cacm \
-topics cacm \
-output run.cacm.bm25.txt \
-hits 1000 \
-bm25
This command may download the small CACM prebuilt index and topics on first use.
To inspect prebuilt indexes exposed by io.anserini.cli.PrebuiltIndexRegistry,
run:
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --list
--list emits JSON in current jars, so prefer --filter and jq instead of
grepping raw output. If jq is not available, ask the user whether it should be
installed before relying on jq examples.
msmarco-v1-passage is a common choice and should be called out when users ask
about available prebuilt indexes or MS MARCO passage retrieval setup.
Recommended lookup for the standard MS MARCO V1 passage inverted index:
java -cp "$ANSERINI_JAR" \
io.anserini.cli.PrebuiltIndexRegistry \
--list --filter '^msmarco-v1-passage$' \
| jq '.[0] | {name, type, description, filename}'
Useful variants:
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --help
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --list --filter 'msmarco.*passage' | jq '.[].name'
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --type flat --list
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --type inverted --list
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --type impact --list
java -cp "$ANSERINI_JAR" io.anserini.cli.PrebuiltIndexRegistry --type hnsw --list
To inspect topics exposed by io.anserini.cli.TopicsRegistry, run:
java -cp "$ANSERINI_JAR" io.anserini.cli.TopicsRegistry --list
--list emits JSON in current jars, so prefer --filter and jq to locate the
exact symbol. If jq is not available, ask the user whether it should be
installed before relying on jq examples.
To print all topics for a specific set, run:
java -cp "$ANSERINI_JAR" io.anserini.cli.TopicsRegistry --get <set>
For the standard MS MARCO V1 passage queries that pair with the
msmarco-v1-passage prebuilt index, use msmarco-v1-passage.dev.
Recommended lookup:
java -cp "$ANSERINI_JAR" \
io.anserini.cli.TopicsRegistry \
--list --filter '^msmarco(-v1)?-passage(\\.dev|-dev)$' \
| jq '.'
Use --list first to discover the exact set name, then --get to inspect its
contents.
Use io.anserini.cli.Search for ad hoc retrieval against either a local Lucene
index path or a prebuilt index name.
Example using the popular msmarco-v1-passage prebuilt index:
java -cp "$ANSERINI_JAR" io.anserini.cli.Search --index msmarco-v1-passage --query "what is a lobster roll" --hits 10
Interactive mode:
java -cp "$ANSERINI_JAR" io.anserini.cli.Search --index msmarco-v1-passage --interactive
Useful output variants:
java -cp "$ANSERINI_JAR" io.anserini.cli.Search --index msmarco-v1-passage --query "what is a lobster roll" --json
java -cp "$ANSERINI_JAR" io.anserini.cli.Search --index msmarco-v1-passage --query "what is a lobster roll" --trec
Use io.anserini.cli.GetDocument to fetch the stored raw document for a
collection docid from either a local Lucene index path or a prebuilt index name.
Example using the popular msmarco-v1-passage prebuilt index:
java -cp "$ANSERINI_JAR" io.anserini.cli.GetDocument --index msmarco-v1-passage --docid 2161721
Interactive mode reads docids from stdin:
java -cp "$ANSERINI_JAR" io.anserini.cli.GetDocument --index msmarco-v1-passage --interactive
This command prints the document's stored raw field. It reports an error when the docid is not found or when the index does not store raw documents.
Use io.anserini.search.SearchCollection for batch retrieval over a topic set.
It writes TREC run files and supports retrieval-model flags such as -bm25,
-rm3, -rocchio, -hits, and -threads. Use io.anserini.cli.Search instead
for single-query or interactive inspection.
Canonical CACM example using a prebuilt index and built-in topic symbol:
java -cp "$ANSERINI_JAR" io.anserini.search.SearchCollection \
-index cacm \
-topics cacm \
-output run.cacm.bm25.txt \
-hits 1000 \
-bm25
Evaluate the CACM run with Anserini's Java trec_eval wrapper:
java -cp "$ANSERINI_JAR" io.anserini.eval.TrecEval \
-c \
-m map \
-m P.30 \
cacm \
run.cacm.bm25.txt
Expected scores are MAP 0.3123 and P30 0.1942.
To verify them mechanically:
java -cp "$ANSERINI_JAR" io.anserini.eval.TrecEval \
-c \
-m map \
-m P.30 \
cacm \
run.cacm.bm25.txt | tee eval.cacm.bm25.txt
grep -q $'map\tall\t0.3123' eval.cacm.bm25.txt
grep -q $'P_30\tall\t0.1942' eval.cacm.bm25.txt
Use io.anserini.api.RestServer to expose search and document lookup over HTTP.
Fatjar invocation:
java -cp "$ANSERINI_JAR" io.anserini.api.RestServer --port 8081
Sample requests against the popular msmarco-v1-passage index:
curl "http://localhost:8081/v1/msmarco-v1-passage/search?query=what%20is%20anserini&hits=5"
curl "http://localhost:8081/v1/msmarco-v1-passage/doc/2161721"
This REST workflow is most useful when users want to query the same prebuilt
indexes exposed by the CLI, especially msmarco-v1-passage.
$install-anserini-fatjar to download a released Maven
Central fatjar, or $install-anserini-dev-env if the user needs a jar built
from the source checkout.bin/run.sh: use $install-anserini-dev-env from an Anserini checkout.ClassNotFoundException: confirm the jar or checkout was built from the
expected Anserini version.RestServer reports Port already in use for unused ports in a sandboxed
Codex session: local socket binding may be blocked by sandbox permissions.
Rerun the server command with escalation, and use an available high local port
if the documented port is occupied.