| name | asta-documents |
| description | Local document metadata index for files used by Asta skills and tools. Use this skill when the user asks to store a document "in Asta" or retrieve "from Asta". Use it when the user references an "Asta document" or anything with an `asta://` URI. |
| allowed-tools | Bash(asta documents *) Read(*) TaskOutput Write(.asta/*) |
Asta Documents Management
This skill provides complete document management functionality for tracking research papers, documentation, and resources using the asta documents CLI.
What it does: Track document metadata (URLs, summaries, tags) in a local index. Think of it as a smart bookmark manager with powerful search capabilities.
Default Index Location: .asta/documents/index.yaml (relative to current working directory). The --index-path flag is optional when using the default location; it's only needed for custom index locations or remote indexes.
Automatic Indexing of .asta Documents
IMPORTANT: When other Asta skills (like literature research) write documents to .asta/ (in the current working directory), you should automatically index them in the document store. This ensures all Asta-generated documents are tracked and searchable.
Workflow:
- After any Asta skill writes files to
.asta/ (e.g., literature reports, paper collections)
- Scan the directory for new documents
- For each document, add it to the index with appropriate metadata:
- name: Extract from filename or document title
- url: Use
file:// URL pointing to the local path (use absolute paths for file:// URLs)
- summary: Extract from document content or use a brief description
- tags: Add relevant tags (e.g., "asta-generated", "literature-report", etc.)
- mime-type: Detect from file extension (e.g., "text/markdown", "application/pdf")
Example:
REPORT_PATH="$(pwd)/.asta/literature/report/2024-01-15-ml.md"
asta documents add "file://${REPORT_PATH}" \
--name="Literature Report: Machine Learning" \
--summary="Comprehensive report on machine learning papers from 2023-2024" \
--tags="asta-generated,literature-report,ml" \
--mime-type="text/markdown"
Installation
This skill requires the asta CLI:
PLUGIN_VERSION=0.17.1
if [ "$(asta --version 2>/dev/null | grep -oE '[0-9]+\.[0-9]+\.[0-9]+')" != "$PLUGIN_VERSION" ]; then
uv tool install --force git+https://github.com/allenai/asta-plugins.git@v$PLUGIN_VERSION
fi
Prerequisites: Python 3.11+ and uv package manager
Quick Command Reference
Add --json flag to any command for machine-readable output.
Uses .asta/documents/index.yaml by default (add --index-path <file> for custom locations).
asta documents list
asta documents list --tags="ai,research"
asta documents search --summary="query"
asta documents search --name="title words"
asta documents search --tags="ai,nlp"
asta documents search --extra=".year > 2020"
asta documents search --summary="transformers" --tags="ai"
asta documents search --summary="transformers" --name="BERT" --union
asta documents add <url> \
--name="Title" \
--summary="Description" \
--tags="tag1,tag2" \
--extra='{"author": "Smith et al", "year": 2024, "venue": "NeurIPS"}'
asta documents get <uuid>
asta documents update <uuid> \
--name="New Title" \
--tags="new,tags"
asta documents fetch <uuid> -o /tmp/document.pdf
asta documents add-tags <uuid> --tags="new,tags"
asta documents remove-tags <uuid> --tags="old,tags"
asta documents cache list
asta documents cache stats
asta documents cache clean --days 7
asta documents show
Always use the command line interface for all operations to ensure proper index management and caching.
Avoid direct read/write operations on the index file.
Working with Remote Indexes (asta:// URLs)
Asta documents can reference remote indexes using the asta:// URL scheme. This allows sharing document collections hosted on the web.
URL Format:
asta://{url-encoded-index-url}/{uuid}
Where:
{url-encoded-index-url} is the URL-encoded URL to the remote index.yaml file
{uuid} is the 10-character document identifier
Example:
# Actual index URL: https://example.com/research/index.yaml
# Asta URL: asta://https%3A%2F%2Fexample.com%2Fresearch%2Findex.yaml/6MNxGbWGRC
Workflow:
When you encounter an asta:// URL, follow these steps:
- Parse the URL to extract the encoded index URL and document UUID
- URL-decode the index URL
- Download the remote index to a local temporary file
- Access documents using the
--index-path parameter
Example:
ASTA_URL="asta://https%3A%2F%2Fexample.com%2Fresearch%2Findex.yaml/6MNxGbWGRC"
ENCODED_INDEX_URL=$(echo "$ASTA_URL" | sed 's|^asta://||' | sed 's|/[^/]*$||')
UUID=$(echo "$ASTA_URL" | sed 's|.*/||')
INDEX_URL=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$ENCODED_INDEX_URL'))")
curl -s -o /tmp/remote-index.yaml "$INDEX_URL"
asta documents get "$UUID" --index-path /tmp/remote-index.yaml
asta documents fetch "$UUID" --index-path /tmp/remote-index.yaml -o /tmp/document.pdf
Common Operations with Remote Indexes:
asta documents search --summary="query" --index-path "$TEMP_INDEX"
asta documents list --index-path "$TEMP_INDEX"
asta documents get "$UUID" --index-path "$TEMP_INDEX"
asta documents search --summary="transformers" --index-path "$TEMP_INDEX" --show-scores
asta documents fetch "$UUID" --index-path "$TEMP_INDEX" -o result.pdf
Important Notes:
- The
--index-path parameter works with all read commands (list, search, get, fetch)
- Remote indexes accessed this way are read-only (no add/update/remove operations)
- Downloaded indexes can be cached locally to avoid repeated downloads
- The index URL portion is URL-encoded and must be decoded before use
- The decoded URL supports: http://, https://, file://, s3://, gs://
- Always validate the index file exists and is valid YAML before using it
Complete Example Workflow:
ASTA_URL="asta://https%3A%2F%2Fai.example.org%2Fpapers%2Findex.yaml/AbC123XyZ9"
ENCODED_INDEX_URL=$(echo "$ASTA_URL" | sed 's|^asta://||' | sed 's|/[^/]*$||')
UUID=$(echo "$ASTA_URL" | sed 's|.*/||')
INDEX_URL=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$ENCODED_INDEX_URL'))")
TEMP_INDEX="/tmp/asta-index-$(date +%s).yaml"
curl -s -o "$TEMP_INDEX" "$INDEX_URL"
if [ ! -f "$TEMP_INDEX" ]; then
echo "Failed to download index from $INDEX_URL"
exit 1
fi
asta documents get "$UUID" --index-path "$TEMP_INDEX"
asta documents fetch "$UUID" --index-path "$TEMP_INDEX" -o /tmp/paper.pdf
Fetch Document Content
The index stores metadata only. The content of a document is retrievable via its URL. The fetch command retrieves the content and caches it locally for future use.
Fetch to file (with automatic caching):
asta documents fetch <uuid> -o /tmp/document.pdf
Supported URL Protocols
The system supports multiple protocols for document URLs:
Local and Web:
http:// and https:// - Web URLs (uses curl)
file:// - Local file system (uses curl)
Cloud Storage:
s3:// - Amazon S3 (requires AWS CLI)
gs:// - Google Cloud Storage (requires gsutil)
Cloud Storage Setup:
For S3:
brew install awscli
pip install awscli
aws configure
For GCS:
brew install --cask google-cloud-sdk
gcloud auth login
Examples:
asta documents add s3://my-bucket/papers/research.pdf \
--name="Research Paper" \
--summary="ML research findings" \
--tags="ml,research"
asta documents add gs://my-bucket/docs/spec.pdf \
--name="Technical Spec" \
--summary="System specifications" \
--tags="docs"
asta documents fetch <uuid> -o local-copy.pdf
Cache Management
List cached items:
asta documents cache list
Show cache statistics:
asta documents cache stats
Clean old cache entries:
asta documents cache clean --days 14
Clear entire cache:
asta documents cache clear
asta documents cache clear -y
Show specific item details:
asta documents cache info <hash>
Common Workflows
Workflow 1: Index Asta-Generated Documents
ls -la .asta/
REPORT_PATH="$(pwd)/.asta/literature/report/literature-report.md"
asta documents add "file://${REPORT_PATH}" \
--name="Literature Report: Transformers" \
--summary="Research findings on transformer architectures" \
--tags="asta-generated,literature-report,transformers" \
--mime-type="text/markdown"
asta documents search --tags="asta-generated"
Workflow 2: Add and Organize Papers
asta documents add https://arxiv.org/pdf/1706.03762.pdf \
--name="Attention Is All You Need" \
--summary="Seminal paper introducing Transformer architecture" \
--tags="ai,research,nlp,transformers" \
--mime-type="application/pdf" \
--extra='{"author": "Vaswani et al", "year": 2017, "venue": "NeurIPS"}'
asta documents search --tags="transformers"
Workflow 3: Search and Fetch
asta documents search --summary="transformer architecture" --show-scores
asta documents get 6MNxGbWGRC
asta documents fetch 6MNxGbWGRC -o /tmp/paper.pdf -q
Workflow 4: Search with JSON Processing
RESULTS=$(asta documents search --summary="query" --json)
UUID=$(echo "$RESULTS" | python3 -c "import sys,json; results=json.load(sys.stdin); print(results[0]['result']['uuid'] if results else '')")
asta documents fetch "$UUID" -o result.pdf
Workflow 5: Bulk Tag Management
DOCS=$(asta documents list --tags="old-tag" --json)
for uuid in $(echo "$DOCS" | python3 -c "import sys,json; print('\\n'.join([d['uuid'] for d in json.load(sys.stdin)]))"); do
asta documents remove-tags "$uuid" --tags="old-tag"
asta documents add-tags "$uuid" --tags="new-tag"
done
Workflow 6: Update Multiple Fields
asta documents get 6MNxGbWGRC
asta documents update 6MNxGbWGRC \
--name="Updated Title" \
--summary="Updated summary with more details" \
--tags="updated,revised,2025"
Workflow 7: Cache Maintenance
asta documents cache stats
asta documents cache list
asta documents cache clean --days 7
asta documents cache stats
Field-Specific Search
Asta uses different search strategies optimized for each document field. You can search single fields or combine multiple fields with intersection/union modes.
Single Field Search
--summary (Summary search):
- Uses best available method automatically:
- Hybrid (BM25 + semantic embeddings) → best quality
- BM25 (keyword relevance ranking) → fast indexed
- FTS5 (full-text search) → fallback
- Simple (substring matching) → always available
- Optimized for natural language queries
- Understands semantic meaning
- Produces relevance scores for ranking
- Example:
asta documents search --summary="papers about transformers"
--name (Name search):
- Simple case-insensitive word matching
- Splits query into words, matches any word in name
- Score = (matched words / total query words)
- Fast, no indexing needed
- Produces match scores for ranking
- Example:
asta documents search --name="Attention"
--tags (Tag search):
- Comma-separated tag matching
- Case-insensitive
- Acts as a filter (no meaningful relevance scores)
- Finds documents with any matching tags
- Example:
asta documents search --tags="ai,nlp"
--extra (Extra metadata search):
- JSONPath-like query syntax
- Supported operators:
>, >=, <, <=, ==, contains
- Numeric and string comparisons
- Acts as a filter (no meaningful relevance scores)
- Examples:
asta documents search --extra=".year > 2020"
asta documents search --extra=".author contains Smith"
asta documents search --extra=".venue == NeurIPS"
Multi-Field Search
Combine multiple field queries to create powerful filtered searches:
Intersection mode (default):
- Returns documents matching ALL specified field queries
- Example:
asta documents search --summary="transformers" --tags="ai"
- Only returns documents where summary contains "transformers" AND tags include "ai"
Union mode (--union flag):
- Returns documents matching ANY specified field query
- Example:
asta documents search --summary="transformers" --name="BERT" --union
- Returns documents where summary contains "transformers" OR name contains "BERT"
Hierarchical Scoring:
Results are sorted using a priority hierarchy where tags/extra act as filters:
-
Summary score (highest priority) - if --summary present
- Uses semantic/hybrid search relevance
- Best for natural language queries
-
Name score (medium priority) - if --name present
- Uses word-matching score
- Used when no summary query
-
Created timestamp (lowest priority) - if only --tags or --extra
- Sorts by creation time (newest first)
- Only used when no summary/name queries
Examples:
asta documents search --summary="machine learning" --tags="ai"
asta documents search --name="Python" --tags="programming"
asta documents search --tags="research"
asta documents search --summary="transformers" --name="Attention" --extra=".year > 2015"
Output Formats
Human-readable (default):
- Formatted tables and lists
- Color-coded (if terminal supports)
- Progress messages
JSON (--json flag):
- Machine-readable
- All fields included
- For scripting and integration
Verbose (-v flag for list):
- Shows all metadata fields
- Includes extra metadata
- Full URIs and timestamps
Best Practices
- Auto-index Asta documents: Always index documents written to
.asta/ by other skills (uses .asta/documents/index.yaml by default)
- Use descriptive summaries: They're indexed for search
- Tag consistently: Establish a tagging scheme (e.g., "asta-generated" for auto-indexed docs)
- Use extra metadata: Store author, year, venue for papers
- Let fetch handle caching: Don't manually check cache
- Use JSON for scripting: More reliable than parsing text
- Use quiet mode in scripts:
-q suppresses progress messages
- Use absolute paths for file:// URLs: Convert relative paths with
$(pwd)/ to ensure correct resolution
Troubleshooting
"asta-documents: command not found"
- The command should auto-install on first use
- Verify installation:
uv tool list | grep asta
- Add to PATH:
export PATH="$HOME/.local/bin:$PATH"
- Manual install:
uv tool install git+https://github.com/allenai/asta-resource-repo.git
"Document not found"
- Verify UUID:
asta documents list --json | grep <partial-uuid>
- Check namespace: UUIDs are namespace-specific
- Ensure there is an index file at
.asta/documents/index.yaml
"Fetch failed"
- Check URL is accessible:
curl -I <url>
- Try force refresh:
--force
- Check network connection
"Search returns no results"
- Try simpler query terms
- Search by name or tags for exact matching:
asta documents search --name="keyword"
asta documents search --tags="tag"
- Check if documents exist:
asta documents list
- Try union mode if using multiple fields:
--union
"Cache is large"
- Check size:
asta documents cache stats
- Clean old entries:
asta documents cache clean --days 7
- Clear if needed:
asta documents cache clear -y
Updating
Update the asta-documents CLI:
uv tool install --reinstall git+https://github.com/allenai/asta-resource-repo.git
Links