一键导入
一键导入
| name | asta-documents |
| description | Local document metadata index for scientific documents |
Use this skill when the user asks to store a document "in Asta" or retrieve "from Asta". Use it when the
user references an "Asta document" or anything with an asta:// URI.
This skill provides complete document management functionality for tracking research papers, documentation, and resources using the asta-documents CLI.
What it does: Track document metadata (URLs, summaries, tags) in a local index. Think of it as a smart bookmark manager with powerful search capabilities.
# Install globally using uv
uv tool install git+https://github.com/allenai/asta-resource-repo.git
Prerequisites: Python 3.10+ and uv package manager
Verify installation with asta-documents --help
Add --json flag to any command for machine-readable output.
# List documents
asta-documents list
asta-documents list --tags="ai,research"
# Search documents (by field)
asta-documents search --summary="query"
asta-documents search --name="title words"
asta-documents search --tags="ai,nlp"
asta-documents search --extra=".year > 2020"
# Multi-field search (intersection - matches ALL)
asta-documents search --summary="transformers" --tags="ai"
# Multi-field search (union - matches ANY)
asta-documents search --summary="transformers" --name="BERT" --union
# Add document
asta-documents add <url> --name="Title" --summary="Description" --tags="tag1,tag2" --extra='{"author": "Smith et al", "year": 2024, "venue": "NeurIPS"}'
# Get document metadata
asta-documents get <uuid>
# Update document
asta-documents update <uuid> --name="New Title" --tags="new,tags"
# Fetch document content
asta-documents fetch <uuid> -o /tmp/document.pdf
# Manage tags
asta-documents add-tags <uuid> --tags="new,tags"
asta-documents remove-tags <uuid> --tags="old,tags"
# Cache management
asta-documents cache list
asta-documents cache stats
asta-documents cache clean --days 7
# Summary information (document counts)
asta-documents show
Always use the command line interface for all operations to ensure proper index management and caching. Avoid direct read/write operations on the index file.
Asta documents can reference remote indexes using the asta:// URL scheme. This allows sharing document collections hosted on the web.
URL Format:
asta://{url-encoded-index-url}/{uuid}
Where:
{url-encoded-index-url} is the URL-encoded URL to the remote index.yaml file{uuid} is the 10-character document identifierExample:
# Actual index URL: https://example.com/research/index.yaml
# Asta URL: asta://https%3A%2F%2Fexample.com%2Fresearch%2Findex.yaml/6MNxGbWGRC
Workflow:
When you encounter an asta:// URL, follow these steps:
--root parameterExample:
# Given an asta:// URL
ASTA_URL="asta://https%3A%2F%2Fexample.com%2Fresearch%2Findex.yaml/6MNxGbWGRC"
# 1. Parse the URL components (extract encoded index URL and UUID)
ENCODED_INDEX_URL=$(echo "$ASTA_URL" | sed 's|^asta://||' | sed 's|/[^/]*$||')
UUID=$(echo "$ASTA_URL" | sed 's|.*/||')
# 2. URL-decode the index URL
INDEX_URL=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$ENCODED_INDEX_URL'))")
# 3. Download the remote index
mkdir -p /tmp/asta-remote
curl -s -o /tmp/asta-remote/index.yaml "$INDEX_URL"
# 4. Get document metadata using --root
asta-documents get "$UUID" --root /tmp/asta-remote
# 5. Fetch document content
asta-documents fetch "$UUID" --root /tmp/asta-remote -o /tmp/document.pdf
Common Operations with Remote Indexes:
# After downloading and decoding the index URL (see examples above)
# Assume TEMP_INDEX points to the downloaded index file
# Search remote index
asta-documents search --summary="query" --root "$TEMP_ROOT"
# List all documents in remote index
asta-documents list --root "$TEMP_ROOT"
# Get metadata for specific document
asta-documents get "$UUID" --root "$TEMP_ROOT"
# Search and fetch from remote index
asta-documents search --summary="transformers" --root "$TEMP_ROOT" --show-scores
asta-documents fetch "$UUID" --root "$TEMP_ROOT" -o result.pdf
Important Notes:
--root parameter works with all read commands (list, search, get, fetch)index.yaml fileComplete Example Workflow:
# User provides: asta://https%3A%2F%2Fai.example.org%2Fpapers%2Findex.yaml/AbC123XyZ9
# Step 1: Extract components
ASTA_URL="asta://https%3A%2F%2Fai.example.org%2Fpapers%2Findex.yaml/AbC123XyZ9"
ENCODED_INDEX_URL=$(echo "$ASTA_URL" | sed 's|^asta://||' | sed 's|/[^/]*$||')
UUID=$(echo "$ASTA_URL" | sed 's|.*/||')
# Step 2: URL-decode the index URL
INDEX_URL=$(python3 -c "import urllib.parse; print(urllib.parse.unquote('$ENCODED_INDEX_URL'))")
# Result: https://ai.example.org/papers/index.yaml
# Step 3: Download index to temp location
TEMP_ROOT="/tmp/asta-remote-$(date +%s)"
mkdir -p "$TEMP_ROOT"
curl -s -o "$TEMP_ROOT/index.yaml" "$INDEX_URL"
# Step 4: Verify download succeeded
if [ ! -f "$TEMP_ROOT/index.yaml" ]; then
echo "Failed to download index from $INDEX_URL"
exit 1
fi
# Step 5: Access the document
asta-documents get "$UUID" --root "$TEMP_ROOT"
asta-documents fetch "$UUID" --root "$TEMP_ROOT" -o /tmp/paper.pdf
# Step 6: Read the content
# Read(/tmp/paper.pdf)
The index stores metadata only. The content of a document is retrievable via its URL. The fetch command retrieves the content and caches it locally for future use.
Fetch to file (with automatic caching):
asta-documents fetch <uuid> -o /tmp/document.pdf
The system supports multiple protocols for document URLs:
Local and Web:
http:// and https:// - Web URLs (uses curl)file:// - Local file system (uses curl)Cloud Storage:
s3:// - Amazon S3 (requires AWS CLI)gs:// - Google Cloud Storage (requires gsutil)Cloud Storage Setup:
For S3:
# Install AWS CLI
brew install awscli # macOS
pip install awscli # or via pip
# Configure credentials
aws configure
# Or use: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_PROFILE
For GCS:
# Install Google Cloud SDK
brew install --cask google-cloud-sdk # macOS
# Authenticate
gcloud auth login
# Or use: GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
Examples:
# Add document from S3
asta-documents add s3://my-bucket/papers/research.pdf \
--name="Research Paper" \
--summary="ML research findings" \
--tags="ml,research"
# Add document from GCS
asta-documents add gs://my-bucket/docs/spec.pdf \
--name="Technical Spec" \
--summary="System specifications" \
--tags="docs"
# Fetch works the same for all protocols
asta-documents fetch <uuid> -o local-copy.pdf
List cached items:
asta-documents cache list
Show cache statistics:
asta-documents cache stats
Clean old cache entries:
# Remove items older than N days
asta-documents cache clean --days 14
Clear entire cache:
asta-documents cache clear
asta-documents cache clear -y # Skip confirmation
Show specific item details:
asta-documents cache info <hash>
# Add research paper
asta-documents add https://arxiv.org/pdf/1706.03762.pdf \
--name="Attention Is All You Need" \
--summary="Seminal paper introducing Transformer architecture" \
--tags="ai,research,nlp,transformers" \
--mime-type="application/pdf" \
--extra='{"author": "Vaswani et al", "year": 2017, "venue": "NeurIPS"}'
# Search papers by tag
asta-documents search --tags="transformers"
# Search for relevant documents
asta-documents search --summary="transformer architecture" --show-scores
# Get metadata for top result (using UUID from search results)
asta-documents get 6MNxGbWGRC
# Fetch content
asta-documents fetch 6MNxGbWGRC -o /tmp/paper.pdf -q
# Read with PDF support
# Read(/tmp/paper.pdf)
# Search and extract UUIDs
RESULTS=$(asta-documents search --summary="query" --json)
# Get first UUID (example with Python)
UUID=$(echo "$RESULTS" | python3 -c "import sys,json; results=json.load(sys.stdin); print(results[0]['result']['uuid'] if results else '')")
# Fetch that document
asta-documents fetch "$UUID" -o result.pdf
# List documents with old tag
DOCS=$(asta-documents list --tags="old-tag" --json)
# For each, remove old tag and add new
for uuid in $(echo "$DOCS" | python3 -c "import sys,json; print('\\n'.join([d['uuid'] for d in json.load(sys.stdin)]))"); do
asta-documents remove-tags "$uuid" --tags="old-tag"
asta-documents add-tags "$uuid" --tags="new-tag"
done
# Get current metadata (using UUID)
asta-documents get 6MNxGbWGRC
# Update multiple fields
asta-documents update 6MNxGbWGRC \
--name="Updated Title" \
--summary="Updated summary with more details" \
--tags="updated,revised,2025"
# Check cache usage
asta-documents cache stats
# List what's cached
asta-documents cache list
# Remove old entries if cache is large
asta-documents cache clean --days 7
# Verify cache reduction
asta-documents cache stats
Asta uses different search strategies optimized for each document field. You can search single fields or combine multiple fields with intersection/union modes.
--summary (Summary search):
asta-documents search --summary="papers about transformers"--name (Name search):
asta-documents search --name="Attention"--tags (Tag search):
asta-documents search --tags="ai,nlp"--extra (Extra metadata search):
>, >=, <, <=, ==, containsasta-documents search --extra=".year > 2020"asta-documents search --extra=".author contains Smith"asta-documents search --extra=".venue == NeurIPS"Combine multiple field queries to create powerful filtered searches:
Intersection mode (default):
asta-documents search --summary="transformers" --tags="ai"Union mode (--union flag):
asta-documents search --summary="transformers" --name="BERT" --unionHierarchical Scoring:
Results are sorted using a priority hierarchy where tags/extra act as filters:
Summary score (highest priority) - if --summary present
Name score (medium priority) - if --name present
Created timestamp (lowest priority) - if only --tags or --extra
Examples:
# Summary + tags: Sorted by summary relevance (tags filter)
asta-documents search --summary="machine learning" --tags="ai"
# Name + tags: Sorted by name word-match (tags filter)
asta-documents search --name="Python" --tags="programming"
# Tags only: Sorted by creation timestamp
asta-documents search --tags="research"
# Three fields: Summary ranks, name and extra filter
asta-documents search --summary="transformers" --name="Attention" --extra=".year > 2015"
Human-readable (default):
JSON (--json flag):
Verbose (-v flag for list):
-q suppresses progress messages"asta-documents: command not found"
uv tool list | grep astaexport PATH="$HOME/.local/bin:$PATH"uv tool install --reinstall git+https://github.com/allenai/asta-resource-repo.git"Document not found"
asta-documents list --json | grep <partial-uri>.asta/documents/index.yaml"Fetch failed"
curl -I <url>--force"Search returns no results"
asta-documents search --name="keyword"asta-documents search --tags="tag"asta-documents list--union"Cache is large"
asta-documents cache statsasta-documents cache clean --days 7asta-documents cache clear -yUpdate the CLI tool:
uv tool install --reinstall git+https://github.com/allenai/asta-resource-repo.git
Update the skill:
curl -o ~/.claude/skills/asta-documents.md https://raw.githubusercontent.com/allenai/asta-resource-repo/main/skills/asta-documents.md