| name | ai-tools |
| description | Command-line tools that delegate analysis tasks to AI models. Includes image description, screenshot comparison, smart cropping around people, token counting, essay generation from text, boolean condition evaluation, context gathering, and Android UI interaction via popper. Use for describing images, comparing UI states, cropping photos around faces, counting tokens, generating reports, evaluating conditions, gathering context for analysis, automating Android apps, testing Wear OS, or any task requiring AI inference. Triggers: ai analysis, describe image, compare screenshots, smart crop, crop around people, face crop, count tokens, token count, generate essay, evaluate condition, alt text, image description, UI comparison, visual diff, satisfies condition, boolean evaluation, gemini, context, gather context, research topic, android ui, adb, uiautomator2, popper, automate app, test wear os.
|
| compatibility | Requires curl, jq, and uv. Image tools also need base64 and magick (ImageMagick). Needs GEMINI_API_KEY environment variable and network access to generativelanguage.googleapis.com.
|
AI Tools
Important: Use Scripts First
ALWAYS prefer the scripts in scripts/ over raw curl API calls. Scripts
are located in the scripts/ subdirectory of this skill's folder. They provide
features that raw commands do not:
- Proper image encoding (WebP conversion, alpha removal)
- Appropriate model selection for each task
- Structured output handling (boolean responses via exit codes)
- Meaningful exit codes for shell integration
When to read the script source: If a script doesn't do exactly what you
need, or fails due to missing dependencies, read the script source. The scripts
encode Gemini API best practices (image ordering, structured output schemas,
model selection) that may not be obvious—use them as reference when building
similar functionality.
Quick Start
Environment: Set GEMINI_API_KEY before running any commands.
Dependencies: curl, jq, uv (all tools); base64, magick (image
tools only)
scripts/context gemini-api | scripts/emerson "Explain the key features"
scripts/screenshot-describe screenshot.png
scripts/screenshot-compare before.png after.png
scripts/photo-smart-crop photo.jpg cropped.jpg
scripts/photo-has-people photo.jpg
scripts/emerson "Summarize the key changes" < documentation.md
echo "Hello world" | scripts/satisfies "is a greeting"
cat document.md | scripts/token-count
scripts/popper "start an exercise"
Script Overview
context
Gathers authoritative, up-to-date context for deep research on various technical
topics (e.g., gemini-api, mcp, home-assistant). Run with --list to see
all available topics. This script should be your first tool for gathering
background knowledge or the latest documentation for an unfamiliar domain.
Warning: Output can be very large. Do not read output directly into your
conversation history. Pipe to emerson for analysis, or redirect to a file to
search/read locally.
scripts/context TOPIC
Options: --list (list available topics)
Exit codes: 0 success, 1 error, 127 missing dependency
Examples:
scripts/context --list
scripts/context gemini-api > gemini-context.xml
scripts/context gemini-cli | scripts/emerson "How do commands work?"
screenshot-describe
Generate concise alt-text for an image. Optimized for UI captures.
scripts/screenshot-describe IMAGE [PROMPT]
Exit codes: 0 success, 1 error, 127 missing dependency
screenshot-compare
Compare two images for visual differences. Identifies layout shifts, color
changes, padding, and text updates.
scripts/screenshot-compare IMAGE1 IMAGE2 [PROMPT]
Exit codes: 0 differences found, 1 error, 2 images identical, 127 missing
dependency
photo-smart-crop
Smart crop images around detected people with a specified aspect ratio.
Prioritizes faces, expands for headroom, enforces aspect ratio.
scripts/photo-smart-crop [--ratio W:H] INPUT OUTPUT
Options: --ratio W:H (default 5:3)
Exit codes: 0 success, 1 error (no people found, API error), 2 rate limited,
127 missing dependency
Examples:
scripts/photo-smart-crop family.jpg family-cropped.jpg
scripts/photo-smart-crop --ratio 16:9 portrait.jpg thumbnail.jpg
scripts/photo-smart-crop --ratio 1:1 headshot.png avatar.png
photo-has-people
Detect if people feature prominently in a photo. Returns boolean via exit code.
scripts/photo-has-people IMAGE
Options: -q, --quiet (suppress output)
Exit codes: 0 true (has people), 1 false (no people), 127 missing dependency
Examples:
if scripts/photo-has-people photo.jpg; then
echo "Found people"
fi
emerson
Generate essay-length (~3000 words) analysis from text input. Produces
authoritative, footnoted Markdown. Can be combined with context to provide
rich background material.
scripts/emerson "PROMPT" < input.txt
Exit codes: 0 success, 1 error, 127 missing dependency
pascal
Ask a question and get a short, paragraph-style response (wrapped to 80
columns). Optimized for quick answers.
scripts/pascal "QUESTION"
Input: Optional context via stdin
Exit codes: 0 success, 1 error, 127 missing dependency
Examples:
scripts/pascal "What is the capital of Peru?"
cat article.md | scripts/pascal "Summarize this article"
scripts/pascal "Explain this code" < script.sh
satisfies
Evaluate whether input text satisfies a condition. Returns boolean via exit
code.
echo "text" | scripts/satisfies [-v|--verbose] "CONDITION"
Options: -v, --verbose (output "true" or "false" to stderr)
Exit codes: 0 true (satisfies), 1 false (does not satisfy), 127 missing
dependency
Examples:
cat file.txt | scripts/satisfies "mentions Elvis" && echo "Found it"
cat response.json | scripts/satisfies "is valid JSON with an 'id' field"
if cat log.txt | scripts/satisfies "contains error messages"; then
echo "Errors detected"
fi
token-count
Count tokens in text using the Gemini API.
cat file.txt | scripts/token-count
Exit codes: 0 success, 1 error, 127 missing dependency
popper
Interact with Android UIs using an AI agent powered by uiautomator2 and
Gemini. This allows semantic control of the device by providing a goal in
natural language.
scripts/popper "GOAL"
Options: --app-only (restrict the agent to the current application)
Environment: ANDROID_SERIAL (optional, target specific device)
Exit codes: 0 success (task completed), 1 error (task failed)
Examples:
scripts/popper "accept all permissions"
scripts/popper --app-only "start a running exercise"
env ANDROID_SERIAL=12345 scripts/popper "open settings"
Image Encoding Notes
- Images converted to lossless WebP for consistent encoding
- Alpha channel removed (
-alpha off) so transparency-only differences are
ignored
- Base64: use
-w 0 (Linux) or -b 0 (macOS) for single-line output
- Single-image prompts: image before text (Gemini best practice)
- Multi-image comparison: text before images (Gemini best practice)
Safety Notes
- Scripts require network access to the Gemini API
GEMINI_API_KEY must be set in the environment
- API calls may incur usage costs
- Large images increase request size and latency
- Scripts do not store or log input data
Reference Material