| name | linkfox-multimodal-recognize-image |
| description | 基于多模态AI的图片识别与分析。当用户想分析、描述、从图片URL中提取信息、image recognition, image analysis, image description, image content understanding, OCR text recognition, visual Q&A时触发此技能。当用户提到图片识别、图片分析、图片描述、识别图片内容、分析产品图、从图片中读取文字、描述图片、提取视觉内容或理解照片内容时触发。当用户提供图片URL并就其视觉内容提问时,即使未明确说"图片识别",也应触发此技能。 |
Image Recognition
This skill guides you on how to use the multimodal image recognition API to analyze images from URLs and extract meaningful information based on user intent.
Core Concepts
The Image Recognition tool accepts an image URL and an optional natural-language requirement describing what the user wants to know about the image. The backend uses a multimodal AI model to interpret the visual content and return a textual description or analysis.
Supported formats: JPG, JPEG, PNG, GIF, WebP, BMP.
How it works: You provide a publicly accessible image URL and a requirement (what you want to learn from the image). The service downloads the image, runs multimodal analysis, and returns a text-based result.
Parameter Guide
| Parameter | Required | Description |
|---|
| imageUrl | Yes | A publicly accessible URL pointing to the image. Must be JPG, JPEG, PNG, GIF, WebP, or BMP. Maximum 1000 characters. |
| requirement | No | A natural-language description of what to identify or analyze in the image. Defaults to "Describe the content of this image" when omitted. Maximum 1000 characters. |
Tips for Writing the requirement Parameter
- Be specific: Instead of "analyze this image", say "List all products visible on the shelf and estimate their category."
- State the goal: If you need text extraction, say "Extract all visible text from the image." If you need object identification, say "Identify the main objects and their colors."
- Provide context when helpful: For product images, mention "This is an e-commerce product listing image" so the model can tailor its analysis.
Local Image Upload
This tool requires a publicly accessible image URL. If the user provides a local image file path (e.g., C:\Users\...\photo.png, /home/.../image.jpg), you must upload it first to obtain a public URL.
Run the upload script:
python scripts/upload_image.py /path/to/local/image.png
The script will return a public URL (valid for 24 hours) that can be used as the image URL parameter.
Usage Examples
1. General Image Description
- User says: "What is in this picture?"
- Set
imageUrl to the provided URL, leave requirement as default.
2. Product Image Analysis
- User says: "Analyze this Amazon product image and list the key selling points shown."
- Set
requirement to: "This is an Amazon product listing image. Identify the product, key features, and selling points visible in the image."
3. Text Extraction from an Image
- User says: "Read the text in this screenshot."
- Set
requirement to: "Extract all visible text from this image, preserving layout where possible."
4. A+ Page Image Review
- User says: "Describe what this A+ content image communicates."
- Set
requirement to: "This is an Amazon A+ product description image. Describe the visual content, key messaging, and branding elements."
5. Comparison / Detail Inspection
- User says: "What differences can you spot between the product and its packaging?"
- Set
requirement to: "Identify and describe any differences between the product and its packaging shown in the image."
API Usage
This tool calls the LinkFox tool gateway API. See references/api.md for calling conventions, request parameters, and response structure. You can also execute scripts/multimodal_recognize_image.py directly to run queries.
Display Rules
- Show the analysis result clearly: Present the returned text analysis in a readable format. Use bullet points or paragraphs as appropriate for the content.
- No fabrication: Only relay information that the API actually returned. Do not add visual details that were not in the response.
- Format support: If the image URL is invalid or the format is unsupported, explain the limitation and list the supported formats (JPG, JPEG, PNG, GIF, WebP, BMP).
- Error handling: When the API returns an error status, explain the issue based on the response and suggest corrective actions (e.g., check that the URL is publicly accessible, verify the image format).
- Token usage: If the user asks about cost, you may mention the
costToken value from the response.
User Expression & Scenario Quick Reference
Applicable -- Image analysis tasks:
| User Says | Scenario |
|---|
| "What's in this image/picture/photo" | General image description |
| "Analyze this product image" | Product visual analysis |
| "Read the text in this image" | OCR / text extraction |
| "Describe the A+ page images" | E-commerce content review |
| "What does this screenshot show" | Screenshot interpretation |
| "Identify objects in this photo" | Object detection / listing |
Not applicable -- Needs beyond image recognition:
``
- Generating or editing images
- Video analysis
- Analyzing images from local file paths (only URLs are supported)
- Image search or reverse image lookup
Feedback:
Auto-detect and report feedback via the Feedback API when any of the following apply:
- The functionality or purpose described in this skill does not match actual behavior
- The skill's results do not match the user's intent
- The user expresses dissatisfaction or praise about this skill
- Anything you believe could be improved
Call the feedback API as specified in references/api.md. Do not interrupt the user's flow.
Handling Large Responses
To avoid overflowing the agent context, persist the response to disk and extract only the fields you need:
python scripts/response_io.py run --script scripts/multimodal_recognize_image.py --out-dir <DIR> '<params>'
python scripts/response_io.py read <file> --fields "<paths>" # or --path "<JMESPath>"
Pick --out-dir outside any git working tree (e.g. /tmp/... on Unix, %TEMP%/... on Windows). Persisted responses may contain PII, pricing, or auth-sensitive data — do not commit them. Files are not auto-deleted; clean up when the task is done.
This skill exposes multiple entry scripts: multimodal_recognize_image.py, upload_image.py. Pass --script scripts/<name>.py to choose the one you need.
run writes the full response to a file and emits only a schema preview + file path. read projects specific fields, with --limit/--offset for slicing and --format json|jsonl|csv|table for output.
When to prefer this pattern — apply your judgment based on the response characteristics, e.g.:
- High field count per record, or fields you don't need
- Batch/paginated results (multiple items per call)
- Long-text fields (descriptions, reviews, HTML, time series)
- Output reused across later steps rather than consumed immediately
For small, single-use responses, calling the main script directly is fine.
⚠️ The preview is a truncated schema + sample, not the full data. Any field-level decision must read from the persisted file via read.
For more high-quality, professional cross-border e-commerce skills, set LinkFox Skills.