| name | multimodal-medical-imaging |
| description | Analyzes medical images (X-ray, MRI, CT) using multimodal LLMs to identify anomalies and generate reports. |
Multimodal Medical Imaging Analysis
The Multimodal Medical Imaging Analysis Skill leverages state-of-the-art Vision-Language Models (VLMs) like Gemini 1.5 Pro and GPT-4o to interpret medical imagery alongside clinical text.
When to Use This Skill
- When you need a preliminary screening of medical images.
- When correlating visual findings with textual clinical notes.
- To generate structured reports (DICOM-SR-like) from raw images.
Core Capabilities
- Anomaly Detection: Identify potential pathologies in X-rays, CTs, etc.
- Report Generation: Draft radiology reports in standard formats.
- VQA (Visual Question Answering): Answer specific questions about an image (e.g., "Is there a fracture in the left femur?").
Workflow
- Input: Provide an image file path (JPG, PNG) and a specific clinical question or "generate report" instruction.
- Analyze: The agent sends the image and prompt to the VLM.
- Output: Returns a JSON object with findings, confidence scores, and reasoning.
Example Usage
User: "Analyze this chest X-ray for pneumonia."
Agent Action:
python3 Skills/Clinical/Medical_Imaging/Multimodal_Analysis/multimodal_agent.py \
--image "/path/to/cxr.jpg" \
--prompt "Check for signs of pneumonia and consolidation."