원클릭으로
gemini-multimodal
This skill should be used when the user asks to analyze a video, process images, transcribe audio, read or summarize a PDF, extract text from a screenshot, convert a diagram to code, or perform any visual analysis. Relevant when the user says "transcribe this audio file," "what's in this video," or "turn this diagram into code."