Discover, compare, and run AI models using Replicate's API. Use this skill whenever the task involves AI-generated media — images (text-to-image, style transfer, editing, upscaling, background removal), video, audio, or any other ML model output. Requires a REPLICATE_API_TOKEN — ask the user for it if not already set.
Use for distinctive production-grade websites, landing pages, and interactive web experiences with strong design and QA discipline.
Transcribe any audio or video file to text using Whisper (Groq or OpenAI). Use when the agent receives voice messages, audio files, video messages, or any media with speech. Triggers on: 'transcribe', 'what does this say', 'voice message', 'speech to text', 'audio', any file path ending in .ogg .mp3 .mp4 .wav .webm .m4a .flac .oga .oga
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Control desktop applications on the user's machine using agent-click. Use when you need to click buttons, type text, read screens, scroll, drag files, move/resize windows, open/quit apps, interact with UI elements, or automate desktop workflows. Triggers: 'click on', 'open app', 'type into', 'scroll down', 'drag file', 'take screenshot', 'read the screen', 'interact with UI', 'desktop automation', 'computer use', 'agent-click'. Built on agent-click (https://github.com/kortix-ai/agent-click, https://www.agent-click.dev/) — an open-source computer use CLI by Kortix. Right now only works on macOS.
Create, manage, validate, preview, and export HTML presentation slides (1920x1080). Load this skill when you need to build a slide deck, export to PDF/PPTX, or preview slides in a browser.
Interact with the user's local machine via Agent Tunnel. Use when you need to read/write files on their computer, run shell commands locally, take screenshots, click, type, control the mouse/keyboard, manage windows and apps, read/write the clipboard, or use the accessibility tree to interact with UI elements. Triggers: 'access my machine', 'local file', 'on my computer', 'my desktop', 'take a screenshot of my screen', 'run this on my machine', 'click on', 'open an app', 'accessibility tree'.
Deep research agent skill. Use when the user needs thorough, scientific, truth-seeking research on any topic -- investigating claims, finding primary sources, synthesizing evidence, producing cited reports. Triggers on: 'research this', 'investigate', 'deep dive', 'find sources', 'what does the evidence say', 'literature review', 'fact check', 'analyze the research on', any request requiring multi-source investigation with citations.