Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

$pwd:

read-media

Name: Read Media
Author: massgen

// Analyze media files (images, video, audio) using AI vision and audio models with a critical-first lens. Use for output-first verification — render your deliverable, then read_media to see what it actually looks like. Distinguishes fundamental issues from surface-level fixes.

Executar no Manus

$ git log --oneline --stat

stars:1

forks:0

updated:22 de março de 2026 às 00:49

SKILL.md

readonly

name	read-media
description	Analyze media files (images, video, audio) using AI vision and audio models with a critical-first lens. Use for output-first verification — render your deliverable, then read_media to see what it actually looks like. Distinguishes fundamental issues from surface-level fixes.

Read Media — Critical-First Analysis

Analyze media files using read_media from the media-tools MCP server. This is the primary tool for output-first verification — experiencing your work as a user would, not just reading code.

Critical-First Philosophy

The vision model is instructed to be a critical reviewer by default:

"You are reviewing this work critically. Be honest about what you see — name specific problems, explain their impact, and distinguish between issues that need a fundamental rethink vs issues that are easy fixes. Don't sugarcoat."

When no prompt is provided, the default is:

"Analyze this {media_type}. What works, what's broken, and what would a demanding user complain about? Be specific and critical."

Responses include a foundation_sound assessment that flags when the approach itself needs rethinking vs. when incremental fixes are sufficient. When foundation_sound: false, treat this as a signal to consider TRANSFORMATIVE changes rather than more polish.

You don't need to add "be critical" to your prompts — it's the default. Focus your prompt on what to look at.

Quick Start

# Analyze a screenshot
read_media(prompt="What flaws or layout issues do you see?", file_paths=["screenshot.png"])

# Compare before and after
read_media(prompt="Compare these two versions — what improved and what regressed?",
           file_paths=["v1.png", "v2.png"])

# Check a video
read_media(prompt="Is the animation smooth? Any glitches?", file_paths=["recording.mp4"])

# Verify audio output
read_media(prompt="Is the speech clear and natural?", file_paths=["output.mp3"])

Parameters

Parameter	Description	Example
`prompt`	What to analyze — focus on what to look at	`"Evaluate the visual hierarchy and spacing"`
`file_paths`	List of file paths to analyze together	`["before.png", "after.png"]`
`continue_from`	conversation_id from a previous call for follow-ups	`"conv_abc123"`
`max_concurrent`	Max parallel analyses (default 4)	`4`

Before/After Comparison

All files are sent to the model in a SINGLE call — the model sees them side-by-side. This is critical for comparison: if you send files in separate calls, the model cannot compare them.

# CORRECT: both files in one call — model sees both
read_media(prompt="Compare before and after — what improved, what regressed?",
           file_paths=["v1.png", "v2.png"])

# WRONG: separate calls — model has no memory of the first
read_media(prompt="Analyze this", file_paths=["v1.png"])
read_media(prompt="Now compare to previous", file_paths=["v2.png"])  # can't compare!

For follow-up questions on the same media, use continue_from:

# First call
result = read_media(prompt="Evaluate the design", file_paths=["page.png"])
# result contains conversation_id

# Follow-up (model remembers the image)
read_media(prompt="Now focus on the typography — is the hierarchy clear?",
           continue_from="conv_abc123")

Supported Formats and Backends

Images

Formats: png, jpg, jpeg, gif, webp, bmp

Backend	Default Model	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	`GOOGLE_API_KEY` / `GEMINI_API_KEY`
OpenAI (priority 2)	`gpt-5.4`	`OPENAI_API_KEY`
Claude (priority 3)	`claude-sonnet-4-5-20250929`	`ANTHROPIC_API_KEY`
Grok (priority 4)	`grok-4.20-0309-reasoning`	`XAI_API_KEY`
OpenRouter (priority 5)	`openai/gpt-5.2`	`OPENROUTER_API_KEY`

Video

Formats: mp4, mov, avi, mkv, webm, gif, flv, wmv

Backend	Default Model	Method	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	Native video (no frame extraction)	`GOOGLE_API_KEY`
OpenAI (priority 2)	`gpt-5.4`	Frame extraction (8 frames default)	`OPENAI_API_KEY`
Claude (priority 3)	`claude-sonnet-4-5-20250929`	Frame extraction	`ANTHROPIC_API_KEY`

Audio

Formats: mp3, wav, m4a, ogg, flac, aac

Backend	Default Model	Mode	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	Native audio understanding	`GOOGLE_API_KEY`
OpenAI (priority 2)	`gpt-4o-audio-preview`	Rich analysis (tone, emotion, pacing)	`OPENAI_API_KEY`

Backend auto-selects based on available API keys (first available in priority order).

Verification by Deliverable Type

Match evidence to how the output is experienced. Classify by what happens when a user opens it, not by file extension:

What does it do?	Shallow (incomplete)	Full check (required)
Stays still (image, PDF, document)	File generates	Render and view every page/section with read_media
Moves (animation, video)	Single frame	Record video, review full motion sequence
Responds to input (website, app)	Screenshot looks good	Use it — click buttons, navigate, test states; screenshot each state
Produces output (script, API)	Runs without error	Test with varied inputs, capture output
Makes sound (audio, TTS)	File exists	Listen via read_media — don't just check file exists

Coverage Check Before Diagnosis

Before concluding something is broken:

Capture the full artifact — all pages, sections, states (not just one viewport)
Scroll through long pages and multi-state flows
Check for capture artifacts: blank regions may be timing/iframe/canvas issues
If code suggests more output than captured, fix capture first

Prompts by Domain

Focus on what to look at — the critical lens is automatic:

Domain	Prompt
Website/landing page	`"Assess visual hierarchy, spacing, typography, and responsive layout. What feels generic?"`
Generated image	`"Does this match the brief? What's off about composition, color, or detail?"`
Chart/diagram	`"Is the data clearly communicated? What's misleading or hard to read?"`
Video/animation	`"Is the motion smooth and intentional? Any artifacts, jumps, or timing issues?"`
Audio/TTS	`"Is the speech clear, natural, and well-paced? Any distortion or pronunciation errors?"`
Before/after	`"What improved? What regressed? Be specific per dimension."`

Capturing Screenshots for Verification

# Playwright (recommended)
npx -y playwright@latest screenshot http://localhost:8765 screenshot.png --full-page

# Multiple viewports
npx -y playwright@latest screenshot http://localhost:8765 desktop.png --viewport-size=1440,900
npx -y playwright@latest screenshot http://localhost:8765 mobile.png --viewport-size=375,812

Then: read_media(prompt="Evaluate at desktop and mobile sizes", file_paths=["desktop.png", "mobile.png"])

API Key Check

On session start, .massgen-quality/environment.json is written with availability status. Check capabilities.has_vision to confirm read_media will work before attempting calls.

related-skills.json

mesmo repositório

evaluate.md

from "massgen/massgen-refinery"

Quick one-shot evaluation of work against quality criteria. Generates criteria, runs the round-evaluator, and presents the verdict. Lighter than /refine — does not auto-improve, just evaluates. Use for pre-PR quality checks or getting a critical assessment.

2026-03-231

massgen-run.md

from "massgen/massgen-refinery"

Launch a MassGen multi-agent run. Multiple LLM backends (codex, gemini, claude, grok) collaborate on a task via voting and consensus. Runs in Docker containers by default.

2026-03-231

refine.md

from "massgen/massgen-refinery"

Run a MassGen-style quality refinement loop. Generates eval criteria, produces/improves answers with subagent help, regression-guards before submission, then evaluates with round-evaluator and trace-analyzer in parallel. Invoke with /refine.

2026-03-231

team-massgen-run.md

from "massgen/massgen-refinery"

Launch parallel MassGen step mode processes — the lead manages all background tasks directly, tracks answers/votes, detects consensus, and synthesizes the result.

2026-03-231

package.json

"author": "massgen"

"repository": "massgen/massgen-refinery"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Desenvolvedores de softwareInformática e Matemática15-1252L4

name	read-media
description	Analyze media files (images, video, audio) using AI vision and audio models with a critical-first lens. Use for output-first verification — render your deliverable, then read_media to see what it actually looks like. Distinguishes fundamental issues from surface-level fixes.

Read Media — Critical-First Analysis

Analyze media files using read_media from the media-tools MCP server. This is the primary tool for output-first verification — experiencing your work as a user would, not just reading code.

Critical-First Philosophy

The vision model is instructed to be a critical reviewer by default:

"You are reviewing this work critically. Be honest about what you see — name specific problems, explain their impact, and distinguish between issues that need a fundamental rethink vs issues that are easy fixes. Don't sugarcoat."

When no prompt is provided, the default is:

"Analyze this {media_type}. What works, what's broken, and what would a demanding user complain about? Be specific and critical."

You don't need to add "be critical" to your prompts — it's the default. Focus your prompt on what to look at.

Quick Start

# Analyze a screenshot
read_media(prompt="What flaws or layout issues do you see?", file_paths=["screenshot.png"])

# Compare before and after
read_media(prompt="Compare these two versions — what improved and what regressed?",
           file_paths=["v1.png", "v2.png"])

# Check a video
read_media(prompt="Is the animation smooth? Any glitches?", file_paths=["recording.mp4"])

# Verify audio output
read_media(prompt="Is the speech clear and natural?", file_paths=["output.mp3"])

Parameters

Parameter	Description	Example
`prompt`	What to analyze — focus on what to look at	`"Evaluate the visual hierarchy and spacing"`
`file_paths`	List of file paths to analyze together	`["before.png", "after.png"]`
`continue_from`	conversation_id from a previous call for follow-ups	`"conv_abc123"`
`max_concurrent`	Max parallel analyses (default 4)	`4`

Before/After Comparison

All files are sent to the model in a SINGLE call — the model sees them side-by-side. This is critical for comparison: if you send files in separate calls, the model cannot compare them.

# CORRECT: both files in one call — model sees both
read_media(prompt="Compare before and after — what improved, what regressed?",
           file_paths=["v1.png", "v2.png"])

# WRONG: separate calls — model has no memory of the first
read_media(prompt="Analyze this", file_paths=["v1.png"])
read_media(prompt="Now compare to previous", file_paths=["v2.png"])  # can't compare!

For follow-up questions on the same media, use continue_from:

# First call
result = read_media(prompt="Evaluate the design", file_paths=["page.png"])
# result contains conversation_id

# Follow-up (model remembers the image)
read_media(prompt="Now focus on the typography — is the hierarchy clear?",
           continue_from="conv_abc123")

Supported Formats and Backends

Images

Formats: png, jpg, jpeg, gif, webp, bmp

Backend	Default Model	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	`GOOGLE_API_KEY` / `GEMINI_API_KEY`
OpenAI (priority 2)	`gpt-5.4`	`OPENAI_API_KEY`
Claude (priority 3)	`claude-sonnet-4-5-20250929`	`ANTHROPIC_API_KEY`
Grok (priority 4)	`grok-4.20-0309-reasoning`	`XAI_API_KEY`
OpenRouter (priority 5)	`openai/gpt-5.2`	`OPENROUTER_API_KEY`

Video

Formats: mp4, mov, avi, mkv, webm, gif, flv, wmv

Backend	Default Model	Method	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	Native video (no frame extraction)	`GOOGLE_API_KEY`
OpenAI (priority 2)	`gpt-5.4`	Frame extraction (8 frames default)	`OPENAI_API_KEY`
Claude (priority 3)	`claude-sonnet-4-5-20250929`	Frame extraction	`ANTHROPIC_API_KEY`

Audio

Formats: mp3, wav, m4a, ogg, flac, aac

Backend	Default Model	Mode	API Key
Google Gemini (priority 1)	`gemini-3.1-pro-preview`	Native audio understanding	`GOOGLE_API_KEY`
OpenAI (priority 2)	`gpt-4o-audio-preview`	Rich analysis (tone, emotion, pacing)	`OPENAI_API_KEY`

Backend auto-selects based on available API keys (first available in priority order).

Verification by Deliverable Type

Match evidence to how the output is experienced. Classify by what happens when a user opens it, not by file extension:

What does it do?	Shallow (incomplete)	Full check (required)
Stays still (image, PDF, document)	File generates	Render and view every page/section with read_media
Moves (animation, video)	Single frame	Record video, review full motion sequence
Responds to input (website, app)	Screenshot looks good	Use it — click buttons, navigate, test states; screenshot each state
Produces output (script, API)	Runs without error	Test with varied inputs, capture output
Makes sound (audio, TTS)	File exists	Listen via read_media — don't just check file exists

Coverage Check Before Diagnosis

Before concluding something is broken:

Capture the full artifact — all pages, sections, states (not just one viewport)
Scroll through long pages and multi-state flows
Check for capture artifacts: blank regions may be timing/iframe/canvas issues
If code suggests more output than captured, fix capture first

Prompts by Domain

Focus on what to look at — the critical lens is automatic:

Domain	Prompt
Website/landing page	`"Assess visual hierarchy, spacing, typography, and responsive layout. What feels generic?"`
Generated image	`"Does this match the brief? What's off about composition, color, or detail?"`
Chart/diagram	`"Is the data clearly communicated? What's misleading or hard to read?"`
Video/animation	`"Is the motion smooth and intentional? Any artifacts, jumps, or timing issues?"`
Audio/TTS	`"Is the speech clear, natural, and well-paced? Any distortion or pronunciation errors?"`
Before/after	`"What improved? What regressed? Be specific per dimension."`

Capturing Screenshots for Verification

# Playwright (recommended)
npx -y playwright@latest screenshot http://localhost:8765 screenshot.png --full-page

# Multiple viewports
npx -y playwright@latest screenshot http://localhost:8765 desktop.png --viewport-size=1440,900
npx -y playwright@latest screenshot http://localhost:8765 mobile.png --viewport-size=375,812

Then: read_media(prompt="Evaluate at desktop and mobile sizes", file_paths=["desktop.png", "mobile.png"])

API Key Check

On session start, .massgen-quality/environment.json is written with availability status. Check capabilities.has_vision to confirm read_media will work before attempting calls.

read-media

Read Media — Critical-First Analysis

Critical-First Philosophy

Quick Start

Parameters

Before/After Comparison

Supported Formats and Backends

Images

Video

Audio

Verification by Deliverable Type

Coverage Check Before Diagnosis

Prompts by Domain

Capturing Screenshots for Verification

API Key Check

Mais deste repositório

Mais deste repositório

Read Media — Critical-First Analysis

Critical-First Philosophy

Quick Start

Parameters

Before/After Comparison

Supported Formats and Backends

Images

Video

Audio

Verification by Deliverable Type

Coverage Check Before Diagnosis

Prompts by Domain

Capturing Screenshots for Verification

API Key Check