en un clic
multimedia-backend-integrator
// Reference guide for adding new media generation backends to MassGen's unified generate_media tool.
// Reference guide for adding new media generation backends to MassGen's unified generate_media tool.
| name | multimedia-backend-integrator |
| description | Reference guide for adding new media generation backends to MassGen's unified generate_media tool. |
Reference guide for adding new media generation backends to MassGen's unified generate_media tool.
_base.py -- Registration: API keys, default models, priority lists
_selector.py -- Auto-selection logic: picks best backend by key + priority
_image.py -- Image backends: OpenAI, Google (Gemini/Imagen), Grok, OpenRouter
_video.py -- Video backends: Grok, Google Veo, OpenAI Sora
_audio.py -- Audio backends: ElevenLabs, OpenAI TTS
generate_media.py -- Entry point: routing, validation, batch mode, image-to-image
_base.py)BACKEND_API_KEYS: map backend name to env var(s)DEFAULT_MODELS: map backend name to {MediaType: model_name} for each supported typeBACKEND_PRIORITY: insert at correct position per media type_image.py / _video.py / _audio.py)import for SDK at module top_generate_{media}_{backend}(config) -> GenerationResultconfig.* fields to SDK parametersconfig.output_pathGenerationResult with metadataelif backend == "new_backend": in the media type's generate_{media}() functiongenerate_media.py)selected_backend not in (...) check in _generate_single_with_input_imageselif has_api_key("new_backend"): in the auto-selection chainTOOL.md: Add env var to frontmatter, backend to tables, keywordsgenerate_media.py docstring: Update backend_type list and Supported BackendsEach backend that supports iterative editing needs a continuation mechanism:
| Backend | Store Type | Key Format | What's Stored | How Continuation Works |
|---|---|---|---|---|
| OpenAI | Stateless (server-side) | response.id | Nothing locally | Pass previous_response_id to next call |
| Gemini | _GeminiChatStore (in-memory) | gemini_chat_{uuid12} | (client, chat) tuples | Reuse chat object for send_message(); client kept alive to prevent HTTP connection GC |
| Grok | _GrokImageStore (in-memory) | grok_img_{uuid12} | Base64 strings | Pass stored base64 as image_url data URI |
class _NewBackendStore:
def __init__(self, max_items: int = 50):
self._store: OrderedDict[str, Any] = OrderedDict()
self._max = max_items
def save(self, data: Any) -> str:
store_id = f"prefix_{uuid.uuid4().hex[:12]}"
if len(self._store) >= self._max:
self._store.popitem(last=False) # LRU eviction
self._store[store_id] = data
return store_id
def get(self, store_id: str) -> Any | None:
return self._store.get(store_id)
_store = _NewBackendStore()
asyncio.to_thread() if neededduration or default treats 0 as falsy; use if duration is not None_generate_single_with_input_images function has a backend allowlist| File | Purpose |
|---|---|
massgen/tool/_multimodal_tools/generation/_base.py | API keys, default models, priorities |
massgen/tool/_multimodal_tools/generation/_selector.py | Backend auto-selection logic |
massgen/tool/_multimodal_tools/generation/_image.py | Image generation backends |
massgen/tool/_multimodal_tools/generation/_video.py | Video generation backends |
massgen/tool/_multimodal_tools/generation/_audio.py | Audio generation backends |
massgen/tool/_multimodal_tools/generation/generate_media.py | Entry point and routing |
massgen/tool/_multimodal_tools/TOOL.md | User-facing documentation |
massgen/tests/test_grok_multimedia_generation.py | Reference: Grok backend tests |
massgen/tests/test_grok_multimedia_backend_selection.py | Reference: Grok selection tests |
massgen/tests/test_multimodal_image_backend_selection.py | Reference: image selection tests |
Run MassGen experiments and analyze logs using automation mode, logfire tracing, and SQL queries. Use this skill for performance analysis, debugging agent behavior, evaluating coordination patterns, and improving the logging structure, or whenever an ANALYSIS_REPORT.md is needed in a log directory.
Invoke MassGen's multi-agent system. Use when the user wants multiple AI agents on a task: writing, code, review, planning, specs, research, design, or any task where parallel iteration beats working alone.
Complete guide for integrating a new LLM backend into MassGen. Use when adding a new provider (e.g., Codex, Mistral, DeepSeek) or when auditing an existing backend for missing integration points. Covers all ~15 files that need touching.
Guide to image generation and editing in MassGen. Use when creating images, editing existing images, iterating on image designs, or choosing between image backends (OpenAI, Google Gemini/Imagen, Grok, OpenRouter).
Guide to video generation in MassGen. Use when creating videos from text prompts or images across Grok, Google Veo, and OpenAI Sora backends.
Guide to audio generation and understanding in MassGen. Covers text-to-speech, music, sound effects, and audio understanding across ElevenLabs and OpenAI backends.