بنقرة واحدة
Generate game audio (music, voices, sound effects) via ElevenLabs.
npx skills add https://github.com/zeveck/audiogen --skill audiogenانسخ والصق هذا الأمر في Claude Code لتثبيت المهارة
Generate game audio (music, voices, sound effects) via ElevenLabs.
npx skills add https://github.com/zeveck/audiogen --skill audiogenانسخ والصق هذا الأمر في Claude Code لتثبيت المهارة
Safe commit workflow with optional scope hint. Inventories all changes, classifies related vs. unrelated files, traces dependencies, protects other agents' work, and optionally pushes or lands worktree commits. Usage: /commit [pr] [scope] [push|land]
Lightweight task dispatcher for ad-hoc work: documentation, examples, refactoring, content updates. Supports scheduling with every/now/next/stop. Usage: /do <description> [worktree] [push] [pr] [every SCHEDULE] [now] | stop | next.
Draft a high-quality plan through iterative adversarial review. Multiple rounds of research, drafting, review, devil's advocate, and refinement until the plan converges. Output is a plan file ready for /run-plan. Usage: /draft-plan [output FILE] [rounds N] <description...>
Orchestrate a batch bug-fixing sprint. Supports scheduling with every/now/next/stop. Use sync to update trackers and verify/close already-fixed issues. Use plan to draft plans for skipped issues. Usage: /fix-issues N [focus] [auto] [every SCHEDULE] [now] | sync | plan [auto] | stop | next.
Refine an in-progress plan by reviewing remaining phases against completed work. Dispatches adversarial reviewer and devil's advocate agents to find stale references, invalidated assumptions, and specification gaps — then refines remaining phases until convergence. Completed phases are NEVER modified. Appends a Drift Log and Plan Review section. Usage: /refine-plan <plan-file> [rounds N]
Full pipeline: decompose a broad goal into sub-plans, draft each with adversarial review, then execute all of them autonomously. One command, walk away. Usage: /research-and-go <description>
| name | audiogen |
| description | Generate game audio (music, voices, sound effects) via ElevenLabs. |
| disable-model-invocation | false |
| allowed-tools | Bash(node */generate.cjs *) |
| argument-hint | <music|voice|sfx|voices> <description> [--voice-id ID] [--length-ms MS] [--duration SEC] [--output PATH] |
Generate background music, character voice lines, and sound effects for games
via the ElevenLabs REST API. One CLI (generate.cjs), three modalities,
zero npm deps. Usable both as a Claude Code skill and as a standalone shell
CLI (e.g. from CI).
process.loadEnvFile.
The CLI asserts this at entry and fails with a clear message on older
Node.ELEVENLABS_API_KEY — get one at
https://elevenlabs.io/app/settings/api-keys. Either export it in your
shell, or add ELEVENLABS_API_KEY=... to a .env file at the project
root. The CLI walks up from cwd and from the script directory looking
for a .env, loading the first one found. Shell-exported values take
precedence over .env values (Node's default; not an audiogen
choice).output_format=mp3_44100_64; see Edge
Cases. Free-tier output requires attribution; paid-tier output comes
with perpetual commercial rights.# Background music — 30 seconds of orchestral tension
/audiogen music tense orchestral battle cue --length-ms 30000
# Voice line — must specify a voice
/audiogen voice "Halt, traveller! What brings you here?" --voice-id Rachel
# Sound effect — 2 seconds
/audiogen sfx heavy wooden door slam with reverb --duration 2
# Voice catalog browse
/audiogen voices --gender female --language en
All outputs default to assets/audio/{music,voice,sfx}/<slug>.mp3 in the
current working directory.
If the user invokes /audiogen with nothing after it, ask what they want
to generate before calling the CLI. Don't guess — the three modalities
are priced and shaped very differently. A good clarification: "Music
(background track), a voice line, or a sound effect? And how long should
it be?"
The first positional word is always a subcommand:
/audiogen music|voice|sfx|voices <rest...>
When the user's phrasing makes intent obvious, pick the subcommand yourself:
musicvoicesfxvoicesWhen the phrasing is ambiguous (e.g. "a scream" — voice or sfx?), ask.
voices is the catalog-browse command, not a generator.
music — background tracks/audiogen music <prompt...> [--length-ms MS] [--seed N]
[--force-instrumental]
[--output-format FMT] [--output PATH]
Generates one music clip via POST /v1/music.
--length-ms — 3000–600000. Default 30000 (30 s). Max 5 minutes.--seed — integer seed for reproducibility.--force-instrumental — suppress vocals.--output-format — mp3_*, pcm_*, opus_*, ulaw_8000,
alaw_8000. Default mp3_44100_128. WAV formats (wav_*) are
rejected — the music endpoint does not accept them.Loop caveat. There is no native "seamless loop" flag for music. For
looping game BGM, loop in the engine / HTML, or build a
composition_plan (not exposed in this CLI). See Loop Caveat below.
voice — TTS (text-to-speech)/audiogen voice <text...> --voice-id <NAME_OR_ID>
[--language-code CODE]
[--stability N] [--similarity-boost N]
[--style N] [--speed N]
[--model-id ID] [--seed N]
[--output-format FMT] [--output PATH]
Generates one TTS clip via POST /v1/text-to-speech/{voice_id}.
--voice-id — required. Accepts either an exact voice name from
the local cache (case-insensitive), or a 20-char alphanumeric voice
ID. If the name isn't cached, fall back to audiogen voices <query>
to discover one.--model-id — default eleven_multilingual_v2. Others:
eleven_flash_v2_5 (fast/cheap), eleven_turbo_v2_5, eleven_v3
(expressive — best for emotional lines).--stability, --similarity-boost, --style, --speed — all
optional voice_settings fields. Ranges 0–1 except --speed (0.5–2).--output-format — includes wav_* in addition to the usual set
(WAV is supported for TTS but not for music or sfx).sfx — sound effects/audiogen sfx <prompt...> [--duration N] [--loop]
[--prompt-influence N]
[--output-format FMT] [--output PATH]
Generates one sound effect via POST /v1/sound-generation.
--duration — 0.5 to 30 seconds. Optional — omit to let the API
auto-pick a duration. The 30-second ceiling is a hard API limit; for
longer ambience, generate a loopable clip and repeat in-engine.--loop — native API flag (requires the eleven_text_to_sound_v2
model, which is the default). Produces a seamlessly loopable
waveform.--prompt-influence — 0–1. Default 0.3. Higher = closer adherence
to the prompt; lower = more creative latitude.--output-format — mp3_*, pcm_*, opus_*, ulaw_*, alaw_*.
WAV rejected, same as music.voices — browse the voice catalog/audiogen voices [query...] [--accent STR] [--gender male|female]
[--language CODE] [--category STR]
[--limit N] [--page-size N]
[--json] [--refresh]
Lists voices from GET /v2/voices with client-side filters. Results
are cached at .audiogen-voices.json in the project root with a
24-hour TTL; stale caches are refreshed automatically. Use
--refresh to force a re-fetch.
query (optional positional) — substring match on voice name,
description, and labels.--accent — substring match on accent labels (e.g. british,
american).--gender — male or female.--language — ISO code (en, ja, es, …).--category — e.g. premade, cloned, professional.--limit — cap the number of rows in the table view (default 50).
--json ignores this.--json — emit raw JSON for scripting; table view otherwise.The cache file is gitignored. It is not a source of truth — the
ElevenLabs catalog is — but is the canonical resolver for --voice-id <name>.
Music and SFX prompts are free-form natural language. Write like you are briefing a human foley artist or composer:
For quick starting points, see reference.md — it contains ~20
music presets, ~15 voice archetypes, and ~25 SFX presets. Presets are
templates, not literal prompts: expand them with any additional cues
from the user's phrasing.
--voice-id directly. 20-char alphanumeric strings are
treated as IDs; anything else is matched against the cache by
name (case-insensitive)./audiogen voices <query> first
— possibly with --gender, --accent, or --language. Pick a
candidate by name, then call voice with that name.--refresh if
the user says "new voices were added" or you hit a "voice not
found" error.reference.md for archetype-to-search-query guidance (e.g.
"Arcane Scholar → voices wise old male english").Default layout in the current working directory:
assets/audio/
music/ — background tracks
voice/ — TTS lines
sfx/ — sound effects
-, first 40 chars). Non-Latin prompts fall back to
audio-YYYYMMDD-HHMMSS.assets/audio/music/foo.mp3
exists, the next call becomes foo-v2.mp3, then -v3, up to
-v999. Pass --force to overwrite in place.--output PATH overrides everything:
PATH is an existing directory (or ends with /), the slug is
written inside it.PATH is the literal output filename (auto-versioning
still applies unless --force).Filenames are a collision-avoidance mechanism, not a version
manifest. If the user wants to track iterations ("go back to v1",
"that was better than v2"), use --history-parent <id> against
.audiogen-history.jsonl — see Regeneration & Iteration.
The CLI prints every failure as audiogen: <message> on stderr and
exits 1. Common patterns:
output_format in the detail — typically a free-tier
restriction. The CLI appends the hint "Free-tier accounts are
restricted to mp3_44100_64. Try --output-format mp3_44100_64."empty audio response — a 200 with zero bytes; the CLI deletes
the empty file and reports. Refine the prompt and retry.voice not found / duplicate name — the cache has no match, or
two cached voices share a name. The error lists the candidate IDs;
pass one directly via --voice-id <ID>.Every error block includes the HTTP status and, when available, the
xi-request-id header — include that when filing issues with
ElevenLabs support.
Every successful generation appends one JSON line to
.audiogen-history.jsonl in the working directory. Schema:
{ts, id, parent_id?, type, prompt, model_id, output_path,
output_format, request_body, request_id?}
--history-id ID — tag a generation with an explicit id (otherwise
the CLI auto-generates one from the output path).--history-parent ID — mark this generation as a derivative of a
prior record. Use for "make that goblin voice gruffer", "same track
with more brass", etc.History-file write failures are logged to stderr but never abort the generation — you got the audio, the bookkeeping is best-effort.
<audio loop>. For a natively seamless
loop, you need to craft a composition_plan (not exposed in this
CLI — use the raw ElevenLabs API or their web UI).--loop on the default eleven_text_to_sound_v2
model. Produces a clip with matching start/end so it can repeat
seamlessly in-engine.The live pricing page at https://elevenlabs.io/pricing/api is authoritative — the numbers below may lag. Approximate API-tier rates as of April 2026:
Consult reference.md for a detailed cost table and worked examples.
When in doubt, quote from the live page, not from this file.
wav_* formats to music or sfx — the API rejects
them. TTS accepts WAV.sfx without a text prompt — the endpoint will
return 400.[0.5, 30] — the 30 s ceiling is a
hard API limit.-v2, -v3, …) as collision avoidance,
not a version manifest. Use --history-parent for genuine iteration
threading..audiogen-history.jsonl by hand — it is append-only.output_format — the CLI appends a hint
suggesting --output-format mp3_44100_64. Follow it.composition_plan out-of-band.--voice-id Adam will fail with a
disambiguation error listing both IDs; retry with an ID."empty audio response; refine prompt or retry.". Treat as
transient; refine and retry.[a-z0-9]
characters; if nothing is left, the filename falls back to
audio-YYYYMMDD-HHMMSS (local time zone).--refresh if a
newly-added voice is not found..env precedence — shell-exported ELEVENLABS_API_KEY wins
over any .env value (Node's default). Unset shell variable if
you want the .env value to take effect.The CLI lives at .claude/skills/audiogen/generate.cjs. It is
dual-shape: as a skill backend (invoked via this SKILL.md), and as a
standalone CLI. Run it directly from any shell for scripting, CI, or
manual experimentation:
node .claude/skills/audiogen/generate.cjs music "chiptune boss" \
--length-ms 45000 --output-format mp3_44100_128
All flags work identically in both modes. Run --help for the
authoritative flag surface; this file is prose, generate.cjs --help
is canon.