ワンクリックで
Generate game audio (music, voices, sound effects) via ElevenLabs.
npx skills add https://github.com/zeveck/audiogen --skill audiogenこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストール
Generate game audio (music, voices, sound effects) via ElevenLabs.
npx skills add https://github.com/zeveck/audiogen --skill audiogenこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストール
Safe commit workflow with optional scope hint. Inventories all changes, classifies related vs. unrelated files, traces dependencies, protects other agents' work, and optionally pushes or lands worktree commits. Usage: /commit [pr] [scope] [push|land]
Lightweight task dispatcher for ad-hoc work: documentation, examples, refactoring, content updates. Supports scheduling with every/now/next/stop. Usage: /do <description> [worktree] [push] [pr] [every SCHEDULE] [now] | stop | next.
Draft a high-quality plan through iterative adversarial review. Multiple rounds of research, drafting, review, devil's advocate, and refinement until the plan converges. Output is a plan file ready for /run-plan. Usage: /draft-plan [output FILE] [rounds N] <description...>
Orchestrate a batch bug-fixing sprint. Supports scheduling with every/now/next/stop. Use sync to update trackers and verify/close already-fixed issues. Use plan to draft plans for skipped issues. Usage: /fix-issues N [focus] [auto] [every SCHEDULE] [now] | sync | plan [auto] | stop | next.
Refine an in-progress plan by reviewing remaining phases against completed work. Dispatches adversarial reviewer and devil's advocate agents to find stale references, invalidated assumptions, and specification gaps — then refines remaining phases until convergence. Completed phases are NEVER modified. Appends a Drift Log and Plan Review section. Usage: /refine-plan <plan-file> [rounds N]
Full pipeline: decompose a broad goal into sub-plans, draft each with adversarial review, then execute all of them autonomously. One command, walk away. Usage: /research-and-go <description>
| name | audiogen |
| description | Generate game audio (music, voices, sound effects) via ElevenLabs. |
| disable-model-invocation | false |
| allowed-tools | Bash(node */generate.cjs *) |
| argument-hint | <music|voice|sfx|voices> <description> [--voice-id ID] [--length-ms MS] [--duration SEC] [--output PATH] |
Generate background music, character voice lines, and sound effects for games
via the ElevenLabs REST API. One CLI (generate.cjs), three modalities,
zero npm deps. Usable both as a Claude Code skill and as a standalone shell
CLI (e.g. from CI).
process.loadEnvFile.
The CLI asserts this at entry and fails with a clear message on older
Node.ELEVENLABS_API_KEY — get one at
https://elevenlabs.io/app/settings/api-keys. Either export it in your
shell, or add ELEVENLABS_API_KEY=... to a .env file at the project
root. The CLI walks up from cwd and from the script directory looking
for a .env, loading the first one found. Shell-exported values take
precedence over .env values (Node's default; not an audiogen
choice).output_format=mp3_44100_64; see Edge
Cases. Free-tier output requires attribution; paid-tier output comes
with perpetual commercial rights.# Background music — 30 seconds of orchestral tension
/audiogen music tense orchestral battle cue --length-ms 30000
# Voice line — must specify a voice
/audiogen voice "Halt, traveller! What brings you here?" --voice-id Rachel
# Sound effect — 2 seconds
/audiogen sfx heavy wooden door slam with reverb --duration 2
# Voice catalog browse
/audiogen voices --gender female --language en
All outputs default to assets/audio/{music,voice,sfx}/<slug>.mp3 in the
current working directory.
If the user invokes /audiogen with nothing after it, ask what they want
to generate before calling the CLI. Don't guess — the three modalities
are priced and shaped very differently. A good clarification: "Music
(background track), a voice line, or a sound effect? And how long should
it be?"
The first positional word is always a subcommand:
/audiogen music|voice|sfx|voices <rest...>
When the user's phrasing makes intent obvious, pick the subcommand yourself:
musicvoicesfxvoicesWhen the phrasing is ambiguous (e.g. "a scream" — voice or sfx?), ask.
voices is the catalog-browse command, not a generator.
music — background tracks/audiogen music <prompt...> [--length-ms MS] [--seed N]
[--force-instrumental]
[--output-format FMT] [--output PATH]
Generates one music clip via POST /v1/music.
--length-ms — 3000–600000. Default 30000 (30 s). Max 5 minutes.--seed — integer seed for reproducibility.--force-instrumental — suppress vocals.--output-format — mp3_*, pcm_*, opus_*, ulaw_8000,
alaw_8000. Default mp3_44100_128. WAV formats (wav_*) are
rejected — the music endpoint does not accept them.Loop caveat. There is no native "seamless loop" flag for music. For
looping game BGM, loop in the engine / HTML, or build a
composition_plan (not exposed in this CLI). See Loop Caveat below.
voice — TTS (text-to-speech)/audiogen voice <text...> --voice-id <NAME_OR_ID>
[--language-code CODE]
[--stability N] [--similarity-boost N]
[--style N] [--speed N]
[--model-id ID] [--seed N]
[--output-format FMT] [--output PATH]
Generates one TTS clip via POST /v1/text-to-speech/{voice_id}.
--voice-id — required. Accepts either an exact voice name from
the local cache (case-insensitive), or a 20-char alphanumeric voice
ID. If the name isn't cached, fall back to audiogen voices <query>
to discover one.--model-id — default eleven_multilingual_v2. Others:
eleven_flash_v2_5 (fast/cheap), eleven_turbo_v2_5, eleven_v3
(expressive — best for emotional lines).--stability, --similarity-boost, --style, --speed — all
optional voice_settings fields. Ranges 0–1 except --speed (0.5–2).--output-format — includes wav_* in addition to the usual set
(WAV is supported for TTS but not for music or sfx).sfx — sound effects/audiogen sfx <prompt...> [--duration N] [--loop]
[--prompt-influence N]
[--output-format FMT] [--output PATH]
Generates one sound effect via POST /v1/sound-generation.
--duration — 0.5 to 30 seconds. Optional — omit to let the API
auto-pick a duration. The 30-second ceiling is a hard API limit; for
longer ambience, generate a loopable clip and repeat in-engine.--loop — native API flag (requires the eleven_text_to_sound_v2
model, which is the default). Produces a seamlessly loopable
waveform.--prompt-influence — 0–1. Default 0.3. Higher = closer adherence
to the prompt; lower = more creative latitude.--output-format — mp3_*, pcm_*, opus_*, ulaw_*, alaw_*.
WAV rejected, same as music.voices — browse the voice catalog/audiogen voices [query...] [--accent STR] [--gender male|female]
[--language CODE] [--category STR]
[--limit N] [--page-size N]
[--json] [--refresh]
Lists voices from GET /v2/voices with client-side filters. Results
are cached at .audiogen-voices.json in the project root with a
24-hour TTL; stale caches are refreshed automatically. Use
--refresh to force a re-fetch.
query (optional positional) — substring match on voice name,
description, and labels.--accent — substring match on accent labels (e.g. british,
american).--gender — male or female.--language — ISO code (en, ja, es, …).--category — e.g. premade, cloned, professional.--limit — cap the number of rows in the table view (default 50).
--json ignores this.--json — emit raw JSON for scripting; table view otherwise.The cache file is gitignored. It is not a source of truth — the
ElevenLabs catalog is — but is the canonical resolver for --voice-id <name>.
Music and SFX prompts are free-form natural language. Write like you are briefing a human foley artist or composer:
For quick starting points, see reference.md — it contains ~20
music presets, ~15 voice archetypes, and ~25 SFX presets. Presets are
templates, not literal prompts: expand them with any additional cues
from the user's phrasing.
--voice-id directly. 20-char alphanumeric strings are
treated as IDs; anything else is matched against the cache by
name (case-insensitive)./audiogen voices <query> first
— possibly with --gender, --accent, or --language. Pick a
candidate by name, then call voice with that name.--refresh if
the user says "new voices were added" or you hit a "voice not
found" error.reference.md for archetype-to-search-query guidance (e.g.
"Arcane Scholar → voices wise old male english").Default layout in the current working directory:
assets/audio/
music/ — background tracks
voice/ — TTS lines
sfx/ — sound effects
-, first 40 chars). Non-Latin prompts fall back to
audio-YYYYMMDD-HHMMSS.assets/audio/music/foo.mp3
exists, the next call becomes foo-v2.mp3, then -v3, up to
-v999. Pass --force to overwrite in place.--output PATH overrides everything:
PATH is an existing directory (or ends with /), the slug is
written inside it.PATH is the literal output filename (auto-versioning
still applies unless --force).Filenames are a collision-avoidance mechanism, not a version
manifest. If the user wants to track iterations ("go back to v1",
"that was better than v2"), use --history-parent <id> against
.audiogen-history.jsonl — see Regeneration & Iteration.
The CLI prints every failure as audiogen: <message> on stderr and
exits 1. Common patterns:
output_format in the detail — typically a free-tier
restriction. The CLI appends the hint "Free-tier accounts are
restricted to mp3_44100_64. Try --output-format mp3_44100_64."empty audio response — a 200 with zero bytes; the CLI deletes
the empty file and reports. Refine the prompt and retry.voice not found / duplicate name — the cache has no match, or
two cached voices share a name. The error lists the candidate IDs;
pass one directly via --voice-id <ID>.Every error block includes the HTTP status and, when available, the
xi-request-id header — include that when filing issues with
ElevenLabs support.
Every successful generation appends one JSON line to
.audiogen-history.jsonl in the working directory. Schema:
{ts, id, parent_id?, type, prompt, model_id, output_path,
output_format, request_body, request_id?}
--history-id ID — tag a generation with an explicit id (otherwise
the CLI auto-generates one from the output path).--history-parent ID — mark this generation as a derivative of a
prior record. Use for "make that goblin voice gruffer", "same track
with more brass", etc.History-file write failures are logged to stderr but never abort the generation — you got the audio, the bookkeeping is best-effort.
<audio loop>. For a natively seamless
loop, you need to craft a composition_plan (not exposed in this
CLI — use the raw ElevenLabs API or their web UI).--loop on the default eleven_text_to_sound_v2
model. Produces a clip with matching start/end so it can repeat
seamlessly in-engine.The live pricing page at https://elevenlabs.io/pricing/api is authoritative — the numbers below may lag. Approximate API-tier rates as of April 2026:
Consult reference.md for a detailed cost table and worked examples.
When in doubt, quote from the live page, not from this file.
wav_* formats to music or sfx — the API rejects
them. TTS accepts WAV.sfx without a text prompt — the endpoint will
return 400.[0.5, 30] — the 30 s ceiling is a
hard API limit.-v2, -v3, …) as collision avoidance,
not a version manifest. Use --history-parent for genuine iteration
threading..audiogen-history.jsonl by hand — it is append-only.output_format — the CLI appends a hint
suggesting --output-format mp3_44100_64. Follow it.composition_plan out-of-band.--voice-id Adam will fail with a
disambiguation error listing both IDs; retry with an ID."empty audio response; refine prompt or retry.". Treat as
transient; refine and retry.[a-z0-9]
characters; if nothing is left, the filename falls back to
audio-YYYYMMDD-HHMMSS (local time zone).--refresh if a
newly-added voice is not found..env precedence — shell-exported ELEVENLABS_API_KEY wins
over any .env value (Node's default). Unset shell variable if
you want the .env value to take effect.The CLI lives at .claude/skills/audiogen/generate.cjs. It is
dual-shape: as a skill backend (invoked via this SKILL.md), and as a
standalone CLI. Run it directly from any shell for scripting, CI, or
manual experimentation:
node .claude/skills/audiogen/generate.cjs music "chiptune boss" \
--length-ms 45000 --output-format mp3_44100_128
All flags work identically in both modes. Run --help for the
authoritative flag surface; this file is prose, generate.cjs --help
is canon.