ワンクリックで
audio-memory
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
Create and maintain ASCII visual dashboards for project tracking with parallel lane progress bars
Clear documentation through visual excellence
AI music generation via Replicate — 5 models for background tracks, lyrics, and sound design
Practitioner methodology for longitudinal case study research, evidence-based documentation, and publication-ready academic writing in AI-assisted development.
First impressions matter. Set projects up for success.
Cloud TTS via Replicate — 15 models, voice cloning, emotion control, and multi-language support
SOC 職業分類に基づく
| name | audio-memory |
| description | Store and manage voice samples for TTS cloning — portable, version-controlled audio references |
| tier | standard |
| applyTo | **/*voice*,**/*audio*memory*,**/*clone*voice* |
| $schema | ../SKILL-SCHEMA.json |
Domain: AI Audio / Voice Cloning
Version: 1.0.0
Last Updated: 2026-04-15
Author: Alex (Master Alex)
Related: text-to-speech (generation), visual-memory (photos)
Store voice samples for TTS voice cloning in a portable, version-controlled format. Unlike visual memory (base64 inline), audio files are stored as files with JSON metadata — audio is too large to embed sensibly.
| Spec | Value |
|---|---|
| Duration | 5-15 seconds of clear speech |
| Format | WAV (preferred) or MP3 |
| Sample rate | 16kHz+ (22kHz+ recommended) |
| Content | Natural speech, varied intonation |
| Background | No music, no background noise |
| File size | ~100KB-500KB per sample |
.github/skills/<skill-name>/audio-memory/
├── index.json # Metadata registry
├── voices/
│ ├── alex-sample.wav # Voice sample files
│ ├── narrator-sample.wav
│ └── ...
└── README.md # Usage notes (optional)
{
"version": "1.0",
"updated": "2026-04-15",
"voices": {
"alex": {
"description": "Natural conversational voice, warm and friendly",
"audioFile": "voices/alex-sample.wav",
"duration": "10s",
"sampleRate": "22050",
"language": "en-US",
"preferredModel": "chatterbox-turbo",
"notes": "Best for narration and documentation reads"
},
"narrator": {
"description": "Professional narration voice",
"audioFile": "voices/narrator-sample.wav",
"duration": "12s",
"sampleRate": "44100",
"language": "en-US",
"preferredModel": "qwen/qwen3-tts"
}
}
}
| Model | Replicate ID | Voice Cloning | Cost |
|---|---|---|---|
| Chatterbox Turbo | resemble-ai/chatterbox-turbo | ✅ 5s sample | $0.025/1k chars |
| Qwen TTS | qwen/qwen3-tts | ✅ 3 modes | $0.02/1k chars |
| MiniMax Speech | minimax/speech-2.8-turbo | ❌ Presets | $0.06/1k tokens |
Note: MiniMax doesn't support cloning but has 40+ voice presets.
"Hello, I'm [Name]. Today I want to share some thoughts about technology and how it shapes our daily lives. The key is finding balance — embracing innovation while staying grounded in what matters most."
# Recommended: Use Audacity, Voice Memos (macOS), or Windows Voice Recorder
# Export as WAV, 22kHz or 44.1kHz, mono
# Create directory structure
New-Item -ItemType Directory -Path ".github/skills/<skill>/audio-memory/voices" -Force
# Copy voice sample
Copy-Item "my-recording.wav" ".github/skills/<skill>/audio-memory/voices/<name>-sample.wav"
{
"voices": {
"<name>": {
"description": "Brief description of the voice character",
"audioFile": "voices/<name>-sample.wav",
"duration": "10s",
"sampleRate": "22050",
"language": "en-US",
"preferredModel": "chatterbox-turbo"
}
}
}
import Replicate from "replicate";
const replicate = new Replicate();
const output = await replicate.run("resemble-ai/chatterbox-turbo", {
input: {
text: "Testing the voice clone. This should sound like the reference sample.",
audio_prompt: fs.readFileSync("voices/<name>-sample.wav"),
},
});
console.log("Generated audio:", output);
import { readFileSync } from "fs";
import Replicate from "replicate";
// Load audio memory
const audioMemory = JSON.parse(
readFileSync(".github/skills/<skill>/audio-memory/index.json", "utf8")
);
const voice = audioMemory.voices["alex"];
// Generate speech with cloned voice
const replicate = new Replicate();
const output = await replicate.run("resemble-ai/chatterbox-turbo", {
input: {
text: "Content to speak in the cloned voice",
audio_prompt: readFileSync(
`.github/skills/<skill>/audio-memory/${voice.audioFile}`
),
},
});
const output = await replicate.run("qwen/qwen3-tts", {
input: {
text: "Content to speak",
tts_mode: "voice_clone",
audio_input: readFileSync(
`.github/skills/<skill>/audio-memory/${voice.audioFile}`
),
},
});
| Element | Recommendation |
|---|---|
| Sample duration | 10s optimal (5s minimum, 15s maximum) |
| Varied speech | Include questions, statements, exclamations |
| Distinct voice | Clear enunciation, consistent microphone setup |
| File format | WAV preferred (lossless), MP3 acceptable |
| Sample rate | 22kHz+ (44.1kHz for premium) |
| Without Audio Memory | With Audio Memory |
|---|---|
| External folder required | Version-controlled with code |
| Breaks on different machines | Works anywhere |
| Manual path management | Structured JSON metadata |
| No documentation | Self-describing with index.json |
| Ad-hoc organization | Consistent skill-scoped storage |
This skill stores voice samples. Use the text-to-speech skill for:
Workflow: