con un clic
audio-memory
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Menú
Store and manage voice samples for TTS cloning — portable, version-controlled audio references
Create and maintain ASCII visual dashboards for project tracking with parallel lane progress bars
Clear documentation through visual excellence
AI music generation via Replicate — 5 models for background tracks, lyrics, and sound design
Practitioner methodology for longitudinal case study research, evidence-based documentation, and publication-ready academic writing in AI-assisted development.
First impressions matter. Set projects up for success.
Cloud TTS via Replicate — 15 models, voice cloning, emotion control, and multi-language support
| name | audio-memory |
| description | Store and manage voice samples for TTS cloning — portable, version-controlled audio references |
| tier | standard |
| applyTo | **/*voice*,**/*audio*memory*,**/*clone*voice* |
| $schema | ../SKILL-SCHEMA.json |
Domain: AI Audio / Voice Cloning
Version: 1.0.0
Last Updated: 2026-04-15
Author: Alex (Master Alex)
Related: text-to-speech (generation), visual-memory (photos)
Store voice samples for TTS voice cloning in a portable, version-controlled format. Unlike visual memory (base64 inline), audio files are stored as files with JSON metadata — audio is too large to embed sensibly.
| Spec | Value |
|---|---|
| Duration | 5-15 seconds of clear speech |
| Format | WAV (preferred) or MP3 |
| Sample rate | 16kHz+ (22kHz+ recommended) |
| Content | Natural speech, varied intonation |
| Background | No music, no background noise |
| File size | ~100KB-500KB per sample |
.github/skills/<skill-name>/audio-memory/
├── index.json # Metadata registry
├── voices/
│ ├── alex-sample.wav # Voice sample files
│ ├── narrator-sample.wav
│ └── ...
└── README.md # Usage notes (optional)
{
"version": "1.0",
"updated": "2026-04-15",
"voices": {
"alex": {
"description": "Natural conversational voice, warm and friendly",
"audioFile": "voices/alex-sample.wav",
"duration": "10s",
"sampleRate": "22050",
"language": "en-US",
"preferredModel": "chatterbox-turbo",
"notes": "Best for narration and documentation reads"
},
"narrator": {
"description": "Professional narration voice",
"audioFile": "voices/narrator-sample.wav",
"duration": "12s",
"sampleRate": "44100",
"language": "en-US",
"preferredModel": "qwen/qwen3-tts"
}
}
}
| Model | Replicate ID | Voice Cloning | Cost |
|---|---|---|---|
| Chatterbox Turbo | resemble-ai/chatterbox-turbo | ✅ 5s sample | $0.025/1k chars |
| Qwen TTS | qwen/qwen3-tts | ✅ 3 modes | $0.02/1k chars |
| MiniMax Speech | minimax/speech-2.8-turbo | ❌ Presets | $0.06/1k tokens |
Note: MiniMax doesn't support cloning but has 40+ voice presets.
"Hello, I'm [Name]. Today I want to share some thoughts about technology and how it shapes our daily lives. The key is finding balance — embracing innovation while staying grounded in what matters most."
# Recommended: Use Audacity, Voice Memos (macOS), or Windows Voice Recorder
# Export as WAV, 22kHz or 44.1kHz, mono
# Create directory structure
New-Item -ItemType Directory -Path ".github/skills/<skill>/audio-memory/voices" -Force
# Copy voice sample
Copy-Item "my-recording.wav" ".github/skills/<skill>/audio-memory/voices/<name>-sample.wav"
{
"voices": {
"<name>": {
"description": "Brief description of the voice character",
"audioFile": "voices/<name>-sample.wav",
"duration": "10s",
"sampleRate": "22050",
"language": "en-US",
"preferredModel": "chatterbox-turbo"
}
}
}
import Replicate from "replicate";
const replicate = new Replicate();
const output = await replicate.run("resemble-ai/chatterbox-turbo", {
input: {
text: "Testing the voice clone. This should sound like the reference sample.",
audio_prompt: fs.readFileSync("voices/<name>-sample.wav"),
},
});
console.log("Generated audio:", output);
import { readFileSync } from "fs";
import Replicate from "replicate";
// Load audio memory
const audioMemory = JSON.parse(
readFileSync(".github/skills/<skill>/audio-memory/index.json", "utf8")
);
const voice = audioMemory.voices["alex"];
// Generate speech with cloned voice
const replicate = new Replicate();
const output = await replicate.run("resemble-ai/chatterbox-turbo", {
input: {
text: "Content to speak in the cloned voice",
audio_prompt: readFileSync(
`.github/skills/<skill>/audio-memory/${voice.audioFile}`
),
},
});
const output = await replicate.run("qwen/qwen3-tts", {
input: {
text: "Content to speak",
tts_mode: "voice_clone",
audio_input: readFileSync(
`.github/skills/<skill>/audio-memory/${voice.audioFile}`
),
},
});
| Element | Recommendation |
|---|---|
| Sample duration | 10s optimal (5s minimum, 15s maximum) |
| Varied speech | Include questions, statements, exclamations |
| Distinct voice | Clear enunciation, consistent microphone setup |
| File format | WAV preferred (lossless), MP3 acceptable |
| Sample rate | 22kHz+ (44.1kHz for premium) |
| Without Audio Memory | With Audio Memory |
|---|---|
| External folder required | Version-controlled with code |
| Breaks on different machines | Works anywhere |
| Manual path management | Structured JSON metadata |
| No documentation | Self-describing with index.json |
| Ad-hoc organization | Consistent skill-scoped storage |
This skill stores voice samples. Use the text-to-speech skill for:
Workflow: