ワンクリックでManusで任意のスキルを実行

始める

audio-memory

スター5

フォーク3

更新日2026年4月15日 17:25

Store and manage voice samples for TTS cloning — portable, version-controlled audio references

インストール

Codex または Claude でインストールこの Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。

Manusで実行

ソース

fabioc-aloha

fabioc-aloha/Alex_Plug_In

GitHub リポジトリを開く Creator のリポジトリを見る

ダウンロード

Manusで実行

Audio Memory Skill

Domain: AI Audio / Voice Cloning
Version: 1.0.0
Last Updated: 2026-04-15
Author: Alex (Master Alex)
Related: text-to-speech (generation), visual-memory (photos)

Overview

Store voice samples for TTS voice cloning in a portable, version-controlled format. Unlike visual memory (base64 inline), audio files are stored as files with JSON metadata — audio is too large to embed sensibly.

Voice Sample Specifications

Spec	Value
Duration	5-15 seconds of clear speech
Format	WAV (preferred) or MP3
Sample rate	16kHz+ (22kHz+ recommended)
Content	Natural speech, varied intonation
Background	No music, no background noise
File size	~100KB-500KB per sample

Storage Structure

.github/skills/<skill-name>/audio-memory/
├── index.json              # Metadata registry
├── voices/
│   ├── alex-sample.wav     # Voice sample files
│   ├── narrator-sample.wav
│   └── ...
└── README.md               # Usage notes (optional)

index.json Schema

{
  "version": "1.0",
  "updated": "2026-04-15",
  "voices": {
    "alex": {
      "description": "Natural conversational voice, warm and friendly",
      "audioFile": "voices/alex-sample.wav",
      "duration": "10s",
      "sampleRate": "22050",
      "language": "en-US",
      "preferredModel": "chatterbox-turbo",
      "notes": "Best for narration and documentation reads"
    },
    "narrator": {
      "description": "Professional narration voice",
      "audioFile": "voices/narrator-sample.wav",
      "duration": "12s",
      "sampleRate": "44100",
      "language": "en-US",
      "preferredModel": "qwen/qwen3-tts"
    }
  }
}

Compatible TTS Models

Model	Replicate ID	Voice Cloning	Cost
Chatterbox Turbo	`resemble-ai/chatterbox-turbo`	✅ 5s sample	$0.025/1k chars
Qwen TTS	`qwen/qwen3-tts`	✅ 3 modes	$0.02/1k chars
MiniMax Speech	`minimax/speech-2.8-turbo`	❌ Presets	$0.06/1k tokens

Note: MiniMax doesn't support cloning but has 40+ voice presets.

Recording Voice Samples

Requirements

Duration: 5-15 seconds (longer = better quality cloning)
Content: Natural speech with varied intonation (not monotone reading)
Quality: Clear audio, no background noise, no music
Format: WAV 16kHz+ or MP3

Recording Tips

Use a quiet room with minimal echo
Speak naturally — include some pauses, varied pitch
Avoid reading monotonously — conversational tone works best
Keep microphone at consistent distance (~6-12 inches)
Include a variety of sounds (different vowels, consonants)

Example Recording Script

"Hello, I'm [Name]. Today I want to share some thoughts about technology and how it shapes our daily lives. The key is finding balance — embracing innovation while staying grounded in what matters most."

Adding a Voice Sample

Step 1: Record the Sample

# Recommended: Use Audacity, Voice Memos (macOS), or Windows Voice Recorder
# Export as WAV, 22kHz or 44.1kHz, mono

Step 2: Place in Audio Memory

# Create directory structure
New-Item -ItemType Directory -Path ".github/skills/<skill>/audio-memory/voices" -Force

# Copy voice sample
Copy-Item "my-recording.wav" ".github/skills/<skill>/audio-memory/voices/<name>-sample.wav"

Step 3: Update index.json

{
  "voices": {
    "<name>": {
      "description": "Brief description of the voice character",
      "audioFile": "voices/<name>-sample.wav",
      "duration": "10s",
      "sampleRate": "22050",
      "language": "en-US",
      "preferredModel": "chatterbox-turbo"
    }
  }
}

Step 4: Test the Clone

import Replicate from "replicate";

const replicate = new Replicate();

const output = await replicate.run("resemble-ai/chatterbox-turbo", {
  input: {
    text: "Testing the voice clone. This should sound like the reference sample.",
    audio_prompt: fs.readFileSync("voices/<name>-sample.wav"),
  },
});

console.log("Generated audio:", output);

Using Audio Memory in Generation

With Chatterbox Turbo

import { readFileSync } from "fs";
import Replicate from "replicate";

// Load audio memory
const audioMemory = JSON.parse(
  readFileSync(".github/skills/<skill>/audio-memory/index.json", "utf8")
);
const voice = audioMemory.voices["alex"];

// Generate speech with cloned voice
const replicate = new Replicate();
const output = await replicate.run("resemble-ai/chatterbox-turbo", {
  input: {
    text: "Content to speak in the cloned voice",
    audio_prompt: readFileSync(
      `.github/skills/<skill>/audio-memory/${voice.audioFile}`
    ),
  },
});

With Qwen TTS (Clone Mode)

const output = await replicate.run("qwen/qwen3-tts", {
  input: {
    text: "Content to speak",
    tts_mode: "voice_clone",
    audio_input: readFileSync(
      `.github/skills/<skill>/audio-memory/${voice.audioFile}`
    ),
  },
});

Quality Guidelines

Element	Recommendation
Sample duration	10s optimal (5s minimum, 15s maximum)
Varied speech	Include questions, statements, exclamations
Distinct voice	Clear enunciation, consistent microphone setup
File format	WAV preferred (lossless), MP3 acceptable
Sample rate	22kHz+ (44.1kHz for premium)

Benefits vs External Storage

Without Audio Memory	With Audio Memory
External folder required	Version-controlled with code
Breaks on different machines	Works anywhere
Manual path management	Structured JSON metadata
No documentation	Self-describing with index.json
Ad-hoc organization	Consistent skill-scoped storage

Integration with text-to-speech Skill

This skill stores voice samples. Use the text-to-speech skill for:

Generating speech from text
Model selection (MiniMax, Chatterbox, Qwen)
Emotion control
Voice design from descriptions (no sample needed)

Workflow:

audio-memory: Store and manage voice samples
text-to-speech: Generate speech using those samples

name	audio-memory
description	Store and manage voice samples for TTS cloning — portable, version-controlled audio references
tier	standard
applyTo	*/voice,/audiomemory,*/clonevoice
$schema	../SKILL-SCHEMA.json

name	audio-memory
description	Store and manage voice samples for TTS cloning — portable, version-controlled audio references
tier	standard
applyTo	*/voice,/audiomemory,*/clonevoice
$schema	../SKILL-SCHEMA.json

audio-memory

このリポジトリの他の Skills

このリポジトリの他の Skills

Audio Memory Skill

Overview

Voice Sample Specifications

Storage Structure

index.json Schema

Compatible TTS Models

Recording Voice Samples

Requirements

Recording Tips

Example Recording Script

Adding a Voice Sample

Step 1: Record the Sample

Step 2: Place in Audio Memory

Step 3: Update index.json

Step 4: Test the Clone

Using Audio Memory in Generation

With Chatterbox Turbo

With Qwen TTS (Clone Mode)

Quality Guidelines

Benefits vs External Storage

Integration with text-to-speech Skill

Audio Memory Skill

Overview

Voice Sample Specifications

Storage Structure

index.json Schema

Compatible TTS Models

Recording Voice Samples

Requirements

Recording Tips

Example Recording Script

Adding a Voice Sample

Step 1: Record the Sample

Step 2: Place in Audio Memory

Step 3: Update index.json

Step 4: Test the Clone

Using Audio Memory in Generation

With Chatterbox Turbo

With Qwen TTS (Clone Mode)

Quality Guidelines

Benefits vs External Storage

Integration with text-to-speech Skill