ワンクリックでManusで任意のスキルを実行

始める

voice-note

スター526

フォーク19

更新日2026年6月13日 14:21

Convert voice messages to text (STT) and text to voice (TTS). Supports Whisper local model and Edge-TTS.

インストール

Codex または Claude でインストールこの Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。

Manusで実行

ソース

fuyuxiang

fuyuxiang/echo-agent

GitHub リポジトリを開く Creator のリポジトリを見る

ダウンロード

Manusで実行

Voice Note

Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities.

Speech → Text

Option A: OpenAI Whisper API (cloud)

from openai import OpenAI
client = OpenAI()
with open("audio.ogg", "rb") as f:
    transcript = client.audio.transcriptions.create(model="whisper-1", file=f)
print(transcript.text)

Option B: faster-whisper (local, no API key)

pip install faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("base", compute_type="int8")  # tiny/base/small/medium/large-v3
segments, info = model.transcribe("audio.ogg", language="zh")
text = " ".join(s.text for s in segments)
print(f"[{info.language}] {text}")

Text → Speech

Edge-TTS (free, excellent Chinese voices)

pip install edge-tts

# CLI
edge-tts --voice zh-CN-XiaoxiaoNeural --text "你好世界" --write-media output.mp3

# List voices
edge-tts --list-voices | grep zh-CN

import edge_tts, asyncio

async def speak(text, voice="zh-CN-XiaoxiaoNeural", output="output.mp3"):
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(output)

asyncio.run(speak("今天天气不错，适合出门"))

Chinese Voice Options

Voice	Style
zh-CN-XiaoxiaoNeural	女声，活泼自然
zh-CN-YunxiNeural	男声，温和
zh-CN-YunyangNeural	男声，新闻播报
zh-CN-XiaoyiNeural	女声，温柔

Script

python3 scripts/voice_process.py transcribe audio.ogg --model base --language zh --output transcript.txt
python3 scripts/voice_process.py summarize meeting.mp3 --model small

Note: TTS (speak/voices) is in the separate tts-voice skill.

Audio Format Notes

Input formats: ogg, mp3, wav, m4a, webm (Whisper accepts most)
Output: mp3 (Edge-TTS default)
Convert: ffmpeg -i input.ogg output.mp3

このリポジトリの他の Skills

同じリポジトリ

ppt-author

fuyuxiang/echo-agent

Create and edit PowerPoint (.pptx) presentations programmatically. Requires python-pptx.

2026-06-22526

excel-author

fuyuxiang/echo-agent

Create and edit Excel (.xlsx) workbooks with openpyxl. Supports formulas, charts, formatting, and data analysis.

2026-06-13526

image-gen

fuyuxiang/echo-agent

Generate images via DALL-E, Stable Diffusion, or free alternatives. Supports multi-channel delivery.

2026-06-13526

meme-gen

fuyuxiang/echo-agent

Generate meme images with text overlays using Pillow. Pick templates or create custom image macros.

2026-06-13526

code-runner

fuyuxiang/echo-agent

Execute Python code snippets in a sandboxed environment. Supports data analysis, visualization, and quick scripts.

2026-06-13526

github-ops

fuyuxiang/echo-agent

GitHub CLI for issues, PRs, code search, CI logs, releases, and API queries. Requires gh CLI and auth.

2026-06-13526

name	voice-note
description	Convert voice messages to text (STT) and text to voice (TTS). Supports Whisper local model and Edge-TTS.
version	1.0.0
metadata	{"echo":{"tags":["Voice","STT","TTS","Whisper","Audio","Media"]}}

Voice Note

Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities.

Speech → Text

Option A: OpenAI Whisper API (cloud)

from openai import OpenAI
client = OpenAI()
with open("audio.ogg", "rb") as f:
    transcript = client.audio.transcriptions.create(model="whisper-1", file=f)
print(transcript.text)

Option B: faster-whisper (local, no API key)

pip install faster-whisper

from faster_whisper import WhisperModel

model = WhisperModel("base", compute_type="int8")  # tiny/base/small/medium/large-v3
segments, info = model.transcribe("audio.ogg", language="zh")
text = " ".join(s.text for s in segments)
print(f"[{info.language}] {text}")

Text → Speech

Edge-TTS (free, excellent Chinese voices)

pip install edge-tts

# CLI
edge-tts --voice zh-CN-XiaoxiaoNeural --text "你好世界" --write-media output.mp3

# List voices
edge-tts --list-voices | grep zh-CN

import edge_tts, asyncio

async def speak(text, voice="zh-CN-XiaoxiaoNeural", output="output.mp3"):
    communicate = edge_tts.Communicate(text, voice)
    await communicate.save(output)

asyncio.run(speak("今天天气不错，适合出门"))

Chinese Voice Options

Voice	Style
zh-CN-XiaoxiaoNeural	女声，活泼自然
zh-CN-YunxiNeural	男声，温和
zh-CN-YunyangNeural	男声，新闻播报
zh-CN-XiaoyiNeural	女声，温柔

Script

python3 scripts/voice_process.py transcribe audio.ogg --model base --language zh --output transcript.txt
python3 scripts/voice_process.py summarize meeting.mp3 --model small

Note: TTS (speak/voices) is in the separate tts-voice skill.

Audio Format Notes

Input formats: ogg, mp3, wav, m4a, webm (Whisper accepts most)
Output: mp3 (Edge-TTS default)
Convert: ffmpeg -i input.ogg output.mp3