一键在 Manus 中运行任何 Skill

$pwd:

voice

Name: Voice
Author: octos-org

// OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.

在 Manus 中运行

$ git log --oneline --stat

stars:964

forks:66

updated:2026年5月27日 22:05

文件资源管理器

6 个文件

SKILL.md

readonly

name	voice
description	OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.

OminiX ASR / TTS / Model Management

On-device speech-to-text and preset-voice text-to-speech with emotion control plus model lifecycle management, backed by Qwen3 ASR/TTS models running on the local ominix-api server (Apple Silicon).

Boundary — when NOT to use this skill

Voice cloning / custom voice profiles → use mofa-fm (fm_tts, fm_voice_save, fm_voice_list, fm_voice_delete). This skill is preset-voice only.
Emotion prompts in fallback mode → not supported. When ominix-api is unreachable, voice_synthesize falls through to the macOS built-in say command, which auto-picks a system voice from the text language and ignores the prompt parameter.

Tools

Tool	Purpose
`voice_transcribe`	ASR — WAV/OGG/MP3/FLAC/M4A → text
`voice_synthesize`	Preset-voice TTS with optional emotion + speed
`list_models`	List loaded + catalog models on the local ominix-api
`download_model`	Pull a catalog model to local disk
`load_model`	Load a downloaded model into GPU memory
`unload_model`	Free a loaded model from GPU memory

Quick recipes

Transcribe a voice message

{"audio_path": "voice.ogg", "language": "Chinese"}

Synthesize plain speech

{"text": "Hello world", "language": "english", "speaker": "ryan"}

Synthesize with emotion

{"text": "我太开心了！", "speaker": "vivian", "prompt": "用兴奋激动的语气说话，充满热情和活力"}

After voice_synthesize returns a file path, deliver the audio with send_file.

Anti-patterns

Calling voice_synthesize with a prompt while ominix-api is down — the fallback (say) silently drops the emotion. Use list_models to confirm Qwen3-TTS is loaded before relying on emotion control.
Passing a non-preset speaker name (e.g. a cloned voice id) — this skill only handles preset voices; route the call to mofa-fm instead.
Skipping download_model + load_model after a fresh install — the catalog model is not loaded until you load it explicitly.

related-skills.json

同仓库

harness-starter-audio.md

from "octos-org/octos"

Harnessed audio-artifact starter. Synthesizes a minimal WAV file under audio/ and relies on the workspace contract to deliver it.

2026-05-27964

harness-starter-coding.md

from "octos-org/octos"

Harnessed coding-assistant starter. Produces a unified-diff artifact and a file-list preview under patches/.

2026-05-27964

harness-starter-generic.md

from "octos-org/octos"

Minimal harnessed single-artifact starter. Use as a template for a custom app that produces one deliverable.

2026-05-27964

harness-starter-report.md

from "octos-org/octos"

Harnessed report-generator starter. Writes a markdown artifact under reports/ and relies on the workspace contract to deliver it.

2026-05-27964

deep-crawl.md

from "octos-org/octos"

Recursively crawl websites using headless Chrome. Triggers: crawl, scrape website, 爬取, crawl site, deep crawl, website content.

2026-05-26964

deep-search.md

from "octos-org/octos"

Deep multi-round web research with parallel fetching. Triggers: deep search, research, 深度搜索, 调研, investigate, deep research.

2026-05-22964

package.json

"author": "octos-org"

"repository": "octos-org/octos"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	voice
description	OminiX ASR (speech-to-text), preset-voice TTS with emotion/speed control, and model management via Qwen3 models on Apple Silicon. For voice cloning and custom voice profiles, use mofa-fm. Triggers: voice, transcribe audio, text to speech, speak this, read aloud, model management, download model, 语音识别, 语音合成, 模型管理.

OminiX ASR / TTS / Model Management

On-device speech-to-text and preset-voice text-to-speech with emotion control plus model lifecycle management, backed by Qwen3 ASR/TTS models running on the local ominix-api server (Apple Silicon).

Boundary — when NOT to use this skill

Voice cloning / custom voice profiles → use mofa-fm (fm_tts, fm_voice_save, fm_voice_list, fm_voice_delete). This skill is preset-voice only.
Emotion prompts in fallback mode → not supported. When ominix-api is unreachable, voice_synthesize falls through to the macOS built-in say command, which auto-picks a system voice from the text language and ignores the prompt parameter.

Tools

Tool	Purpose
`voice_transcribe`	ASR — WAV/OGG/MP3/FLAC/M4A → text
`voice_synthesize`	Preset-voice TTS with optional emotion + speed
`list_models`	List loaded + catalog models on the local ominix-api
`download_model`	Pull a catalog model to local disk
`load_model`	Load a downloaded model into GPU memory
`unload_model`	Free a loaded model from GPU memory

Quick recipes

Transcribe a voice message

{"audio_path": "voice.ogg", "language": "Chinese"}

Synthesize plain speech

{"text": "Hello world", "language": "english", "speaker": "ryan"}

Synthesize with emotion

{"text": "我太开心了！", "speaker": "vivian", "prompt": "用兴奋激动的语气说话，充满热情和活力"}

After voice_synthesize returns a file path, deliver the audio with send_file.

Anti-patterns

Calling voice_synthesize with a prompt while ominix-api is down — the fallback (say) silently drops the emotion. Use list_models to confirm Qwen3-TTS is loaded before relying on emotion control.
Passing a non-preset speaker name (e.g. a cloned voice id) — this skill only handles preset voices; route the call to mofa-fm instead.
Skipping download_model + load_model after a fresh install — the catalog model is not loaded until you load it explicitly.

voice

OminiX ASR / TTS / Model Management

Boundary — when NOT to use this skill

Tools

Quick recipes

Transcribe a voice message

Synthesize plain speech

Synthesize with emotion

Further reading

Anti-patterns

OminiX ASR / TTS / Model Management

Boundary — when NOT to use this skill

Tools

Quick recipes

Transcribe a voice message

Synthesize plain speech

Synthesize with emotion

Further reading

Anti-patterns

voice

OminiX ASR / TTS / Model Management

Boundary — when NOT to use this skill

Tools

Quick recipes

Transcribe a voice message

Synthesize plain speech

Synthesize with emotion

Further reading

Anti-patterns

同仓库更多 Skills

同仓库更多 Skills

OminiX ASR / TTS / Model Management

Boundary — when NOT to use this skill

Tools

Quick recipes

Transcribe a voice message

Synthesize plain speech

Synthesize with emotion

Further reading

Anti-patterns