Jeden Skill in Manus ausführen
mit einem Klick

Jeden Skill in Manus mit einem Klick ausführen

omnivoice

Local TTS, voice cloning, voice design, and video dubbing via the OmniVoice Studio MCP server (open-source ElevenLabs alternative; nothing leaves the machine, runs on MPS/CUDA/CPU). Use when: (1) generating speech from text in any of 646 languages, (2) cloning a voice from a 3-second reference clip, (3) designing a voice by gender/age/accent/pitch/style, (4) dubbing a video into another language, (5) listing voice profiles or personality presets, (6) producing narration where privacy, cost, or absent API keys matter, (7) non-English narration where Edge TTS/kokoro fall short, (8) batch audio for blog posts or content pipelines. Triggers: 'omnivoice', 'voice clone', 'clone this voice', 'tts', 'narrate', 'generate speech', 'voice synthesis', 'dub video', 'voice design', 'local tts', 'multilingual voice', 'narrate this post', 'elevenlabs alternative'.

In Manus ausführen

Sterne2

Forks0

Aktualisiert4. Juni 2026 um 17:09

Quelle

broomva

broomva/skills

GitHub-Repository öffnen Creator-Repositorys ansehen

Installationsbefehl

Download

In Manus ausführen

Datei-Explorer

8 Dateien

SKILL.md

readonly

Mehr aus diesem Repository

gleiches Repository

revenuecast

broomva/skills

revenuecast — turn a real-world capability into a self-demonstrating, high-throughput generative-AI revenue engine (the "Kleos" method). It is "/skillify for revenue": skillify turns a workflow into a tested skill; revenuecast turns a capability into a revenue engine whose own output IS the advertisement. The bstack-native composition of the 2026 "show-then-sell-the-system" creator loop (realosias, aivideoskool, GenHQ): Brand-Lock -> Show -> Distribute -> Hook -> Sell -> Moat, where the showcased output's desirability + accessibility-via-AI creates demand for the method, and you monetize the reproducible system. Composes content-engine (the factory), content-creation, blog-post, seo-llmeo, arcan-glass, social-intelligence, strategy-skills, and symphony/arcan (autonomous runtime). Its deterministic core (scripts/revenuecast_check.py) gates an engine-instance manifest on the design canon — own-the-audience, a real moat (not leakable prompts), the compliance/survival pillar (FTC v. Air AI / EU AI Act Art.50 / NO

2026-06-092

skillify

broomva/skills

Skillify-as-a-verb — distill a working session (or a pointed-at chat history) into a permanent, TESTED, registered skill at the end of a workflow. The bstack-native composition of Garry Tan's 10-step "skillify it": look-back extraction → CreateSkill scaffold → latent/deterministic split → unit tests → resolver-eval (role-x.py eval) → script-test gate (bstack skills audit --require-tests) → P20 cross-review → bookkeeping file. Composes existing primitives; reimplements nothing. The deterministic core (scripts/skillify_check.py) makes "a feature that doesn't pass all ten is not a skill" machine-checkable. USE WHEN: "skillify it", "skillify this", "package this as a skill", "distill this into a skill", "make this a skill", "turn this into a skill", or at the end of an ad-hoc workflow that worked and should become permanent. NOT FOR: ingesting an external artifact (use /checkit); retrospective "what have I done repeatedly" discovery alone (use the look-back lens); a one-off task with no reusable procedure.

2026-06-062

handoff

broomva/skills

Fresh-session handoff doc drafting. Produces a stable, single-file human-readable narrative state for the NEXT agent context (fresh session, after `/clear`, after persist iteration, after a tab close). The artifact lives at `docs/handoffs/YYYY-MM-DD-<arc>.md` and follows a stable shape: TL;DR + State-of-the-world (P15 snapshot) + What-was-delivered (PR table with SHAs) + First action + Pickup state. Distinct from P12 persist's `PROMPT.md` (machine-state for cross-context loop) and the P1 Bridge session log (raw transcript) — the handoff is the narrative bridge a human reads in ten seconds and a fresh agent reads in thirty. Use when: (1) ending a substantive session that another agent will continue, (2) preparing a fresh-session pickup point mid-arc, (3) needing to compress a multi-PR arc into a single resumable document, (4) the user says "write a handoff" / "fresh-session handoff" / "let me come back to this tomorrow". Triggers on "handoff", "fresh-session", "fresh session", "pickup", "where we are", "leave

2026-06-062

investment-management

broomva/skills

Investment management skill — portfolio construction, analysis, and execution. Compounds finance-substrate (accounting/tax) + wealth-management (projections/goals) into a full financial framework. Covers traditional investing (stocks, ETFs, bonds), alternatives (crypto, prediction markets, real estate, VC), quantitative analytics (factor models, Monte Carlo, optimization), and platform integration (Alpaca, Coinbase, Polymarket, agent-browser for Colombian platforms). Embodies philosophies from Buffett, Dalio, Bogle, Munger, and Marks. Use when: (1) building or analyzing a portfolio, (2) screening stocks/ETFs/crypto, (3) running backtests or factor analysis, (4) executing trades or rebalancing, (5) tracking investments across platforms, (6) researching market data or fundamentals, (7) making investment decisions with structured frameworks. Triggers on 'investment', 'portfolio', 'stocks', 'ETF', 'bonds', 'crypto', 'trading', 'backtest', 'factor model', 'rebalance', 'Polymarket', 'Alpaca', 'asset allocation'.

2026-06-042

alkosto-wait-optimizer

broomva/skills

Estimate optimal waiting time for Alkosto's "every 25/50 customers" promotion using either checkout-flow observations or winner announcement timestamps. Use when the user asks how long to wait, wants a probability-based cutoff, or needs a fast in-store decision rule with uncertainty handling.

2026-06-042

capx-agentic-robotics

broomva/skills

Agentic robotics with CaP-X — LLM-driven robot manipulation via code generation. Use when: (1) Setting up CaP-X / CaP-Gym environments for robot manipulation benchmarks, (2) Running CaP-Bench evaluations across LLMs/VLMs on robotic tasks, (3) Building or extending CaP-Agent0 agentic harnesses (skill libraries, visual differencing, parallel reasoning), (4) Training robot coding agents with CaP-RL (GRPO on code generation), (5) Developing perception APIs (SAM3, Molmo, depth, point clouds) or control APIs (IK solvers, grasp planners), (6) Sim-to-real transfer for Franka Panda, R1Pro humanoid, or other robot platforms, (7) Designing auto-synthesized skill libraries for physical manipulation (Voyager-style), (8) Integrating agentic robotics with Life Agent OS (Arcan orchestration, Spaces agent networking, Lago persistence), (9) Any task involving LLM-based robot control, manipulation benchmarks, robotic code synthesis, or embodied AI agents.

2026-06-042

name

omnivoice

description

OmniVoice

Overview

Generate audio locally via the OmniVoice Studio MCP server. Tools: generate_speech, list_voices, list_personalities, list_languages, check_health. Resources: voice://{id}, history://recent.

Prerequisites — Backend Must Be Running

The MCP tools all hit $OMNIVOICE_API_URL (default http://localhost:3900). If the backend is down, every tool returns a connection error. Install + boot:

git clone https://github.com/debpalash/OmniVoice-Studio.git "$OMNIVOICE_HOME"
cd "$OMNIVOICE_HOME"
uv sync
VIRTUAL_ENV="$(pwd)/.venv" uv pip install 'mcp[cli]'

Then:

scripts/check-health.sh        # exit 0 if up
scripts/start-backend.sh       # boot in background (MPS/CUDA auto-detected)

First synthesis call lazy-downloads the k2-fsa/OmniVoice model (~2.4 GB) from HuggingFace — cached on subsequent boots.

Task Index — Pick the Right Tool

Task	Tool	Notes
Verify backend is up	`check_health`	Returns `{"status":"ok","device":"mps
Text → audio with a saved voice	`generate_speech(text, profile_id)`	Returns base64 WAV. `profile_id="demo0001"` is the bundled demo voice
Text → audio without a clone (voice design)	`generate_speech(text, instruct="…")`	Omit `profile_id`; pass an `instruct` like `"warm middle-aged female narrator, calm pace"`
Multilingual narration	`generate_speech(text, language="es")`	Any ISO 639 code or `"Auto"`
List existing voices	`list_voices`	Returns id, name, type, personality
List personality presets	`list_personalities`	Returns narrator / casual / news-anchor / etc. with their `instruct` strings
List supported languages	`list_languages`	646 total; returns 20 popular + the full count

For non-trivial decisions (which engine to use, when to pick OmniVoice over kokoro / Edge TTS / ElevenLabs), see references/engines-comparison.md.

For MCP wiring details, backend lifecycle, troubleshooting, and a clean teardown, see references/mcp-setup.md.

Common Workflows

1. One-shot narration with the demo voice

# As called through the MCP client (your agent will do this for you):
result = generate_speech(
    text="Hello — this is OmniVoice generating speech locally.",
    profile_id="demo0001",
    language="English",
    steps=16,                   # 8 = fast/draft · 16 = balanced · 32 = quality
)
# result is JSON with audio_id, generation_time_s, audio_duration_s, format, wav_base64

Benchmark: 4.2 s of audio in ~24 s server-side on Apple Silicon MPS at 16 diffusion steps.

2. Save the WAV to disk and play

Tool returns base64 PCM WAV (16-bit, mono, 24 kHz). Decode + write:

import base64, json
payload = json.loads(result_text)            # parse JSON the tool returns
open("out.wav","wb").write(base64.b64decode(payload["wav_base64"]))

On macOS: afplay out.wav. Convert to MP3 with ffmpeg -i out.wav -codec:a libmp3lame -b:a 128k out.mp3.

3. Voice clone — end-to-end recipe

Cloning needs a 3-10 second reference clip the model will use as a speaker embedding. The MCP server does NOT expose profile creation — it only reads existing profiles. Two paths to create one:

Path A — bundled helper (macOS, recommended for fresh clones):

scripts/record-reference.sh ~/Downloads/my-ref.wav 12 1
# args: output_path raw_duration_sec mic_index
# Default mic_index=1 (MacBook built-in); list devices via:
#   ffmpeg -f avfoundation -list_devices true -i ""

The script gives audible countdown + start/stop cues via macOS say + /System/Library/Sounds/Ping.aiff so the user knows when to speak (terminal stdout is buffered — text "speak now" prompts arrive too late). It records a longer raw window, then trims to ~10 seconds of speech via silenceremove + atrim, plays back for verification, and prints the next-step curl command.

Path B — manual:

# 1. Record (mono, 24 kHz native — matches model's internal rate)
ffmpeg -f avfoundation -i ":1" -t 12 -ac 1 -ar 24000 raw.wav

# 2. Trim leading silence + take first 10 sec of speech
ffmpeg -i raw.wav \
  -af "silenceremove=start_periods=1:start_silence=0.05:start_threshold=-40dB,atrim=end=10" \
  -ac 1 -ar 24000 ref.wav

# 3. Verify
ffmpeg -i ref.wav -af volumedetect -f null - 2>&1 | grep volume   # max should be > -20 dB
afplay ref.wav

POST to /profiles (multipart/form-data — required fields: name, ref_audio):

curl -X POST http://127.0.0.1:3900/profiles \
  -F "name=carlos-clone" \
  -F "ref_audio=@ref.wav" \
  -F "ref_text=The exact text spoken in the clip" \
  -F "language=English" \
  | python3 -m json.tool
# returns { "id": "abc12345", "name": "carlos-clone" }

Once created, pass profile_id to generate_speech (via MCP) or directly via POST /generate. Profiles persist in SQLite + reference-audio files at ~/Library/Application Support/OmniVoice/voices/<id>.<ext> (the backend preserves the uploaded extension — .wav if you uploaded a WAV, .mp3 if MP3, etc.). State persists across backend restarts.

Reference clip tips that materially affect quality:

Factor	Why it matters
Single speaker	Mixed speakers blur the embedding
Clean speech, no music/noise	Model embeds the noise too
Natural prosody (avoid pangrams)	Diffusion samples replicate prosody, not just timbre
3-10 sec is the sweet spot	< 3 s lacks information; > 10 s adds compute without quality gain
Match `ref_text` to what's spoken	Improves alignment, especially on noisy refs
`language` correct	Wrong language → cross-lingual transfer artifacts
Loudness peak ≥ -15 dB	Quiet refs work but normalize poorly

4. Voice design (no reference clip)

Skip profile_id; provide an instruct string describing the desired voice:

generate_speech(
    text="Welcome to the future of agentic systems.",
    instruct="warm middle-aged female narrator, calm authoritative pace, documentary style",
)

Get pre-made instructs via list_personalities and copy the one matching the brief (narrator, casual, news-anchor, etc.).

5. Video dubbing (web UI only)

The MCP server does not expose the dubbing endpoint. The full transcribe → translate → re-voice → mux pipeline lives behind the desktop UI (bun run desktop in $OMNIVOICE_HOME) and the /dub/* REST routes. When the user asks to dub a video, point them to the UI; surface this skill only for the synthesis primitives above.

When NOT to use OmniVoice

Fast English-only narration on weak hardware → kokoro-tts is ~10× smaller and 2× realtime on CPU (see references/engines-comparison.md)
Lowest-friction one-off TTS → Edge TTS needs no install or backend
Highest possible quality regardless of cost → ElevenLabs still wins on English narration polish; OmniVoice ties or wins on multilingual + cloning
Real-time streaming dictation → use the OmniVoice desktop widget (⌘+⇧+Space), not the MCP server

Resources

references/engines-comparison.md — Decision tree across OmniVoice / kokoro / Voicebox / Edge TTS / ElevenLabs / cloud APIs
references/mcp-setup.md — MCP wiring, backend lifecycle, env vars, troubleshooting
scripts/check-health.sh — curl /health, exit 0/1
scripts/start-backend.sh — Start uvicorn on 127.0.0.1:3900 with health probe
scripts/stop-backend.sh — Clean shutdown via kill -TERM on the bound PID
scripts/record-reference.sh — macOS-only: record + trim + verify a reference clip for cloning, with audible cues (say + system beeps) that bypass terminal output buffering

Backend Swagger / OpenAPI: http://127.0.0.1:3900/docs (when backend is up).

Upstream: github.com/debpalash/OmniVoice-Studio — FSL-1.1-ALv2 (free for personal/internal/non-commercial; auto-converts to Apache-2.0 two years after each release).