Run any Skill in Manus with one click

tts

Stars8

Forks2

UpdatedApril 27, 2026 at 06:38

Synthesize speech from text and play it through the macOS speakers. Talks to a local LocalKin Service Audio Server (default :8001 / Kokoro). When `record` is running with audio=true, the spoken audio is captured into the recording — use this in place of `shell say` for high-quality multilingual narration in demo videos. Set the TTS_ENDPOINT env var to point at a different server.

Installation

Install with Codex or Claude Copy this prompt, paste it into Codex, Claude, or another assistant, and let it review the skill page and install it for you.

Run Skill in Manus

Source

LocalKinAI

LocalKinAI/kinclaw

View GitHub Repository View Creator Repositories

Download

Run Skill in Manus

Related occupationsSOC

Based on SOC occupation classification

Software DevelopersComputer and Mathematical Occupations·SOC 15-1252

SKILL.md

readonly

name	tts
description	Synthesize speech from text and play it through the macOS speakers. Talks to a local LocalKin Service Audio Server (default :8001 / Kokoro). When `record` is running with audio=true, the spoken audio is captured into the recording — use this in place of `shell say` for high-quality multilingual narration in demo videos. Set the TTS_ENDPOINT env var to point at a different server.
command	["sh","-c","T=\"$1\"\nS=\"$2\"\nW=\"$3\"\n# Kernel strips unsubstituted {{vars}} to \"\", so empty == \"param\n# not passed\". Don't add `[ \"$X\" = \"{{name}}\" ]` sentinels — those\n# self-defeat when the caller passes a real value.\n# Default to a Chinese female voice when the text contains CJK\n# characters, otherwise let the server pick. The server silently\n# falls back to English-only Kokoro on missing speaker, which\n# mispronounces Chinese as the literal phrase \"Chinese letter\".\nif [ -z \"$S\" ] && printf '%s' \"$T\" \| LC_ALL=C grep -q '[^[:print:][:space:]]'; then\n S=\"${TTS_DEFAULT_ZH_SPEAKER:-zf_xiaoxiao}\"\nfi\nif [ -n \"$S\" ]; then\n PAYLOAD=$(jq -nc --arg t \"$T\" --arg s \"$S\" '{text:$t,speaker:$s}')\nelse\n PAYLOAD=$(jq -nc --arg t \"$T\" '{text:$t}')\nfi\nOUT=$(mktemp -t kinclaw-tts).wav\nHTTP=$(printf '%s' \"$PAYLOAD\" \\\n \| curl -sS -X POST \"${TTS_ENDPOINT:-http://localhost:8001}/synthesize\" \\\n -H 'Content-Type: application/json' \\\n --data-binary @- \\\n -o \"$OUT\" -w '%{http_code}')\nif [ \"$HTTP\" != \"200\" ]; then\n echo \"tts: server returned HTTP $HTTP\" >&2\n cat \"$OUT\" >&2\n rm -f \"$OUT\"\n exit 1\nfi\n# wait=false (default): play in background, return immediately so\n# the agent can continue acting while audio is still narrating.\n# During `record` this gives parallel narration + action without\n# the recording capturing dead air.\n# wait=true: block until afplay finishes — use only when the next\n# action visually depends on what was just said.\nif [ \"$W\" = \"true\" ]; then\n afplay \"$OUT\" \|\| exit $?\n printf 'spoken: %s\\nspeaker: %s\\nmode: blocking\\npath: %s\\n' \"$T\" \"${S:-<server default>}\" \"$OUT\"\nelse\n ( afplay \"$OUT\" >/dev/null 2>&1 ) &\n printf 'spoken: %s\\nspeaker: %s\\nmode: background pid=%d\\npath: %s\\n' \"$T\" \"${S:-<server default>}\" \"$!\" \"$OUT\"\nfi\n","_"]
args	["{{text}}","{{speaker}}","{{wait}}"]
schema	{"text":{"type":"string","description":"Text to speak. Required. Captured into video by `record` when audio=true.","required":true},"speaker":{"type":"string","description":"Kokoro speaker id. The server's field name is `speaker`, not `voice` — passing the wrong key is silently ignored and falls back to the English model, which mispronounces Chinese as \"chinese letter\". Examples:\n- Chinese female: `zf_xiaoxiao` (default for CJK text), `zf_xiaobei`, `zf_xiaoni`\n- Chinese male: `zm_yunxi`, `zm_yunjian`\n- English female: `af_bella`, `af_sarah`\n- English male: `am_adam`, `am_michael`\nOmit to let the skill auto-pick: CJK text gets `zf_xiaoxiao`, ASCII gets the server default.\n","required":false},"wait":{"type":"string","description":"\"true\" or \"false\" (default false). Default false plays in the background and returns immediately, so the agent keeps acting while the narration plays — recommended during `record` to avoid burning recording time on dead air. Pass \"true\" only when the next action visually depends on what was just said (rare).\n","required":false}}
timeout	120

tts — speech synthesis via LocalKin Service Audio (Kokoro)

Wraps the LocalKin Service Audio API at :8001/synthesize and plays the returned WAV through the macOS default output (afplay).

Why this is a SKILL.md and not a native skill

It's three lines of curl + afplay. Pushing it into pkg/skill/ would violate the "thin kernel + fat skill" thesis and make it harder for users to fork. As an external SKILL.md it's also a forge template: the next HTTP service that needs wrapping can be modeled on this file.

How `record` captures the narration

record action=start audio=true enables ScreenCaptureKit's system-audio tap. afplay writes to the default output device, which the tap captures. End result: the spoken text shows up on the video's audio track without any extra plumbing.

Examples

tts text="接下来我会打开计算器" voice="zf_xiaoxiao"
tts text="Now I'll open Safari and search for KinClaw"

Override the endpoint

TTS_ENDPOINT=http://otherbox:8001 kinclaw -soul souls/pilot.soul.md

Failure modes

tts: server returned HTTP 000 — server isn't running on the configured port.
tts: server returned HTTP 4xx/5xx — server is up but rejected the request; the body is echoed to stderr.
afplay: ... — playback failed (no audio device, sandboxed environment).

More from this repository

same repository

cerebellum

LocalKinAI/kinclaw

Single fast-execution skill for macOS / Linux / Windows app operations. The "cerebellum" to the LLM's "cerebrum" — the brain decides intent, this skill executes the canonical multi-step pattern in one syscall sequence (no LLM round-trip per step). Inspired by the same architecture as the LocalKin robot car: the high-level brain plans ("rename this file to X"), the cerebellum daemon executes deterministically (`mv old new`) at near-zero latency. USAGE: cerebellum "<category> <action> [args...]" Categories: macOS finder, notes, mail, calendar, reminders, settings, safari, music, photos, maps, pages, numbers, keynote, terminal, multi, web (16 cats, 478 actions) Linux linux-files, linux-apps, linux-settings, linux-clipboard (4 cats) Windows windows-files, windows-apps, windows-settings, windows-clipboard (4 cats) Run with no args to see the full action menu. Examples: cerebell

2026-05-128

location

LocalKinAI/kinclaw

Get the user's current GPS location. Auto-detects the best backend for the host OS: macOS → corelocationcli (CoreLocation — cell / WiFi / GPS) Linux → gdbus + geoclue2 (kernel-driver / WiFi / Mozilla MLS) fallback → IP geolocation (ipapi.co — city-level only) Returns coordinates / address / city / full details — request what you need with the `format` arg. When to use this vs the {{location}} system prompt context: - {{location}} is set from $KINCLAW_LOCATION env var — "where the user generally is" (home / office). Static, free, always available. - This skill reads the OS location service in real time — "where the user is right NOW". Use when user might be traveling / driving / out, or when meter-level precision matters. First invocation prompts for permission (macOS Location Services / Linux Geoclue per-app authorization through xdg-desktop-portal). Grant it for whichever process kinclaw runs in. First-time setup: macOS: brew install corelocationcli Linux: apt install

2026-05-128

mail-draft

LocalKinAI/kinclaw

Create a Mail draft (saved, not sent) with subject + body and an optional attachment. Uses Mail's AppleScript dictionary directly: `make new outgoing message` then `save` (NOT `send`). The draft appears in the Drafts mailbox of the default account. This is the CORRECT path for any task that says "save as Mail draft" / "share via Mail as draft". Common wrong path: agents open a compose window with Cmd+N, type subject + body, then close with Cmd+W — that prompts a Save sheet they often skip, losing the draft. This skill bypasses the compose window entirely.

2026-05-108

notes-attach-image

LocalKinAI/kinclaw

Attach an image file to a Notes note. Notes' AppleScript dictionary does NOT support `make new attachment` for image files, so this skill uses the clipboard-paste path: 1. Read the image into the system pasteboard via `osascript`'s `(read POSIX file "X" as JPEG picture)` coercion (works for JPEG / PNG / TIFF — checks file extension). 2. Activate Notes, select the target note, focus the body. 3. Move cursor to end of body, then Cmd+V — Notes pastes a pasteboard image as an attachment in the note. This is more robust than the Edit-menu "Attach File" UI flow, which opens a file picker that's brittle to AX-walk.

2026-05-108

notes-checklist

LocalKinAI/kinclaw

Convert a note's body into a Notes-native checklist (HTML `<ul class="gtl-todo-list">` markup), with optional auto-checking of specific item indices. Agents routinely fail at this by typing markdown `- [ ]` text — Notes does NOT recognize markdown checklist syntax; it stores literal "- [ ]" text. The native checklist requires Cmd+Shift+L on selected text or Format menu → Checklist. This skill: activates Notes, focuses the note body, selects all, Cmd+Shift+L to convert. If `check_indices` is given, navigates to each line and presses Cmd+Shift+U to toggle the checkbox.

2026-05-108

notes-export-pdf

LocalKinAI/kinclaw

Export a Notes note to a PDF file at an absolute path. Notes doesn't expose Export-as-PDF via AppleScript dictionary; this skill drives the File menu + Save sheet via UI scripting. The Save sheet is the standard NSSavePanel — we type the filename into the focused field, use Cmd+Shift+G to set the destination directory, then press Return to commit. The skill does NOT use the Print → Save as PDF route (3+ extra dialog clicks); it uses Notes' direct File → Export as PDF... menu. After the save, the skill polls for the file's existence to confirm completion before returning.

2026-05-108

name	tts
description	Synthesize speech from text and play it through the macOS speakers. Talks to a local LocalKin Service Audio Server (default :8001 / Kokoro). When `record` is running with audio=true, the spoken audio is captured into the recording — use this in place of `shell say` for high-quality multilingual narration in demo videos. Set the TTS_ENDPOINT env var to point at a different server.
command	["sh","-c","T=\"$1\"\nS=\"$2\"\nW=\"$3\"\n# Kernel strips unsubstituted {{vars}} to \"\", so empty == \"param\n# not passed\". Don't add `[ \"$X\" = \"{{name}}\" ]` sentinels — those\n# self-defeat when the caller passes a real value.\n# Default to a Chinese female voice when the text contains CJK\n# characters, otherwise let the server pick. The server silently\n# falls back to English-only Kokoro on missing speaker, which\n# mispronounces Chinese as the literal phrase \"Chinese letter\".\nif [ -z \"$S\" ] && printf '%s' \"$T\" \| LC_ALL=C grep -q '[^[:print:][:space:]]'; then\n S=\"${TTS_DEFAULT_ZH_SPEAKER:-zf_xiaoxiao}\"\nfi\nif [ -n \"$S\" ]; then\n PAYLOAD=$(jq -nc --arg t \"$T\" --arg s \"$S\" '{text:$t,speaker:$s}')\nelse\n PAYLOAD=$(jq -nc --arg t \"$T\" '{text:$t}')\nfi\nOUT=$(mktemp -t kinclaw-tts).wav\nHTTP=$(printf '%s' \"$PAYLOAD\" \\\n \| curl -sS -X POST \"${TTS_ENDPOINT:-http://localhost:8001}/synthesize\" \\\n -H 'Content-Type: application/json' \\\n --data-binary @- \\\n -o \"$OUT\" -w '%{http_code}')\nif [ \"$HTTP\" != \"200\" ]; then\n echo \"tts: server returned HTTP $HTTP\" >&2\n cat \"$OUT\" >&2\n rm -f \"$OUT\"\n exit 1\nfi\n# wait=false (default): play in background, return immediately so\n# the agent can continue acting while audio is still narrating.\n# During `record` this gives parallel narration + action without\n# the recording capturing dead air.\n# wait=true: block until afplay finishes — use only when the next\n# action visually depends on what was just said.\nif [ \"$W\" = \"true\" ]; then\n afplay \"$OUT\" \|\| exit $?\n printf 'spoken: %s\\nspeaker: %s\\nmode: blocking\\npath: %s\\n' \"$T\" \"${S:-<server default>}\" \"$OUT\"\nelse\n ( afplay \"$OUT\" >/dev/null 2>&1 ) &\n printf 'spoken: %s\\nspeaker: %s\\nmode: background pid=%d\\npath: %s\\n' \"$T\" \"${S:-<server default>}\" \"$!\" \"$OUT\"\nfi\n","_"]
args	["{{text}}","{{speaker}}","{{wait}}"]
schema	{"text":{"type":"string","description":"Text to speak. Required. Captured into video by `record` when audio=true.","required":true},"speaker":{"type":"string","description":"Kokoro speaker id. The server's field name is `speaker`, not `voice` — passing the wrong key is silently ignored and falls back to the English model, which mispronounces Chinese as \"chinese letter\". Examples:\n- Chinese female: `zf_xiaoxiao` (default for CJK text), `zf_xiaobei`, `zf_xiaoni`\n- Chinese male: `zm_yunxi`, `zm_yunjian`\n- English female: `af_bella`, `af_sarah`\n- English male: `am_adam`, `am_michael`\nOmit to let the skill auto-pick: CJK text gets `zf_xiaoxiao`, ASCII gets the server default.\n","required":false},"wait":{"type":"string","description":"\"true\" or \"false\" (default false). Default false plays in the background and returns immediately, so the agent keeps acting while the narration plays — recommended during `record` to avoid burning recording time on dead air. Pass \"true\" only when the next action visually depends on what was just said (rare).\n","required":false}}
timeout	120

tts — speech synthesis via LocalKin Service Audio (Kokoro)

Wraps the LocalKin Service Audio API at :8001/synthesize and plays the returned WAV through the macOS default output (afplay).

Why this is a SKILL.md and not a native skill

How `record` captures the narration

Examples

tts text="接下来我会打开计算器" voice="zf_xiaoxiao"
tts text="Now I'll open Safari and search for KinClaw"

Override the endpoint

TTS_ENDPOINT=http://otherbox:8001 kinclaw -soul souls/pilot.soul.md

Failure modes

tts: server returned HTTP 000 — server isn't running on the configured port.
tts: server returned HTTP 4xx/5xx — server is up but rejected the request; the body is echoed to stderr.
afplay: ... — playback failed (no audio device, sandboxed environment).

tts

tts — speech synthesis via LocalKin Service Audio (Kokoro)

Why this is a SKILL.md and not a native skill

How record captures the narration

Examples

Override the endpoint

Failure modes

More from this repository

tts — speech synthesis via LocalKin Service Audio (Kokoro)

Why this is a SKILL.md and not a native skill

How record captures the narration

Examples

Override the endpoint

Failure modes

More from this repository

How `record` captures the narration

How `record` captures the narration