| name | listen-later |
| description | Convert an article, newsletter, or document into a short Kokoro-narrated audio read-up and upload it as a private episode to the user's "📥 Listen Later" Spotify show. ONLY trigger on explicit phrases like "listen later", "read-up", "save as audio for the commute", "add to my listen-later feed". Do NOT trigger on generic summarize, TTS, save-to-spotify, podcast, or cover-art requests — those route to the `save-to-spotify` skill. |
Listen Later
Opinionated pipeline: arbitrary text → Kokoro af_heart audio → private episode in the 📥 Listen Later Spotify show.
For voice cloning, multilingual, custom cover art, or full podcast production, stop and use save-to-spotify directly — this skill is intentionally rigid.
Defaults (do not ask unless user overrides)
| |
|---|
| Voice | Kokoro af_heart, 1.0× (kokoro on PATH) |
| Length | Mode-dependent — see below |
| Show | 📥 Listen Later (must already exist; resolve URI via save-to-spotify --json shows) |
| Cover | Reuse show cover for the episode (no per-episode art) |
| Timeline | Chapters only — no images, no link companions |
| Polling | Off by default. Only poll when the user explicitly asks to wait until ready. |
| Chapter rule | Every chapter ≥30s (Spotify rejects too many short ones). Consolidate adjacent segments into chapters after rendering. First chapter MUST start at 0. |
Mode selection (infer from the user's words, do not ask)
-
Verbatim mode — default when the user says "read-up", "save to listen later", "add to my queue", or just pastes text. Narrate the full text, lightly cleaned for TTS (strip markdown / hashtags / emojis / URLs, expand abbreviations, em dash → hyphen). Do NOT cut content. Length follows the source: ~150 wpm → a 1500-word article becomes ~10 minutes.
-
Summarize mode — only when the user says "summarize", "TL;DR", "short version", or "key points". Pick target length from source complexity:
| Source | Target |
|---|
| Tweet thread / short blog post (<800 words) | 1–2 min |
| Newsletter / medium article (800–3000 words) | 3–5 min |
| Long-form essay / report / paper (>3000 words) | 5–8 min |
| Dense technical / multi-topic source | bias to the longer end |
Within the target, write 6–12 declarative segments preserving the source's structure 1:1.
Segment count rule of thumb: ~30–60 seconds of speech per segment.
Interview
Do not ask for confirmation. Derive the episode title from the source title, article <title>, or URL slug. If the user supplies a title, use it.
Flow
- Preflight: run one shell block that creates the work directory, checks
save-to-spotify --json auth status, checks which kokoro, extracts the page, and resolves the 📥 Listen Later show URI from save-to-spotify --json shows. Do not inspect files one at a time unless something fails.
- Script: write
segments.json with 6–10 declarative segments, links stripped, abbreviations expanded for TTS, em dashes → hyphens. Include chapter titles in the same file.
- Render: render all segments in one shell block. Prefer parallel Kokoro jobs when there is more than one segment, capped at CPU count:
printf '%s\0' seg_*.txt | xargs -0 -P "$(sysctl -n hw.ncpu 2>/dev/null || nproc)" -I{} sh -c 'kokoro -t "$(cat "$1")" -o "${1%.txt}.wav"' sh {}.
- Silence: generate WAV silence once with ffmpeg: 300 ms between segments and 600 ms as outer pad.
- Concat: concat WAVs with the ffmpeg concat demuxer, then encode and normalize in a single final MP3 command when possible:
ffmpeg -f concat -safe 0 -i concat.txt -af loudnorm -ar 44100 -ac 1 -b:a 192k episode.mp3.
- Durations: use
ffprobe on WAV/MP3 files in one shell block and write durations.json.
- Chapters: cursor walks the actual durations. Force first chapter to
start_time_ms: 0. Merge adjacent segments until every chapter is ≥30s — typically ends up at 3–5 chapters for a 3-min episode.
- Description: short HTML — one intro paragraph,
<ul> of M:SS — Title, source link if supplied.
- Upload:
save-to-spotify --json episodes create --show-id <SHOW_URI> --title "<T>" --file episode.mp3 --image show_cover.jpg --summary "<HTML>".
- Timeline:
save-to-spotify --json timeline set --episode-id <EP_ID> --from-file timeline.json.
- Poll: skip by default. If the user explicitly asks to wait until ready, run
save-to-spotify --json episodes status <EP_ID> until readiness == READY.
Working directory
Use /tmp/listen-later/<slug>/ (segments, silences, episode.mp3, timeline.json, description.html). Write incrementally — if step N fails, prior steps are preserved.
Errors to watch
first chapter must start at 0 ms → set items[0].chapter.start_time_ms = 0.
too many short chapters → merge adjacent segment chapters until ≥30s each.
- Missing
📥 Listen Later show → ask the user to create it once via save-to-spotify shows create --title "📥 Listen Later" ..., do NOT auto-create silently.