| name | podcast |
| description | Create podcasts from topics, URLs, or text. Triggers on: "做播客", "podcast",
"播客", "录一期节目", "chat about", "discuss", "debate", "dialogue",
"make a podcast about".
|
| metadata | {"openclaw":{"emoji":"🎙️","requires":{"bin":["listenhub"]},"primaryBin":"listenhub"}} |
When to Use
- User wants to create a podcast episode on any topic
- User provides a URL or text and wants it turned into a podcast discussion
- User asks for a "debate", "dialogue", or "discussion" format
- User says "podcast", "播客", or "录一期节目"
When NOT to Use
- User wants text-to-speech reading (use
/speech)
- User wants an explainer video with visuals (use
/explainer)
- User wants to generate an image (use
/image-gen)
- User only wants to extract content from a URL without generating audio (use
/content-parser)
Purpose
Generate podcast episodes with 1-2 AI speakers discussing a topic. Supports quick overviews, deep analysis, and debate formats. Input can be a topic description, URL(s), or text. Output is a full audio episode with transcript.
Hard Constraints
- Always check CLI auth following
shared/cli-authentication.md
- Follow
shared/cli-patterns.md for command execution and error handling
- Never hardcode speaker IDs in API calls — use built-in defaults from
shared/speaker-selection.md as fallback only; fetch from the speakers API when the user wants to change voice
- Never fabricate CLI commands or parameters
- Always read config following
shared/config-pattern.md before any interaction
- Always follow
shared/speaker-selection.md for speaker selection (text table + free-text input)
- Never save files to
~/Downloads/ or .listenhub/ — save artifacts to the current working directory with friendly topic-based names (see shared/config-pattern.md § Artifact Naming)
Use the AskUserQuestion tool for every multiple-choice step — do NOT print options as plain text. Ask one question at a time. Wait for the user's answer before proceeding to the next step. After all parameters are collected, summarize the choices and ask the user to confirm. Do NOT call any generation API until the user has explicitly confirmed.
Step -1: CLI Auth Check
Follow shared/cli-authentication.md § Auth Check. If the CLI is not installed or the user is not logged in, auto-install and auto-login — never ask the user to run commands manually.
Then follow shared/cli-authentication.md § Auth Mode Detection to determine AUTH_MODE and set:
if [ "$AUTH_MODE" = "openapi" ]; then
CMD_PREFIX="listenhub openapi podcast"
else
CMD_PREFIX="listenhub podcast"
fi
All subsequent CLI calls use $CMD_PREFIX instead of hardcoded listenhub podcast.
Step 0: Config Setup
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/podcast"
echo '{"outputMode":"inline","language":null,"defaultMode":"quick","defaultSpeakers":{}}' > ".listenhub/podcast/config.json"
CONFIG_PATH=".listenhub/podcast/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/podcast/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/podcast/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Setup Flow (user-initiated reconfigure only)
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (podcast):
输出方式:{inline / download / both}
语言偏好:{zh / en / 未设置}
默认模式:{quick / deep / debate / 未设置}
默认主播:{speakerName(s) / 使用内置默认}
Then ask these questions in order and save:
-
outputMode: Follow shared/output-mode.md § Setup Flow Question.
-
Language (optional): "默认语言?"
- "中文 (zh)"
- "English (en)"
- "每次手动选择" → keep
null
-
Mode (optional): "默认播客模式?"
- "Quick — 简短概述"
- "Deep — 深度分析"
- "Debate — 辩论对话"
- "每次手动选择" → keep
null
After collecting answers, save immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --arg m "$OUTPUT_MODE" '. + {"outputMode": $m}')
if [ "$LANGUAGE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg lang "$LANGUAGE" '. + {"language": $lang}')
fi
if [ "$MODE" != "null" ]; then
NEW_CONFIG=$(echo "$NEW_CONFIG" | jq --arg mode "$MODE" '. + {"defaultMode": $mode}')
fi
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Interaction Flow
Step 1: Topic + Reference Materials
Ask topic and optional reference materials together in a single question using AskUserQuestion with two sub-questions, or a single free-text prompt:
What topic would you like to turn into a podcast? If you have reference materials (URLs or text), include them here too.
Accept: topic description, URL(s), pasted text, or any combination.
Examples of valid input:
Step 2: Mode
Default: "quick" — skip this question unless:
config.defaultMode is set to something else → use that value silently
- User explicitly mentioned a mode keyword in Step 1 (e.g. "deep dive", "debate", "in depth") → infer mode from intent
Only ask this question if the user's intent is ambiguous AND no default is configured. In most cases, just use "quick".
Step 3: Language
Default: match the user's interaction language. Detect from the language the user used in Step 1:
- If the user wrote in Chinese →
zh
- If the user wrote in English →
en
- If
config.language is set → use that value
Never ask this question. Always infer silently. Show in the confirmation summary so the user can override if needed.
Step 4: Speaker Count
Default: 2 speakers (dialogue) — the most common and engaging format.
Skip this question. Debate mode requires 2 speakers. For quick/deep, default to 2 speakers as well.
Only use 1 speaker if the user explicitly requests a monologue or solo format.
Step 5: Speaker Selection
Follow shared/speaker-selection.md:
- If
config.defaultSpeakers.{language} is set → use saved speakers silently
- If not set → use built-in defaults from
shared/speaker-selection.md (no question asked)
- Show the speaker(s) in the confirmation summary — user can change from there if desired
- Only show the full speaker list if the user explicitly asks to change voices
For 2-speaker mode (dialogue/debate): use Primary + Secondary defaults for the language.
Step 6: Confirm & Generate
Summarize all choices:
Ready to generate podcast:
Topic: {topic}
Mode: {mode}
Language: {language}
Speakers: {speaker name(s)}
References: {yes/no + brief description}
Proceed?
Wait for explicit confirmation before calling any CLI command. The user can adjust any parameter here before confirming.
Workflow
Generation
-
Submit (background): Run the CLI command with run_in_background: true and timeout: 360000:
$CMD_PREFIX create \
--query "{topic}" \
--source-url "{url}" \
--source-text "{text}" \
--mode {quick|deep|debate} \
--lang {en|zh|ja} \
--speaker "{name}" \
--speaker "{name2}" \
--json
Flag notes:
--query — the topic or question to discuss
--source-url — repeatable, one per URL reference
--source-text — repeatable, one per text block reference
--mode — one of quick, deep, debate
--lang — language code
--speaker — repeatable (max 2); use speaker display names
--speaker-id — alternative to --speaker; use speaker IDs instead of names
- Omit
--source-url / --source-text if the user provided no references
The CLI handles polling internally and returns the final result when generation completes.
-
Tell the user the task is submitted and that they will be notified when it finishes.
-
When notified of completion, Present result:
Parse the CLI JSON output to extract fields: audioUrl, subtitlesUrl, audioDuration, credits.
Read OUTPUT_MODE from config. Follow shared/output-mode.md for behavior.
inline or both: Display audioUrl as a clickable link.
Present:
播客已生成!
在线收听:{audioUrl}
字幕:{subtitlesUrl}(如有)
时长:{audioDuration / 1000}s
消耗积分:{credits}
download or both: Also download the file. Generate a topic slug following shared/config-pattern.md § Artifact Naming.
SLUG="{topic-slug}"
NAME="${SLUG}-podcast.mp3"
BASE="${NAME%.*}"; EXT="${NAME##*.}"; i=2
while [ -e "$NAME" ]; do NAME="${BASE}-${i}.${EXT}"; i=$((i+1)); done
curl -sS -o "$NAME" "{audioUrl}"
Present:
已保存到当前目录:
{NAME}
-
Offer to show transcript or provide download URL on request
After Successful Generation
Update config with the choices made this session:
NEW_CONFIG=$(echo "$CONFIG" | jq \
--arg lang "{language}" \
--arg mode "{mode}" \
--argjson speakers '{"{language}": ["{speakerId}"]}' \
'. + {"language": $lang, "defaultMode": $mode, "defaultSpeakers": (.defaultSpeakers + $speakers)}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
API Reference
- Speaker list:
shared/cli-speakers.md
- Speaker selection guide:
shared/speaker-selection.md
- CLI patterns:
shared/cli-patterns.md
- CLI authentication:
shared/cli-authentication.md
- Config pattern:
shared/config-pattern.md
Composability
- Invokes: speakers API (for speaker selection)
- Invoked by: content-planner (Phase 3)
Example
User: "Make a podcast about the latest AI developments"
Agent workflow:
- Detect: podcast request, topic = "latest AI developments", no references
- Infer: mode = "quick" (default), language = "en" (user wrote in English), 2 speakers (default)
- Show confirmation summary → user confirms
$CMD_PREFIX create \
--query "The latest AI developments" \
--mode deep \
--lang en \
--speaker "Mars" \
--speaker "Mia" \
--json
Wait for CLI to return result, then present with title and listen link.