Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

tts-synthesis

Synthesize TTS audio from course transcripts with pluggable engines (OpenAI gpt-4o-mini-tts default, edge-tts fallback). Use when turning a class transcript.txt into per-slide MP3s plus a concatenated full.mp3, handling [slide N] / [pause:ms] markers, retries, and per-slide speaker_affect overlays. Triggered after coherence-review passes and before/within asset-build, or as a standalone step. Also invoke for "오디오 생성", "TTS 합성", "강의 음성 만들어", "voice 변경", "speed 조정".

Exécuter dans Manus

Aperçu

Commande d'installation

npx skills add https://github.com/tobyilee/course-builder --skill tts-synthesis

Copiez et collez cette commande dans Claude Code pour installer le skill

Source

tobyilee/course-builder

Étoiles17

Forks5

Mis à jour3 mai 2026 à 23:19

Explorateur de fichiers

2 fichiers

SKILL.md

readonly

name

tts-synthesis

description

TTS Synthesis — Pluggable Audio Pipeline

transcript.txt를 per-slide MP3 + full.mp3로 합성한다. OpenAI gpt-4o-mini-tts + nova + speed 1.3가 기본. edge-tts는 offline/무료 대안.

왜 이 스킬이 존재하는가

원래 하네스의 Non-Goal이었으나 Phase 7에서 확장. TTS 합성은 (1) transcript 마커 파싱, (2) 엔진별 API 호출, (3) pause 무음 삽입 + concat, (4) retry가 모두 얽혀있어 스크립트화하지 않으면 재사용이 어렵다.

실행

표준 호출

python3 scripts/synthesize-tts.py \
  course/sections/<sec>/classes/<cls>/transcript.txt \
  course/sections/<sec>/classes/<cls>/audio

위는 기본값 (openai + nova + speed 1.3 + default 친근 ko instructions)을 사용.

래퍼 (pipefail 보장)

bash .claude/skills/tts-synthesis/scripts/run.sh \
  <transcript.txt> <out_dir> [추가 플래그]

래퍼가 set -o pipefail + tee의 실제 exit 전파 + .env 로딩을 책임진다.

엔진 전환

# edge-tts로 (오프라인, rate limit 있음)
--engine edge --voice ko-KR-SunHiNeural

# OpenAI 다른 voice/모델
--voice shimmer
--model tts-1-hd   # 고음질, 비용 2배
--speed 1.0        # 원속도

입력 마커 규격

[slide N] (줄 단독, N=1..) — TTS 후처리용 오디오 구간 분할 기준
[pause:Xms] — 무음 삽입 (기본 300~800). speed 파라미터에 영향받지 않음 (원본 유지, 사용자 확인된 선호)
[emph]...[/emph] — SSML 변환 시 <emphasis> (OpenAI는 instructions로 대체)

출력 구조

<out_dir>/
  slide_01.mp3  ~  slide_NN.mp3   # 슬라이드별
  full.mp3                          # 전체 concat

에이전트/오케스트레이터 연결

이 스킬은 tts-synthesizer 에이전트의 단일 책임 스킬. course-builder 오케스트레이터 Phase 5의 step 4 로 통합되어 있으며, asset-builder 가 build-bundle.sh 안에서 자동 위임 호출한다:

Phase 5 (build):
  step 1 gate (asset-builder)
  step 2 Marp HTML (asset-builder)
  step 3 Marp PNG (asset-builder, SKIP_PLAYER 가능)
  step 4 TTS 합성 ← tts-synthesizer (이 스킬), per-class
  step 5 manifest (asset-builder)
  step 6 player HTML (asset-builder, SKIP_PLAYER 가능)
  step 7 SSML 검증 (asset-builder)
  step 8 bundle.zip (asset-builder)

자동 호출 조건 (모두 충족):

coherence_report.overall == "pass" — gate 통과
OPENAI_API_KEY 존재 (없으면 edge-tts 폴백)
SKIP_TTS != 1
audio/full.mp3 미존재 또는 FORCE_TTS=1

standalone 호출(빌드 외부 사용자 트리거)도 지원 — tts-synthesizer 에이전트를 직접 invoke.

품질 규칙

Retry

transient 실패(edge-tts rate limit, OpenAI 429/5xx)는 4회 exponential backoff로 자동 복구.

Idempotency

동일 transcript + 같은 파라미터 재실행 시 결과 파일 크기는 ±2% 내 일치 (TTS 자체의 미세 변동). audio/ 디렉토리 통째로 삭제 후 재실행을 기본 전략으로.

Speaker affect 주입 (Phase 7 확장)

--beats <beats.json>을 주면 slide별로 beat의 speaker_affect(hook=호기심, teach=차분, example=흥분, practice=질문, recap=확신)를 기본 instructions에 덧씌워 slide마다 다른 뉘앙스로 합성.

체크리스트

transcript.txt에 [slide N] cue 존재, 순서 1부터 연속
OPENAI_API_KEY 로딩됨 (engine=openai)
out_dir 존재 또는 생성 가능
합성 후 slide_NN.mp3 개수 == transcript [slide N] 개수
full.mp3 duration ≈ Σ(slide_NN.mp3) (±1%)
retry 로그에 경고 있었는지 점검

참고

원본 스크립트: scripts/synthesize-tts.py (canonical, 이 스킬이 호출)
실측 char/sec rate: skills/script-writing/SKILL.md §"길이 계산"
pause 보존 정책은 사용자 확정 (memory/feedback_tts_config.md)

Plus depuis ce dépôt

même dépôt

asset-build

tobyilee/course-builder

Final build step — render all Marp slides, validate SSML, synthesize the course manifest, and package bundle.zip. Use when coherence review has passed and the course is ready for deployment. Triggered by the asset-builder agent.

2026-05-0317

course-builder

tobyilee/course-builder

Orchestrate end-to-end online course generation from a topic. Invoke when user asks to "create a course", "design a curriculum", "build online lectures", "강의 만들어", "코스 설계", "커리큘럼 짜", "섹션별 퀴즈 포함 강의", "TTS 스크립트 강의", or any request to produce structured learning material with slides + notes + scripts + quizzes. Also invoke for follow-ups — "재실행", "섹션만 다시", "업데이트", "이전 결과 개선", "코스 수정". Do NOT invoke for single-artifact requests (just a slide, just a quiz) — those should call the specific authoring skill directly.

2026-05-0317

script-writing

tobyilee/course-builder

Write TTS-ready transcripts with [slide N] cues, [pause:ms] markers, and optional SSML. Convert raw code/URLs/long numbers to speakable forms. Use when turning slides+beats into a narrator script. Triggered by the script-writer agent.

2026-04-2417

slide-authoring

tobyilee/course-builder

Author slide decks in Marp Markdown and render to HTML. Use when turning class beats into 4-7 slides per class, applying Marp frontmatter/layout rules, preventing overflow, or rendering via the marp CLI. Triggered by the slide-author agent.

2026-04-2417

coherence-review

tobyilee/course-builder

Cross-artifact QA for the course pipeline. Verify LO coverage, Bloom balance, slide↔script cue alignment, note↔slide consistency, tone, quiz factual validity, and transcript speakability. Use when reviewing generated course assets before build. Triggered by the coherence-reviewer agent.

2026-04-2317

class-planning

tobyilee/course-builder

Plan a single class as a beat sheet (hook → teach → example → practice → recap) used jointly by slide, note, and script authors. Use when turning a class spec into beats, allocating duration across beats, or assigning visual/speaker cues. Triggered by the class-planner agent.

2026-04-2217

Source

tobyilee

tobyilee/course-builder

Ouvrir le dépôt GitHub Voir les dépôts du créateur

Commande d'installation

Téléchargement

Exécuter dans Manus

Utile pourSOC

Développeurs de logicielsProfessions informatiques et mathématiques15-1252L4

name

tts-synthesis

description

TTS Synthesis — Pluggable Audio Pipeline

transcript.txt를 per-slide MP3 + full.mp3로 합성한다. OpenAI gpt-4o-mini-tts + nova + speed 1.3가 기본. edge-tts는 offline/무료 대안.

왜 이 스킬이 존재하는가

실행

표준 호출

python3 scripts/synthesize-tts.py \
  course/sections/<sec>/classes/<cls>/transcript.txt \
  course/sections/<sec>/classes/<cls>/audio

위는 기본값 (openai + nova + speed 1.3 + default 친근 ko instructions)을 사용.

래퍼 (pipefail 보장)

bash .claude/skills/tts-synthesis/scripts/run.sh \
  <transcript.txt> <out_dir> [추가 플래그]

래퍼가 set -o pipefail + tee의 실제 exit 전파 + .env 로딩을 책임진다.

엔진 전환

# edge-tts로 (오프라인, rate limit 있음)
--engine edge --voice ko-KR-SunHiNeural

# OpenAI 다른 voice/모델
--voice shimmer
--model tts-1-hd   # 고음질, 비용 2배
--speed 1.0        # 원속도

입력 마커 규격

[slide N] (줄 단독, N=1..) — TTS 후처리용 오디오 구간 분할 기준
[pause:Xms] — 무음 삽입 (기본 300~800). speed 파라미터에 영향받지 않음 (원본 유지, 사용자 확인된 선호)
[emph]...[/emph] — SSML 변환 시 <emphasis> (OpenAI는 instructions로 대체)

출력 구조

<out_dir>/
  slide_01.mp3  ~  slide_NN.mp3   # 슬라이드별
  full.mp3                          # 전체 concat

에이전트/오케스트레이터 연결

Phase 5 (build):
  step 1 gate (asset-builder)
  step 2 Marp HTML (asset-builder)
  step 3 Marp PNG (asset-builder, SKIP_PLAYER 가능)
  step 4 TTS 합성 ← tts-synthesizer (이 스킬), per-class
  step 5 manifest (asset-builder)
  step 6 player HTML (asset-builder, SKIP_PLAYER 가능)
  step 7 SSML 검증 (asset-builder)
  step 8 bundle.zip (asset-builder)

자동 호출 조건 (모두 충족):

coherence_report.overall == "pass" — gate 통과
OPENAI_API_KEY 존재 (없으면 edge-tts 폴백)
SKIP_TTS != 1
audio/full.mp3 미존재 또는 FORCE_TTS=1

standalone 호출(빌드 외부 사용자 트리거)도 지원 — tts-synthesizer 에이전트를 직접 invoke.

품질 규칙

Retry

transient 실패(edge-tts rate limit, OpenAI 429/5xx)는 4회 exponential backoff로 자동 복구.

Idempotency

Speaker affect 주입 (Phase 7 확장)

체크리스트

transcript.txt에 [slide N] cue 존재, 순서 1부터 연속
OPENAI_API_KEY 로딩됨 (engine=openai)
out_dir 존재 또는 생성 가능
합성 후 slide_NN.mp3 개수 == transcript [slide N] 개수
full.mp3 duration ≈ Σ(slide_NN.mp3) (±1%)
retry 로그에 경고 있었는지 점검

참고

원본 스크립트: scripts/synthesize-tts.py (canonical, 이 스킬이 호출)
실측 char/sec rate: skills/script-writing/SKILL.md §"길이 계산"
pause 보존 정책은 사용자 확정 (memory/feedback_tts_config.md)