Run any Skill in Manus with one click

transcription-reader

Efficiently read, parse, and analyze transcription and subtitle files (STJ, VTT, SRT, ASS, SSA). Use this skill whenever the user uploads or references a transcription file, subtitle file, or meeting recording transcript. Trigger on file extensions like .stj, .stjson, .vtt, .srt, .ass, .ssa, or when the user mentions transcripts, captions, subtitles, meeting notes from recordings, or wants to summarize a recording, find who said what, extract action items, or convert between transcription formats.

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/yaniv-golan/skill-packager-skill --skill transcription-reader

Copy and paste this command into Claude Code to install the skill

Source

yaniv-golan/skill-packager-skill

Stars0

Forks0

UpdatedApril 7, 2026 at 14:52

File Explorer

4 files

SKILL.md

readonly

More from this repository

same repository

skill-packager

yaniv-golan/skill-packager-skill

Package AI agent skills into deployment formats — .zip, .skill, Claude plugin, Claude marketplace, Cursor plugin, Cursor marketplace, ChatGPT/Manus zip, Codex CLI, NanoClaw marketplace, OpenClaw/ClawHub, Agent Skills standard (.agents/), or a universal repo with all formats + CI/CD. Use when the user says "package this skill", "deploy my skill", "create a plugin from this skill", "make this work on Cursor/ChatGPT/Codex/NanoClaw/OpenClaw", "set up a repo for my skill", or wants to distribute a SKILL.md to any platform.

2026-04-270

proof-engine

yaniv-golan/skill-packager-skill

Create formal, verifiable proofs of claims with machine-checkable reasoning. Use when asked to prove, verify, fact-check, or rigorously establish whether a claim is true or false — mathematical, empirical, or mixed. Trigger phrases: "is it really true", "can you prove", "verify this", "fact-check this", "prove it", "show me the logic". Do NOT use for opinions, essays, or questions with no verifiable answer.

2026-04-070

proof-engine

yaniv-golan/skill-packager-skill

2026-04-070

pretext

yaniv-golan/skill-packager-skill

Help developers use @chenglou/pretext, the 15KB TypeScript text measurement library that computes exact text metrics without DOM reflows. Use when user mentions pretext, @chenglou/pretext, text measurement, text layout without DOM, text reflow, text around obstacles, auto-fit font size, layoutNextLine, or asks to measure text height/width in JavaScript without triggering browser reflow. Covers API usage, integration patterns, creative demos (ASCII art, obstacle-aware flow, masonry), and critical gotchas. Do NOT use for CSS-only text layout questions or general typography.

2026-04-070

youtube-downloader

yaniv-golan/skill-packager-skill

Download YouTube videos with interactive resolution and subtitle selection using yt-dlp. Use when the user wants to download a YouTube video, save a video from YouTube, get video subtitles/captions, or mentions youtube download, yt-dlp, video resolution, or SRT subtitles from YouTube.

2026-04-070

pretext

yaniv-golan/skill-packager-skill

2026-04-070

Source

yaniv-golan

yaniv-golan/skill-packager-skill

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Proofreaders and Copy MarkersOffice and Administrative Support Occupations43-9081L4

name	transcription-reader
description	Efficiently read, parse, and analyze transcription and subtitle files (STJ, VTT, SRT, ASS, SSA). Use this skill whenever the user uploads or references a transcription file, subtitle file, or meeting recording transcript. Trigger on file extensions like .stj, .stjson, .vtt, .srt, .ass, .ssa, or when the user mentions transcripts, captions, subtitles, meeting notes from recordings, or wants to summarize a recording, find who said what, extract action items, or convert between transcription formats.
license	MIT
metadata	{"author":"Yaniv Golan","version":"0.4.0","compatibility":"Requires Python 3. Optional packages: stjlib (STJ), webvtt-py (VTT), pysubs2 (SRT/ASS/SSA)."}

Transcription Reader

Transcription files contain timing metadata that inflates token usage without adding analytical value. This skill helps you decide the fastest path to get the content into context and work with it.

Quick Decision Tree

Identify the format from the file extension:
- .stj, .stjson, .stj.json → STJ (Standard Transcription JSON)
- .vtt → WebVTT
- .srt → SubRip
- .ass, .ssa → Advanced SubStation Alpha / SubStation Alpha
- .json (with "stj" root key) → STJ
If STJ → run the extraction script immediately. Do not read the raw JSON. STJ is deeply nested with speaker ID maps, word-level timing arrays, and metadata — raw JSON is unreadable and wastes tokens. No exceptions, regardless of file size. If stjlib is not installed, install it first (pip install stjlib).
```
# Basic extraction
python3 scripts/extract_transcript.py INPUT_FILE
# For summarization (recommended — produces compact speaker blocks)
python3 scripts/extract_transcript.py INPUT_FILE --merge-speakers
# Quick overview
python3 scripts/extract_transcript.py INPUT_FILE --stats
```
For other formats, check the file size to decide your approach:

Small files (under ~200 lines / ~10KB): Just read the file directly. The overhead of running the extraction script isn't worth it — you can mentally skip the timestamps as you read. For SRT and VTT, the text is easy to parse visually. For ASS/SSA, look for Dialogue: lines.

Large files (over ~200 lines): Use the extraction script. This is where the savings matter — a 2400-line Zoom VTT becomes ~1400 lines of clean text (41% reduction, ~7K tokens saved). STJ files typically see 80-96% reduction depending on content.
```
python3 scripts/extract_transcript.py INPUT_FILE [--format FORMAT] [--output OUTPUT_FILE]
```
Read the extracted text (or the raw file if small) into context.
Analyze based on the user's request (summarize, search, extract action items, etc.).

When Timing Data Matters

Sometimes the user needs timestamps — for example, "at what point did they discuss budgets?" or "create a clip list." In these cases, use the --keep-timestamps flag:

python3 scripts/extract_transcript.py INPUT_FILE --keep-timestamps

This produces a format like:

[00:05:12] Alice: We need to revisit the budget for Q3.
[00:05:28] Bob: Agreed, let me pull up the numbers.

Format-Specific Notes

STJ (Standard Transcription JSON)

Always use the extraction script for STJ, regardless of file size. Raw STJ is deeply nested JSON with speaker ID maps, word-level timing arrays, confidence scores, and metadata — it's unreadable as raw text and extremely token-heavy.

Key features available in STJ:

Speaker names (not just IDs) — mapped from the speakers array
Language per segment — useful for multilingual recordings
Confidence scores — can filter out low-confidence segments with --min-confidence 0.8
Word-level timing — available when word_timing_mode is "complete" or "partial"

Diarization-only STJ files: Some STJ files contain only speaker timing without transcribed text (the text field is empty). These are produced by diarization-only tools that identify who spoke when, but don't transcribe the words. The extraction script detects this automatically and emits a warning. All post-processing flags work on these files — use --merge-speakers to collapse consecutive same-speaker segments (e.g., 954 segments → ~200 merged blocks), --time-range to answer "who was talking between minute 20-30?", and --output-format jsonl for structured output. Use --stats or --list-speakers for speaker timing summaries. If you encounter a diarization-only file, let the user know it contains speaker timing but no transcribed words — they would need to run a separate transcription tool (e.g., Whisper) on the original audio to get the actual text.

For advanced STJ work (building on top of the STJ data model, custom queries, programmatic access), read references/stj-format.md for the full specification and stjlib API reference.

WebVTT (.vtt)

WebVTT files may contain speaker labels in two forms: voice tags (<v Speaker Name>text</v>) or inline labels (Speaker Name: text). Zoom recordings use the inline form. The extraction script handles both patterns.

For small VTT files, the text is readable directly — captions are separated by blank lines with a timestamp line above each one. Just skip the timestamp lines mentally.

SRT (.srt)

SRT is the simplest format — numbered blocks with a timestamp line and text. Speaker information is not part of the format spec but is sometimes embedded in the text (e.g., SPEAKER 1: Hello or [ALICE] Hello).

For small SRT files, direct reading works fine — the structure is very regular and easy to scan.

ASS/SSA (.ass, .ssa)

ASS/SSA files have a large header (styles, metadata) followed by Dialogue: lines. The Name field in each dialogue event contains the speaker. The extraction script strips the header and style metadata, which can be 50%+ of the file.

For small ASS files, you can read directly but skip everything above the [Events] section.

Installation

The extraction script requires these Python packages:

pip install stjlib webvtt-py pysubs2

If any package is missing, the script will tell you which one to install — it gracefully handles missing optional dependencies and only requires the library for the format being processed.

Common Analysis Patterns

After getting the transcript text into context, here are effective approaches for common tasks:

Summarize a meeting: Use --merge-speakers for the best summarization input — it produces compact speaker blocks instead of one-line-per-cue output. Identify the main topics discussed, key decisions made, and action items. Structure the summary with a brief overview followed by topic-by-topic detail.

Find what was said about a topic: Search for relevant keywords. Include surrounding context (a few lines before and after) to capture the full discussion.

List action items: Look for commitment language ("I'll do", "let's", "we need to", "action item", "TODO", "by Friday", "deadline") and extract who committed to what.

Who said what: The extracted transcript has speaker labels. Filter or group by speaker name to show each person's contributions. For large files, use --speakers-only NAME to extract just one speaker's segments.

Speaker analytics: Use --stats for duration, word counts, and speaker segment counts. Use --list-speakers for a quick overview of who's in the file.

Script Reference

The bundled scripts/extract_transcript.py supports these options:

Usage: extract_transcript.py INPUT_FILE [options]

Options:
  --format FORMAT        Force format (stj, vtt, srt, ass, ssa). Auto-detected from extension if omitted.
  --output FILE          Write to file instead of stdout.
  --output-format FMT    Output as 'text' (default) or 'jsonl' (one JSON object per line).
  --keep-timestamps      Include timestamps in output.
  --merge-speakers       Merge consecutive segments from the same speaker into single blocks.
  --time-range RANGE     Extract only a time range (e.g., 10:00-20:00 or 1:05:00-1:30:00).
  --min-confidence N     Skip segments below this confidence (STJ only, default: 0.0).
  --speakers-only NAME   Filter to segments from a specific speaker.
  --language LANG        Filter to segments in a specific language (STJ only).
  --list-speakers        Just list speakers found in the file, don't extract text.
  --stats                Show transcript statistics (duration, speaker counts, word counts).

Token-saving tips: Use --merge-speakers to consolidate short segments (a 954-segment file becomes ~200 merged blocks). Combine with --time-range to extract just the section you need. For maximum token efficiency with structured data, use --merge-speakers --output-format jsonl.