Run any Skill in Manus with one click

local-audio-transcriber

本地录音转文字工具。当用户发送已有录音、音频或视频文件，并希望把语音直接转成文字、会议逐字稿、采访文字稿、字幕 SRT/VTT 或 Markdown 记录时使用。Apple Silicon 优先用 MLX/Apple GPU 和 whisper-large-v3-turbo-q4，本地转写，不用于现场临时录音，也不默认调用云端语音识别服务。

Run Skill in Manus

Overview

Install command

npx skills add https://github.com/chujianyun/skills --skill local-audio-transcriber

Copy and paste this command into Claude Code to install the skill

Source

chujianyun/skills

Stars621

Forks89

UpdatedJune 3, 2026 at 14:44

File Explorer

3 files

SKILL.md

readonly

name	local-audio-transcriber
description	本地录音转文字工具。当用户发送已有录音、音频或视频文件，并希望把语音直接转成文字、会议逐字稿、采访文字稿、字幕 SRT/VTT 或 Markdown 记录时使用。Apple Silicon 优先用 MLX/Apple GPU 和 whisper-large-v3-turbo-q4，本地转写，不用于现场临时录音，也不默认调用云端语音识别服务。

Local Audio Transcriber

把用户提供的本地录音、音频或视频文件转成文字。核心目标是：用户发来音频后，直接返回转写文本；同时在本地保存可复用的转写文件。

适用边界

适用：.m4a、.mp3、.wav、.aac、.flac、.ogg、.opus、.mp4、.mov、.mkv 等已有文件
适用：会议录音、访谈录音、课程音频、播客片段、视频提取文字、字幕生成
不适用：现场临时录音、麦克风采集、实时听写，除非用户另行明确要求
默认本地处理，不上传音频到云端

工作流程

确认用户提供的是可访问的本地音频/视频文件路径或附件。
先判断机器类型和可用引擎：

python3 - <<'PY'
import platform, sys
print(sys.platform, platform.machine())
try:
    import mlx.core as mx
    print("mlx", mx.default_device())
except Exception as e:
    print(type(e).__name__ + ": " + str(e))
PY

Apple Silicon（M1/M2/M3/M4）优先安装并使用 MLX，本地调用 Apple GPU/统一内存：

python3.13 -m venv /tmp/local-audio-transcriber-mlx
/tmp/local-audio-transcriber-mlx/bin/python -m pip install -U pip mlx-whisper

非 Apple Silicon、CUDA 机器或 MLX 不可用时，再使用 faster-whisper：

python3 -m pip install -U faster-whisper

运行转写脚本，中文录音优先指定 --language zh；不确定语言时省略语言参数：

/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "input.m4a" --language zh --formats txt,md,json --print-text

向用户直接发送转写文本。文本很长时，优先交付本地文件路径，并贴出开头和关键说明。
如用户要求字幕，补充 --formats srt,vtt。

常用命令

# Apple Silicon 中文长录音：优先使用 MLX + Apple GPU + turbo-q4
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --language zh --formats txt,md,json,srt,vtt --print-text

# 明确指定 MLX 和模型
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --engine mlx --model mlx-community/whisper-large-v3-turbo-q4 --language zh --formats txt,md,json,srt,vtt --print-text

# 非 Apple Silicon / CUDA / CPU：使用 faster-whisper
python3 {skill_dir}/scripts/transcribe.py "recording.m4a" --engine faster-whisper --model small --language zh --formats txt,md --print-text

# 自动识别语言，生成字幕
python3 {skill_dir}/scripts/transcribe.py "video.mp4" --formats txt,srt,vtt --print-text

# 多个文件批量转写到指定目录
python3 {skill_dir}/scripts/transcribe.py *.m4a --language zh --output-dir ./transcripts --formats txt,md,json

# faster-whisper 质量更高但更慢
python3 {skill_dir}/scripts/transcribe.py "meeting.m4a" --engine faster-whisper --model medium --language zh --formats txt,md --print-text

参数取舍

Apple Silicon 默认引擎：mlx，默认模型：mlx-community/whisper-large-v3-turbo-q4
非 Apple Silicon 默认引擎：faster-whisper，默认模型：small
中文长录音、多人直播、专有名词较多：优先 whisper-large-v3-turbo-q4
whisper-large-v3-turbo-q4 是量化模型，适合 M1 16GB 这类统一内存机器；首次运行会下载模型，之后走本地缓存
OpenAI Whisper 的 --device mps 在 M1 上可能极慢；本地 Apple GPU 路线优先用 MLX，不优先用 PyTorch MPS
长录音默认关闭 condition_on_previous_text，避免 Whisper 在中文口语录音里进入重复幻觉循环
只有明确需要跨段一致性且没有重复风险时，才加 --condition-on-previous-text
faster-whisper 长会议、嘈杂环境或多人访谈：可用 --model medium
CPU 环境：默认 int8，更省内存
CUDA 环境：脚本会尽量自动使用 float16
口语录音：默认开启 VAD 静音过滤；如切分异常，使用 --no-vad-filter
如果 MLX small 出现明显错词，优先升级到 whisper-large-v3-turbo-q4，不要只靠后处理硬校对
初始提示词可以帮助专有名词，但如果配合上下文续写出现重复，立即关闭 --condition-on-previous-text

输出要求

用户明确要“转文字”时，最终回复应包含转写正文，不要只给文件路径。
用户要“整理成会议纪要/文章/字幕”时，先完成转写，再按用户目标继续加工。
对隐私敏感录音，只说明本地处理和输出位置，不复述无关敏感信息。

故障处理

ModuleNotFoundError: faster_whisper：运行 python3 -m pip install -U faster-whisper
ModuleNotFoundError: mlx_whisper：在虚拟环境中运行 /tmp/local-audio-transcriber-mlx/bin/python -m pip install -U mlx-whisper
macOS 系统 Python 提示 externally-managed-environment：不要加 --break-system-packages，改用 python3.13 -m venv /tmp/local-audio-transcriber-mlx
mlx 默认设备不是 Device(gpu, 0)：说明 MLX/Metal 没走通，降级到 faster-whisper 或检查系统环境
音频无法解码：建议安装或更新 ffmpeg，或让用户换成 .m4a/.mp3/.wav
转写很慢：改用 --model small 或 --model base
Apple Silicon 转写很慢：确认不是 OpenAI Whisper --device mps，优先使用本脚本 --engine mlx
中文被识别成其他语言：加 --language zh
字幕时间轴不准：尝试 --model medium，或关闭 VAD：--no-vad-filter
出现“同一句话无限重复”：重跑并确保没有开启 --condition-on-previous-text，必要时换 whisper-large-v3-turbo-q4

More from this repository

same repository

alltuu-downloader

chujianyun/skills

喔图(alltuu.com)云摄影相册批量下载工具。当用户需要从 alltuu.com / m.alltuu.com 相册批量下载原图时使用此技能。支持下载原图（6720x4480 级别），自动处理签名URL，并发下载。适用于 alltuu.com/album/ 相册链接。

2026-05-10621

photoplus-downloader

chujianyun/skills

PhotoPlus相册批量下载原图工具。当用户需要从 photoplus.cn/live/ 相册批量下载原图时使用此技能。适用于 photoplus.cn 相册链接，支持多线程并发、自动跳过已下载文件。

2026-05-10621

remove-ai-flavor

chujianyun/skills

去除 AI 味道的文章风格优化技能。用于识别并改写文章、公众号稿、自媒体稿、口播稿、演讲稿、课程稿、产品文案中的 AI 痕迹、模板腔、资料味、翻译腔、空洞大词、过度金句、破折号滥用、bullet 堆叠、动不动加粗等问题；当用户说“去 AI 味”“去除 AI 痕迹”“不像 AI 写的”“更像人写的”“更自然”“别太机器味”“去掉模板感”“改得像公众号终稿”时使用。不用于事实核查、从零选题策划、论文转公众号、纯标题生成或追求 AI 检测器通过率。

2026-05-02621

agent-md-advisor

chujianyun/skills

AGENTS.md / CLAUDE.md 最佳实践顾问。用于用户询问 agents markdown、AGENTS.md、CLAUDE.md、Claude Code memory、AI coding agent 指令文件的格式、结构、最佳实践；也用于审查、诊断、重写、优化或从零创建 AGENTS.md、CLAUDE.md、CLAUDE.local.md、.claude/rules 等 agent 指令文件。不适用于通用 README 写作，除非目标是给 AI coding agent 提供项目上下文。

2026-04-29621

skill-optimizer

chujianyun/skills

审查并优化现有 skill 的触发语义、工作流、确认门槛、资源组织、安全边界与文档分层。当用户提到“优化 skill”“检查 skill 质量”“改进某个 skill”“重构技能说明”，或明确说明要优化哪些方面时使用。默认先审查并给计划，只有在用户明确确认开始修改后才实施。

2026-04-26621

github-code-interpreter

chujianyun/skills

GitHub 源码解读助手。适用于用户提供 GitHub 仓库链接，并希望解读源码、理解原理、分析架构、生成学习报告或快速上手文档时使用。会在 working 目录下生成源码解读和快速上手两份文档。默认先交付初稿，不自动复查；如果用户明确同意，再安排后续复查。不适用于仅克隆仓库或只要一句简介的场景。

2026-04-13621

Source

chujianyun

chujianyun/skills

View GitHub Repository View Creator Repositories

Install command

Download

Run Skill in Manus

Useful forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	local-audio-transcriber
description	本地录音转文字工具。当用户发送已有录音、音频或视频文件，并希望把语音直接转成文字、会议逐字稿、采访文字稿、字幕 SRT/VTT 或 Markdown 记录时使用。Apple Silicon 优先用 MLX/Apple GPU 和 whisper-large-v3-turbo-q4，本地转写，不用于现场临时录音，也不默认调用云端语音识别服务。

Local Audio Transcriber

把用户提供的本地录音、音频或视频文件转成文字。核心目标是：用户发来音频后，直接返回转写文本；同时在本地保存可复用的转写文件。

适用边界

适用：.m4a、.mp3、.wav、.aac、.flac、.ogg、.opus、.mp4、.mov、.mkv 等已有文件
适用：会议录音、访谈录音、课程音频、播客片段、视频提取文字、字幕生成
不适用：现场临时录音、麦克风采集、实时听写，除非用户另行明确要求
默认本地处理，不上传音频到云端

工作流程

确认用户提供的是可访问的本地音频/视频文件路径或附件。
先判断机器类型和可用引擎：

python3 - <<'PY'
import platform, sys
print(sys.platform, platform.machine())
try:
    import mlx.core as mx
    print("mlx", mx.default_device())
except Exception as e:
    print(type(e).__name__ + ": " + str(e))
PY

Apple Silicon（M1/M2/M3/M4）优先安装并使用 MLX，本地调用 Apple GPU/统一内存：

python3.13 -m venv /tmp/local-audio-transcriber-mlx
/tmp/local-audio-transcriber-mlx/bin/python -m pip install -U pip mlx-whisper

非 Apple Silicon、CUDA 机器或 MLX 不可用时，再使用 faster-whisper：

python3 -m pip install -U faster-whisper

运行转写脚本，中文录音优先指定 --language zh；不确定语言时省略语言参数：

/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "input.m4a" --language zh --formats txt,md,json --print-text

向用户直接发送转写文本。文本很长时，优先交付本地文件路径，并贴出开头和关键说明。
如用户要求字幕，补充 --formats srt,vtt。

常用命令

# Apple Silicon 中文长录音：优先使用 MLX + Apple GPU + turbo-q4
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --language zh --formats txt,md,json,srt,vtt --print-text

# 明确指定 MLX 和模型
/tmp/local-audio-transcriber-mlx/bin/python {skill_dir}/scripts/transcribe.py "recording.m4a" --engine mlx --model mlx-community/whisper-large-v3-turbo-q4 --language zh --formats txt,md,json,srt,vtt --print-text

# 非 Apple Silicon / CUDA / CPU：使用 faster-whisper
python3 {skill_dir}/scripts/transcribe.py "recording.m4a" --engine faster-whisper --model small --language zh --formats txt,md --print-text

# 自动识别语言，生成字幕
python3 {skill_dir}/scripts/transcribe.py "video.mp4" --formats txt,srt,vtt --print-text

# 多个文件批量转写到指定目录
python3 {skill_dir}/scripts/transcribe.py *.m4a --language zh --output-dir ./transcripts --formats txt,md,json

# faster-whisper 质量更高但更慢
python3 {skill_dir}/scripts/transcribe.py "meeting.m4a" --engine faster-whisper --model medium --language zh --formats txt,md --print-text

参数取舍

Apple Silicon 默认引擎：mlx，默认模型：mlx-community/whisper-large-v3-turbo-q4
非 Apple Silicon 默认引擎：faster-whisper，默认模型：small
中文长录音、多人直播、专有名词较多：优先 whisper-large-v3-turbo-q4
whisper-large-v3-turbo-q4 是量化模型，适合 M1 16GB 这类统一内存机器；首次运行会下载模型，之后走本地缓存
OpenAI Whisper 的 --device mps 在 M1 上可能极慢；本地 Apple GPU 路线优先用 MLX，不优先用 PyTorch MPS
长录音默认关闭 condition_on_previous_text，避免 Whisper 在中文口语录音里进入重复幻觉循环
只有明确需要跨段一致性且没有重复风险时，才加 --condition-on-previous-text
faster-whisper 长会议、嘈杂环境或多人访谈：可用 --model medium
CPU 环境：默认 int8，更省内存
CUDA 环境：脚本会尽量自动使用 float16
口语录音：默认开启 VAD 静音过滤；如切分异常，使用 --no-vad-filter
如果 MLX small 出现明显错词，优先升级到 whisper-large-v3-turbo-q4，不要只靠后处理硬校对
初始提示词可以帮助专有名词，但如果配合上下文续写出现重复，立即关闭 --condition-on-previous-text

输出要求

用户明确要“转文字”时，最终回复应包含转写正文，不要只给文件路径。
用户要“整理成会议纪要/文章/字幕”时，先完成转写，再按用户目标继续加工。
对隐私敏感录音，只说明本地处理和输出位置，不复述无关敏感信息。

故障处理

ModuleNotFoundError: faster_whisper：运行 python3 -m pip install -U faster-whisper
ModuleNotFoundError: mlx_whisper：在虚拟环境中运行 /tmp/local-audio-transcriber-mlx/bin/python -m pip install -U mlx-whisper
macOS 系统 Python 提示 externally-managed-environment：不要加 --break-system-packages，改用 python3.13 -m venv /tmp/local-audio-transcriber-mlx
mlx 默认设备不是 Device(gpu, 0)：说明 MLX/Metal 没走通，降级到 faster-whisper 或检查系统环境
音频无法解码：建议安装或更新 ffmpeg，或让用户换成 .m4a/.mp3/.wav
转写很慢：改用 --model small 或 --model base
Apple Silicon 转写很慢：确认不是 OpenAI Whisper --device mps，优先使用本脚本 --engine mlx
中文被识别成其他语言：加 --language zh
字幕时间轴不准：尝试 --model medium，或关闭 VAD：--no-vad-filter
出现“同一句话无限重复”：重跑并确保没有开启 --condition-on-previous-text，必要时换 whisper-large-v3-turbo-q4