ワンクリックでManusで任意のスキルを実行

始める

win-whisper

スター5

フォーク1

更新日2026年2月9日 15:14

Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration.

インストール

Codex または Claude でインストールこの Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。

Manusで実行

ソース

IrisGoLab

IrisGoLab/PCClaw

GitHub リポジトリを開く Creator のリポジトリを見る

ダウンロード

Manusで実行

win-whisper

Record audio and transcribe speech to text using OpenAI Whisper models. Runs completely local — no API keys, no network required after setup.

Uses whisper.cpp v1.8.3+ for fast inference. Supports Intel GPU acceleration (12x boost), OpenVINO for NPU, and Voice Activity Detection.

Supported audio formats: WAV, MP3, FLAC, OGG.

Setup (One-Time)

1. Download whisper-cli

powershell.exe -NoProfile -Command "
$whisperDir = \"$env:USERPROFILE\.pcclaw\whisper\"
New-Item -ItemType Directory -Force $whisperDir | Out-Null

# Download whisper.cpp pre-built binary (OpenBLAS accelerated, ~16MB)
$zipUrl = 'https://github.com/ggml-org/whisper.cpp/releases/download/v1.8.3/whisper-blas-bin-x64.zip'
$zipPath = \"$env:TEMP\whisper-blas-bin-x64.zip\"
Write-Host 'Downloading whisper-cli...' -ForegroundColor Cyan
Invoke-WebRequest -Uri $zipUrl -OutFile $zipPath
Expand-Archive $zipPath -DestinationPath $whisperDir -Force
Remove-Item $zipPath

# Binary extracts to Release/ subdirectory — add that to PATH
$binDir = \"$whisperDir\Release\"
$userPath = [Environment]::GetEnvironmentVariable('Path', 'User')
if ($userPath -notlike \"*$binDir*\") {
    [Environment]::SetEnvironmentVariable('Path', \"$userPath;$binDir\", 'User')
    $env:Path += \";$binDir\"
    Write-Host 'Added to PATH.' -ForegroundColor Green
}

Write-Host 'Done. whisper-cli installed.' -ForegroundColor Green
"

2. Download a Model

powershell.exe -NoProfile -Command "
$whisperDir = \"$env:USERPROFILE\.pcclaw\whisper\"
$model = 'base'

$modelUrl = \"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-$model.bin\"
$modelPath = \"$whisperDir\ggml-$model.bin\"
Write-Host \"Downloading ggml-$model.bin...\" -ForegroundColor Cyan
Invoke-WebRequest -Uri $modelUrl -OutFile $modelPath
Write-Host \"Model saved to $modelPath\" -ForegroundColor Green
"

Change $model to tiny, small, medium, or large-v3-turbo. See Model Guide.

3. Verify

powershell.exe -NoProfile -Command "whisper-cli --help 2>&1 | Select-Object -First 3"

Record Audio from Microphone

Timed Recording (N seconds)

powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw

$outPath = \"$env:TEMP\pcclaw-recording.wav\"
$seconds = DURATION

[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null

Write-Host \"Recording for $seconds seconds...\" -ForegroundColor Yellow
Start-Sleep -Seconds $seconds

[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $outPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null

Write-Host \"Saved to $outPath\" -ForegroundColor Green
"

Replace DURATION with seconds to record (e.g., 5, 10, 30).

Interactive Recording (Press Enter to Stop)

powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw

$outPath = \"$env:TEMP\pcclaw-recording.wav\"

[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null

Write-Host 'Recording... Press Enter to stop.' -ForegroundColor Yellow
Read-Host

[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $outPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null

Write-Host \"Saved to $outPath\" -ForegroundColor Green
"

Transcribe Audio File

Basic Transcription

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -nt -np 2>&1
"

Replace AUDIO_PATH with the path to an audio file (WAV, MP3, FLAC, or OGG). -nt = no timestamps, -np = no progress (clean output).

With Language Specification

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l LANG_CODE -nt -np 2>&1
"

Replace LANG_CODE with: en, zh, ja, ko, de, fr, es, or auto (auto-detect).

With Timestamps

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -np 2>&1
"

Output: [00:00:00.000 --> 00:00:05.000] Transcribed text here

Translate to English

Transcribe and translate any language to English:

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -tr -nt -np 2>&1
"

Output as SRT Subtitles

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -osrt -of 'OUTPUT_PATH' -np 2>&1
"

Replace OUTPUT_PATH with desired path (without extension). Produces OUTPUT_PATH.srt.

With Voice Activity Detection (VAD)

Skip silence and only transcribe speech segments:

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
$vadModel = \"$env:USERPROFILE\.pcclaw\whisper\silero-vad.onnx\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto --vad -vm $vadModel -nt -np 2>&1
"

Download the VAD model (one-time, ~2MB):

powershell.exe -NoProfile -Command "
Invoke-WebRequest -Uri 'https://github.com/ggml-org/whisper.cpp/raw/master/models/silero-vad.onnx' -OutFile \"$env:USERPROFILE\.pcclaw\whisper\silero-vad.onnx\"
"

Record and Transcribe (Combined)

Record from microphone then immediately transcribe:

powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw

$wavPath = \"$env:TEMP\pcclaw-recording.wav\"
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
$seconds = DURATION

# Record
[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host \"Recording for $seconds seconds...\" -ForegroundColor Yellow
Start-Sleep -Seconds $seconds
[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $wavPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null

# Transcribe
Write-Host 'Transcribing...' -ForegroundColor Cyan
whisper-cli -m $model -f $wavPath -l auto -nt -np 2>&1
"

Replace DURATION with seconds to record.

Real-Time Streaming Transcription

whisper-stream.exe (included in the download) captures from the microphone and transcribes in real-time:

powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-stream -m $model -l auto --step 500 --length 5000 2>&1
"

Press Ctrl+C to stop. Requires SDL2.dll (included in the download).

Model Guide

Model	Disk	RAM	Speed	Quality	Best For
`tiny`	75 MB	390 MB	Fastest	Basic	Quick notes, commands
`base`	148 MB	500 MB	Fast	Good	General use (recommended)
`small`	488 MB	1.0 GB	Medium	Better	Meetings, interviews
`medium`	1.5 GB	2.6 GB	Slow	Great	High accuracy needs
`large-v3-turbo`	1.6 GB	3.0 GB	Slow	Best	Professional transcription

For English-only, append .en (e.g., base.en) for better accuracy.

Download any model:

powershell.exe -NoProfile -Command "
$model = 'MODEL_NAME'
$dir = \"$env:USERPROFILE\.pcclaw\whisper\"
Invoke-WebRequest -Uri \"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-$model.bin\" -OutFile \"$dir\ggml-$model.bin\"
Write-Host \"Downloaded ggml-$model.bin\" -ForegroundColor Green
"

GPU / NPU Acceleration

Intel GPU (Arc / Iris Xe) — via OpenVINO

For Intel GPU/NPU acceleration, use the --ov-e-device flag with an OpenVINO-enabled build:

# GPU acceleration
whisper-cli -m MODEL -f AUDIO --ov-e-device GPU

# NPU acceleration (Intel AIPC)
whisper-cli -m MODEL -f AUDIO --ov-e-device NPU

This requires building whisper.cpp from source with OpenVINO:

# Requires: cmake, Visual Studio Build Tools, OpenVINO toolkit
cmake -B build -DWHISPER_OPENVINO=1
cmake --build build --config Release

Pre-converted OpenVINO encoder models are available at huggingface.co/Intel/whisper.cpp-openvino-models.

Disable GPU

If you encounter GPU issues, force CPU-only:

whisper-cli -m MODEL -f AUDIO --no-gpu

Recommended Workflows

Voice memo → Text

1. win-whisper: record audio
2. win-whisper: transcribe the recording
3. Agent processes/summarizes the text

Voice command → Action

1. win-whisper: record 5 seconds
2. win-whisper: transcribe
3. Agent interprets the command
4. win-ui-auto: execute the action

Meeting notes

1. win-whisper: record with medium/large model
2. Agent summarizes key points and action items
3. ms-todo or google-tasks: create tasks

Multilingual transcription

1. win-whisper: transcribe with -l auto
2. win-whisper: translate to English with -tr
3. Agent processes both original and translation

Notes

Requires one-time setup: download whisper-cli (~16 MB) and a model (75 MB – 1.6 GB).
Audio recording uses Windows Multimedia API (winmm.dll) — zero external dependencies.
whisper.cpp v1.8.3 includes 12x performance boost on Intel integrated/discrete GPUs.
GPU and Flash Attention are enabled by default. Use --no-gpu to force CPU-only.
Supported audio formats: WAV, MP3, FLAC, OGG (built-in decoder).
Recording format: 22050 Hz, 16-bit, mono. If your mic doesn't support this, remove the set line to record at default format — whisper-cli handles resampling.
Thread count: -t N (default 4). Use -t 8 for faster inference on many-core CPUs.
The download includes whisper-stream.exe for real-time mic transcription via SDL2.
For Intel NPU: build with OpenVINO (-DWHISPER_OPENVINO=1), then use --ov-e-device NPU.
Combine with win-screenshot + win-ocr for a complete See + Hear + Act agent loop.

name	win-whisper
description	Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration.
metadata	{"openclaw":{"emoji":"🎙️","os":["win32"],"requires":{"bins":["whisper-cli"]}}}

name	win-whisper
description	Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration.
metadata	{"openclaw":{"emoji":"🎙️","os":["win32"],"requires":{"bins":["whisper-cli"]}}}

win-whisper

このリポジトリの他の Skills

このリポジトリの他の Skills

win-whisper

Setup (One-Time)

1. Download whisper-cli

2. Download a Model

3. Verify

Record Audio from Microphone

Timed Recording (N seconds)

Interactive Recording (Press Enter to Stop)

Transcribe Audio File

Basic Transcription

With Language Specification

With Timestamps

Translate to English

Output as SRT Subtitles

With Voice Activity Detection (VAD)

Record and Transcribe (Combined)

Real-Time Streaming Transcription

Model Guide

GPU / NPU Acceleration

Intel GPU (Arc / Iris Xe) — via OpenVINO

Disable GPU

Recommended Workflows

Voice memo → Text

Voice command → Action

Meeting notes

Multilingual transcription

Notes

win-whisper

Setup (One-Time)

1. Download whisper-cli

2. Download a Model

3. Verify

Record Audio from Microphone

Timed Recording (N seconds)

Interactive Recording (Press Enter to Stop)

Transcribe Audio File

Basic Transcription

With Language Specification

With Timestamps

Translate to English

Output as SRT Subtitles

With Voice Activity Detection (VAD)

Record and Transcribe (Combined)

Real-Time Streaming Transcription

Model Guide

GPU / NPU Acceleration

Intel GPU (Arc / Iris Xe) — via OpenVINO

Disable GPU

Recommended Workflows

Voice memo → Text

Voice command → Action

Meeting notes

Multilingual transcription

Notes