ワンクリックで
win-whisper
Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
メニュー
Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration.
Codex または Claude でインストール この Prompt をコピーして Codex、Claude、または他のアシスタントに貼り付けると、Skill ページを確認してインストールできます。
SOC 職業分類に基づく
Persistent agent memory via OpenViking — store and retrieve context across sessions using a tiered filesystem database. L0/L1/L2 context layers with semantic search.
Windows Task Scheduler — create, list, manage, and delete scheduled tasks for automated workflows, zero dependencies.
Browser integration — open URLs, read bookmarks and history from Edge/Chrome, get active tabs, list downloads, zero dependencies.
Local AI inference via Ollama — run LLMs on-device, manage models, detect NPU/GPU/DirectML hardware, zero cloud dependencies.
System diagnostics — CPU, RAM, battery, GPU, network, processes, OS info via WMI, zero dependencies.
Read and search Windows Sticky Notes — access your notes via the local SQLite database using built-in winsqlite3.dll, zero dependencies.
| name | win-whisper |
| description | Record audio and transcribe speech to text using Whisper — runs fully local, optional Intel GPU/NPU acceleration. |
| metadata | {"openclaw":{"emoji":"🎙️","os":["win32"],"requires":{"bins":["whisper-cli"]}}} |
Record audio and transcribe speech to text using OpenAI Whisper models. Runs completely local — no API keys, no network required after setup.
Uses whisper.cpp v1.8.3+ for fast inference. Supports Intel GPU acceleration (12x boost), OpenVINO for NPU, and Voice Activity Detection.
Supported audio formats: WAV, MP3, FLAC, OGG.
powershell.exe -NoProfile -Command "
$whisperDir = \"$env:USERPROFILE\.pcclaw\whisper\"
New-Item -ItemType Directory -Force $whisperDir | Out-Null
# Download whisper.cpp pre-built binary (OpenBLAS accelerated, ~16MB)
$zipUrl = 'https://github.com/ggml-org/whisper.cpp/releases/download/v1.8.3/whisper-blas-bin-x64.zip'
$zipPath = \"$env:TEMP\whisper-blas-bin-x64.zip\"
Write-Host 'Downloading whisper-cli...' -ForegroundColor Cyan
Invoke-WebRequest -Uri $zipUrl -OutFile $zipPath
Expand-Archive $zipPath -DestinationPath $whisperDir -Force
Remove-Item $zipPath
# Binary extracts to Release/ subdirectory — add that to PATH
$binDir = \"$whisperDir\Release\"
$userPath = [Environment]::GetEnvironmentVariable('Path', 'User')
if ($userPath -notlike \"*$binDir*\") {
[Environment]::SetEnvironmentVariable('Path', \"$userPath;$binDir\", 'User')
$env:Path += \";$binDir\"
Write-Host 'Added to PATH.' -ForegroundColor Green
}
Write-Host 'Done. whisper-cli installed.' -ForegroundColor Green
"
powershell.exe -NoProfile -Command "
$whisperDir = \"$env:USERPROFILE\.pcclaw\whisper\"
$model = 'base'
$modelUrl = \"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-$model.bin\"
$modelPath = \"$whisperDir\ggml-$model.bin\"
Write-Host \"Downloading ggml-$model.bin...\" -ForegroundColor Cyan
Invoke-WebRequest -Uri $modelUrl -OutFile $modelPath
Write-Host \"Model saved to $modelPath\" -ForegroundColor Green
"
Change $model to tiny, small, medium, or large-v3-turbo. See Model Guide.
powershell.exe -NoProfile -Command "whisper-cli --help 2>&1 | Select-Object -First 3"
powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw
$outPath = \"$env:TEMP\pcclaw-recording.wav\"
$seconds = DURATION
[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host \"Recording for $seconds seconds...\" -ForegroundColor Yellow
Start-Sleep -Seconds $seconds
[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $outPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host \"Saved to $outPath\" -ForegroundColor Green
"
Replace DURATION with seconds to record (e.g., 5, 10, 30).
powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw
$outPath = \"$env:TEMP\pcclaw-recording.wav\"
[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host 'Recording... Press Enter to stop.' -ForegroundColor Yellow
Read-Host
[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $outPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host \"Saved to $outPath\" -ForegroundColor Green
"
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -nt -np 2>&1
"
Replace AUDIO_PATH with the path to an audio file (WAV, MP3, FLAC, or OGG).
-nt = no timestamps, -np = no progress (clean output).
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l LANG_CODE -nt -np 2>&1
"
Replace LANG_CODE with: en, zh, ja, ko, de, fr, es, or auto (auto-detect).
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -np 2>&1
"
Output: [00:00:00.000 --> 00:00:05.000] Transcribed text here
Transcribe and translate any language to English:
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -tr -nt -np 2>&1
"
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto -osrt -of 'OUTPUT_PATH' -np 2>&1
"
Replace OUTPUT_PATH with desired path (without extension). Produces OUTPUT_PATH.srt.
Skip silence and only transcribe speech segments:
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
$vadModel = \"$env:USERPROFILE\.pcclaw\whisper\silero-vad.onnx\"
whisper-cli -m $model -f 'AUDIO_PATH' -l auto --vad -vm $vadModel -nt -np 2>&1
"
Download the VAD model (one-time, ~2MB):
powershell.exe -NoProfile -Command "
Invoke-WebRequest -Uri 'https://github.com/ggml-org/whisper.cpp/raw/master/models/silero-vad.onnx' -OutFile \"$env:USERPROFILE\.pcclaw\whisper\silero-vad.onnx\"
"
Record from microphone then immediately transcribe:
powershell.exe -NoProfile -Command "
Add-Type -MemberDefinition '[DllImport(\"winmm.dll\", CharSet=CharSet.Unicode)] public static extern int mciSendStringW(string cmd, System.Text.StringBuilder ret, int retLen, IntPtr cb);' -Name WinMM -Namespace PCClaw
$wavPath = \"$env:TEMP\pcclaw-recording.wav\"
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
$seconds = DURATION
# Record
[PCClaw.WinMM]::mciSendStringW('open new Type waveaudio Alias mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('set mic bitspersample 16 channels 1 samplespersec 22050', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('record mic', $null, 0, [IntPtr]::Zero) | Out-Null
Write-Host \"Recording for $seconds seconds...\" -ForegroundColor Yellow
Start-Sleep -Seconds $seconds
[PCClaw.WinMM]::mciSendStringW('stop mic', $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW(\"save mic $wavPath\", $null, 0, [IntPtr]::Zero) | Out-Null
[PCClaw.WinMM]::mciSendStringW('close mic', $null, 0, [IntPtr]::Zero) | Out-Null
# Transcribe
Write-Host 'Transcribing...' -ForegroundColor Cyan
whisper-cli -m $model -f $wavPath -l auto -nt -np 2>&1
"
Replace DURATION with seconds to record.
whisper-stream.exe (included in the download) captures from the microphone and transcribes in real-time:
powershell.exe -NoProfile -Command "
$model = \"$env:USERPROFILE\.pcclaw\whisper\ggml-base.bin\"
whisper-stream -m $model -l auto --step 500 --length 5000 2>&1
"
Press Ctrl+C to stop. Requires SDL2.dll (included in the download).
| Model | Disk | RAM | Speed | Quality | Best For |
|---|---|---|---|---|---|
tiny | 75 MB | 390 MB | Fastest | Basic | Quick notes, commands |
base | 148 MB | 500 MB | Fast | Good | General use (recommended) |
small | 488 MB | 1.0 GB | Medium | Better | Meetings, interviews |
medium | 1.5 GB | 2.6 GB | Slow | Great | High accuracy needs |
large-v3-turbo | 1.6 GB | 3.0 GB | Slow | Best | Professional transcription |
For English-only, append .en (e.g., base.en) for better accuracy.
Download any model:
powershell.exe -NoProfile -Command "
$model = 'MODEL_NAME'
$dir = \"$env:USERPROFILE\.pcclaw\whisper\"
Invoke-WebRequest -Uri \"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-$model.bin\" -OutFile \"$dir\ggml-$model.bin\"
Write-Host \"Downloaded ggml-$model.bin\" -ForegroundColor Green
"
For Intel GPU/NPU acceleration, use the --ov-e-device flag with an OpenVINO-enabled build:
# GPU acceleration
whisper-cli -m MODEL -f AUDIO --ov-e-device GPU
# NPU acceleration (Intel AIPC)
whisper-cli -m MODEL -f AUDIO --ov-e-device NPU
This requires building whisper.cpp from source with OpenVINO:
# Requires: cmake, Visual Studio Build Tools, OpenVINO toolkit
cmake -B build -DWHISPER_OPENVINO=1
cmake --build build --config Release
Pre-converted OpenVINO encoder models are available at huggingface.co/Intel/whisper.cpp-openvino-models.
If you encounter GPU issues, force CPU-only:
whisper-cli -m MODEL -f AUDIO --no-gpu
1. win-whisper: record audio
2. win-whisper: transcribe the recording
3. Agent processes/summarizes the text
1. win-whisper: record 5 seconds
2. win-whisper: transcribe
3. Agent interprets the command
4. win-ui-auto: execute the action
1. win-whisper: record with medium/large model
2. Agent summarizes key points and action items
3. ms-todo or google-tasks: create tasks
1. win-whisper: transcribe with -l auto
2. win-whisper: translate to English with -tr
3. Agent processes both original and translation
--no-gpu to force CPU-only.set line to record at default format — whisper-cli handles resampling.-t N (default 4). Use -t 8 for faster inference on many-core CPUs.whisper-stream.exe for real-time mic transcription via SDL2.-DWHISPER_OPENVINO=1), then use --ov-e-device NPU.win-screenshot + win-ocr for a complete See + Hear + Act agent loop.