| name | audio-voice-recovery |
| description | Audio forensics and voice recovery guidelines for CSI-level audio analysis. This skill should be used when recovering voice from low-quality or low-volume audio, enhancing degraded recordings, performing forensic audio analysis, or transcribing difficult audio. Triggers on tasks involving audio enhancement, noise reduction, voice isolation, forensic authentication, or audio transcription. |
Forensic Audio Research Audio Voice Recovery Best Practices
Comprehensive audio forensics and voice recovery guide providing CSI-level capabilities for recovering voice from low-quality, low-volume, or damaged audio recordings. Contains 45 rules across 8 categories, prioritized by impact to guide audio enhancement, forensic analysis, and transcription workflows.
When to Apply
Reference these guidelines when:
- Recovering voice from noisy or low-quality recordings
- Enhancing audio for transcription or legal evidence
- Performing forensic audio authentication
- Analyzing recordings for tampering or splices
- Building automated audio processing pipelines
- Transcribing difficult or degraded speech
Rule Categories by Priority
| Priority | Category | Impact | Prefix | Rules |
|---|
| 1 | Signal Preservation & Analysis | CRITICAL | signal- | 5 |
| 2 | Noise Profiling & Estimation | CRITICAL | noise- | 5 |
| 3 | Spectral Processing | HIGH | spectral- | 6 |
| 4 | Voice Isolation & Enhancement | HIGH | voice- | 7 |
| 5 | Temporal Processing | MEDIUM-HIGH | temporal- | 5 |
| 6 | Transcription & Recognition | MEDIUM | transcribe- | 5 |
| 7 | Forensic Authentication | MEDIUM | forensic- | 5 |
| 8 | Tool Integration & Automation | LOW-MEDIUM | tool- | 7 |
Quick Reference
1. Signal Preservation & Analysis (CRITICAL)
2. Noise Profiling & Estimation (CRITICAL)
3. Spectral Processing (HIGH)
4. Voice Isolation & Enhancement (HIGH)
5. Temporal Processing (MEDIUM-HIGH)
6. Transcription & Recognition (MEDIUM)
7. Forensic Authentication (MEDIUM)
8. Tool Integration & Automation (LOW-MEDIUM)
Essential Tools
| Tool | Purpose | Install |
|---|
| FFmpeg | Format conversion, filtering | brew install ffmpeg |
| SoX | Noise profiling, effects | brew install sox |
| Whisper | Speech transcription | pip install openai-whisper |
| librosa | Python audio analysis | pip install librosa |
| noisereduce | ML noise reduction | pip install noisereduce |
| Audacity | Visual editing | brew install audacity |
Workflow Scripts (Recommended)
Use the bundled scripts to generate objective baselines, create a workflow plan, and verify results.
scripts/preflight_audio.py - Generate a forensic preflight report (JSON or Markdown).
scripts/plan_from_preflight.py - Create a workflow plan template from the preflight report.
scripts/compare_audio.py - Compare objective metrics between baseline and processed audio.
Example usage:
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
python3 skills/.experimental/audio-voice-recovery/scripts/plan_from_preflight.py --preflight preflight.json --out plan.md
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
--before evidence.wav \
--after enhanced.wav \
--format md \
--out comparison.md
Forensic Preflight Workflow (Do This Before Any Changes)
Align preflight with SWGDE Best Practices for the Enhancement of Digital Audio (20-a-001) and SWGDE Best Practices for Forensic Audio (08-a-001).
Establish an objective baseline state and plan the workflow so processing does not introduce clipping, artifacts, or false "done" confidence.
Use scripts/preflight_audio.py to capture baseline metrics and preserve the report with the case file.
Capture and record before processing:
- Record evidence identity and integrity: path, filename, file size, SHA-256 checksum, source, format/container, codec
- Record signal integrity: sample rate, bit depth, channels, duration
- Measure baseline loudness and levels: LUFS/LKFS, true peak, peak, RMS, dynamic range, DC offset
- Detect clipping and document clipped-sample percentage, peak headroom, exact time ranges
- Identify noise profile: stationary vs non-stationary, dominant noise bands, SNR estimate
- Locate the region of interest (ROI) and document time ranges and changes over time
- Inspect spectral content and estimate speech-band energy and intelligibility risk
- Scan for temporal defects: dropouts, discontinuities, splices, drift
- Evaluate channel correlation and phase anomalies (if stereo)
- Extract and preserve metadata: timestamps, device/model tags, embedded notes
Procedure:
- Prepare a forensic working copy, verify hashes, and preserve the original untouched.
- Locate ROI and target signal; document exact time ranges and changes across the recording.
- Assess challenges to intelligibility and signal quality; map challenges to mitigation strategies.
- Identify required processing and plan a workflow order that avoids unwanted artifacts.
Generate a plan draft with
scripts/plan_from_preflight.py and complete it with case-specific decisions.
- Measure baseline loudness and true peak per ITU-R BS.1770 / EBU R 128 and record peak/RMS/DC offset.
- Detect clipping and dropouts; if clipping is present, declip first or pause and document limitations.
- Inspect spectral content and noise type; collect representative noise profile segments and estimate SNR.
- If stereo, evaluate channel correlation and phase; document anomalies.
- Create a baseline listening log (multiple devices) and define success criteria for intelligibility and listenability.
Failure-pattern guardrails:
- Do not process until every preflight field is captured.
- Document every process, setting, software version, and time segment to enable repeatability.
- Compare each processed output to the unprocessed input and assess progress toward intelligibility and listenability.
- Avoid over-processing; review removed signal (filter residue) to avoid removing target signal components.
- Keep intermediate files uncompressed and preserve sample rate/bit depth when moving between tools.
- Perform a final review against the original; if unsatisfactory, revise or stop and report limitations.
- If the request is not achievable, communicate limitations and do not declare completion.
- Require objective metrics and A/B listening before declaring completion.
- Do not rely solely on objective metrics; corroborate with critical listening.
- Take listening breaks to avoid ear fatigue during extended reviews.
Quick Enhancement Pipeline
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json
cp evidence.wav working.wav
sha256sum evidence.wav > evidence.sha256
ffmpeg -i working.wav -af "\
highpass=f=80,\
adeclick=w=55:o=75,\
afftdn=nr=12:nf=-30:nt=w,\
equalizer=f=2500:t=q:w=1:g=3,\
loudnorm=I=-16:TP=-1.5:LRA=11\
" enhanced.wav
whisper enhanced.wav --model large-v3 --language en
sha256sum -c evidence.sha256
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
--before evidence.wav \
--after enhanced.wav \
--format md \
--out comparison.md
How to Use
Read individual reference files for detailed explanations and code examples:
Reference Files