with one click
shiny-speech
// Generate code using Shiny.Speech for cross-platform speech-to-text, text-to-speech, audio capture, and audio playback with pluggable cloud providers
// Generate code using Shiny.Speech for cross-platform speech-to-text, text-to-speech, audio capture, and audio playback with pluggable cloud providers
| name | shiny-speech |
| description | Generate code using Shiny.Speech for cross-platform speech-to-text, text-to-speech, audio capture, and audio playback with pluggable cloud providers |
| auto_invoke | true |
| triggers | ["speech to text","text to speech","speech recognition","voice recognition","tts","stt","speak","dictation","transcribe","synthesize speech","audio capture","audio playback","microphone","ISpeechToTextService","ITextToSpeechService","IAudioSource","IAudioPlayer","ISpeechToTextProvider","ITextToSpeechProvider","SpeechRecognitionResult","SpeechRecognitionOptions","SpeechRecognitionError","TextToSpeechOptions","VoiceInfo","AccessState","ResultReceived","KeywordHeard","StatementAfterKeyword","WaitListenForKeywords","ListenForKeywords","ListenUntilSilence","SpeakAsync","GetVoicesAsync","StartCaptureAsync","StopCaptureAsync","AddSpeechServices","AddSpeechToText","AddTextToSpeech","AddAudioSource","AddAudioPlayer","AddCloudSpeechToText","AddCloudTextToSpeech","AddAzureSpeech","AddElevenLabsTextToSpeech","AddElevenLabsSpeechToText","AddElevenLabsSpeech","ElevenLabsSpeechToTextProvider","Scribe","scribe_v1","AzureSpeechConfig","ElevenLabsConfig","CloudSpeechToText","CloudTextToSpeech","Shiny.Speech","Shiny.Speech.Cloud","Shiny.Speech.Azure","Shiny.Speech.ElevenLabs","PipeStream","IsListening","IsSpeaking","AudioLevelChanged","IsPlayerAnalysisSupported","VU meter","audio level","wake word","keyword detection","hey siri","voice activation","blazor speech","blazor wasm speech","browser speech","webassembly speech","web speech api","BrowserSpeechToTextService","BrowserTextToSpeechService","BrowserAudioPlayer","BrowserAudioSource","OperatingSystem.IsBrowser"] |
You are an expert in Shiny Speech, a library that provides cross-platform speech-to-text, text-to-speech, audio capture, and audio playback for .NET MAUI and Blazor WebAssembly with pluggable cloud providers.
Invoke this skill when the user wants to:
GitHub: https://github.com/shinyorg/speech NuGet Packages:
Shiny.Speech — Core library with platform-native STT, TTS, audio capture, and playbackShiny.Speech.Cloud — Cloud provider abstractionsShiny.Speech.Azure — Azure AI Speech providerShiny.Speech.ElevenLabs — ElevenLabs TTS providerNamespace: Shiny.Speech
Shiny Speech provides:
ISpeechToTextService (iOS, Android, Windows, Browser/WASM)ResultReceived, KeywordHeard, Error events allow multiple subscribersStart() to begin listening, Stop() to end; Start() throws if already listeningKeywords in SpeechRecognitionOptions and subscribe to KeywordHeardITextToSpeechService (iOS, Android, Windows, Browser/WASM)IAudioSource (raw PCM 16kHz, 16-bit, mono — all platforms including browser)IAudioPlayer (MP3 format; browser uses HTML5 Audio via base64 data URL)ISpeechToTextProvider and ITextToSpeechProviderListenUntilSilence, StatementAfterKeyword, WaitListenForKeywords, ListenForKeywordsAccessState and RequestAccess()AudioLevelChanged event on ITextToSpeechService and IAudioPlayer emits a normalized 0.0–1.0 RMS level during playback; IsPlayerAnalysisSupported reports per-platform availabilityFor platform-native speech only:
dotnet add package Shiny.Speech
For Azure AI Speech (cloud STT + TTS):
dotnet add package Shiny.Speech
dotnet add package Shiny.Speech.Azure
For ElevenLabs (cloud TTS):
dotnet add package Shiny.Speech
dotnet add package Shiny.Speech.ElevenLabs
Platform-native speech services:
builder.Services.AddSpeechServices(); // Registers STT, TTS, AudioSource, AudioPlayer
// On Browser/WASM: auto-detected via OperatingSystem.IsBrowser()
Or register individually:
builder.Services.AddSpeechToText(); // ISpeechToTextService only
builder.Services.AddTextToSpeech(); // ITextToSpeechService only
builder.Services.AddAudioSource(); // IAudioSource only
builder.Services.AddAudioPlayer(); // IAudioPlayer only
Azure AI Speech (replaces platform-native with cloud):
builder.Services.AddAzureSpeech("your-subscription-key", "eastus");
// Automatically registers IAudioSource and IAudioPlayer for platform audio I/O
Or with config object and selective services:
builder.Services.AddAzureSpeech(
new AzureSpeechConfig { SubscriptionKey = "key", Region = "eastus" },
speechToText: true,
textToSpeech: true
);
ElevenLabs (replaces platform-native STT/TTS with cloud — Scribe + TTS):
// Register both STT (Scribe) and TTS at once
builder.Services.AddElevenLabsSpeech("your-api-key");
// Or selectively
builder.Services.AddElevenLabsSpeechToText("your-api-key"); // Scribe STT only
builder.Services.AddElevenLabsTextToSpeech("your-api-key"); // TTS only
// Auto-registers IAudioSource and/or IAudioPlayer for platform audio I/O as needed
// With a config object — overrides default Scribe model / TTS model / voice
builder.Services.AddElevenLabsSpeech(new ElevenLabsConfig
{
ApiKey = "your-api-key",
SpeechToTextModel = "scribe_v1",
TextToSpeechModel = "eleven_multilingual_v2",
DefaultVoiceId = "21m00Tcm4TlvDq8ikWAM"
});
ElevenLabs Scribe is request/response, not streaming: results are yielded as a single final
SpeechRecognitionResultwhen the user callsStop()(the captured audio is buffered, wrapped in a WAV container, and posted to/v1/speech-to-text). For continuous partial results, use Azure instead.
Android — Add to AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
MODIFY_AUDIO_SETTINGS is required for the TTS audio-level Visualizer and for the native STT beep suppression.
iOS — Add to Info.plist:
<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition</string>
<key>NSMicrophoneUsageDescription</key>
<string>This app uses the microphone for speech recognition</string>
Browser (Blazor WebAssembly) — No manifest changes needed. The browser prompts the user for microphone access automatically. Include the JS interop module in index.html:
<script src="shiny-speech.js"></script>
Note:
IAudioSourcecaptures raw PCM audio in the browser using the Web Audio API (getUserMedia+ScriptProcessorNode), downsampled to 16kHz 16-bit mono — the same format as other platforms.
Always check permissions before using STT. The service uses a Start/Stop model with events.
public class MyViewModel(ISpeechToTextService stt)
{
async Task StartListening()
{
var access = await stt.RequestAccess();
if (access != AccessState.Available)
return;
// Subscribe to events (multiple subscribers allowed)
stt.ResultReceived += (s, result) =>
{
// result.Text — recognized text
// result.IsFinal — true when segment is finalized
// result.Confidence — optional confidence score (0-1)
};
stt.KeywordHeard += (s, keyword) =>
{
// keyword — the matched keyword string
};
stt.Error += (s, error) =>
{
// error.Message — error description
// error.Exception — optional exception
};
// Start listening (throws InvalidOperationException if already listening)
await stt.Start(new SpeechRecognitionOptions
{
Culture = CultureInfo.GetCultureInfo("en-US"),
SilenceTimeout = TimeSpan.FromSeconds(3),
PreferOnDevice = true,
Keywords = ["Yes", "No", "Maybe"] // optional keyword detection
});
}
async Task StopListening()
{
await stt.Stop(); // no-op if not listening
}
}
public class MyViewModel(ISpeechToTextService stt)
{
async Task SimpleDictation(CancellationToken ct)
{
// Listen until silence — starts, waits for first final result, stops
var text = await stt.ListenUntilSilence(
new SpeechRecognitionOptions
{
Culture = CultureInfo.GetCultureInfo("en-US"),
SilenceTimeout = TimeSpan.FromSeconds(3)
},
ct
);
}
async Task WakeWordActivation(CancellationToken ct)
{
// "Hey Computer, do something" → returns "do something"
// Waits for keyword, then captures next final statement
var command = await stt.StatementAfterKeyword(
["Hey Computer"],
cancellationToken: ct
);
}
async Task WaitForAnswer(CancellationToken ct)
{
// Wait for one specific keyword (with optional timeout)
var answer = await stt.WaitListenForKeywords(
["Yes", "No", "Maybe"],
timeout: TimeSpan.FromSeconds(30),
cancellationToken: ct
);
// Returns matched keyword or null on timeout
}
async Task ContinuousKeywords(CancellationToken ct)
{
// Stream keywords continuously as IAsyncEnumerable
await foreach (var keyword in stt.ListenForKeywords(
["Up", "Down", "Left", "Right"],
cancellationToken: ct))
{
Console.WriteLine($"Direction: {keyword}");
}
}
}
public class MyViewModel(ITextToSpeechService tts)
{
async Task Speak()
{
// Simple speech
await tts.SpeakAsync("Hello, world!");
// With options
await tts.SpeakAsync("Hello, world!", new TextToSpeechOptions
{
SpeechRate = 1.2f,
Pitch = 1.0f,
Volume = 0.8f,
Culture = CultureInfo.GetCultureInfo("en-US")
});
// List available voices
var voices = await tts.GetVoicesAsync();
var voice = voices.FirstOrDefault(v => v.Name.Contains("Neural"));
// Speak with specific voice
await tts.SpeakAsync("Hello!", new TextToSpeechOptions { Voice = voice });
// Stop speaking
if (tts.IsSpeaking)
await tts.StopAsync();
}
}
public class MyViewModel(IAudioSource audioSource)
{
async Task CaptureAudio(CancellationToken ct)
{
// Returns raw PCM stream (16kHz, 16-bit, mono)
await using var stream = await audioSource.StartCaptureAsync(ct);
// Read audio data from stream...
// Stream remains open until StopCaptureAsync is called
await audioSource.StopCaptureAsync();
}
}
public class MyViewModel(IAudioPlayer audioPlayer)
{
async Task PlayAudio(Stream mp3Stream, CancellationToken ct)
{
// Play MP3 format audio
await audioPlayer.PlayAsync(mp3Stream, ct);
// Check playback state
if (audioPlayer.IsPlaying)
await audioPlayer.StopAsync();
}
}
Subscribe to AudioLevelChanged on ITextToSpeechService (native + cloud TTS) or IAudioPlayer (generic audio playback). Each emitted value is a normalized RMS level in 0.0–1.0. Always gate UI on IsPlayerAnalysisSupported — it is false on Windows native TTS and Browser.
public partial class TtsViewModel(ITextToSpeechService tts) : ObservableObject
{
[ObservableProperty] double audioLevel; // bind to ProgressBar.Progress
public bool IsVuSupported => tts.IsPlayerAnalysisSupported;
public TtsViewModel(ITextToSpeechService tts) : this(tts)
=> tts.AudioLevelChanged += (_, level) =>
MainThread.BeginInvokeOnMainThread(() => AudioLevel = level);
}
Platform behaviour:
| Surface | iOS / macOS | Android | Windows | Browser |
|---|---|---|---|---|
Native TTS (ITextToSpeechService) | ✅ AVAudioEngine + player-node tap | ✅ OnAudioAvailable PCM RMS | ❌ | ❌ |
Cloud TTS (CloudTextToSpeech) | ✅ forwarded from IAudioPlayer | ✅ forwarded from IAudioPlayer | ❌ | ❌ |
Generic playback (IAudioPlayer) | ✅ AVAudioPlayer.MeteringEnabled | ✅ Visualizer on session | ❌ | ❌ |
Apple native TTS plays through AVAudioEngine + AVAudioPlayerNode so a tap on the player node can compute RMS. The engine is created lazily on first speak and kept warm — first utterance adds ~50–150 ms; subsequent utterances are indistinguishable. Reset AudioLevel to 0 on speak completion / StopAsync so the meter drains.
Implement ISpeechToTextProvider and/or ITextToSpeechProvider:
public class MyCloudSttProvider : ISpeechToTextProvider
{
// Required: surface non-fatal errors (e.g. a transient network blip between
// chunked requests in continuous mode) without aborting the IAsyncEnumerable.
// CloudSpeechToText subscribes to this and forwards to ISpeechToTextService.Error.
public event EventHandler<SpeechRecognitionError>? Error;
public async IAsyncEnumerable<SpeechRecognitionResult> RecognizeAsync(
Stream audioStream,
SpeechRecognitionOptions? options = null,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Send audioStream to your cloud API
// Yield results as they arrive
try
{
yield return new SpeechRecognitionResult("Hello", IsFinal: true, Confidence: 0.95f);
}
catch (HttpRequestException ex)
{
// Non-fatal: signal the error and let the session keep running.
// Throwing instead would terminate the enumerator and end the session.
Error?.Invoke(this, new SpeechRecognitionError(ex.Message, ex));
}
}
}
// Register in DI (IAudioSource is auto-registered)
builder.Services.AddCloudSpeechToText<MyCloudSttProvider>();
RequestAccess() before STT operationsStart() to begin, Stop() to end; Start() throws if already listeningStart() to avoid missing resultsStop() to avoid leaksListenUntilSilence, StatementAfterKeyword, WaitListenForKeywords, ListenForKeywords handle Start/Stop/event wiring for youIAudioSource and IAudioPlayer implement IAsyncDisposableListenUntilSilence — For simple dictation scenariosStatementAfterKeyword — For "Hey Siri" style wake word activationWaitListenForKeywords — For yes/no/choice scenariosListenForKeywords — For continuous keyword detection as an async streamIAudioSource and IAudioPlayer as needed via TryAdd, so manual registration is no longer requiredAccessState — Check for NotSupported, Denied, and Restricted statesIsListening/IsSpeaking/IsPlaying — Check state before starting new listening/speech/playbackPreferOnDevice — Set to true for offline-capable STT when availableAddSpeechServices() uses OperatingSystem.IsBrowser() at runtime to register browser implementations; no conditional code needed in your appIAudioSource captures raw PCM via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit monoshiny-speech.js in index.html for speech services to workPlayAndRecord with AllowBluetooth / AllowBluetoothA2dp / DefaultToSpeaker, so when CarPlay is active iOS automatically routes audio through the car's microphone and speakers — no CarPlay-specific code neededIsPlayerAnalysisSupported before showing meter UI; events do not fire on platforms where metering isn't available (Windows native TTS, Browser)AudioLevelChanged to the UI thread — the event fires from the audio render / synthesizer thread; use MainThread.BeginInvokeOnMainThread in MAUI or equivalent in Blazor before mutating bound propertiesAudioLevel back to 0 after SpeakAsync returns or StopAsync is called so the meter drains visuallyFor detailed API documentation, see:
reference/api-reference.md - Full API surface, interfaces, records, and configurationdotnet add package Shiny.Speech # Core platform-native speech services
dotnet add package Shiny.Speech.Cloud # Cloud provider abstractions (included by Azure/ElevenLabs)
dotnet add package Shiny.Speech.Azure # Azure AI Speech provider
dotnet add package Shiny.Speech.ElevenLabs # ElevenLabs TTS provider
Generate code using Shiny Aspire integrations — Orleans ADO.NET hosting and Gluetun VPN container routing
Generate code for Shiny.AiConversation - a centralized AI service library for .NET MAUI apps with chat client abstraction, wake word detection, speech-to-text/text-to-speech, acknowledgement modes (None/AudioBlip/LessWordy/Full), persistent message store, optional AI chat history lookup tool, and configurable sound effects
Generate code using Shiny.BluetoothLE.Hosting, a BLE peripheral hosting library for .NET with GATT server, advertising, and managed characteristic patterns
Shiny BluetoothLE client/central operations for scanning, connecting, and communicating with BLE peripherals
Core infrastructure, hosting, DI, key-value stores, lifecycle hooks, and platform abstractions for Shiny on .NET MAUI, iOS, and Android
Guide for implementing Firebase Cloud Messaging push notifications in .NET MAUI apps using Shiny.Push.FirebaseMessaging on iOS and Android.