Run any Skill in Manus with one click

$pwd:

aliyun-qwen-asr-realtime

Name: Aliyun Qwen Asr Realtime
Author: cinience

// Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.

Run Skill in Manus

$ git log --oneline --stat

stars:391

forks:34

updated:April 27, 2026 at 22:35

File Explorer

4 files

SKILL.md

readonly

name	aliyun-qwen-asr-realtime
description	Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.
version	1.0.0

Category: provider

Model Studio Qwen ASR Realtime

Validation

mkdir -p output/aliyun-qwen-asr-realtime
python -m py_compile skills/ai/audio/aliyun-qwen-asr-realtime/scripts/prepare_realtime_asr_request.py && echo "py_compile_ok" > output/aliyun-qwen-asr-realtime/validate.txt

Pass criteria: command exits 0 and output/aliyun-qwen-asr-realtime/validate.txt is generated.

Output And Evidence

Save session payloads and response samples under output/aliyun-qwen-asr-realtime/.

Critical model names

Use one of these exact model strings:

qwen3-asr-flash-realtime
qwen3-asr-flash-realtime-2026-02-10

Use cases

Realtime subtitles and captions
Voice-agent duplex input
Streaming speech-to-text in browser or terminal clients

Prerequisites

Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Realtime sessions generally require WebSocket or streaming session handling in the client.

Normalized interface (asr.realtime)

Request

model (string, optional): default qwen3-asr-flash-realtime
language_hints (array, optional)
format (string, optional): e.g. pcm, wav
sample_rate (int, optional): e.g. 16000
chunk_ms (int, optional): frame size in milliseconds

Response

text (string): recognized transcript fragment
is_final (bool): finalization marker
usage (object, optional)

Quick start

Generate a request template:

python skills/ai/audio/aliyun-qwen-asr-realtime/scripts/prepare_realtime_asr_request.py \
  --output output/aliyun-qwen-asr-realtime/request.json

Operational guidance

Prefer 16kHz mono PCM unless your client stack requires another format.
Keep chunks small enough for responsive partial results.
If you only have recorded files, use skills/ai/audio/aliyun-qwen-asr/ instead.

References

references/sources.md

related-skills.json

same repository

aliyun-arms-query.md

from "cinience/alicloud-skills"

Use when querying distributed traces or application metrics in Alibaba Cloud ARMS (Application Real-Time Monitoring Service). Use for trace search by service/duration/tags, trace detail and method stack retrieval, application listing, and performance metrics queries.

2026-05-30391

aliyun-arms-query-test.md

from "cinience/alicloud-skills"

Smoke test for aliyun-arms-query skill. Validates script compilation and basic SDK client initialization.

2026-05-30391

aliyun-cosyvoice-voice-clone.md

from "cinience/alicloud-skills"

Use when creating cloned voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from reference audio and then reusing the returned voice_id in later TTS calls.

2026-04-27391

aliyun-cosyvoice-voice-design.md

from "cinience/alicloud-skills"

Use when designing custom voices with Alibaba Cloud Model Studio CosyVoice customization models, especially cosyvoice-v3.5-plus or cosyvoice-v3.5-flash, from a voice prompt plus preview text before using the returned voice_id in TTS.

2026-04-27391

aliyun-qwen-asr.md

from "cinience/alicloud-skills"

Use when transcribing non-realtime speech with Alibaba Cloud Model Studio Qwen ASR models (`qwen3-asr-flash`, `qwen-audio-asr`, `qwen3-asr-flash-filetrans`). Use when converting recorded audio files to text, generating transcripts with timestamps, or documenting DashScope/OpenAI-compatible ASR request and response fields.

2026-04-27391

aliyun-qwen-livetranslate.md

from "cinience/alicloud-skills"

Use when live speech translation is needed with Alibaba Cloud Model Studio Qwen LiveTranslate models, including bilingual meetings, realtime interpretation, and speech-to-speech or speech-to-text translation flows.

2026-04-27391

package.json

"author": "cinience"

"repository": "cinience/alicloud-skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name	aliyun-qwen-asr-realtime
description	Use when low-latency realtime speech recognition is needed with Alibaba Cloud Model Studio Qwen ASR Realtime models, including streaming microphone input, live captions, or duplex voice agents.
version	1.0.0

Category: provider

Model Studio Qwen ASR Realtime

Validation

mkdir -p output/aliyun-qwen-asr-realtime
python -m py_compile skills/ai/audio/aliyun-qwen-asr-realtime/scripts/prepare_realtime_asr_request.py && echo "py_compile_ok" > output/aliyun-qwen-asr-realtime/validate.txt

Pass criteria: command exits 0 and output/aliyun-qwen-asr-realtime/validate.txt is generated.

Output And Evidence

Save session payloads and response samples under output/aliyun-qwen-asr-realtime/.

Critical model names

Use one of these exact model strings:

qwen3-asr-flash-realtime
qwen3-asr-flash-realtime-2026-02-10

Use cases

Realtime subtitles and captions
Voice-agent duplex input
Streaming speech-to-text in browser or terminal clients

Prerequisites

Set DASHSCOPE_API_KEY in your environment, or add dashscope_api_key to ~/.alibabacloud/credentials.
Realtime sessions generally require WebSocket or streaming session handling in the client.

Normalized interface (asr.realtime)

Request

model (string, optional): default qwen3-asr-flash-realtime
language_hints (array, optional)
format (string, optional): e.g. pcm, wav
sample_rate (int, optional): e.g. 16000
chunk_ms (int, optional): frame size in milliseconds

Response

text (string): recognized transcript fragment
is_final (bool): finalization marker
usage (object, optional)

Quick start

Generate a request template:

python skills/ai/audio/aliyun-qwen-asr-realtime/scripts/prepare_realtime_asr_request.py \
  --output output/aliyun-qwen-asr-realtime/request.json

Operational guidance

Prefer 16kHz mono PCM unless your client stack requires another format.
Keep chunks small enough for responsive partial results.
If you only have recorded files, use skills/ai/audio/aliyun-qwen-asr/ instead.

References

references/sources.md

aliyun-qwen-asr-realtime

Model Studio Qwen ASR Realtime

Validation

Output And Evidence

Critical model names

Use cases

Prerequisites

Normalized interface (asr.realtime)

Request

Response

Quick start

Operational guidance

References

More from this repository

More from this repository

Model Studio Qwen ASR Realtime

Validation

Output And Evidence

Critical model names

Use cases

Prerequisites

Normalized interface (asr.realtime)

Request

Response

Quick start

Operational guidance

References