一键在 Manus 中运行任何 Skill

$pwd:

audify

Name: Audify
Author: jpcaparas

// Turn a readable resource into cleaned Gemini TTS audio. Use when the user wants to audify a URL, markdown note, HTML page, DOCX, or raw text into an MP3 while stripping markup, HTML, code fences, and bare URLs before synthesis. Triggers on: '/audify', 'read this aloud', 'turn this page into audio', 'make an mp3 narration', 'text to speech this resource', or when Gemini 3.1 Flash TTS with `GEMINI_API_KEY` is the right path. Do NOT trigger for music generation, live voice chat, or binary/media sources that are not meant to be narrated.

在 Manus 中运行

$ git log --oneline --stat

stars:13

forks:1

updated:2026年5月26日 06:57

文件资源管理器

19 个文件

SKILL.md

readonly

name	audify
description	Turn a readable resource into cleaned Gemini TTS audio. Use when the user wants to audify a URL, markdown note, HTML page, DOCX, or raw text into an MP3 while stripping markup, HTML, code fences, and bare URLs before synthesis. Triggers on: '/audify', 'read this aloud', 'turn this page into audio', 'make an mp3 narration', 'text to speech this resource', or when Gemini 3.1 Flash TTS with `GEMINI_API_KEY` is the right path. Do NOT trigger for music generation, live voice chat, or binary/media sources that are not meant to be narrated.
compatibility	Requires: `python3`, `ffmpeg`, and `GEMINI_API_KEY`; optional `jq` for raw REST debugging

audify

Clean a readable resource into narration-safe prose, synthesize it with Gemini 3.1 Flash TTS, and write a timestamped output folder that contains an MP3, the cleaned transcript, and a manifest.

Verified against the Gemini API speech generation docs updated April 15, 2026 and the Google Cloud blog post published April 16, 2026.

Decision Tree

What kind of input are you handling?

A URL, markdown file, HTML file, DOCX, plain text file, or raw text that should be listened to
- Run python3 scripts/audify.py ...
A resource that is mostly code, logs, tables, minified JSON, or other content that is not meant to be narrated
- Bail instead of forcing TTS. Explain why it is not a good narration target.
A request where voice or nuance is materially ambiguous
- Ask one short question before synthesis.
- Use: "Which voice and delivery should I use? If you do not care, I will use Kore with a clear neutral narrator style."
A request where the user does not care about style details
- Default to Kore plus a clear neutral narrator nuance.
A longer job with multiple chunks or a large cleaned transcript
- Tell the user up front that multi-chunk TTS can take a few minutes and that quiet periods are normal.
- Do not keep narrating every short poll. Prefer one expectation-setting update, then wait for chunk progress or completion.
Missing GEMINI_API_KEY, unavailable gemini-3.1-flash-tts-preview, missing ffmpeg, or exhausted read attempts
- Bail with the concrete failed prerequisite.

Quick Reference

Task	Command	Read
URL to MP3 bundle	`python3 scripts/audify.py "https://example.com/"`	`references/patterns.md`
Local file to MP3 bundle	`python3 scripts/audify.py --file templates/sample-input.md --voice Kore --nuance "Warm documentary narrator"`	`references/patterns.md`
Raw text from stdin	`cat templates/sample-input.md \| python3 scripts/audify.py --stdin`	`references/patterns.md`
Clean-only suitability check	`python3 scripts/audify.py --url "https://example.com/" --check-only`	`references/patterns.md`
Live model and synthesis probe	`python3 scripts/probe_gemini_tts.py --mode all`	`references/api.md`

Reading Guide

If the user needs...	Read
Raw Gemini REST shape, supported models, voices, and failure codes	`references/api.md`
Env vars, key setup, `ffmpeg`, and prerequisite checks	`references/configuration.md`
URL/file/text workflows, output bundle layout, and question-asking rules	`references/patterns.md`
Failure handling, retries, and why audify should bail	`references/gotchas.md`

Operational Rules

Treat SKILL.md as the entry point and keep the tool choice narrow: scripts/audify.py for production runs, scripts/probe_gemini_tts.py for live verification.
Clean first, synthesize second. Strip markup, HTML, code fences, and bare URLs before TTS so the spoken transcript stays close to the readable text instead of the transport format.
Preserve human text whenever possible. Keep visible anchor text, headings, paragraph content, and inline prose; drop boilerplate that is structural rather than spoken.
Stop when the source is not narration-friendly. Do not read code dumps, logs, stack traces, raw tables, or binary blobs aloud just because they decoded as text.
Stop when read attempts are exhausted. Do not silently fall back from a bad fetch or undecodable file to a hallucinated summary.
Set runtime expectations before long runs. For multi-chunk TTS, tell the user a realistic range such as "often 2-6 minutes" and that silence between chunk completions is normal.
Do not badger the user with polling updates. After the initial expectation-setting message, only report meaningful state changes such as chunk progress, retries, or final completion.
Auto-split large transiently failing chunks before giving up. Keep the same voice, nuance, model, and output format while retrying with smaller chunk boundaries.

Output Contract

By default scripts/audify.py creates audify-output/<timestamp>-<slug>/ under the current working directory.

The folder contains:

audio.mp3 by default
cleaned.txt with the final spoken transcript
manifest.json with source, voice, nuance, chunking, and retry metadata
runtime expectations in both the wrapper output JSON and the status stream
fallback chunk-split metadata when a large chunk had to be retried in smaller pieces

Use --format wav when MP3 conversion is not wanted.

Gotchas

Gemini TTS returns PCM, not MP3: The Gemini API returns base64 PCM audio. This skill converts it locally with ffmpeg, so missing ffmpeg is a hard stop for the default MP3 path.
Gemini 3.1 Flash TTS can throw transient 500 errors: Google documents rare cases where the model emits text tokens instead of audio tokens. The wrapper retries transient failures with backoff.
Vague prompts can get rejected or spoken aloud: The wrapper uses an explicit "synthesize speech only" preamble and a labeled TRANSCRIPT section so instructions do not become narration.
Voice and prompt can clash: Google warns that strong speaker mismatches can sound wrong. When the user asks for a very specific persona, make sure the selected voice and nuance point in the same direction.
HTML extraction is best-effort: Blog chrome, nav text, or legal footer text can still leak through on messy pages. If the cleaned preview looks wrong, stop and ask for a narrower source.
Long silence is not the same as failure: A multi-chunk run can spend a couple of minutes inside Gemini calls and local MP3 conversion. Do not treat every quiet 30-second interval as a problem.
A single large chunk can still fail transiently: When that happens, the wrapper should split just that chunk into smaller pieces and continue with the same voice, nuance, model, and format instead of forcing a full manual rerun.

Helper Scripts

scripts/audify.py is the production wrapper for URL, file, stdin, and raw text inputs.
scripts/probe_gemini_tts.py runs safe live probes against model discovery and short synthesis.
scripts/test_audify_unit.py covers cleaner, chunking, DOCX extraction, and bail heuristics.
scripts/validate.py checks structure, cross-references, and leftover template placeholders.
scripts/test_skill.py runs structural checks, unit tests, and a live smoke probe when GEMINI_API_KEY is present.

related-skills.json

同仓库

azure-devops-create-work-item.md

from "jpcaparas/skills"

Draft a local Azure DevOps work item packet from loose context, defaulting to a Scrum Product Backlog Item unless the user specifies Bug, Feature, User Story, Task, Issue, or Epic. When run inside a repo, inspect the codebase and include relevant snippets. Save `work-item.md`, `context.md`, `sources.md`, and `metadata.json`. Do NOT use for live Azure DevOps REST/CLI creation, bulk migration, wiki authoring, or status reporting.

2026-05-2613

azure-devops-wiki-markdown.md

from "jpcaparas/skills"

Use when writing, fixing, or reviewing Azure DevOps wiki Markdown, Mermaid diagrams, `_TOC_` and `_TOSP_`, collapsible `<details>` blocks, query-table embeds, `@` mentions, work-item links, KaTeX math, HTML video embeds, code fences, or Azure DevOps surface-specific support differences across Wiki, PR, README, Widget, and Done fields. Triggers on Azure DevOps wiki, markdown guidance, Mermaid sequence/graph/timeline/ER diagrams, proposal decision trees, table-of-subpages, query-table, code fence aliases, line-break bugs, and wiki page formatting. Do NOT use for GitHub-only Markdown, generic Mermaid authoring outside Azure DevOps, or non-Azure documentation platforms.

2026-05-2613

better-writing.md

from "jpcaparas/skills"

Modern writing system for any prose humans read: docs, essays, explainers, PRs, memos, reports, newsletters, UI copy, and landing pages. Starts with Strunk's durable clarity rules, then routes into revision passes for structure, cadence, voice, genericity cleanup, and style-family calibration across technical, analytical, editorial, reflective, and conversion writing. Triggers on: 'rewrite this', 'make this clearer', 'tighten this', 'this sounds stiff', 'fix the voice', 'improve this memo', 'improve this doc', 'sharpen this essay', 'improve this landing page copy'. Do NOT trigger for code-only work, raw research without prose, or factual verification tasks that do not need writing help.

2026-05-2613

client-report-from-commits.md

from "jpcaparas/skills"

Turn git commits and diffs since an exact start date into a copy-pastable, non-technical client report grouped by feature. Use when the user wants a client update, accomplishment summary, weekly progress note, stakeholder recap, or high-level status report based on git history. Trigger on: client report from commits, summarize git diff since a date, weekly update from git log, non-technical accomplishments, stakeholder-ready changelog. Do NOT trigger for technical release notes, code review, or any request where the date is still ambiguous.

2026-05-2613

eli12.md

from "jpcaparas/skills"

Surgically explain how a codebase, subsystem, feature flow, or cluster of files works in clear, accessible language. Use when the user asks 'how does this work?', wants a code walkthrough, needs architecture explained simply, or wants a digestible mental model with grounded real-world analogies. Trigger on: 'explain this codebase', 'walk me through the flow', 'how does X work', 'what happens when', 'eli12', 'explain it simply', 'make this architecture easier to understand'. Do NOT use for bug triage, code review, literal children's storytelling, or requests that primarily need code changes instead of explanation.

2026-05-2613

google-search-ai-optimization.md

from "jpcaparas/skills"

Google-grounded SEO/GEO web development skill for optimizing sites for Google Search, AI Overviews, AI Mode, and agentic browsing without unsupported hacks. Use when building or fixing websites for Search visibility, generative AI search readiness, technical SEO, crawlability, indexability, snippets, structured data, ecommerce/local details, or browser-agent usability. Triggers on: Google AI search, AI Overviews, AI Mode, GEO, AEO, SEO implementation, crawlable content, snippets, llms.txt myths, structured data, Merchant Center, Business Profile, agent-friendly website. Do NOT use for paid ads, generic rank tracking, non-Google LLM directory tactics, or off-page link schemes.

2026-05-2613

package.json

"author": "jpcaparas"

"repository": "jpcaparas/skills"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件开发工程师计算机与数学类职业15-1252L4

name	audify
description	Turn a readable resource into cleaned Gemini TTS audio. Use when the user wants to audify a URL, markdown note, HTML page, DOCX, or raw text into an MP3 while stripping markup, HTML, code fences, and bare URLs before synthesis. Triggers on: '/audify', 'read this aloud', 'turn this page into audio', 'make an mp3 narration', 'text to speech this resource', or when Gemini 3.1 Flash TTS with `GEMINI_API_KEY` is the right path. Do NOT trigger for music generation, live voice chat, or binary/media sources that are not meant to be narrated.
compatibility	Requires: `python3`, `ffmpeg`, and `GEMINI_API_KEY`; optional `jq` for raw REST debugging

audify

Clean a readable resource into narration-safe prose, synthesize it with Gemini 3.1 Flash TTS, and write a timestamped output folder that contains an MP3, the cleaned transcript, and a manifest.

Verified against the Gemini API speech generation docs updated April 15, 2026 and the Google Cloud blog post published April 16, 2026.

Decision Tree

What kind of input are you handling?

A URL, markdown file, HTML file, DOCX, plain text file, or raw text that should be listened to
- Run python3 scripts/audify.py ...
A resource that is mostly code, logs, tables, minified JSON, or other content that is not meant to be narrated
- Bail instead of forcing TTS. Explain why it is not a good narration target.
A request where voice or nuance is materially ambiguous
- Ask one short question before synthesis.
- Use: "Which voice and delivery should I use? If you do not care, I will use Kore with a clear neutral narrator style."
A request where the user does not care about style details
- Default to Kore plus a clear neutral narrator nuance.
A longer job with multiple chunks or a large cleaned transcript
- Tell the user up front that multi-chunk TTS can take a few minutes and that quiet periods are normal.
- Do not keep narrating every short poll. Prefer one expectation-setting update, then wait for chunk progress or completion.
Missing GEMINI_API_KEY, unavailable gemini-3.1-flash-tts-preview, missing ffmpeg, or exhausted read attempts
- Bail with the concrete failed prerequisite.

Quick Reference

Task	Command	Read
URL to MP3 bundle	`python3 scripts/audify.py "https://example.com/"`	`references/patterns.md`
Local file to MP3 bundle	`python3 scripts/audify.py --file templates/sample-input.md --voice Kore --nuance "Warm documentary narrator"`	`references/patterns.md`
Raw text from stdin	`cat templates/sample-input.md \| python3 scripts/audify.py --stdin`	`references/patterns.md`
Clean-only suitability check	`python3 scripts/audify.py --url "https://example.com/" --check-only`	`references/patterns.md`
Live model and synthesis probe	`python3 scripts/probe_gemini_tts.py --mode all`	`references/api.md`

Reading Guide

If the user needs...	Read
Raw Gemini REST shape, supported models, voices, and failure codes	`references/api.md`
Env vars, key setup, `ffmpeg`, and prerequisite checks	`references/configuration.md`
URL/file/text workflows, output bundle layout, and question-asking rules	`references/patterns.md`
Failure handling, retries, and why audify should bail	`references/gotchas.md`

Operational Rules

Treat SKILL.md as the entry point and keep the tool choice narrow: scripts/audify.py for production runs, scripts/probe_gemini_tts.py for live verification.
Clean first, synthesize second. Strip markup, HTML, code fences, and bare URLs before TTS so the spoken transcript stays close to the readable text instead of the transport format.
Preserve human text whenever possible. Keep visible anchor text, headings, paragraph content, and inline prose; drop boilerplate that is structural rather than spoken.
Stop when the source is not narration-friendly. Do not read code dumps, logs, stack traces, raw tables, or binary blobs aloud just because they decoded as text.
Stop when read attempts are exhausted. Do not silently fall back from a bad fetch or undecodable file to a hallucinated summary.
Set runtime expectations before long runs. For multi-chunk TTS, tell the user a realistic range such as "often 2-6 minutes" and that silence between chunk completions is normal.
Do not badger the user with polling updates. After the initial expectation-setting message, only report meaningful state changes such as chunk progress, retries, or final completion.
Auto-split large transiently failing chunks before giving up. Keep the same voice, nuance, model, and output format while retrying with smaller chunk boundaries.

Output Contract

By default scripts/audify.py creates audify-output/<timestamp>-<slug>/ under the current working directory.

The folder contains:

audio.mp3 by default
cleaned.txt with the final spoken transcript
manifest.json with source, voice, nuance, chunking, and retry metadata
runtime expectations in both the wrapper output JSON and the status stream
fallback chunk-split metadata when a large chunk had to be retried in smaller pieces

Use --format wav when MP3 conversion is not wanted.

Gotchas

Gemini TTS returns PCM, not MP3: The Gemini API returns base64 PCM audio. This skill converts it locally with ffmpeg, so missing ffmpeg is a hard stop for the default MP3 path.
Gemini 3.1 Flash TTS can throw transient 500 errors: Google documents rare cases where the model emits text tokens instead of audio tokens. The wrapper retries transient failures with backoff.
Vague prompts can get rejected or spoken aloud: The wrapper uses an explicit "synthesize speech only" preamble and a labeled TRANSCRIPT section so instructions do not become narration.
Voice and prompt can clash: Google warns that strong speaker mismatches can sound wrong. When the user asks for a very specific persona, make sure the selected voice and nuance point in the same direction.
HTML extraction is best-effort: Blog chrome, nav text, or legal footer text can still leak through on messy pages. If the cleaned preview looks wrong, stop and ask for a narrower source.
Long silence is not the same as failure: A multi-chunk run can spend a couple of minutes inside Gemini calls and local MP3 conversion. Do not treat every quiet 30-second interval as a problem.
A single large chunk can still fail transiently: When that happens, the wrapper should split just that chunk into smaller pieces and continue with the same voice, nuance, model, and format instead of forcing a full manual rerun.

Helper Scripts

scripts/audify.py is the production wrapper for URL, file, stdin, and raw text inputs.
scripts/probe_gemini_tts.py runs safe live probes against model discovery and short synthesis.
scripts/test_audify_unit.py covers cleaner, chunking, DOCX extraction, and bail heuristics.
scripts/validate.py checks structure, cross-references, and leftover template placeholders.
scripts/test_skill.py runs structural checks, unit tests, and a live smoke probe when GEMINI_API_KEY is present.

audify

audify

Decision Tree

Quick Reference

Reading Guide

Operational Rules

Output Contract

Gotchas

Helper Scripts

同仓库更多 Skills

同仓库更多 Skills

audify

Decision Tree

Quick Reference

Reading Guide

Operational Rules

Output Contract

Gotchas

Helper Scripts