一键在 Manus 中运行任何 Skill

$pwd:

gem

Name: Gem
Author: hamelsmu

// Multimodal AI processing and image generation using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Also generates images with Nano Banana Pro. Ideal when you need to extract information from files that require vision or multimodal understanding, or generate images from text prompts.

在 Manus 中运行

$ git log --oneline --stat

stars:57

forks:5

updated:2026年3月6日 20:42

SKILL.md

readonly

name	gem
description	Multimodal AI processing and image generation using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Also generates images with Nano Banana Pro. Ideal when you need to extract information from files that require vision or multimodal understanding, or generate images from text prompts.

Gemini Multimodal Tool

Use the ai-gem CLI tool for multimodal AI processing and image generation via Google's Gemini API.

Usage

# Text queries
ai-gem "Write a haiku about Python programming"

# Analyze documents
ai-gem "Summarize this document" document.pdf

# Analyze images
ai-gem "What's in this image?" photo.jpg

# Process YouTube videos
ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID"

# Compare multiple files
ai-gem "Compare these files" file1.pdf file2.png

# Web search
ai-gem "Current AI news" --search

# Generate images (uses Nano Banana Pro by default)
ai-gem --image "A cute robot reading a book in a cozy library"
ai-gem --image "A landscape at sunset" --aspect-ratio 16:9
ai-gem --image "A cat wearing a hat" -o cat.png
ai-gem --image "Edit this to add sunglasses" reference.jpg

# Use alternative image model
ai-gem --image "A blue triangle" -m gemini-2.5-flash-image

Image Generation Options

--image / -i: Generate an image instead of text
--output / -o: Output file path (auto-generated if omitted)
--aspect-ratio / -a: Aspect ratio (1:1, 9:16, 16:9, etc.)
--model / -m: Override model (default: nano-banana-pro-preview)
Attachments serve as reference images for editing

Requirements

GEMINI_API_KEY environment variable must be set
The hamel package must be installed: pip install hamel

Supported Input Types

PDFs
Images (PNG, JPEG, GIF, WebP)
Videos (MP4, etc.)
YouTube URLs
Plain text files
Multiple files for comparison

related-skills.json

同仓库

youtube.md

from "hamelsmu/hamel"

Manage your YouTube channel — upload, list, edit metadata, schedule/unschedule videos, set thumbnails, download your own private videos, get transcripts, generate AI chapter summaries, and post comments (with a Chrome-automation playbook for pinning). Use when asked to upload to YouTube, schedule a video, edit video metadata, download a private YouTube video, get a transcript, generate chapters, or post/pin a comment.

2026-04-1757

x.md

from "hamelsmu/hamel"

Unified X (Twitter) CLI — fetch follows, diff snapshots, get likes/bookmarks, fetch latest posts, and take screenshots. Uses the official X API v2 with Bearer Token and OAuth 2.0 user-context auth.

2026-04-0357

annotate-talk.md

from "hamelsmu/hamel"

Create annotated blog posts from technical talks with slides. Use when asked to convert a video presentation to a blog post, create written content from a talk, or annotate slides with transcript.

2026-03-0657

kit.md

from "hamelsmu/hamel"

Fetch Kit (ConvertKit) newsletter broadcasts for writing context. Use when asked to download newsletters, get past email content for style reference, or fetch broadcasts for analysis.

2026-01-0257

package.json

"author": "hamelsmu"

"repository": "hamelsmu/hamel"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

数据科学家计算机与数学类职业15-2051L4

name	gem
description	Multimodal AI processing and image generation using Google Gemini. Use for analyzing PDFs, images, videos, YouTube links, and other large documents. Also generates images with Nano Banana Pro. Ideal when you need to extract information from files that require vision or multimodal understanding, or generate images from text prompts.

Gemini Multimodal Tool

Use the ai-gem CLI tool for multimodal AI processing and image generation via Google's Gemini API.

Usage

# Text queries
ai-gem "Write a haiku about Python programming"

# Analyze documents
ai-gem "Summarize this document" document.pdf

# Analyze images
ai-gem "What's in this image?" photo.jpg

# Process YouTube videos
ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID"

# Compare multiple files
ai-gem "Compare these files" file1.pdf file2.png

# Web search
ai-gem "Current AI news" --search

# Generate images (uses Nano Banana Pro by default)
ai-gem --image "A cute robot reading a book in a cozy library"
ai-gem --image "A landscape at sunset" --aspect-ratio 16:9
ai-gem --image "A cat wearing a hat" -o cat.png
ai-gem --image "Edit this to add sunglasses" reference.jpg

# Use alternative image model
ai-gem --image "A blue triangle" -m gemini-2.5-flash-image

Image Generation Options

--image / -i: Generate an image instead of text
--output / -o: Output file path (auto-generated if omitted)
--aspect-ratio / -a: Aspect ratio (1:1, 9:16, 16:9, etc.)
--model / -m: Override model (default: nano-banana-pro-preview)
Attachments serve as reference images for editing

Requirements

GEMINI_API_KEY environment variable must be set
The hamel package must be installed: pip install hamel

Supported Input Types

PDFs
Images (PNG, JPEG, GIF, WebP)
Videos (MP4, etc.)
YouTube URLs
Plain text files
Multiple files for comparison

gem

Gemini Multimodal Tool

Usage

Image Generation Options

Requirements

Supported Input Types

同仓库更多 Skills

同仓库更多 Skills

Gemini Multimodal Tool

Usage

Image Generation Options

Requirements

Supported Input Types