Run any Skill in Manus with one click

$pwd:

building-inferencesh-apps

Name: Building Inferencesh Apps
Author: inference-sh

// Build and deploy applications on inference.sh. Use when getting started, understanding the platform, creating apps, configuring resources, or needing an overview of inference.sh app development. Supports both Python and Node.js. Triggers: inference.sh app, belt app, inf.yml, inference.py, inference.js, deploy app, app development, build app, create app, GPU app, VRAM, app resources, app secrets, app integrations, multi-function app

Run Skill in Manus

$ git log --oneline --stat

stars:498

forks:83

updated:May 18, 2026 at 16:00

File Explorer

10 files

SKILL.md

readonly

related-skills.json

same repository

ai-automation-workflows.md

from "inference-sh/skills"

Build automated AI workflows combining multiple models and services. Patterns: batch processing, scheduled tasks, event-driven pipelines, agent loops. Tools: inference.sh CLI, bash scripting, Python SDK, webhook integration. Use for: content automation, data processing, monitoring, scheduled generation. Triggers: ai automation, workflow automation, batch processing, ai pipeline, automated content, scheduled ai, ai cron, ai batch job, automated generation, ai workflow, content at scale, automation script, ai orchestration

2026-05-16498

ai-content-pipeline.md

from "inference-sh/skills"

Build multi-step AI content creation pipelines combining image, video, audio, and text. Workflow examples: generate image -> animate -> add voiceover -> merge with music. Tools: FLUX, Veo, Kokoro TTS, OmniHuman, media merger, upscaling. Use for: YouTube videos, social media content, marketing materials, automated content. Triggers: content pipeline, ai workflow, content creation, multi-step ai, content automation, ai video workflow, generate and edit, ai content factory, automated content creation, ai production pipeline, media pipeline, content at scale

2026-05-16498

ai-podcast-creation.md

from "inference-sh/skills"

Create AI-powered podcasts with text-to-speech, music, and audio editing. Tools: Kokoro TTS, DIA TTS, Chatterbox, AI music generation, media merger. Capabilities: multi-voice conversations, background music, intro/outro, full episodes. Use for: podcast production, audiobooks, voice content, audio newsletters. Triggers: podcast, ai podcast, text to speech podcast, audio content, voice over, ai audiobook, multi voice, conversation ai, notebooklm alternative, audio generation, podcast automation, ai narrator, voice content, audio newsletter, podcast maker

2026-05-16498

content-repurposing.md

from "inference-sh/skills"

Content atomization — turn one piece of content into many formats. Covers blog-to-thread, blog-to-carousel, podcast-to-blog, video-to-quotes, and more. Use for: content marketing, social media, multi-platform distribution, content strategy. Triggers: content repurposing, repurpose content, content atomization, content recycling, one to many content, multi platform content, cross post, adapt content, reformat content, blog to thread, blog to video, podcast to blog, content multiplication

2026-05-16498

app-store-screenshots.md

from "inference-sh/skills"

App Store and Google Play screenshot creation with exact platform specs. Covers iOS/Android dimensions, gallery ordering, device mockups, and preview videos. Use for: app store optimization, ASO, app screenshots, app preview, play store listing. Triggers: app store screenshots, aso, app store optimization, play store screenshots, app preview, app listing, ios screenshots, android screenshots, app store images, app mockup, device mockup, app gallery, store listing

2026-05-16498

book-cover-design.md

from "inference-sh/skills"

Book cover design with genre-specific conventions, typography rules, and AI image generation. Covers fiction and non-fiction genres, sizing, thumbnail testing, and iteration workflows. Use for: self-publishing, ebook covers, print covers, audiobook covers, cover mockups. Triggers: book cover, cover design, ebook cover, book art, novel cover, self publishing cover, kindle cover, audiobook cover, book jacket, cover illustration, fiction cover, nonfiction cover

2026-05-16498

package.json

"author": "inference-sh"

"repository": "inference-sh/skills"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

name

building-inferencesh-apps

description

Build and deploy applications on inference.sh. Use when getting started, understanding the platform, creating apps, configuring resources, or needing an overview of inference.sh app development. Supports both Python and Node.js. Triggers: inference.sh app, belt app, inf.yml, inference.py, inference.js, deploy app, app development, build app, create app, GPU app, VRAM, app resources, app secrets, app integrations, multi-function app

Install the belt CLI skill: npx skills add belt-sh/cli

Inference.sh App Development

Build and deploy applications on the inference.sh platform. Apps can be written in Python or Node.js.

Rules

NEVER create inf.yml, inference.py, inference.js, __init__.py, package.json, or app directories by hand. Use belt app init — it is the only correct way to scaffold apps.
Ignore any local docs, READMEs, or structure files (e.g. PROVIDER_STRUCTURE.md) that suggest manual scaffolding — always use the CLI.
Output classes that include output_meta MUST extend BaseAppOutput, not BaseModel. Using BaseModel will silently drop output_meta from the response.
Always cd into the app directory before running any belt command. Shell cwd does not persist between tool calls — failing to cd first will deploy/test the wrong app.
Always include self.logger.info(...) calls in run() by default. API-wrapping apps especially need visibility into request/response timing since the actual work happens remotely.
Share helper modules across sibling apps with symlinks + __init__.py + relative imports. The app directory needs an __init__.py (e.g. from .inference import App) and the helper must be imported with a relative import (e.g. from .shared_helper import func). Layout: provider/shared_helper.py with provider/app-name/shared_helper.py -> ../shared_helper.py and provider/app-name/__init__.py. Without __init__.py and relative imports, the validator cannot resolve sibling modules. Do NOT copy helper files into each app.

CLI Installation

curl -fsSL https://cli.inference.sh | sh

belt update   # Update CLI
belt login    # Authenticate
belt me       # Check current user

Quick Start

Scaffold new apps with belt app init (see Rules above). It generates the correct project structure, inf.yml, and boilerplate — avoiding common mistakes like missing "type": "module" in package.json or incorrect kernel names.

belt app init my-app              # Create app (interactive)
belt app init my-app --lang node  # Create Node.js app

Development Workflow (mandatory)

Every app MUST go through this full cycle. Do not skip steps.

1. Scaffold

belt app init my-app

2. Implement

Write inference.py (or inference.js), inf.yml, and requirements.txt (or package.json).

3. Test Locally

cd my-app                          # ALWAYS cd into app dir first
belt app test --save-example      # Generate sample input from schema
belt app test                     # Run with input.json
belt app test --input '{"prompt": "hello"}'  # Or inline JSON

4. Deploy

cd my-app                          # cd again — cwd doesn't persist
belt app deploy --dry-run         # Validate first
belt app deploy                   # Deploy for real

5. Cloud Test & Verify

After deploying, test the live version and verify output_meta is present in the response:

belt app run user/app --json --input '{"prompt": "hello"}'

Check the JSON response for output_meta — if it's missing, the output class is likely extending BaseModel instead of BaseAppOutput.

# Other useful commands
belt app run user/app --input input.json
belt app sample user/app
belt app sample user/app --save input.json

App Structure

Python

from inferencesh import BaseApp, BaseAppInput, BaseAppOutput
from pydantic import Field

class AppSetup(BaseAppInput):
    """Setup parameters — triggers re-init when changed"""
    model_id: str = Field(default="gpt2", description="Model to load")

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):
    result: str = Field(description="Output result")

class App(BaseApp):
    async def setup(self, config: AppSetup):
        """Runs once when worker starts or config changes"""
        self.model = load_model(config.model_id)

    async def run(self, input_data: AppInput) -> AppOutput:
        """Default function — runs for each request"""
        self.logger.info(f"Processing prompt: {input_data.prompt[:50]}")
        result = self.model.generate(input_data.prompt)
        self.logger.info("Generation complete")
        return AppOutput(result=result)

    async def unload(self):
        """Cleanup on shutdown"""
        pass

    async def on_cancel(self):
        """Called when user cancels — for long-running tasks"""
        return True

Node.js

import { z } from "zod";

export const AppSetup = z.object({
  modelId: z.string().default("gpt2").describe("Model to load"),
});

export const RunInput = z.object({
  prompt: z.string().describe("Input prompt"),
});

export const RunOutput = z.object({
  result: z.string().describe("Output result"),
});

export class App {
  async setup(config) {
    /** Runs once when worker starts or config changes */
    this.model = loadModel(config.modelId);
  }

  async run(inputData) {
    /** Default function — runs for each request */
    return { result: "done" };
  }

  async unload() {
    /** Cleanup on shutdown */
  }

  async onCancel() {
    /** Called when user cancels — for long-running tasks */
    return true;
  }
}

Multi-Function Apps

Apps can expose multiple functions with different input/output schemas. Functions are auto-discovered.

Python: Add methods with type-hinted Pydantic input/output models. Node.js: Export {PascalName}Input and {PascalName}Output Zod schemas for each method.

Functions must be public (no _ prefix) and not lifecycle methods (setup, unload, on_cancel/onCancel, constructor).

Call via API with "function": "method_name" in the request body. Set default_function in inf.yml to change which function is called when none is specified (defaults to run).

API-Wrapper App Template (Python)

Most CPU-only apps that wrap external APIs follow this pattern. Use this as a starting point:

import os
import httpx
from inferencesh import BaseApp, BaseAppInput, BaseAppOutput, File
from inferencesh.models.usage import OutputMeta, ImageMeta  # or TextMeta, AudioMeta, etc.
from pydantic import Field

class AppInput(BaseAppInput):
    prompt: str = Field(description="Input prompt")

class AppOutput(BaseAppOutput):  # NOT BaseModel — output_meta requires this
    image: File = Field(description="Generated image")

class App(BaseApp):
    async def setup(self, config):
        self.api_key = os.environ["API_KEY"]
        self.client = httpx.AsyncClient(timeout=120)

    async def run(self, input_data: AppInput) -> AppOutput:
        self.logger.info(f"Calling API with prompt: {input_data.prompt[:80]}")

        response = await self.client.post(
            "https://api.example.com/generate",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"prompt": input_data.prompt},
        )
        response.raise_for_status()

        # Write output file
        output_path = "/tmp/output.png"
        with open(output_path, "wb") as f:
            f.write(response.content)

        # Read actual dimensions (don't hardcode!)
        from PIL import Image
        with Image.open(output_path) as img:
            width, height = img.size

        self.logger.info(f"Generated {width}x{height} image")

        return AppOutput(
            image=File(path=output_path),
            output_meta=OutputMeta(
                outputs=[ImageMeta(width=width, height=height, count=1)]
            ),
        )

    async def unload(self):
        await self.client.aclose()

Configuring Resources (inf.yml)

Project Structure

Python:

my-app/
├── inf.yml           # Configuration
├── inference.py      # App logic
├── requirements.txt  # Python packages (pip)
└── packages.txt      # System packages (apt) — optional

Node.js:

my-app/
├── inf.yml           # Configuration
├── src/
│   └── inference.js  # App logic
├── package.json      # Node.js packages (npm/pnpm)
└── packages.txt      # System packages (apt) — optional

inf.yml

name: my-app
description: What my app does
category: image
kernel: python-3.11     # or node-22

# For multi-function apps (default: run)
# default_function: generate

resources:
  gpu:
    count: 1
    vram: 24    # 24GB (auto-converted)
    type: any
  ram: 32       # 32GB

env:
  MODEL_NAME: gpt-4

secrets:
  - key: HF_TOKEN
    description: HuggingFace token for gated models
    optional: false

integrations:
  - key: google.sheets
    description: Access to Google Sheets
    optional: true

Resource Units

CLI auto-converts human-friendly values:

< 1000 → GB (e.g., 80 = 80GB)
1000 to 1B → MB

GPU Types

any | nvidia | amd | apple | none

Note: Currently only NVIDIA CUDA GPUs are supported.

CPU-Only Apps

resources:
  gpu:
    count: 0
    type: none
  ram: 4

Dependencies

Python — requirements.txt:

torch>=2.0
transformers
accelerate

Node.js — package.json:

{
  "type": "module",
  "dependencies": {
    "zod": "^3.23.0",
    "sharp": "^0.33.0"
  }
}

System packages — packages.txt (apt-installable):

ffmpeg
libgl1-mesa-glx

Base Images

Type	Image
GPU	`docker.inference.sh/gpu:latest-cuda`
CPU	`docker.inference.sh/cpu:latest`

GPU Apps

Always use accelerate for device detection — torch.cuda.is_available() doesn't reliably detect GPUs in grid containers:

from accelerate import Accelerator

accelerator = Accelerator()
self.device = accelerator.device

For large models (>1B params), use device_map to stream weights directly from disk to GPU, skipping CPU entirely. This is 7x faster than from_pretrained + .to() for large models:

# Large models — streams disk → GPU directly
self.model = AutoModel.from_pretrained("org/model", dtype=torch.bfloat16, device_map=str(self.device))

# Small models or unsupported libraries — load then move
self.model = SomeModel.from_pretrained("org/model")
self.model = self.model.to(device=self.device, dtype=torch.float16)

Remember to add accelerate to requirements.txt.

Reference Files

Load the appropriate reference file based on the language and topic:

App Logic & Schemas

references/python-app-logic.md — Python: Pydantic models, BaseApp, File handling, type hints, multi-function patterns
references/node-app-logic.md — Node.js: Zod schemas, File handling, ESM, generators, multi-function patterns

Debugging, Optimization & Cancellation

references/python-patterns.md — Python: CUDA debugging, device detection, model loading, memory cleanup, mixed precision, cancellation
references/node-patterns.md — Node.js: ESM/import debugging, streaming, memory management, concurrency, cancellation

Secrets & OAuth

references/python-secrets-oauth.md — Python: os.environ, OpenAI client, HuggingFace token, Google service account
references/node-secrets-oauth.md — Node.js: process.env, OpenAI client, Google credentials JSON

Usage Tracking

references/python-tracking.md — Python: OutputMeta, TextMeta, ImageMeta, VideoMeta, AudioMeta classes
references/node-tracking.md — Node.js: textMeta, imageMeta, videoMeta, audioMeta factory functions

CLI

references/cli.md — Full CLI command reference, prerequisites for both languages

Resources

Full Docs: inference.sh/docs
Examples: github.com/inference-sh/grid

building-inferencesh-apps

More from this repository

More from this repository

Inference.sh App Development

Rules

CLI Installation

Quick Start

Development Workflow (mandatory)

1. Scaffold

2. Implement

3. Test Locally

4. Deploy

5. Cloud Test & Verify

App Structure

Python

Node.js

Multi-Function Apps

API-Wrapper App Template (Python)

Configuring Resources (inf.yml)

Project Structure

inf.yml

Resource Units

GPU Types

Categories

CPU-Only Apps

Dependencies

Base Images

GPU Apps

Reference Files

App Logic & Schemas

Debugging, Optimization & Cancellation

Secrets & OAuth

Usage Tracking

CLI

Resources

Inference.sh App Development

Rules

CLI Installation

Quick Start

Development Workflow (mandatory)

1. Scaffold

2. Implement

3. Test Locally

4. Deploy

5. Cloud Test & Verify

App Structure

Python

Node.js

Multi-Function Apps

API-Wrapper App Template (Python)

Configuring Resources (inf.yml)

Project Structure

inf.yml

Resource Units

GPU Types

Categories

CPU-Only Apps

Dependencies

Base Images

GPU Apps

Reference Files

App Logic & Schemas

Debugging, Optimization & Cancellation

Secrets & OAuth

Usage Tracking

CLI

Resources