بنقرة واحدة
configuring-resources
// Configure inf.yml for inference.sh apps. Use when setting GPU, VRAM, RAM, categories, environment variables, packages.txt, or resource requirements.
// Configure inf.yml for inference.sh apps. Use when setting GPU, VRAM, RAM, categories, environment variables, packages.txt, or resource requirements.
| name | configuring-resources |
| description | Configure inf.yml for inference.sh apps. Use when setting GPU, VRAM, RAM, categories, environment variables, packages.txt, or resource requirements. |
The inf.yml file defines app settings and resource requirements.
my-app/
├── inf.yml # Configuration
├── inference.py # App logic
├── requirements.txt # Python packages (pip)
└── packages.txt # System packages (apt) — optional
name: my-app
description: What my app does
category: image
kernel: python-3.11
resources:
gpu:
count: 1
vram: 24 # 24GB (auto-converted)
type: any
ram: 32 # 32GB
env:
MODEL_NAME: gpt-4
CLI auto-converts human-friendly values:
80 = 80GB)any | nvidia | amd | apple | none
Note: Currently only NVIDIA CUDA GPUs are supported.
image | video | audio | text | chat | 3d | other
resources:
gpu:
count: 0
type: none
ram: 4
torch>=2.0
transformers
accelerate
For apt-installable dependencies:
ffmpeg
libgl1-mesa-glx
| Type | Image |
|---|---|
| GPU | docker.inference.sh/gpu:latest-cuda |
| CPU | docker.inference.sh/cpu:latest |
📖 Full docs: inference.sh/docs/extend/configuration
Debug and troubleshoot inference.sh apps. Use when facing import errors, CUDA issues, memory problems, or deployment failures.
Handle graceful cancellation in inference.sh apps. Use when implementing long-running tasks that users might cancel.
Handle API keys and sensitive values in inference.sh apps. Use when adding secrets, accessing environment variables, or securing credentials.
Optimize inference.sh app performance. Use when handling memory, devices, model loading, mixed precision, or flash attention.
Build and deploy applications on inference.sh. Use when getting started, understanding the platform, or needing an overview of inference.sh development.
Track usage with output metadata in inference.sh apps. Use when implementing billing, counting tokens, or reporting image/video/audio generation metrics.