Run any Skill in Manus with one click

$pwd:

flox-cuda

Name: Flox Cuda
Author: flox

// CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development.

Run Skill in Manus

$ git log --oneline --stat

stars:11

forks:1

updated:November 6, 2025 at 21:20

SKILL.md

readonly

related-skills.json

same repository

flox-environments.md

from "flox/flox-agentic"

Manage reproducible development environments with Flox. **ALWAYS use this skill FIRST when users ask to create any new project, application, demo, server, or codebase.** Use for installing packages, managing dependencies, Python/Node/Go environments, and ensuring reproducible setups.

2026-04-1511

flox-builds.md

from "flox/flox-agentic"

Building and packaging applications with Flox. Use for manifest builds, Nix expression builds, sandbox modes, multi-stage builds, and packaging assets.

2025-11-2511

flox-sharing.md

from "flox/flox-agentic"

Sharing and composing Flox environments. Use for environment composition, remote environments, FloxHub, and team collaboration patterns.

2025-11-2511

flox-publish.md

from "flox/flox-agentic"

Use for publishing user packages to flox for use in Flox environments. Use for package distribution and sharing of builds defined in a flox environment.

2025-11-2411

flox-containers.md

from "flox/flox-agentic"

Containerizing Flox environments with Docker/Podman. Use for creating container images, OCI exports, multi-stage builds, and deployment workflows.

2025-11-0611

flox-services.md

from "flox/flox-agentic"

Running services and background processes in Flox environments. Use for service configuration, network services, logging, database setup, and service debugging.

2025-11-0611

package.json

"author": "flox"

"repository": "flox/flox-agentic"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Network and Computer Systems AdministratorsComputer and Mathematical Occupations15-1244L4

name	flox-cuda
description	CUDA and GPU development with Flox. Use for NVIDIA CUDA setup, GPU computing, deep learning frameworks, cuDNN, and cross-platform GPU/CPU development.

Flox CUDA Development Guide

Prerequisites & Authentication

Sign up for early access at https://flox.dev
Authenticate with flox auth login
Linux-only: CUDA packages only work on ["aarch64-linux", "x86_64-linux"]
All CUDA packages are prefixed with flox-cuda/ in the catalog
No macOS support: Use Metal alternatives on Darwin

Core Commands

# Search for CUDA packages
flox search cudatoolkit --all | grep flox-cuda
flox search nvcc --all | grep 12_8

# Show available versions
flox show flox-cuda/cudaPackages.cudatoolkit

# Install CUDA packages
flox install flox-cuda/cudaPackages_12_8.cuda_nvcc
flox install flox-cuda/cudaPackages.cuda_cudart

# Verify installation
nvcc --version
nvidia-smi

Package Discovery

# Search for CUDA toolkit
flox search cudatoolkit --all | grep flox-cuda

# Search for specific versions
flox search nvcc --all | grep 12_8

# Show all available versions
flox show flox-cuda/cudaPackages.cudatoolkit

# Search for CUDA libraries
flox search libcublas --all | grep flox-cuda
flox search cudnn --all | grep flox-cuda

Essential CUDA Packages

Package Pattern	Purpose	Example
`cudaPackages_X_Y.cudatoolkit`	Main CUDA Toolkit	`cudaPackages_12_8.cudatoolkit`
`cudaPackages_X_Y.cuda_nvcc`	NVIDIA C++ Compiler	`cudaPackages_12_8.cuda_nvcc`
`cudaPackages.cuda_cudart`	CUDA Runtime API	`cuda_cudart`
`cudaPackages_X_Y.libcublas`	Linear algebra	`cudaPackages_12_8.libcublas`
`cudaPackages_X_Y.libcufft`	Fast Fourier Transform	`cudaPackages_12_8.libcufft`
`cudaPackages_X_Y.libcurand`	Random number generation	`cudaPackages_12_8.libcurand`
`cudaPackages_X_Y.cudnn_9_11`	Deep neural networks	`cudaPackages_12_8.cudnn_9_11`
`cudaPackages_X_Y.nccl`	Multi-GPU communication	`cudaPackages_12_8.nccl`

Critical: Conflict Resolution

CUDA packages have LICENSE file conflicts requiring explicit priorities:

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]
cuda_nvcc.priority = 1                    # Highest priority

cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]
cuda_cudart.priority = 2

cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]
cudatoolkit.priority = 3                  # Lower for LICENSE conflicts

gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped"  # For libstdc++
gcc-unwrapped.priority = 5

CUDA Version Selection

CUDA 12.x (Current)

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cudatoolkit.pkg-path = "flox-cuda/cudaPackages_12_8.cudatoolkit"
cudatoolkit.priority = 3
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]

CUDA 11.x (Legacy Support)

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_11_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cudatoolkit.pkg-path = "flox-cuda/cudaPackages_11_8.cudatoolkit"
cudatoolkit.priority = 3
cudatoolkit.systems = ["aarch64-linux", "x86_64-linux"]

Cross-Platform GPU Development

Dual CUDA/CPU packages for portability (Linux gets CUDA, macOS gets CPU fallback):

[install]
## CUDA packages (Linux only)
cuda-pytorch.pkg-path = "flox-cuda/python3Packages.torch"
cuda-pytorch.systems = ["x86_64-linux", "aarch64-linux"]
cuda-pytorch.priority = 1

## Non-CUDA packages (macOS + Linux fallback)
pytorch.pkg-path = "python313Packages.pytorch"
pytorch.systems = ["x86_64-darwin", "aarch64-darwin"]
pytorch.priority = 6                     # Lower priority

GPU Detection Pattern

Dynamic CPU/GPU package installation in hooks:

setup_gpu_packages() {
  venv="$FLOX_ENV_CACHE/venv"

  if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
    if lspci 2>/dev/null | grep -E 'NVIDIA|AMD' > /dev/null; then
      echo "GPU detected, installing CUDA packages"
      uv pip install --python "$venv/bin/python" \
        torch torchvision --index-url https://download.pytorch.org/whl/cu129
    else
      echo "No GPU detected, installing CPU packages"
      uv pip install --python "$venv/bin/python" \
        torch torchvision --index-url https://download.pytorch.org/whl/cpu
    fi
    touch "$FLOX_ENV_CACHE/.deps_installed"
  fi
}

Complete CUDA Environment Examples

Basic CUDA Development

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]

gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5

[vars]
CUDA_VERSION = "12.8"
CUDA_HOME = "$FLOX_ENV"

[hook]
echo "CUDA $CUDA_VERSION environment ready"
echo "nvcc: $(nvcc --version | grep release)"

Deep Learning with PyTorch

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]

libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]

cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
cudnn.priority = 2
cudnn.systems = ["aarch64-linux", "x86_64-linux"]

python313Full.pkg-path = "python313Full"
uv.pkg-path = "uv"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5

[vars]
CUDA_VERSION = "12.8"
PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"

[hook]
setup_pytorch_cuda() {
  venv="$FLOX_ENV_CACHE/venv"

  if [ ! -d "$venv" ]; then
    uv venv "$venv" --python python3
  fi

  if [ -f "$venv/bin/activate" ]; then
    source "$venv/bin/activate"
  fi

  if [ ! -f "$FLOX_ENV_CACHE/.deps_installed" ]; then
    uv pip install --python "$venv/bin/python" \
      torch torchvision torchaudio \
      --index-url https://download.pytorch.org/whl/cu129
    touch "$FLOX_ENV_CACHE/.deps_installed"
  fi
}

setup_pytorch_cuda

TensorFlow with CUDA

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]

cudnn.pkg-path = "flox-cuda/cudaPackages_12_8.cudnn_9_11"
cudnn.priority = 2
cudnn.systems = ["aarch64-linux", "x86_64-linux"]

python313Full.pkg-path = "python313Full"
uv.pkg-path = "uv"

[hook]
setup_tensorflow() {
  venv="$FLOX_ENV_CACHE/venv"
  [ ! -d "$venv" ] && uv venv "$venv" --python python3
  [ -f "$venv/bin/activate" ] && source "$venv/bin/activate"

  if [ ! -f "$FLOX_ENV_CACHE/.tf_installed" ]; then
    uv pip install --python "$venv/bin/python" tensorflow[and-cuda]
    touch "$FLOX_ENV_CACHE/.tf_installed"
  fi
}

setup_tensorflow

Multi-GPU Development

[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

nccl.pkg-path = "flox-cuda/cudaPackages_12_8.nccl"
nccl.priority = 2
nccl.systems = ["aarch64-linux", "x86_64-linux"]

libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]

[vars]
CUDA_VISIBLE_DEVICES = "0,1,2,3"  # All GPUs
NCCL_DEBUG = "INFO"

Modular CUDA Environments

Base CUDA Environment

# team/cuda-base
[install]
cuda_nvcc.pkg-path = "flox-cuda/cudaPackages_12_8.cuda_nvcc"
cuda_nvcc.priority = 1
cuda_nvcc.systems = ["aarch64-linux", "x86_64-linux"]

cuda_cudart.pkg-path = "flox-cuda/cudaPackages.cuda_cudart"
cuda_cudart.priority = 2
cuda_cudart.systems = ["aarch64-linux", "x86_64-linux"]

gcc.pkg-path = "gcc"
gcc-unwrapped.pkg-path = "gcc-unwrapped"
gcc-unwrapped.priority = 5

[vars]
CUDA_VERSION = "12.8"
CUDA_HOME = "$FLOX_ENV"

CUDA Math Libraries

# team/cuda-math
[include]
environments = [{ remote = "team/cuda-base" }]

[install]
libcublas.pkg-path = "flox-cuda/cudaPackages_12_8.libcublas"
libcublas.priority = 2
libcublas.systems = ["aarch64-linux", "x86_64-linux"]

libcufft.pkg-path = "flox-cuda/cudaPackages_12_8.libcufft"
libcufft.priority = 2
libcufft.systems = ["aarch64-linux", "x86_64-linux"]

libcurand.pkg-path = "flox-cuda/cudaPackages_12_8.libcurand"
libcurand.priority = 2
libcurand.systems = ["aarch64-linux", "x86_64-linux"]

CUDA Debugging Tools

# team/cuda-debug
[install]
cuda-gdb.pkg-path = "flox-cuda/cudaPackages_12_8.cuda-gdb"
cuda-gdb.systems = ["aarch64-linux", "x86_64-linux"]

nsight-systems.pkg-path = "flox-cuda/cudaPackages_12_8.nsight-systems"
nsight-systems.systems = ["aarch64-linux", "x86_64-linux"]

[vars]
CUDA_LAUNCH_BLOCKING = "1"  # Synchronous kernel launches for debugging

Layer for Development

# Base CUDA environment
flox activate -r team/cuda-base

# Add debugging tools when needed
flox activate -r team/cuda-base -- flox activate -r team/cuda-debug

Testing CUDA Installation

Verify CUDA Compiler

nvcc --version

Check GPU Availability

nvidia-smi

Compile Test Program

cat > hello_cuda.cu << 'EOF'
#include <stdio.h>

__global__ void hello() {
    printf("Hello from GPU!\n");
}

int main() {
    hello<<<1,1>>>();
    cudaDeviceSynchronize();
    return 0;
}
EOF

nvcc hello_cuda.cu -o hello_cuda
./hello_cuda

Test PyTorch CUDA

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

Best Practices

Always Use Priority Values

CUDA packages have predictable conflicts - assign explicit priorities

Version Consistency

Use specific versions (e.g., _12_8) for reproducibility. Don't mix CUDA versions.

Modular Design

Split base CUDA, math libs, and debugging into separate environments for flexibility

Test Compilation

Verify nvcc hello.cu -o hello works after setup

Platform Constraints

Always include systems = ["aarch64-linux", "x86_64-linux"]

Memory Management

Set appropriate CUDA memory allocator configs:

[vars]
PYTORCH_CUDA_ALLOC_CONF = "max_split_size_mb:128"
CUDA_LAUNCH_BLOCKING = "0"  # Async by default

Common CUDA Gotchas

CUDA Toolkit ≠ Complete Toolkit

The cudatoolkit package doesn't include all libraries. Add what you need:

libcublas for linear algebra
libcufft for FFT
cudnn for deep learning

License Conflicts

Every CUDA package may need explicit priority due to LICENSE file conflicts

No macOS Support

CUDA is Linux-only. Use Metal-accelerated packages on Darwin when available

Version Mixing

Don't mix CUDA versions. Use consistent _X_Y suffixes across all CUDA packages

Python Virtual Environments

CUDA Python packages (PyTorch, TensorFlow) should be installed in venv with correct CUDA version

Driver Requirements

Ensure NVIDIA driver supports your CUDA version. Check with nvidia-smi

Troubleshooting

CUDA Not Found

# Check CUDA_HOME
echo $CUDA_HOME

# Check nvcc
which nvcc
nvcc --version

# Check library paths
echo $LD_LIBRARY_PATH

PyTorch Not Using GPU

import torch
print(torch.cuda.is_available())  # Should be True
print(torch.version.cuda)         # Should match your CUDA version

# If False, reinstall with correct CUDA version
# uv pip install torch --index-url https://download.pytorch.org/whl/cu129

Compilation Errors

# Check gcc/g++ version
gcc --version
g++ --version

# Ensure gcc-unwrapped is installed
flox list | grep gcc-unwrapped

# Check include paths
echo $CPATH
echo $LIBRARY_PATH

Runtime Errors

# Check GPU visibility
echo $CUDA_VISIBLE_DEVICES

# Check for GPU
nvidia-smi

# Run with debug output
CUDA_LAUNCH_BLOCKING=1 python my_script.py

Related Skills

flox-environments - Setting up development environments
flox-sharing - Composing CUDA base with project environments
flox-containers - Containerizing CUDA environments for deployment
flox-services - Running CUDA workloads as services

flox-cuda

More from this repository

More from this repository

Flox CUDA Development Guide

Prerequisites & Authentication

Core Commands

Package Discovery

Essential CUDA Packages

Critical: Conflict Resolution

CUDA Version Selection

CUDA 12.x (Current)

CUDA 11.x (Legacy Support)

Cross-Platform GPU Development

GPU Detection Pattern

Complete CUDA Environment Examples

Basic CUDA Development

Deep Learning with PyTorch

TensorFlow with CUDA

Multi-GPU Development

Modular CUDA Environments

Base CUDA Environment

CUDA Math Libraries

CUDA Debugging Tools

Layer for Development

Testing CUDA Installation

Verify CUDA Compiler

Check GPU Availability

Compile Test Program

Test PyTorch CUDA

Best Practices

Always Use Priority Values

Version Consistency

Modular Design

Test Compilation

Platform Constraints

Memory Management

Common CUDA Gotchas

CUDA Toolkit ≠ Complete Toolkit

License Conflicts

No macOS Support

Version Mixing

Python Virtual Environments

Driver Requirements

Troubleshooting

CUDA Not Found

PyTorch Not Using GPU

Compilation Errors

Runtime Errors

Related Skills

Flox CUDA Development Guide

Prerequisites & Authentication

Core Commands

Package Discovery

Essential CUDA Packages

Critical: Conflict Resolution

CUDA Version Selection

CUDA 12.x (Current)

CUDA 11.x (Legacy Support)

Cross-Platform GPU Development

GPU Detection Pattern

Complete CUDA Environment Examples

Basic CUDA Development

Deep Learning with PyTorch

TensorFlow with CUDA

Multi-GPU Development

Modular CUDA Environments

Base CUDA Environment

CUDA Math Libraries

CUDA Debugging Tools

Layer for Development

Testing CUDA Installation

Verify CUDA Compiler

Check GPU Availability

Compile Test Program

Test PyTorch CUDA

Best Practices

Always Use Priority Values

Version Consistency

Modular Design

Test Compilation