| name | run-llms |
| description | Comprehensive guide for setting up and running local LLMs using Harbor. Use when user wants to run LLMs locally, set up or troubleshoot Ollama, Open WebUI, llama.cpp, vLLM, SearXNG, Open Terminal, or similar local AI services. Covers full setup from Docker prerequisites through running models, per-service configuration, VRAM optimization, GPU troubleshooting, web search integration, code execution, profiles, tunnels, and advanced features. Includes decision trees for autonomous agent workflows and step-by-step troubleshooting playbooks. |
Run LLMs Locally with Harbor
Harbor is a containerized LLM toolkit. This skill enables autonomous setup, configuration, troubleshooting, and operation of local LLM infrastructure.
Agent Decision Trees
Use these decision trees to determine what action to take for common user requests.
User wants to run an LLM
1. Is Harbor installed?
→ NO: Install Harbor (see Initial Setup)
→ YES: Continue
2. Is Docker running?
→ Run: docker info
→ FAIL: Start Docker daemon, check installation
→ OK: Continue
3. Does the user have a specific model in mind?
→ YES: Determine format (Ollama tag, GGUF, HF safetensors)
→ Ollama tag (e.g. qwen3:4b): harbor pull <model> && harbor up
→ GGUF from HuggingFace: harbor pull <org/repo> && harbor up llamacpp
→ Safetensors/HF model: harbor vllm model <user/repo> && harbor up vllm
→ NO: Recommend a small default: harbor pull qwen3:4b && harbor up
4. Verify: harbor ps → confirm services healthy
5. Open UI: harbor open
User has GPU issues
1. Check NVIDIA drivers: nvidia-smi
→ FAIL: User needs to install NVIDIA drivers
→ OK: Continue
2. Check Container Toolkit: docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
→ FAIL: Install NVIDIA Container Toolkit, restart Docker
→ OK: Continue
3. Check service logs: harbor logs <service>
→ Look for: "CUDA error", "out of memory", "no GPU"
→ OOM: See "Model won't load / OOM" troubleshooting
→ No GPU detected: Check /etc/docker/daemon.json for nvidia runtime
4. Restart: sudo systemctl restart docker && harbor down && harbor up
User wants web search in chat
1. Start SearXNG: harbor up searxng
→ SearXNG auto-wires to Open WebUI when both run together
2. If WebUI was already running: harbor restart webui
3. Verify: harbor ps | grep searxng
4. Open UI: harbor open → Web search is now available in chat
User wants to change the model
1. Which backend is running?
→ harbor ps → identify running backend
2. Apply correct command:
→ Ollama: harbor pull <model> → select in UI dropdown
→ llama.cpp (single model): harbor llamacpp model <url> → harbor restart llamacpp
→ llama.cpp (router mode): harbor pull <org/repo> → model auto-discovered
→ vLLM: harbor vllm model <user/repo> → harbor restart vllm
3. Verify: harbor logs <backend> → wait for ready message
User wants code execution in chat
1. Start Open Terminal: harbor up openterminal
→ Auto-wires to Open WebUI with shared bearer token
2. Verify: harbor ps | grep openterminal
3. Open UI: harbor open → Code blocks now have "Run" button
User wants to expose Harbor to network/internet
1. LAN access:
→ harbor url --lan webui → get LAN URL
→ harbor qr webui → QR code for mobile
2. Internet tunnel:
→ harbor tunnel webui → creates cloudflared tunnel
→ Share the generated URL
→ harbor tunnel down → when done
Initial Setup Workflow
- Check prerequisites
- Install Harbor CLI
- Start default services (Ollama + Open WebUI)
- Pull a model
- Verify the setup
Step 1: Check Prerequisites
docker --version
docker compose version
git --version
If Docker missing:
- Linux: Install via official Docker repo
- macOS: Install Docker Desktop
- Windows: Install Docker Desktop + WSL2 (run all commands in WSL2)
If Docker Compose too old:
sudo apt-get update && sudo apt-get install docker-compose-plugin
Linux permission fix (if docker commands fail):
sudo usermod -aG docker $USER
Step 2: Install Harbor
harbor --version
curl https://av.codes/get-harbor.sh | bash
source ~/.bashrc
Verify:
harbor doctor
Step 3: Start Harbor
harbor up
First run downloads images (may take several minutes). Wait for healthy output:
✔ Container harbor.ollama Healthy
✔ Container harbor.webui Healthy
Open UI: harbor open
First launch requires creating a local admin account in the browser.
Step 4: Pull a Model
harbor pull qwen3:4b
harbor ollama list
Step 5: Verify
harbor open — opens UI in browser
- Select the pulled model from dropdown
- Send a test message
Core Commands
| Command | Purpose |
|---|
harbor up [services...] | Start services (defaults + specified) |
harbor up --no-defaults <svc> | Start only specified services |
harbor down | Stop all services |
harbor ps | Show running services |
harbor logs [service] | Tail logs |
harbor logs -n 1000 <service> | Extended logs |
harbor open [service] | Open in browser |
harbor url [service] | Print service URL |
harbor url --lan <service> | Print LAN URL |
harbor url -i <service> | Print Docker-internal URL |
harbor pull <model|service> | Download model or pull Docker image |
harbor restart [service] | Restart service(s) |
harbor build <service> | Build service image |
harbor shell <service> | Interactive shell in container |
harbor exec <service> <cmd> | Run command in container |
harbor run <service> <cmd> | One-off command in fresh container |
harbor doctor | System diagnostics |
harbor fixfs | Fix file system ACLs for service volumes |
harbor top | GPU monitoring via nvtop |
harbor size | Show disk usage |
harbor find <pattern> | Find files in caches |
harbor eject | Export standalone docker-compose config |
harbor qr <service> | Generate QR code for service URL |
Model Management
Pull Sources
harbor pull qwen3:4b
harbor pull llama3.2:3b
harbor pull gemma3:4b
harbor pull hf.co/bartowski/gemma-2-2b-it-GGUF:Q4_K_M
harbor pull microsoft/Phi-3.5-mini-instruct-gguf
harbor pull microsoft/Phi-3.5-mini-instruct-gguf:Q4_K_M
Pull routing logic: specs with / are tried against HuggingFace first (HEAD request, 5s timeout), then fall through to Ollama if unreachable.
Cross-Source Model Management
harbor models ls
harbor models ls --json
harbor models pull unsloth/Qwen3-4B-Instruct-GGUF
harbor models rm qwen3.5:9b
harbor models rm unsloth/Qwen3-4B-Instruct-GGUF
HuggingFace Tools
harbor hf scan-cache
harbor hf token <token>
harbor hf download user/repo
harbor hf find gguf gemma
harbor hf path user/repo
harbor hf cache
harbor hf cache /path/to/cache
Service: Ollama
Handle: ollama | Port: 33821 | Default service (starts with harbor up)
Ergonomic wrapper around llama.cpp with model management, auto-pull, and OpenAI-compatible API.
Ollama CLI
harbor ollama list
harbor ollama ls
harbor ollama pull <model>
harbor ollama rm <model>
harbor ollama run <model>
harbor ollama cp <src> <dst>
harbor ollama create -f <file> <name>
harbor ollama show <model> --modelfile
harbor ollama ps
harbor ollama ctx
harbor ollama ctx <n>
harbor ollama --help
harbor ollama version
harbor ollama serve --help
Ollama Model Sources
From Ollama registry:
harbor pull phi4
harbor pull qwen3:4b
harbor pull llama3.2:3b
From HuggingFace (hf.co prefix):
harbor ollama pull hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0
harbor ollama cp hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0 r1-8b
Ollama Custom Modelfiles
touch mymodel.Modelfile
harbor ollama create -f mymodel.Modelfile mymodel
harbor ollama show modelname:latest --modelfile > mymodel.Modelfile
harbor ollama run mymodel
Modelfiles in $(harbor home)/services/ollama/modelfiles/ can be referenced as:
harbor ollama create -f /modelfiles/mymodel.Modelfile mymodel
Ollama Context Length
harbor ollama ctx
harbor ollama ctx 8192
harbor env ollama OLLAMA_CONTEXT_LENGTH 8192
Note: harbor ollama ctx syncs to env, but not vice versa.
Ollama Configuration
harbor config ls | grep OLLAMA
harbor config set ollama.version 0.3.7-rc5-rocm
harbor env ollama OLLAMA_DEBUG 1
harbor env ollama OLLAMA_NUM_PARALLEL 4
| Config Key | Default | Purpose |
|---|
OLLAMA_CACHE | ~/.ollama | Cache location (absolute or relative to harbor home) |
OLLAMA_HOST_PORT | 33821 | Host port |
OLLAMA_VERSION | latest | Docker image tag |
OLLAMA_INTERNAL_URL | http://ollama:11434 | URL given to connected services (change to use external Ollama) |
OLLAMA_DEFAULT_MODELS | mxbai-embed-large:latest | Comma-separated models to pull on startup |
OLLAMA_CONTEXT_LENGTH | 4096 | Global default context length |
Switching to External Ollama
harbor config set ollama.internal_url http://172.17.0.1:11434
Ollama Troubleshooting
Model loading failures:
harbor logs ollama
harbor pull model:q4_k_m
Slow first inference: Expected — model is loading into memory. Subsequent requests are fast.
Context length issues:
harbor ollama ctx 4096
harbor ollama ctx 131072
Model not found after pull:
harbor ollama list
harbor restart ollama
Service: llama.cpp
Handle: llamacpp | Port: 33831
LLM inference in C/C++. Bypasses Ollama's release cycle for access to latest models and features.
llama.cpp Modes
Single-model mode — set a specific model, server loads it on start:
harbor llamacpp model https://huggingface.co/user/repo/blob/main/file.gguf
harbor up llamacpp
Router mode (default when no model specifier set) — auto-discovers all cached GGUF models, loads on demand:
harbor config set llamacpp.model.specifier ""
harbor up llamacpp
Router mode discovers models from:
- HuggingFace cache (mounted at
/root/.cache/huggingface)
- Models directory: place GGUFs in
./llamacpp/data/models
- Preset file: INI at
./llamacpp/data/models.ini
llama.cpp Pull Workflow
harbor pull bartowski/Qwen2.5-Coder-7B-Instruct-GGUF
harbor pull unsloth/gemma-4-31B-it-GGUF:Q4_K_M
harbor up llamacpp
llama.cpp CLI
harbor llamacpp model
harbor llamacpp model <hf_url>
harbor llamacpp gguf
harbor llamacpp gguf /path/to.gguf
harbor llamacpp args
harbor llamacpp args '<args>'
harbor llamacpp models
harbor llamacpp build on
harbor llamacpp build off
harbor llamacpp build ref
harbor llamacpp build ref <ref>
llama.cpp Configuration
harbor llamacpp args '-c 4096 -n 512'
harbor run llamacpp --server --help
harbor exec llamacpp ls
| Config Key | Default | Purpose |
|---|
LLAMACPP_CACHE | ~/.cache/llama.cpp | Legacy cache path (models now stored in HF cache) |
LLAMACPP_HOST_PORT | 33831 | Host port |
LLAMACPP_IMAGE_CPU | ghcr.io/ggml-org/llama.cpp:server | CPU image |
LLAMACPP_IMAGE_NVIDIA | ghcr.io/ggml-org/llama.cpp:server-cuda | NVIDIA GPU image |
LLAMACPP_IMAGE_ROCM | ghcr.io/ggml-org/llama.cpp:server-rocm | AMD ROCm image |
To override a specific capability image:
harbor config set llamacpp.image.nvidia ghcr.io/your-org/llama.cpp:server-cuda
llama.cpp Router Mode Details
harbor llamacpp args "--models-dir /app/data/models"
harbor llamacpp args "--models-preset /app/data/models.ini"
harbor llamacpp args "--models-dir /app/data/models --models-max 4 --no-models-autoload"
Router API:
curl http://localhost:33831/models
curl -X POST http://localhost:33831/models/load \
-H "Content-Type: application/json" \
-d '{"model":"ggml-org/gemma-3-4b-it-GGUF:Q4_K_M"}'
llama.cpp Build from Source
When pre-built images lag behind releases:
harbor llamacpp build on
harbor llamacpp build ref b5678
harbor build llamacpp
harbor up llamacpp
harbor llamacpp build off
harbor pull llamacpp
harbor up llamacpp
llama.cpp with AMD Strix Halo (gfx1151)
AMD Strix Halo is a unified-memory APU requiring special images.
Image selection:
harbor config set llamacpp.image.rocm kyuz0/amd-strix-halo-toolboxes:rocm-7.2
Available tags: rocm-7.2, rocm-6.4.4, vulkan-radv (most stable for large models), vulkan-amdvlk.
Mandatory extra args:
harbor llamacpp args "llama-server -fa 1 --no-mmap"
Recommended full configuration:
harbor llamacpp args "llama-server -fa 1 --no-mmap -ngl 999 -c 65536 --cache-type-k q8_0 --cache-type-v q8_0 --batch-size 4096 --ubatch-size 512"
For MoE models, use --cache-type-k q4_0 --cache-type-v q4_0 --ubatch-size 256.
llama.cpp Troubleshooting
Router mode shows no models:
harbor pull bartowski/Qwen2.5-Coder-7B-Instruct-GGUF
harbor restart llamacpp
Model fails to load:
harbor logs llamacpp
harbor llamacpp args '-c 2048 --n-gpu-layers 20'
harbor restart llamacpp
Service: vLLM
Handle: vllm | Port: 33911
High-throughput, memory-efficient inference engine. Best for production workloads with safetensors/HF models.
vLLM CLI
harbor vllm model
harbor vllm model <user/repo>
harbor vllm args
harbor vllm args '<args>'
harbor vllm attention
harbor vllm attention ROCM_FLASH
harbor vllm version
harbor vllm version <tag>
harbor run vllm --help
vLLM Setup
harbor vllm model Qwen/Qwen3.5-4B
harbor hf token <your-token>
harbor vllm model meta-llama/Llama-3.2-8B-Instruct
harbor up vllm
harbor logs vllm
vLLM VRAM Optimization
These strategies are critical when models don't fit in VRAM. Apply in order of preference:
1. Reduce context length (most effective):
harbor vllm args '--max-model-len 4096'
2. Quantize on load (significant VRAM savings):
harbor vllm args '--load-format bitsandbytes --quantization bitsandbytes'
3. Offload layers to CPU:
harbor vllm args '--cpu-offload-gb 4'
4. Disable CUDA graphs (saves VRAM spike at load, reduces speed):
harbor vllm args '--enforce-eager'
5. Tune GPU memory cap:
harbor vllm args '--gpu-memory-utilization 0.85'
6. CPU-only mode (very slow, last resort):
harbor vllm args '--device cpu'
Combined example for tight VRAM:
harbor vllm args '--max-model-len 4096 --load-format bitsandbytes --quantization bitsandbytes --enforce-eager'
vLLM Configuration
harbor vllm version v0.9.1
harbor vllm version latest
harbor config set vllm.image custom/vllm
docker pull $(harbor config get vllm.image):$(harbor config get vllm.version)
harbor config set vllm.host.port 4090
vLLM Troubleshooting
OOM on startup:
harbor logs vllm
harbor vllm args '--max-model-len 2048 --enforce-eager'
harbor restart vllm
Model not loading (gated):
harbor hf token <token>
harbor restart vllm
Slow startup: Expected — vLLM compiles CUDA graphs on first run. Use --enforce-eager to skip (at inference speed cost).
Service: Open WebUI
Handle: webui | Port: 33801 | Default service (starts with harbor up)
Full-featured chat UI with model management, prompt library, persistent history, document RAG, web RAG, tools, and functions.
Open WebUI Starting
harbor up
harbor up webui
harbor pull webui
First launch requires creating an admin account in the browser.
Open WebUI Auto-Wired Integrations
When started together, these services auto-connect to Open WebUI:
harbor up searxng
harbor up comfyui
harbor up speaches
harbor up pipelines
harbor up metamcp mcpo
harbor up cognee
harbor up openterminal
Open WebUI CLI
harbor webui version
harbor webui version dev-cuda
harbor webui version main
harbor webui name
harbor webui name "My AI"
harbor webui secret
harbor webui secret sk-203948
harbor webui log
harbor webui log DEBUG
Open WebUI Config Override
Harbor assembles config from integration pieces. To override without Harbor resetting:
open $(harbor home)/services/webui/configs/config.override.json
Open WebUI Configuration
| Config Key | Default | Purpose |
|---|
WEBUI_HOST_PORT | 33801 | Host port |
WEBUI_SECRET | h@rb0r | JWT token secret |
WEBUI_NAME | Harbor | UI display name |
WEBUI_LOG_LEVEL | INFO | Log level |
WEBUI_VERSION | main | Docker image tag |
HARBOR_WEBUI_IMAGE | ghcr.io/open-webui/open-webui:main | Docker image |
harbor env webui ENABLE_REALTIME_CHAT_SAVE false
Open WebUI Troubleshooting
Can't create admin account: Clear browser cache, try incognito. Check harbor logs webui.
Backend models not showing: Check Settings → Connections in UI. Verify backend running with harbor ps.
Config changes lost on restart: Use config.override.json instead of the UI settings panel for persistent overrides.
Service: SearXNG
Handle: searxng | Port: 33811
Metasearch engine that aggregates results from multiple search services. Provides web search / Web RAG to Open WebUI and other frontends.
SearXNG Setup
harbor up searxng
harbor restart webui
harbor ps | grep searxng
harbor url searxng
SearXNG Auto-Integrations
Auto-connects to: webui, ldr, chatui, chatnio, perplexica, anythingllm
SearXNG Configuration
harbor config set searxng.internal_url http://external:8080
Config files are in $(harbor home)/searxng/ (settings.yml, limiter.toml). See SearXNG configuration reference for engine customization.
| Config Key | Default | Purpose |
|---|
SEARXNG_HOST_PORT | 33811 | Host port |
SEARXNG_IMAGE | searxng/searxng | Docker image |
SEARXNG_VERSION | latest | Docker image tag |
SEARXNG_INTERNAL_URL | http://searxng:8080 | URL used by connected services |
SEARXNG_WORKSPACE | ./searxng | Config files location |
SearXNG Troubleshooting
Web search not appearing in WebUI:
harbor ps | grep searxng
harbor logs searxng
harbor restart webui
Service: Open Terminal
Handle: openterminal | Port: 34771
Remote shell and file-management API for AI agents. Provides terminal + notebook execution capability to Open WebUI.
Open Terminal Setup
harbor up openterminal
harbor up webui openterminal
harbor config get openterminal.api.key
When started with WebUI, Harbor auto-configures the bearer token and internal URL — no manual setup needed.
Open Terminal Filesystem
/home/user — Harbor-managed sandbox (default workspace, persists across restarts)
/workspace/host — optional mount of a real host folder (opt-in)
- Docker socket mount is opt-in
Open Terminal Package Installation
harbor config set openterminal.packages "ripgrep fd-find jq"
harbor config set openterminal.pip_packages "httpx polars pandas numpy"
harbor restart openterminal
Open Terminal Host Workspace Mount
harbor config set openterminal.host.workspace /absolute/path/to/project
harbor restart openterminal
Open Terminal Docker Access
harbor config set openterminal.docker.socket true
harbor restart openterminal
Open Terminal Configuration
| Config Key | Default | Purpose |
|---|
HARBOR_OPENTERMINAL_HOST_PORT | 34771 | Host port |
HARBOR_OPENTERMINAL_IMAGE | ghcr.io/open-webui/open-terminal | Docker image |
HARBOR_OPENTERMINAL_VERSION | v0.10.2 | Image version |
HARBOR_OPENTERMINAL_WORKSPACE | ./services/openterminal/data | Persistent sandbox path |
HARBOR_OPENTERMINAL_API_KEY | "" (auto-generated) | Bearer token |
HARBOR_OPENTERMINAL_PACKAGES | "" | System packages to install on start |
HARBOR_OPENTERMINAL_PIP_PACKAGES | "" | Python packages to install on start |
HARBOR_OPENTERMINAL_EXECUTE_TIMEOUT | 5 | Default wait timeout (seconds) |
HARBOR_OPENTERMINAL_ENABLE_TERMINAL | true | Enable interactive terminal sessions |
HARBOR_OPENTERMINAL_ENABLE_NOTEBOOKS | true | Enable notebook execution |
HARBOR_OPENTERMINAL_HOST_WORKSPACE | "" | Host folder mount (opt-in) |
HARBOR_OPENTERMINAL_DOCKER_SOCKET | false | Docker socket access (opt-in) |
Open Terminal Troubleshooting
harbor logs openterminal
curl http://localhost:34771/health
harbor down openterminal
rm -rf services/openterminal/data
harbor up openterminal
Configuration
Config Commands
harbor config ls
harbor config ls | grep OLLAMA
harbor config get webui.host.port
harbor config set webui.name "My AI"
Service-Specific Environment
harbor env <service>
harbor env <service> KEY
harbor env <service> KEY value
Defaults Management
harbor defaults
harbor defaults ls
harbor defaults add llamacpp
harbor defaults rm webui
Profiles (Save/Load Configurations)
harbor profile ls
harbor profile save mysetup
harbor profile use mysetup
harbor profile use <url>
harbor profile rm mysetup
Profiles are partial — only specify options you want to change. Changes after loading are not auto-saved; use harbor profile save <name> to persist.
Cache Locations
harbor config ls | grep CACHE
harbor size
harbor hf cache
harbor hf cache /path/to/cache
Network & Access
URLs
harbor url webui
harbor url --lan webui
harbor url -i webui
harbor qr webui
Tunnels (Internet Access)
harbor tunnel webui
harbor tunnel down
harbor tunnels add webui
harbor tunnels ls
harbor tunnels rm webui
Troubleshooting Playbooks
Services Won't Start
harbor ps
harbor logs <service>
docker ps -a | grep harbor
harbor down && harbor up
docker info
harbor fixfs
No GPU Detected / CUDA Errors
nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
cat /etc/docker/daemon.json
sudo systemctl restart docker
harbor down && harbor up
Model Won't Load / OOM
nvidia-smi
harbor pull model:q4_k_m
harbor vllm args '--max-model-len 4096'
harbor vllm args '--load-format bitsandbytes --quantization bitsandbytes'
harbor vllm args '--cpu-offload-gb 4'
harbor vllm args '--enforce-eager'
harbor llamacpp args '-c 2048 --n-gpu-layers 20'
harbor restart <backend>
Can't Access UI / Connection Refused
harbor ps
harbor url webui
ss -tlnp | grep 33801
harbor open
Model Not Showing in UI
harbor ollama list
harbor logs webui
harbor restart webui
Web Search Not Working
harbor ps | grep searxng
harbor logs searxng
harbor restart webui
harbor url searxng
Slow or Hanging Inference
harbor top
harbor logs <backend>
harbor ollama ps
harbor logs vllm
harbor logs <backend> | grep -i "cpu\|gpu\|cuda"
Common Workflows
Run a Small Model Quickly
harbor up
harbor pull qwen3:4b
harbor open
Run a Large Model with Limited VRAM (vLLM + Quantization)
harbor hf token <token>
harbor vllm model meta-llama/Llama-3.2-8B-Instruct
harbor vllm args '--max-model-len 4096 --load-format bitsandbytes --quantization bitsandbytes'
harbor up vllm
harbor logs vllm
Set Up Web-Augmented Chat
harbor up searxng
harbor open
Run llama.cpp with a Specific GGUF
harbor pull bartowski/Qwen2.5-Coder-7B-Instruct-GGUF
harbor up llamacpp
harbor open
Enable Code Execution in Chat
harbor up openterminal
harbor open
Switch Between Backends
harbor down vllm
harbor up llamacpp
harbor open
Save and Restore Configuration
harbor profile save my-dev-setup
harbor profile use my-dev-setup
harbor profile use https://example.com/path/to/profile.env
Expose to Local Network
harbor url --lan webui
harbor qr webui
Create a Temporary Internet Tunnel
harbor tunnel webui
harbor tunnel down
Full Setup with Web Search + Code Execution
harbor up searxng openterminal
harbor pull qwen3:4b
harbor open
Advanced Features
Aliases
harbor alias set myenv 'code $(harbor home)/.env'
harbor run myenv
harbor alias ls
harbor alias rm myenv
History
harbor history
harbor history ls
harbor history clear
AI Help
harbor how to filter logs?
File Search
harbor find .gguf
harbor find bartowski
Eject Config
harbor eject > docker-compose.yml
GPU Monitoring
harbor top
Available Backends Quick Reference
| Backend | Handle | Set Model | Best For |
|---|
| Ollama | ollama | harbor pull <model> | Ease of use, Ollama registry |
| llama.cpp | llamacpp | harbor llamacpp model <url> | GGUF models, bleeding edge |
| vLLM | vllm | harbor vllm model <user/repo> | Production inference, HF models |
| TGI | tgi | harbor tgi model <user/repo> | HuggingFace ecosystem |
| SGLang | sglang | harbor sglang model <user/repo> | Fast inference |
Service Handle Reference
| Service | Handle | Port | CLI | Purpose |
|---|
| Open WebUI | webui | 33801 | harbor webui | Chat UI (default) |
| Ollama | ollama | 33821 | harbor ollama | LLM backend (default) |
| llama.cpp | llamacpp | 33831 | harbor llamacpp | GGUF backend |
| vLLM | vllm | 33911 | harbor vllm | Production backend |
| SearXNG | searxng | 33811 | — | Web search |
| Open Terminal | openterminal | 34771 | — | Terminal + notebooks |
| ComfyUI | comfyui | — | harbor comfyui | Image generation |
| LiteLLM | litellm | — | harbor litellm | API proxy |
| TGI | tgi | — | harbor tgi | HuggingFace backend |
| SGLang | sglang | — | harbor sglang | Fast inference backend |
| Aider | aider | — | harbor aider | AI coding |
| n8n | n8n | — | — | Workflow automation |
| Jupyter | jupyter | — | harbor jupyter | Notebooks |
| Speaches | speaches | — | — | TTS / STT |
Environment Variable Quick Reference
All variables below can be queried/set with harbor config get/set <key> (using dot notation) or found in harbor config ls.
| Variable | Default | Service | Purpose |
|---|
OLLAMA_CACHE | ~/.ollama | ollama | Model cache location |
OLLAMA_HOST_PORT | 33821 | ollama | Host port |
OLLAMA_VERSION | latest | ollama | Docker tag |
OLLAMA_INTERNAL_URL | http://ollama:11434 | ollama | URL for connected services |
OLLAMA_DEFAULT_MODELS | mxbai-embed-large:latest | ollama | Models to pull on startup |
OLLAMA_CONTEXT_LENGTH | 4096 | ollama | Global default context |
LLAMACPP_CACHE | ~/.cache/llama.cpp | llamacpp | Legacy cache path |
LLAMACPP_HOST_PORT | 33831 | llamacpp | Host port |
LLAMACPP_IMAGE_CPU | ghcr.io/ggml-org/llama.cpp:server | llamacpp | CPU image |
LLAMACPP_IMAGE_NVIDIA | ghcr.io/ggml-org/llama.cpp:server-cuda | llamacpp | NVIDIA image |
LLAMACPP_IMAGE_ROCM | ghcr.io/ggml-org/llama.cpp:server-rocm | llamacpp | ROCm image |
WEBUI_HOST_PORT | 33801 | webui | Host port |
WEBUI_SECRET | h@rb0r | webui | JWT secret |
WEBUI_NAME | Harbor | webui | UI display name |
WEBUI_LOG_LEVEL | INFO | webui | Log level |
WEBUI_VERSION | main | webui | Docker tag |
HARBOR_WEBUI_IMAGE | ghcr.io/open-webui/open-webui:main | webui | Docker image |
SEARXNG_HOST_PORT | 33811 | searxng | Host port |
SEARXNG_IMAGE | searxng/searxng | searxng | Docker image |
SEARXNG_VERSION | latest | searxng | Docker tag |
SEARXNG_INTERNAL_URL | http://searxng:8080 | searxng | URL for connected services |
SEARXNG_WORKSPACE | ./searxng | searxng | Config files location |
HARBOR_OPENTERMINAL_HOST_PORT | 34771 | openterminal | Host port |
HARBOR_OPENTERMINAL_API_KEY | "" | openterminal | Bearer token (auto-generated) |
HARBOR_OPENTERMINAL_PACKAGES | "" | openterminal | System packages |
HARBOR_OPENTERMINAL_PIP_PACKAGES | "" | openterminal | Python packages |
HARBOR_OPENTERMINAL_EXECUTE_TIMEOUT | 5 | openterminal | Exec wait timeout (seconds) |
HARBOR_OPENTERMINAL_ENABLE_TERMINAL | true | openterminal | Interactive terminal |
HARBOR_OPENTERMINAL_ENABLE_NOTEBOOKS | true | openterminal | Notebook execution |
HARBOR_OPENTERMINAL_HOST_WORKSPACE | "" | openterminal | Host folder mount |
HARBOR_OPENTERMINAL_DOCKER_SOCKET | false | openterminal | Docker access |