with one click
halfstack
// Diagnose and fix Docker Compose halfstack issues — config mapping, service health, DB/Redis/etcd inspection, supergraph regeneration
// Diagnose and fix Docker Compose halfstack issues — config mapping, service health, DB/Redis/etcd inspection, supergraph regeneration
Guide for implementing Backend.AI repository patterns (create, get, search, update, delete, purge, batch operations, Querier, BatchQuerier, Creator, Updater, Purger, SearchScope, with_tables)
Local development tools — service management (./dev) and v2 CLI testing (./bai)
Guide for implementing REST and GraphQL APIs (create, get, search, update, delete, purge, scope prefix patterns, admin_ prefix, SearchScope, BaseFilterAdapter, @api_function, Click CLI)
Guide for implementing Backend.AI client SDK and CLI (Session, BaseFunction, @api_function, Click commands, Pydantic models, FieldSpec, output handlers, APIConfig, testing)
Complete submission workflow - quality checks, commit, PR creation, changelog generation, and final push. Use after finishing implementation work.
Guide the Backend.AI release process - run release.sh, generate changelog via towncrier, consolidate RC entries for final releases with subsection grouping.
| name | halfstack |
| description | Diagnose and fix Docker Compose halfstack issues — config mapping, service health, DB/Redis/etcd inspection, supergraph regeneration |
| invoke_method | user |
| auto_execute | false |
| enabled | true |
| tags | ["dev","docker","halfstack","troubleshooting"] |
Diagnose and directly fix issues with the Docker Compose halfstack development environment.
The runtime compose file is always docker-compose.halfstack.current.yml (project root).
It is generated from docker-compose.halfstack-main.yml (or halfstack-ha.yml for HA mode).
# Check all halfstack services
docker compose -f docker-compose.halfstack.current.yml ps
# Check a specific service's logs
docker compose -f docker-compose.halfstack.current.yml logs <service-name>
# Restart a specific service
docker compose -f docker-compose.halfstack.current.yml restart <service-name>
# Bring everything up
docker compose -f docker-compose.halfstack.current.yml up -d --wait
Optional services are gated behind Docker Compose profiles. By default (docker compose up -d)
only the required services start. To include optional ones, pass --profile <name>.
| Service | Image | Purpose | Profile |
|---|---|---|---|
backendai-half-db | postgres:16.3-alpine | Main database | (required) |
backendai-half-redis | redis:7.2-alpine | Cache / pub-sub | (required) |
backendai-half-etcd | etcd v3.5 | Config store | (required) |
backendai-half-apollo-router | Hive Gateway | GraphQL federation (manager has 2 GQL servers federated through this) | (required) |
backendai-half-prometheus | Prometheus | Metrics — manager queries it for deployment autoscale rule evaluation | (required) |
backendai-half-otel-collector | OTel Collector | Trace / metric export | telemetry, observability |
backendai-half-loki | Loki | Log aggregation | telemetry, observability |
backendai-half-grafana | Grafana | Dashboards | observability |
backendai-half-tempo | Tempo | Tracing | observability |
backendai-half-pyroscope | Pyroscope | Profiling | observability |
backendai-half-db-exporter | postgres-exporter | Postgres metrics | observability |
backendai-half-redis-exporter | redis_exporter | Redis metrics | observability |
backendai-half-minio | MinIO | Object storage | storage |
Profile semantics:
telemetry — service-level export only (otel-collector + loki). Visualisation
(Grafana) and supporting backends (Tempo, Pyroscope, exporters) are typically managed
centrally; this profile is a good default for dev installs that just want their logs
and traces forwarded.observability — superset of telemetry. Brings up the full local stack including
Grafana / Tempo / Pyroscope / exporters.storage — MinIO only.# Required only (default)
docker compose -f docker-compose.halfstack.current.yml up -d --wait
# + telemetry export (OTel collector + Loki forwarding logs/traces to a central monitor)
docker compose -f docker-compose.halfstack.current.yml --profile telemetry up -d --wait
# + full observability stack (Grafana / Tempo / Pyroscope / exporters in addition to telemetry)
docker compose -f docker-compose.halfstack.current.yml --profile observability up -d --wait
# + object storage (MinIO)
docker compose -f docker-compose.halfstack.current.yml --profile storage up -d --wait
# Everything
docker compose -f docker-compose.halfstack.current.yml --profile observability --profile storage up -d --wait
When stopping/removing, profile flags must also be passed for those containers to be torn down:
docker compose -f docker-compose.halfstack.current.yml --profile observability --profile storage down
scripts/delete-dev.sh already passes both profiles so a clean wipe works regardless of what was enabled.
The compose file declares a configs: section. Docker Compose reads these as files.
If a file is missing when docker compose up runs, Docker creates a directory at that path instead.
Once a directory exists where a file should be, even copying the correct file won't help — the directory must be removed first.
Step 1: Stop affected services (or all services):
docker compose -f docker-compose.halfstack.current.yml down
Step 2: Check and remove any directories that should be files:
# These MUST be regular files, not directories
for f in prometheus.yaml otel-collector-config.yaml loki-config.yaml \
tempo-config.yaml supergraph.graphql gateway.config.ts; do
[ -d "$f" ] && rm -rf "$f" && echo "Removed directory: $f"
done
# These MUST be directories
for d in grafana-dashboards grafana-provisioning; do
[ -f "$d" ] && rm -f "$d" && echo "Removed file: $d"
done
Step 3: Copy config files from source (same as scripts/install-dev.sh):
# Docker Compose configs (plain copy, no transformation)
cp configs/prometheus/prometheus.yaml ./prometheus.yaml
cp configs/otel/otel-collector-config.yaml ./otel-collector-config.yaml
cp configs/loki/loki-config.yaml ./loki-config.yaml
cp configs/tempo/tempo-config.yaml ./tempo-config.yaml
cp configs/graphql/gateway.config.ts ./gateway.config.ts
# Supergraph — generated, but can be copied from last known-good
cp docs/manager/graphql-reference/supergraph.graphql ./supergraph.graphql
# Grafana (recursive directory copy)
cp -r configs/grafana/dashboards ./grafana-dashboards
cp -r configs/grafana/provisioning ./grafana-provisioning
Step 4: Ensure volume directories exist:
mkdir -p volumes/postgres-data
mkdir -p volumes/etcd-data
mkdir -p volumes/redis-data
Step 5: Bring services back up:
docker compose -f docker-compose.halfstack.current.yml up -d --wait
| File in project root | Source path | Used by service |
|---|---|---|
prometheus.yaml | configs/prometheus/prometheus.yaml | backendai-half-prometheus |
otel-collector-config.yaml | configs/otel/otel-collector-config.yaml | backendai-half-otel-collector |
loki-config.yaml | configs/loki/loki-config.yaml | backendai-half-loki |
tempo-config.yaml | configs/tempo/tempo-config.yaml | backendai-half-tempo |
supergraph.graphql | docs/manager/graphql-reference/supergraph.graphql | backendai-half-apollo-router |
gateway.config.ts | configs/graphql/gateway.config.ts | backendai-half-apollo-router |
grafana-dashboards/ | configs/grafana/dashboards/ | backendai-half-grafana (volume mount) |
grafana-provisioning/ | configs/grafana/provisioning/ | backendai-half-grafana (volume mount) |
If docker-compose.halfstack.current.yml doesn't exist or is outdated:
cp docker-compose.halfstack-main.yml docker-compose.halfstack.current.yml
Then apply port substitutions. Read existing component toml files to determine current ports,
or use defaults from scripts/install-dev.sh:
| Setting | Default | sed pattern |
|---|---|---|
| POSTGRES_PORT | 8101 | s/8100:5432/${POSTGRES_PORT}:5432/ |
| REDIS_PORT | 8111 | s/8110:6379/${REDIS_PORT}:6379/ |
| ETCD_PORT | 8121 | s/8120:2379/${ETCD_PORT}:2379/ |
Note: The source template has 8100/8110/8120 but install-dev.sh defaults are 8101/8111/8121.
Always check existing config files first to determine the correct port.
The Hive Gateway serves the federated GraphQL schema. Regenerate when:
# 1. Generate new schemas and supergraph
./scripts/generate-graphql-schema.sh
# 2. Copy to project root (where compose expects it)
cp docs/manager/graphql-reference/supergraph.graphql ./supergraph.graphql
cp configs/graphql/gateway.config.ts ./gateway.config.ts
# 3. Restart the gateway
docker compose -f docker-compose.halfstack.current.yml restart backendai-half-apollo-router
If manager code is broken and generate-graphql-schema.sh fails,
copy the last known-good supergraph from git:
git show main:docs/manager/graphql-reference/supergraph.graphql > ./supergraph.graphql
PGCONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-db)
# Interactive psql
docker exec -it -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend
# Non-interactive query
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "SELECT version();"
# Check databases
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -tc "SELECT datname FROM pg_database;"
# Check alembic migration version (manager)
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "SELECT * FROM alembic_version;"
# Check alembic migration version (appproxy)
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d appproxy -c "SELECT * FROM alembic_version;"
# List tables
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d backend -c "\dt"
Common fix — appproxy DB missing:
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -c "CREATE DATABASE appproxy;"
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -c "CREATE ROLE appproxy WITH LOGIN PASSWORD 'develove';"
docker exec -e PGPASSWORD=develove $PGCONTAINER psql -U postgres -d appproxy -c "GRANT ALL ON SCHEMA public TO appproxy;"
./py -m alembic -c alembic-appproxy.ini upgrade head
REDIS_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-redis)
# Ping
docker exec $REDIS_CONTAINER redis-cli ping
# Info
docker exec $REDIS_CONTAINER redis-cli info server
docker exec $REDIS_CONTAINER redis-cli dbsize
# List keys (dev only)
docker exec $REDIS_CONTAINER redis-cli keys '*'
# Get/check specific key
docker exec $REDIS_CONTAINER redis-cli get <key>
docker exec $REDIS_CONTAINER redis-cli type <key>
# Flush all (destructive)
docker exec $REDIS_CONTAINER redis-cli flushall
ETCD_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-etcd)
# List all keys
docker exec $ETCD_CONTAINER etcdctl get --prefix "" --keys-only
# Get specific key
docker exec $ETCD_CONTAINER etcdctl get <key>
# Common key prefixes
docker exec $ETCD_CONTAINER etcdctl get --prefix "config/redis"
docker exec $ETCD_CONTAINER etcdctl get --prefix "volumes"
# Health check
docker exec $ETCD_CONTAINER etcdctl endpoint health
Or via Backend.AI CLI:
./backend.ai mgr etcd get --prefix ''
./backend.ai mgr etcd get config/redis/addr
./backend.ai mgr etcd put config/redis/addr "127.0.0.1:8111"
MINIO_CONTAINER=$(docker compose -f docker-compose.halfstack.current.yml ps -q backendai-half-minio)
# Health check
docker exec $MINIO_CONTAINER curl -sf http://localhost:9000/minio/health/live
# List buckets (set alias first)
docker exec $MINIO_CONTAINER mc alias set local http://localhost:9000 minioadmin minioadmin
docker exec $MINIO_CONTAINER mc ls local/
# Web console: http://127.0.0.1:9001 (minioadmin / minioadmin)
These config files live in the project root and are generated from configs/ templates.
| Config file | Source template | Key transformations |
|---|---|---|
manager.toml | configs/manager/halfstack.toml | etcd/PG/manager port, ipc-base-path |
alembic.ini | configs/manager/halfstack.alembic.ini | PG connection string |
account-manager.toml | configs/account-manager/halfstack.toml | etcd/PG/service port, ipc-base-path |
alembic-accountmgr.ini | configs/account-manager/halfstack.alembic.ini | PG connection string |
agent.toml | configs/agent/halfstack.toml | etcd/RPC/watcher port, ipc/var/mount paths, accelerator plugins |
storage-proxy.toml | configs/storage-proxy/halfstack.toml | etcd port, 2 secrets, volume config, MinIO creds |
app-proxy-coordinator.toml | configs/app-proxy-coordinator/halfstack.toml | PG/Redis port, service port, 3 generated secrets |
alembic-appproxy.ini | configs/app-proxy-coordinator/halfstack.alembic.ini | PG connection string |
app-proxy-worker.toml | configs/app-proxy-worker/halfstack.toml | Redis port, service port, same 3 secrets as coordinator |
webserver.conf | configs/webserver/halfstack.conf | Manager endpoint URL, Redis addr |
manager.toml, alembic.ini, account-manager.toml, app-proxy-coordinator.toml, alembic-appproxy.iniapp-proxy-coordinator.toml, app-proxy-worker.toml, webserver.confmanager.toml, agent.toml, storage-proxy.tomlapp-proxy-coordinator.toml and app-proxy-worker.toml must share identical api_secret, jwt_secret, permit_hash.secretdev.etcd.volumes.json) must match storage-proxy.toml's [api.manager] secretWhen regenerating, read existing secret values from the current config file and reuse them.
Only generate new secrets (python -c 'import secrets; print(secrets.token_urlsafe(32))') when the config file doesn't exist at all.
Reference scripts/install-dev.sh lines 1016–1142 for the exact sed substitution patterns per component.
When halfstack issues are reported, follow this order:
ls -la docker-compose.halfstack.current.ymldocker compose -f docker-compose.halfstack.current.yml psdocker compose ... logs <service>rm -rf <dir> → copy correct file → restart.toml/.conf exists and ports match compose