GPUStack
Docker Migration
NFS Shared Cache
GPU Cluster Operations Notes
Migrating GPUStack from v0.7.1 to v2.1.1
Upgrading without breaking the shared NFS model cache configuration #
A record of migrating a GPUStack environment operated on the legacy installation script basis to Docker-based v2 series.
While maintaining the configuration where the management node doubles as NFS, we gradually migrated 2 GPU worker nodes,
and organized model cache sharing, worker reconnection, and backend/CUDA consistency verification at deployment.
Ubuntu / Docker / NVIDIA GPU / NFS
Target: GPUStack v0.7.1 → v2.1.1
Assumptions for this article #
In the production environment, we have a configuration with 1 management node and 2 GPU workers, and model cache is
consolidated in /models and shared via NFS. In this article, sensitive information such as actual hostnames,
private IPs, and worker tokens have all been omitted, abstracted to a level suitable for public disclosure.
The original configuration was simple: “the management node consolidates GPUStack Server and NFS, and GPU worker nodes
reference a common /models“. The advantage was being able to download model files once, reducing disk usage and
network transfer when adding workers. We decided to keep this philosophy even when migrating to v2 series,
and simply reorganized the operations to match the current approach.
Approach before and after upgrade #
Before #
Operating GPUStack deployed with the legacy installation script as a systemd service.
Transition #
Migrate the management node first to Docker-based v2.1.1, then gradually switch GPU workers.
After #
Management node is dedicated to Server + NFS, inference consolidated on 2 GPU workers. /models continues to be shared.
Management Node
GPUStack Server + NFS #
Start Docker-based GPUStack Server and hold the model cache entity.
- Operate Server only
- Disable embedded worker
- Provide
/modelsas NFS export
GPU Worker A / B
GPUStack Worker #
Mount NFS-shared /models and start inference containers.
- Docker + NVIDIA runtime
- Unify
--cache-dir /models - Specify worker name and IP explicitly
Key points to understand before upgrading to v2 series #
1. Migration assumes Docker #
Environments deployed with the legacy installation script or pip-based approach will need to
adopt Docker-based foundation as the official migration path to v2 series.
2. Management DB also changes #
From SQLite which was the default in v0.7 series and earlier, v2.0 and later migrates to embedded PostgreSQL.
In other words, this is not simply swapping binaries, but a migration-conscious operation.
3. Clarify embedded worker #
In the legacy configuration, the management node appeared as a worker as well. With this migration,
we decided to switch to a policy where the management node is Server-only,
and only GPU workers remain in the Workers list.
What was great to do first #
Before touching Server, we took a backup of the legacy data directory.
In v2 series, new components read the legacy data dir, so permission adjustment and backup are very important.
Upgrade procedure implemented #
Identify legacy data dir and back up first #
We confirmed the legacy systemd definition to clarify which data dirs the Server and Worker use respectively.
Then we performed service stop → tar backup → permission adjustment in order.
sudo systemctl cat gpustack | sed -n '/ExecStart/p'
sudo systemctl stop gpustack
sudo systemctl disable gpustack
sudo tar -C / -cpf /root/backup/gpustack-server.tar var/lib/gpustackMigrate management node first to v2.1.1 #
Since the management node was assumed to not use GPU, we omitted --runtime nvidia from Server startup.
If left attached, unknown or invalid runtime name: nvidia will fail startup
on servers where NVIDIA runtime is not registered in Docker.
sudo docker run -d --name gpustack \
--restart=unless-stopped \
--privileged \
--network=host \
--env GPUSTACK_DATA_MIGRATION=true \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /var/lib/gpustack:/var/lib/gpustack \
--volume /models:/models \
gpustack/gpustack:v2.1.1 \
--disable-worker \
--cache-dir /modelsWe explicitly specified --disable-worker because we wanted to cleanly separate roles
from the old embedded worker on the management node. The theme of this migration was “management node focuses on management”.
Gradually switch GPU workers to Docker-based worker #
Workers inherited the existing data dir while being reconnected as Docker-based workers.
To match shared cache paths, we bind /models:/models to the container as well,
and append --cache-dir /models to the startup arguments.
sudo docker run -d --name gpustack-worker \
--restart=unless-stopped \
--privileged \
--network=host \
--volume /var/lib/gpustack-data:/var/lib/gpustack \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /models:/models \
--runtime nvidia \
gpustack/gpustack:v2.1.1 \
--server-url http://<SERVER_IP> \
--token <JOIN_TOKEN> \
--worker-name <WORKER_NAME> \
--worker-ip <WORKER_IP> \
--cache-dir /modelsThe key point is not treating the GUI “Add Worker” as simply a “registration task”.
In reality, the GUI is just an entry point to obtain tokens and execution commands;
Ready status comes only after executing the actual command on the worker node.
Verify NVIDIA Container Toolkit #
GPU workers require --runtime nvidia. If Docker doesn’t know the runtime,
NVIDIA Container Toolkit configuration may not be reflected. Follow the official procedure
to configure Docker runtime with nvidia-ctk and restart Docker.
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart dockerClean up Workers list to “only nodes that actually perform inference” #
In the legacy environment, the management node remained as a worker, appearing as Not Ready in the list, which was a source of confusion.
Once the 2 GPU workers became Ready, we targeted stale workers from the management node for cleanup.
This makes the role intuitive even for operations teams.
Pain points we hit this time #
Management node: unknown or invalid runtime name: nvidia #
We were starting with --runtime nvidia even though the Server node had no GPU,
or had not installed NVIDIA runtime. If dedicating the management node to Server-only,
it was better to simply remove this.
Worker is Ready but model is Pending #
Worker status and model deployment availability are separate issues. Especially immediately after v2 migration,
Pending can occur due to carried-over backend version or CUDA generation mismatch.
“When worker becomes Ready, we’re done” is not the end; actual migration includes deploying 1 model to verify.
Lessons from deployment #
This time, OS updates on worker side were also included, resulting in CUDA-related packages aligning to 12.9 generation.
In that state, we needed to review the built-in backend version and deployment settings on the GPUStack side.
After migration, it is safe to check Inference Backends and Deployment’s backend/version specification
as a set.
Checklist #
- Understand legacy environment data dirs and systemd definitions first
- Back up Server before starting migration
- Raise management node first, GPU workers last
- If using NFS shared cache, unify
/modelsacross host / container / worker - If Server-only node, disable embedded worker and simplify Workers list
- After worker connection, always deploy 1 model to verify backend/CUDA consistency
Summary #
What made the biggest difference in this upgrade was not upgrading the version itself,
but “clarifying node roles and making what appears on screen more understandable”.
With clear separation—management node is Server + NFS, inference only on GPU workers—
subsequent operational decisions became much easier.
If you are similarly using v0.7 series or earlier and want to upgrade to v2 series while maintaining a shared cache configuration,
we strongly recommend the 3 points: Server first, Workers follow, deployment full verification.
Don’t stop at “it started”; viewing actual model startup gives quality migration much better stability.
Official references used #
- GPUStack Migration Guide
- GPUStack CLI Reference: gpustack start
- GPUStack Release Notes
- NVIDIA Container Toolkit Install Guide
This article is a practical memo based on specific internal environments. When applying to actual operations,
always re-verify combinations of GPU in use, drivers, CUDA generation, and backend version against the latest official information.