How-to
BESTNET-CLOUD
GPU-VPS
GPUStack
10G Private Network
NFS Shared Cache

Tech Blog / Infrastructure Build Guide

Bestnet Cloud × GPU-VPS connected via 10G private network,
GPUStack cluster implementation guide #

Separating Bestnet Cloud as the control plane and GPU-VPS as inference workers,
with a shared NFS cache centralizing model files.
From client portal preparation
through implementation on Ubuntu and operational verification, we have organized this as a how-to article.

Initial construction example actually performed by us on Ubuntu 24 / GPUStack 0.6.x series

10G
Private Link

1 time
Model Download

2+ units
GPU Worker

1 location
Shared Model Storage

Scope

Scope of this article #

This article covers the initial build pattern of launching AI-SRV on Bestnet Cloud to consolidate GPUStack Server + NFS,
and deploying multiple GPU workers on the GPU-VPS side.
Since role separation and shared cache design are the main themes, public IP addresses and internal naming conventions have been replaced with example values.

The core of the design is to place the control plane that does not require GPU and the model storage location on the general-purpose cloud side,
and treat GPU-VPS purely as inference execution nodes.
By doing this, even when adding workers, you can avoid duplicate storage investment,
and manage the actual model only in the shared area on the Bestnet Cloud side.

VRRP / Keepalived not used
Model registration via GUI only
GPUStack Server + NFS integrated into AI-SRV
All workers share NFS mount point /models

Note: The center of this article is the “Initial Build Edition”. Migration to GPUStack 2.x is covered in a separate article,
and here we have organized the build pattern for the 0.6.x series for clarity.

Knowledge Mapping

Reinterpret Bestnet Cloud knowledge in actual build sequence #

In Bestnet Cloud knowledge base, client portal operational tasks are broken down in detail.
Therefore, in GPU cluster construction, deciding “which knowledge to use at which timing” beforehand reduces rework.
In this article, we have organized the portal-side preparations in the following order.

1
Portal Protection
Determine approach to MFA and allowed IPs first

2
SSH Key Registration
Set up management keys before OS deployment

3
Template Creation
Prepare AI-SRV and GPU-VPS on Ubuntu 24

4
Private Network
Connect all nodes to 10G segment

5
Firewall / NAT
Narrow down public scope and communication direction

6
Snapshot
Keep a restore point at initial state

Portal

Client Portal Security Settings #

Securing the management interface itself first ensures that subsequent server creation and permission management are secure.

Access

SSH Key Registration and Management #

Rather than manually inserting keys after server creation, determine key operations before template deployment.

Provision

Server Creation from Template #

Standardizing AI-SRV and GPU workers on the same base OS prevents variation in subsequent command procedures.

Network

Private Network #

By predetermining internal routes for NFS and cluster communication, you can minimize public ports.

Protect

Firewall Configuration Changes #

Keep only AI-SRV UI, NFS, and GPU worker SSH, following the policy of not exposing unnecessary public surfaces.

Snapshot Management #

Creating restore points after OS initialization, after GPU driver installation, and before app installation makes testing easier.

Scale

Cloud Resource Upgrade #

When capacity becomes insufficient, expanding the shared model area on the AI-SRV side first provides better cost efficiency.

Virtual Router NAT #

If you want to push GPU workers to private-only, you can also choose a configuration where only outbound communication goes through NAT.

Architecture

Architecture approach #

The roles are simple. The AI-SRV on Bestnet Cloud side handles Web UI / API / NFS sharing,
while nodes on the GPU-VPS side focus on GPUStack Worker and inference execution only.
By using a 10G private network to close internal communication,
you can move model distribution, NFS, and cluster connection all to the private side.

BESTNET-CLOUD

control-01 #

AI-SRV integrating GPUStack Server + NFS

Provides Web UI / API
Export /srv/gpustack_models via NFS
Bind mount locally to /models
Consolidate model actual storage to one location

⇄

10G Private Network
NFS / UI / Worker Join / internal communication

⇄

GPU-VPS

gpu-worker-01 / 02 #

Worker group sharing /models, handling inference only

Launch GPUStack Worker
Standardize --cache-dir /models
Each worker does not hold large model storage
On horizontal expansion, add nodes with same pattern

Deploy model via GUI
→
First worker fetches model only once

Save to shared NFS
→
Other workers read from same /models

Concentrate storage on AI-SRV side #

The main battleground for capacity planning is the AI-SRV. Secure the shared area generously, accounting for the model itself, temporary downloads, and future model replacements.

Keep GPU-VPS lean #

Design the GPU worker side focusing on OS, GPU driver, GPUStack Worker, and minimal log area, without duplicating the model itself.

Concentrate communication on private side #

When UI, worker join, NFS, and model reuse are all completed on the private network, the public scope becomes small.

Design Worksheet

Items to decide beforehand #

Role	Location	Hostname (example)	Private IP (example)	Main responsibilities
Server + NFS	Bestnet Cloud	control-01	10.10.0.10	GPUStack UI / API, NFS, model storage
GPU Worker #1	GPU-VPS	gpu-worker-01	10.10.0.21	Inference execution, shared `/models` usage
GPU Worker #2	GPU-VPS	gpu-worker-02	10.10.0.22	Inference execution, shared `/models` usage

Example values for public distribution. Adjust to your own naming rules and private subnet in production.

Design checkpoints #

Which nodes to give public IPs
Whether to make workers private-only
Whether to use NAT router or give individual outbound communication
Initial capacity and expansion plan for shared model area

Storage estimation approach #

Total size of active models
Allowance for old and new models to temporarily coexist during switching
Temporary file space like .part
Logs, operational files, and future worker addition buffer

Phase 1

Solidify client portal preparations first #

Protect the client portal itself #

Before server creation, determine the management protection policy. Especially for team operations,
clarifying MFA and allowed IP handling first prevents later confusion about “who can access the management interface from where”.

Recommendation: Enable 2-factor authentication on the management portal and, if possible, restrict management source IPs.

Register SSH keys before creating servers #

Rather than inserting keys individually after AI-SRV and GPU worker creation, preparing keys beforehand in the client portal SSH key management
allows immediate management access after template deployment.

Organize team operational public keys
Don’t mix shared keys and individual keys in emergency situations
Don’t post key fingerprints in published articles

Create AI-SRV on Bestnet Cloud from template #

Use Ubuntu 24 series template for AI-SRV, hosting both GPUStack Server and NFS.
Since this node also serves as model storage, design with sufficient storage area separate from the OS disk.

Role is GPUStack Server + NFS
Connect to private network
Consider public side only if UI exposure is necessary
Allocate more capacity for model storage on AI-SRV than on workers

Create workers on GPU-VPS in necessary quantities #

Since workers are for inference, select GPU-VPS plans based on GPU type, VRAM, and future model size.
On the other hand, since the model itself is kept on the NFS side, you can easily reduce worker storage to the minimum needed for OS and execution.

Standardize on Ubuntu 24 series
Create with GPU driver / CUDA available
Connect to private network
Consider private-only configuration if no public IP is needed

Place all nodes on 10G private network #

NFS and GPUStack internal communication flow over the 10G private link connecting Bestnet Cloud and GPU-VPS.
This allows model retrieval and worker join to remain on the private side.

Accommodate AI-SRV and all workers in the same private segment
Assign fixed IPs to stabilize subsequent NFS export and systemd settings
If workers need outbound communication, choose NAT router method or temporary public egress

If you want to push workers to private-only, using Bestnet Cloud’s virtual router / NAT pattern for OS updates and external fetching only is easier to manage.

Determine initial firewall rules #

A policy of “allow only minimum necessary inbound to AI-SRV, and give workers almost no public inbound” is sufficient.
In the portal firewall screen, first check existing rules, then add only necessary ports.

Target	Port / Protocol	Source consideration	Purpose
AI-SRV	`22/tcp`	Restrict to management source IPs only	SSH
AI-SRV	`80/tcp`	Management network or private side	GPUStack Web UI / API
AI-SRV	`2049/tcp`	Private subnet only	NFS v4
AI-SRV	`111/tcp,111/udp`	Private subnet only when needed	For configurations using rpcbind
GPU Worker	`22/tcp`	Restrict to management source IPs only	SSH

With NFS v4 assumed, it’s easy to consolidate to 2049/tcp, though some environments may use rpcbind. Always limit public scope to private subnet.

Take initial snapshot #

After OS initial creation, key injection, private network connection, and firewall setup are complete,
take snapshots of AI-SRV and GPU workers to make it easier to restore if middleware installation fails later.

Phase 2

Create OS baseline #

On both server and workers, first align time sync and NFS base. Since GPU workers will subsequently have GPU driver / CUDA installed,
completing OS updates and restart points beforehand stabilizes the system.

# All nodes
sudo apt update
sudo apt install -y nfs-common

# For NFS server (AI-SRV only)
sudo apt install -y nfs-kernel-server

To standardize on AI-SRV #

Time synchronization
NFS server
Model storage mount design
Initial GPUStack Server deployment

To standardize on GPU workers #

Time synchronization
NFS client
GPU driver / CUDA
GPUStack Worker and /models mount

Phase 3

Build GPUStack Server + NFS on AI-SRV #

Install GPUStack Server #

In the initial build example, we deploy GPUStack Server on AI-SRV to enable login to the management UI.
After noting the initial admin password, move it to a safe vault and do not post in the article.

curl -sfL https://get.gpustack.ai | sh -s -
cat /var/lib/gpustack/initial_admin_password

# Web UI (example)
# http://10.10.0.10/

Create NFS shared directory #

Place the actual model files in /srv/gpustack_models.
Export this to the private subnet only, and also bind mount it as /models on AI-SRV itself.
This makes the path visible to the server and the path visible to workers consistent.

sudo mkdir -p /srv/gpustack_models

echo "/srv/gpustack_models 10.10.0.0/24(rw,sync,no_subtree_check,no_root_squash)" |   sudo tee -a /etc/exports

sudo exportfs -ra

# Server itself also bind mount
sudo mkdir -p /models
sudo mount --bind /srv/gpustack_models /models
echo "/srv/gpustack_models /models none bind 0 0" | sudo tee -a /etc/fstab

Key point: Always limit export scope to private subnet. It is critical not to open NFS to the public side.

Issue worker join token #

Create a worker join token from the GPUStack GUI. In public articles, express as <WORKER_JOIN_TOKEN>
and do not post actual values. Also, limiting GUI login sources to the private network side is more secure.

Phase 4

Mount shared cache on GPU-VPS side and have workers join #

First mount `/models` via NFS #

Each worker mounts the shared area exported by AI-SRV at the same path /models.
This makes the cache path for all workers consistent, preventing duplicate model storage.

sudo mkdir -p /models
echo "10.10.0.10:/srv/gpustack_models /models nfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

After mounting, verify with mount | grep /models that the intended shared location is visible.

Install GPUStack Worker #

Use the token issued on the AI-SRV side to have each GPU-VPS join the cluster as a worker.

curl -sfL https://get.gpustack.ai | sh -s -   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>

Fix `--cache-dir /models` in worker service #

In shared cache configuration, this specification is most important. Add
--cache-dir /models to the GPUStack Worker systemd definition and also fix the data-dir.

[Service]
ExecStart=/root/.local/bin/gpustack start   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>   --cache-dir /models   --data-dir /var/lib/gpustack-data

sudo systemctl daemon-reload
sudo systemctl enable --now gpustack

# If "Using cache dir: /models" appears, OK
journalctl -u gpustack -f

Verify workers are Ready in GPUStack GUI #

In the GUI Workers screen, verify all GPU-VPS nodes are Ready.
If they remain Not Ready, troubleshoot in order: GPU driver / CUDA, NFS mount, worker logs.

Phase 5

Model deployment and operational verification #

The completion criterion is not just workers becoming Ready.
Actually deploy a model from the GUI with 2 or more replicas, verify only the first node downloads,
and the 2nd and later nodes reuse the same shared file.

Deploy from Catalog #

Set replicas to 2 or more to create conditions for multiple workers to be used.

First worker fetches #

If not yet saved to /models, the assigned first worker fetches the model.

2nd node reads shared file #

If the same model exists in shared cache, it is loaded without re

通过 10G 专用网络连接 Bestnet Cloud × GPU-VPS，实施 GPUStack 集群的步骤