使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS,
實作 GPUStack 叢集的步驟

使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS,實作 GPUStack 叢集的步驟

7 min read

How-to
BESTNET-CLOUD
GPU-VPS
GPUStack
10G Private Network
NFS Shared Cache

Tech Blog / Infrastructure Build Guide

Bestnet Cloud × GPU-VPS connected via 10G private network,
GPUStack cluster implementation guide #

Separating Bestnet Cloud as the control plane and GPU-VPS as inference workers,
with a shared NFS cache centralizing model files.
From client portal preparation
through implementation on Ubuntu and operational verification, we have organized this as a how-to article.

Initial construction example actually performed by us on Ubuntu 24 / GPUStack 0.6.x series
10G
Private Link
1 time
Model Download
2+ units
GPU Worker
1 location
Shared Model Storage
Scope

Scope of this article #

This article covers the initial build pattern of launching AI-SRV on Bestnet Cloud to consolidate GPUStack Server + NFS,
and deploying multiple GPU workers on the GPU-VPS side.
Since role separation and shared cache design are the main themes, public IP addresses and internal naming conventions have been replaced with example values.

The core of the design is to place the control plane that does not require GPU and the model storage location on the general-purpose cloud side,
and treat GPU-VPS purely as inference execution nodes.
By doing this, even when adding workers, you can avoid duplicate storage investment,
and manage the actual model only in the shared area on the Bestnet Cloud side.

VRRP / Keepalived not used
Model registration via GUI only
GPUStack Server + NFS integrated into AI-SRV
All workers share NFS mount point /models
Note: The center of this article is the “Initial Build Edition”. Migration to GPUStack 2.x is covered in a separate article,
and here we have organized the build pattern for the 0.6.x series for clarity.
Knowledge Mapping

Reinterpret Bestnet Cloud knowledge in actual build sequence #

In Bestnet Cloud knowledge base, client portal operational tasks are broken down in detail.
Therefore, in GPU cluster construction, deciding “which knowledge to use at which timing” beforehand reduces rework.
In this article, we have organized the portal-side preparations in the following order.

1
Portal Protection
Determine approach to MFA and allowed IPs first
2
SSH Key Registration
Set up management keys before OS deployment
3
Template Creation
Prepare AI-SRV and GPU-VPS on Ubuntu 24
4
Private Network
Connect all nodes to 10G segment
5
Firewall / NAT
Narrow down public scope and communication direction
6
Snapshot
Keep a restore point at initial state

Portal

Client Portal Security Settings #

Securing the management interface itself first ensures that subsequent server creation and permission management are secure.

Access

SSH Key Registration and Management #

Rather than manually inserting keys after server creation, determine key operations before template deployment.

Provision

Server Creation from Template #

Standardizing AI-SRV and GPU workers on the same base OS prevents variation in subsequent command procedures.

Network

Private Network #

By predetermining internal routes for NFS and cluster communication, you can minimize public ports.

Protect

Firewall Configuration Changes #

Keep only AI-SRV UI, NFS, and GPU worker SSH, following the policy of not exposing unnecessary public surfaces.

Snapshot Management #

Creating restore points after OS initialization, after GPU driver installation, and before app installation makes testing easier.

Scale

Cloud Resource Upgrade #

When capacity becomes insufficient, expanding the shared model area on the AI-SRV side first provides better cost efficiency.

Virtual Router NAT #

If you want to push GPU workers to private-only, you can also choose a configuration where only outbound communication goes through NAT.

Architecture

Architecture approach #

The roles are simple. The AI-SRV on Bestnet Cloud side handles Web UI / API / NFS sharing,
while nodes on the GPU-VPS side focus on GPUStack Worker and inference execution only.
By using a 10G private network to close internal communication,
you can move model distribution, NFS, and cluster connection all to the private side.

BESTNET-CLOUD

control-01 #

AI-SRV integrating GPUStack Server + NFS

  • Provides Web UI / API
  • Export /srv/gpustack_models via NFS
  • Bind mount locally to /models
  • Consolidate model actual storage to one location

GPU-VPS

gpu-worker-01 / 02 #

Worker group sharing /models, handling inference only

  • Launch GPUStack Worker
  • Standardize --cache-dir /models
  • Each worker does not hold large model storage
  • On horizontal expansion, add nodes with same pattern
Deploy model via GUI

First worker fetches model only once
Save to shared NFS

Other workers read from same /models

Concentrate storage on AI-SRV side #

The main battleground for capacity planning is the AI-SRV. Secure the shared area generously, accounting for the model itself, temporary downloads, and future model replacements.

Keep GPU-VPS lean #

Design the GPU worker side focusing on OS, GPU driver, GPUStack Worker, and minimal log area, without duplicating the model itself.

Concentrate communication on private side #

When UI, worker join, NFS, and model reuse are all completed on the private network, the public scope becomes small.

Design Worksheet

Items to decide beforehand #

RoleLocationHostname (example)Private IP (example)Main responsibilities
Server + NFSBestnet Cloudcontrol-0110.10.0.10GPUStack UI / API, NFS, model storage
GPU Worker #1GPU-VPSgpu-worker-0110.10.0.21Inference execution, shared /models usage
GPU Worker #2GPU-VPSgpu-worker-0210.10.0.22Inference execution, shared /models usage

Example values for public distribution. Adjust to your own naming rules and private subnet in production.

Design checkpoints #

  • Which nodes to give public IPs
  • Whether to make workers private-only
  • Whether to use NAT router or give individual outbound communication
  • Initial capacity and expansion plan for shared model area

Storage estimation approach #

  • Total size of active models
  • Allowance for old and new models to temporarily coexist during switching
  • Temporary file space like .part
  • Logs, operational files, and future worker addition buffer
Phase 1

Solidify client portal preparations first #

01

Protect the client portal itself #

Before server creation, determine the management protection policy. Especially for team operations,
clarifying MFA and allowed IP handling first prevents later confusion about “who can access the management interface from where”.

Recommendation: Enable 2-factor authentication on the management portal and, if possible, restrict management source IPs.
02

Register SSH keys before creating servers #

Rather than inserting keys individually after AI-SRV and GPU worker creation, preparing keys beforehand in the client portal SSH key management
allows immediate management access after template deployment.

  • Organize team operational public keys
  • Don’t mix shared keys and individual keys in emergency situations
  • Don’t post key fingerprints in published articles
03

Create AI-SRV on Bestnet Cloud from template #

Use Ubuntu 24 series template for AI-SRV, hosting both GPUStack Server and NFS.
Since this node also serves as model storage, design with sufficient storage area separate from the OS disk.

  • Role is GPUStack Server + NFS
  • Connect to private network
  • Consider public side only if UI exposure is necessary
  • Allocate more capacity for model storage on AI-SRV than on workers
04

Create workers on GPU-VPS in necessary quantities #

Since workers are for inference, select GPU-VPS plans based on GPU type, VRAM, and future model size.
On the other hand, since the model itself is kept on the NFS side, you can easily reduce worker storage to the minimum needed for OS and execution.

  • Standardize on Ubuntu 24 series
  • Create with GPU driver / CUDA available
  • Connect to private network
  • Consider private-only configuration if no public IP is needed
05

Place all nodes on 10G private network #

NFS and GPUStack internal communication flow over the 10G private link connecting Bestnet Cloud and GPU-VPS.
This allows model retrieval and worker join to remain on the private side.

  • Accommodate AI-SRV and all workers in the same private segment
  • Assign fixed IPs to stabilize subsequent NFS export and systemd settings
  • If workers need outbound communication, choose NAT router method or temporary public egress

If you want to push workers to private-only, using Bestnet Cloud’s virtual router / NAT pattern for OS updates and external fetching only is easier to manage.

06

Determine initial firewall rules #

A policy of “allow only minimum necessary inbound to AI-SRV, and give workers almost no public inbound” is sufficient.
In the portal firewall screen, first check existing rules, then add only necessary ports.

TargetPort / ProtocolSource considerationPurpose
AI-SRV22/tcpRestrict to management source IPs onlySSH
AI-SRV80/tcpManagement network or private sideGPUStack Web UI / API
AI-SRV2049/tcpPrivate subnet onlyNFS v4
AI-SRV111/tcp,111/udpPrivate subnet only when neededFor configurations using rpcbind
GPU Worker22/tcpRestrict to management source IPs onlySSH

With NFS v4 assumed, it’s easy to consolidate to 2049/tcp, though some environments may use rpcbind. Always limit public scope to private subnet.

07

Take initial snapshot #

After OS initial creation, key injection, private network connection, and firewall setup are complete,
take snapshots of AI-SRV and GPU workers to make it easier to restore if middleware installation fails later.

Phase 2

Create OS baseline #

On both server and workers, first align time sync and NFS base. Since GPU workers will subsequently have GPU driver / CUDA installed,
completing OS updates and restart points beforehand stabilizes the system.

# All nodes
sudo apt update
sudo apt install -y nfs-common

# For NFS server (AI-SRV only)
sudo apt install -y nfs-kernel-server

To standardize on AI-SRV #

  • Time synchronization
  • NFS server
  • Model storage mount design
  • Initial GPUStack Server deployment

To standardize on GPU workers #

  • Time synchronization
  • NFS client
  • GPU driver / CUDA
  • GPUStack Worker and /models mount
Phase 3

Build GPUStack Server + NFS on AI-SRV #

08

Install GPUStack Server #

In the initial build example, we deploy GPUStack Server on AI-SRV to enable login to the management UI.
After noting the initial admin password, move it to a safe vault and do not post in the article.

curl -sfL https://get.gpustack.ai | sh -s -
cat /var/lib/gpustack/initial_admin_password

# Web UI (example)
# http://10.10.0.10/
09

Create NFS shared directory #

Place the actual model files in /srv/gpustack_models.
Export this to the private subnet only, and also bind mount it as /models on AI-SRV itself.
This makes the path visible to the server and the path visible to workers consistent.

sudo mkdir -p /srv/gpustack_models

echo "/srv/gpustack_models 10.10.0.0/24(rw,sync,no_subtree_check,no_root_squash)" |   sudo tee -a /etc/exports

sudo exportfs -ra

# Server itself also bind mount
sudo mkdir -p /models
sudo mount --bind /srv/gpustack_models /models
echo "/srv/gpustack_models /models none bind 0 0" | sudo tee -a /etc/fstab
Key point: Always limit export scope to private subnet. It is critical not to open NFS to the public side.
10

Issue worker join token #

Create a worker join token from the GPUStack GUI. In public articles, express as <WORKER_JOIN_TOKEN>
and do not post actual values. Also, limiting GUI login sources to the private network side is more secure.

Phase 4

Mount shared cache on GPU-VPS side and have workers join #

11

First mount /models via NFS #

Each worker mounts the shared area exported by AI-SRV at the same path /models.
This makes the cache path for all workers consistent, preventing duplicate model storage.

sudo mkdir -p /models
echo "10.10.0.10:/srv/gpustack_models /models nfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

After mounting, verify with mount | grep /models that the intended shared location is visible.

12

Install GPUStack Worker #

Use the token issued on the AI-SRV side to have each GPU-VPS join the cluster as a worker.

curl -sfL https://get.gpustack.ai | sh -s -   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>
13

Fix --cache-dir /models in worker service #

In shared cache configuration, this specification is most important. Add
--cache-dir /models to the GPUStack Worker systemd definition and also fix the data-dir.

[Service]
ExecStart=/root/.local/bin/gpustack start   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>   --cache-dir /models   --data-dir /var/lib/gpustack-data
sudo systemctl daemon-reload
sudo systemctl enable --now gpustack

# If "Using cache dir: /models" appears, OK
journalctl -u gpustack -f
14

Verify workers are Ready in GPUStack GUI #

In the GUI Workers screen, verify all GPU-VPS nodes are Ready.
If they remain Not Ready, troubleshoot in order: GPU driver / CUDA, NFS mount, worker logs.

Phase 5

Model deployment and operational verification #

The completion criterion is not just workers becoming Ready.
Actually deploy a model from the GUI with 2 or more replicas, verify only the first node downloads,
and the 2nd and later nodes reuse the same shared file.

1

Deploy from Catalog #

Set replicas to 2 or more to create conditions for multiple workers to be used.

2

First worker fetches #

If not yet saved to /models, the assigned first worker fetches the model.

3

2nd node reads shared file #

If the same model exists in shared cache, it is loaded without re

Updated on 2026年6月9日

What are your feelings

  • Happy
  • Normal
  • Sad
目次