使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS，
實作 GPUStack 叢集的步驟

使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS，實作 GPUStack 叢集的步驟

7 min read

How-to
BESTNET-CLOUD
GPU-VPS
GPUStack
10G Private Network
NFS Shared Cache

Tech Blog / Infrastructure Build Guide

Bestnet Cloud × GPU-VPS connected via 10G private network,
GPUStack cluster implementation guide #

Separating Bestnet Cloud as the control plane and GPU-VPS as inference workers,
with a shared NFS cache centralizing model files.
From client portal preparation
through implementation on Ubuntu and operational verification, we have organized this as a how-to article.

Initial construction example actually performed by us on Ubuntu 24 / GPUStack 0.6.x series

10G
Private Link

1 time
Model Download

2+ units
GPU Worker

1 location
Shared Model Storage

Scope

Scope of this article #

This article covers the initial build pattern of launching AI-SRV on Bestnet Cloud to consolidate GPUStack Server + NFS,
and deploying multiple GPU workers on the GPU-VPS side.
Since role separation and shared cache design are the main themes, public IP addresses and internal naming conventions have been replaced with example values.

The core of the design is to place the control plane that does not require GPU and the model storage location on the general-purpose cloud side,
and treat GPU-VPS purely as inference execution nodes.
By doing this, even when adding workers, you can avoid duplicate storage investment,
and manage the actual model only in the shared area on the Bestnet Cloud side.

VRRP / Keepalived not used
Model registration via GUI only
GPUStack Server + NFS integrated into AI-SRV
All workers share NFS mount point /models

Note: The center of this article is the “Initial Build Edition”. Migration to GPUStack 2.x is covered in a separate article,
and here we have organized the build pattern for the 0.6.x series for clarity.

Knowledge Mapping

Reinterpret Bestnet Cloud knowledge in actual build sequence #

In Bestnet Cloud knowledge base, client portal operational tasks are broken down in detail.
Therefore, in GPU cluster construction, deciding “which knowledge to use at which timing” beforehand reduces rework.
In this article, we have organized the portal-side preparations in the following order.

1
Portal Protection
Determine approach to MFA and allowed IPs first

2
SSH Key Registration
Set up management keys before OS deployment

3
Template Creation
Prepare AI-SRV and GPU-VPS on Ubuntu 24

4
Private Network
Connect all nodes to 10G segment

5
Firewall / NAT
Narrow down public scope and communication direction

6
Snapshot
Keep a restore point at initial state

Portal

Client Portal Security Settings #

Securing the management interface itself first ensures that subsequent server creation and permission management are secure.

Access

SSH Key Registration and Management #

Rather than manually inserting keys after server creation, determine key operations before template deployment.

Provision

Server Creation from Template #

Standardizing AI-SRV and GPU workers on the same base OS prevents variation in subsequent command procedures.

Network

Private Network #

By predetermining internal routes for NFS and cluster communication, you can minimize public ports.

Protect

Firewall Configuration Changes #

Keep only AI-SRV UI, NFS, and GPU worker SSH, following the policy of not exposing unnecessary public surfaces.

Snapshot Management #

Creating restore points after OS initialization, after GPU driver installation, and before app installation makes testing easier.

Scale

Cloud Resource Upgrade #

When capacity becomes insufficient, expanding the shared model area on the AI-SRV side first provides better cost efficiency.

Virtual Router NAT #

If you want to push GPU workers to private-only, you can also choose a configuration where only outbound communication goes through NAT.

Architecture

Architecture approach #

The roles are simple. The AI-SRV on Bestnet Cloud side handles Web UI / API / NFS sharing,
while nodes on the GPU-VPS side focus on GPUStack Worker and inference execution only.
By using a 10G private network to close internal communication,
you can move model distribution, NFS, and cluster connection all to the private side.

BESTNET-CLOUD

control-01 #

AI-SRV integrating GPUStack Server + NFS

Provides Web UI / API
Export /srv/gpustack_models via NFS
Bind mount locally to /models
Consolidate model actual storage to one location

⇄

10G Private Network
NFS / UI / Worker Join / internal communication

⇄

GPU-VPS

gpu-worker-01 / 02 #

Worker group sharing /models, handling inference only

Launch GPUStack Worker
Standardize --cache-dir /models
Each worker does not hold large model storage
On horizontal expansion, add nodes with same pattern

Deploy model via GUI
→
First worker fetches model only once

Save to shared NFS
→
Other workers read from same /models

Concentrate storage on AI-SRV side #

The main battleground for capacity planning is the AI-SRV. Secure the shared area generously, accounting for the model itself, temporary downloads, and future model replacements.

Keep GPU-VPS lean #

Design the GPU worker side focusing on OS, GPU driver, GPUStack Worker, and minimal log area, without duplicating the model itself.

Concentrate communication on private side #

When UI, worker join, NFS, and model reuse are all completed on the private network, the public scope becomes small.

Design Worksheet

Items to decide beforehand #

Role	Location	Hostname (example)	Private IP (example)	Main responsibilities
Server + NFS	Bestnet Cloud	control-01	10.10.0.10	GPUStack UI / API, NFS, model storage
GPU Worker #1	GPU-VPS	gpu-worker-01	10.10.0.21	Inference execution, shared `/models` usage
GPU Worker #2	GPU-VPS	gpu-worker-02	10.10.0.22	Inference execution, shared `/models` usage

Example values for public distribution. Adjust to your own naming rules and private subnet in production.

Design checkpoints #

Which nodes to give public IPs
Whether to make workers private-only
Whether to use NAT router or give individual outbound communication
Initial capacity and expansion plan for shared model area

Storage estimation approach #

Total size of active models
Allowance for old and new models to temporarily coexist during switching
Temporary file space like .part
Logs, operational files, and future worker addition buffer

Phase 1

Solidify client portal preparations first #

Protect the client portal itself #

Before server creation, determine the management protection policy. Especially for team operations,
clarifying MFA and allowed IP handling first prevents later confusion about “who can access the management interface from where”.

Recommendation: Enable 2-factor authentication on the management portal and, if possible, restrict management source IPs.

Register SSH keys before creating servers #

Rather than inserting keys individually after AI-SRV and GPU worker creation, preparing keys beforehand in the client portal SSH key management
allows immediate management access after template deployment.

Organize team operational public keys
Don’t mix shared keys and individual keys in emergency situations
Don’t post key fingerprints in published articles

Create AI-SRV on Bestnet Cloud from template #

Use Ubuntu 24 series template for AI-SRV, hosting both GPUStack Server and NFS.
Since this node also serves as model storage, design with sufficient storage area separate from the OS disk.

Role is GPUStack Server + NFS
Connect to private network
Consider public side only if UI exposure is necessary
Allocate more capacity for model storage on AI-SRV than on workers

Create workers on GPU-VPS in necessary quantities #

Since workers are for inference, select GPU-VPS plans based on GPU type, VRAM, and future model size.
On the other hand, since the model itself is kept on the NFS side, you can easily reduce worker storage to the minimum needed for OS and execution.

Standardize on Ubuntu 24 series
Create with GPU driver / CUDA available
Connect to private network
Consider private-only configuration if no public IP is needed

Place all nodes on 10G private network #

NFS and GPUStack internal communication flow over the 10G private link connecting Bestnet Cloud and GPU-VPS.
This allows model retrieval and worker join to remain on the private side.

Accommodate AI-SRV and all workers in the same private segment
Assign fixed IPs to stabilize subsequent NFS export and systemd settings
If workers need outbound communication, choose NAT router method or temporary public egress

If you want to push workers to private-only, using Bestnet Cloud’s virtual router / NAT pattern for OS updates and external fetching only is easier to manage.

Determine initial firewall rules #

A policy of “allow only minimum necessary inbound to AI-SRV, and give workers almost no public inbound” is sufficient.
In the portal firewall screen, first check existing rules, then add only necessary ports.

Target	Port / Protocol	Source consideration	Purpose
AI-SRV	`22/tcp`	Restrict to management source IPs only	SSH
AI-SRV	`80/tcp`	Management network or private side	GPUStack Web UI / API
AI-SRV	`2049/tcp`	Private subnet only	NFS v4
AI-SRV	`111/tcp,111/udp`	Private subnet only when needed	For configurations using rpcbind
GPU Worker	`22/tcp`	Restrict to management source IPs only	SSH

With NFS v4 assumed, it’s easy to consolidate to 2049/tcp, though some environments may use rpcbind. Always limit public scope to private subnet.

Take initial snapshot #

After OS initial creation, key injection, private network connection, and firewall setup are complete,
take snapshots of AI-SRV and GPU workers to make it easier to restore if middleware installation fails later.

Phase 2

Create OS baseline #

On both server and workers, first align time sync and NFS base. Since GPU workers will subsequently have GPU driver / CUDA installed,
completing OS updates and restart points beforehand stabilizes the system.

# All nodes
sudo apt update
sudo apt install -y nfs-common

# For NFS server (AI-SRV only)
sudo apt install -y nfs-kernel-server

To standardize on AI-SRV #

Time synchronization
NFS server
Model storage mount design
Initial GPUStack Server deployment

To standardize on GPU workers #

Time synchronization
NFS client
GPU driver / CUDA
GPUStack Worker and /models mount

Phase 3

Build GPUStack Server + NFS on AI-SRV #

Install GPUStack Server #

In the initial build example, we deploy GPUStack Server on AI-SRV to enable login to the management UI.
After noting the initial admin password, move it to a safe vault and do not post in the article.

curl -sfL https://get.gpustack.ai | sh -s -
cat /var/lib/gpustack/initial_admin_password

# Web UI (example)
# http://10.10.0.10/

Create NFS shared directory #

Place the actual model files in /srv/gpustack_models.
Export this to the private subnet only, and also bind mount it as /models on AI-SRV itself.
This makes the path visible to the server and the path visible to workers consistent.

sudo mkdir -p /srv/gpustack_models

echo "/srv/gpustack_models 10.10.0.0/24(rw,sync,no_subtree_check,no_root_squash)" |   sudo tee -a /etc/exports

sudo exportfs -ra

# Server itself also bind mount
sudo mkdir -p /models
sudo mount --bind /srv/gpustack_models /models
echo "/srv/gpustack_models /models none bind 0 0" | sudo tee -a /etc/fstab

Key point: Always limit export scope to private subnet. It is critical not to open NFS to the public side.

Issue worker join token #

Create a worker join token from the GPUStack GUI. In public articles, express as <WORKER_JOIN_TOKEN>
and do not post actual values. Also, limiting GUI login sources to the private network side is more secure.

Phase 4

Mount shared cache on GPU-VPS side and have workers join #

First mount `/models` via NFS #

Each worker mounts the shared area exported by AI-SRV at the same path /models.
This makes the cache path for all workers consistent, preventing duplicate model storage.

sudo mkdir -p /models
echo "10.10.0.10:/srv/gpustack_models /models nfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

After mounting, verify with mount | grep /models that the intended shared location is visible.

Install GPUStack Worker #

Use the token issued on the AI-SRV side to have each GPU-VPS join the cluster as a worker.

curl -sfL https://get.gpustack.ai | sh -s -   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>

Fix `--cache-dir /models` in worker service #

In shared cache configuration, this specification is most important. Add
--cache-dir /models to the GPUStack Worker systemd definition and also fix the data-dir.

[Service]
ExecStart=/root/.local/bin/gpustack start   --server-url http://10.10.0.10   --token <WORKER_JOIN_TOKEN>   --cache-dir /models   --data-dir /var/lib/gpustack-data

sudo systemctl daemon-reload
sudo systemctl enable --now gpustack

# If "Using cache dir: /models" appears, OK
journalctl -u gpustack -f

Verify workers are Ready in GPUStack GUI #

In the GUI Workers screen, verify all GPU-VPS nodes are Ready.
If they remain Not Ready, troubleshoot in order: GPU driver / CUDA, NFS mount, worker logs.

Phase 5

Model deployment and operational verification #

The completion criterion is not just workers becoming Ready.
Actually deploy a model from the GUI with 2 or more replicas, verify only the first node downloads,
and the 2nd and later nodes reuse the same shared file.

Deploy from Catalog #

Set replicas to 2 or more to create conditions for multiple workers to be used.

First worker fetches #

If not yet saved to /models, the assigned first worker fetches the model.

2nd node reads shared file #

If the same model exists in shared cache, it is loaded without re

Updated on 2026年6月9日

What are your feelings

Happy
Normal
Sad

將 Bestnet Cloud × GPU-VPS 透過 10G 私有網路連接，實作 GPUStack 叢集的步驟
本文涵蓋範圍
將 Bestnet Cloud 知識對應到實際建置順序
客戶入口網站安全性設定
SSH 金鑰註冊・管理
從範本建立伺服器
私有網路
防火牆設定變更
快照管理
雲端資源升級
虛擬路由器 NAT
架構的概念
control-01
gpu-worker-01 / 02
儲存空間集中於 AI-SRV 端
GPU-VPS 保持精簡
通訊集中於 private side
事前決定的項目
設計時的檢查重點
儲存空間估算考量
優先完成客戶入口網站端的準備工作
強化客戶入口網站本身的防護
先註冊 SSH 金鑰再建立伺服器
在 Bestnet Cloud 上從範本建立 AI-SRV
在 GPU-VPS 端建立所需數量的 worker
將所有節點放上 10G 私有網路
決定防火牆的初始規則
取得第一個快照
建立 OS 端的基準線
AI-SRV 需統一的項目
GPU Worker 須準備的項目
在 AI-SRV 上建置 GPUStack Server + NFS
安裝 GPUStack Server
建立 NFS 共用目錄
發行 worker 加入 token
在 GPU-VPS 側掛載共用 cache 並加入 worker
首先以 NFS 掛載 /models
安裝 GPUStack Worker
在 worker 服務中固定 --cache-dir /models
在 GPUStack GUI 確認 worker 為 Ready 狀態
模型部署與運作確認
從 Catalog 進行 Deploy
第一台 worker 取得模型
第二台讀取共用檔案
更換時重新 Deploy
此架構的成本最佳化效益
營運時的要點
增加 worker 時
容量不足時
變更前的還原點
監控與疑難排解
GPUStack 日誌
模型下載進度
NFS 狀態確認
將 Bestnet Cloud 作為控制系統、GPU-VPS 作為執行系統分離，GPU 叢集會更易於管理

使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS，實作 GPUStack 叢集的步驟

使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS，實作 GPUStack 叢集的步驟

Scope of this article #

Reinterpret Bestnet Cloud knowledge in actual build sequence #

Client Portal Security Settings #

SSH Key Registration and Management #

Server Creation from Template #

Private Network #

Firewall Configuration Changes #

Snapshot Management #

Cloud Resource Upgrade #

Virtual Router NAT #

Architecture approach #

control-01 #

gpu-worker-01 / 02 #

Concentrate storage on AI-SRV side #

Keep GPU-VPS lean #

Concentrate communication on private side #

Items to decide beforehand #

Design checkpoints #

Storage estimation approach #

Solidify client portal preparations first #

Protect the client portal itself #

Register SSH keys before creating servers #

Create AI-SRV on Bestnet Cloud from template #

Create workers on GPU-VPS in necessary quantities #

Place all nodes on 10G private network #

Determine initial firewall rules #

Take initial snapshot #

Create OS baseline #

To standardize on AI-SRV #

To standardize on GPU workers #

Build GPUStack Server + NFS on AI-SRV #

Install GPUStack Server #

Create NFS shared directory #

Issue worker join token #

Mount shared cache on GPU-VPS side and have workers join #

First mount /models via NFS #

Install GPUStack Worker #

Fix --cache-dir /models in worker service #

Verify workers are Ready in GPUStack GUI #

Model deployment and operational verification #

Deploy from Catalog #

First worker fetches #

2nd node reads shared file #

Share This Article :

服務

AI 解決方案

資源

使用 10G 私有網路連接 Bestnet Cloud × GPU-VPS，
實作 GPUStack 叢集的步驟

First mount `/models` via NFS #

Fix `--cache-dir /models` in worker service #