BESTNET-CLOUD
GPU-VPS
GPUStack
10G Private Network
NFS Shared Cache
Tech Blog / Infrastructure Build Guide
Bestnet Cloud × GPU-VPS connected via 10G private network,
GPUStack cluster implementation guide #
Separating Bestnet Cloud as the control plane and GPU-VPS as inference workers,
with a shared NFS cache centralizing model files.
From client portal preparation
through implementation on Ubuntu and operational verification, we have organized this as a how-to article.
Private Link
Model Download
GPU Worker
Shared Model Storage
Scope of this article #
This article covers the initial build pattern of launching AI-SRV on Bestnet Cloud to consolidate GPUStack Server + NFS,
and deploying multiple GPU workers on the GPU-VPS side.
Since role separation and shared cache design are the main themes, public IP addresses and internal naming conventions have been replaced with example values.
The core of the design is to place the control plane that does not require GPU and the model storage location on the general-purpose cloud side,
and treat GPU-VPS purely as inference execution nodes.
By doing this, even when adding workers, you can avoid duplicate storage investment,
and manage the actual model only in the shared area on the Bestnet Cloud side.
Model registration via GUI only
GPUStack Server + NFS integrated into AI-SRV
All workers share NFS mount point /models
and here we have organized the build pattern for the 0.6.x series for clarity.
Reinterpret Bestnet Cloud knowledge in actual build sequence #
In Bestnet Cloud knowledge base, client portal operational tasks are broken down in detail.
Therefore, in GPU cluster construction, deciding “which knowledge to use at which timing” beforehand reduces rework.
In this article, we have organized the portal-side preparations in the following order.
Portal Protection
Determine approach to MFA and allowed IPs first
SSH Key Registration
Set up management keys before OS deployment
Template Creation
Prepare AI-SRV and GPU-VPS on Ubuntu 24
Private Network
Connect all nodes to 10G segment
Firewall / NAT
Narrow down public scope and communication direction
Snapshot
Keep a restore point at initial state
Portal
Client Portal Security Settings #
Securing the management interface itself first ensures that subsequent server creation and permission management are secure.
Access
SSH Key Registration and Management #
Rather than manually inserting keys after server creation, determine key operations before template deployment.
Provision
Server Creation from Template #
Standardizing AI-SRV and GPU workers on the same base OS prevents variation in subsequent command procedures.
Network
Private Network #
By predetermining internal routes for NFS and cluster communication, you can minimize public ports.
Protect
Firewall Configuration Changes #
Keep only AI-SRV UI, NFS, and GPU worker SSH, following the policy of not exposing unnecessary public surfaces.
Snapshot Management #
Creating restore points after OS initialization, after GPU driver installation, and before app installation makes testing easier.
Scale
Cloud Resource Upgrade #
When capacity becomes insufficient, expanding the shared model area on the AI-SRV side first provides better cost efficiency.
Virtual Router NAT #
If you want to push GPU workers to private-only, you can also choose a configuration where only outbound communication goes through NAT.
Architecture approach #
The roles are simple. The AI-SRV on Bestnet Cloud side handles Web UI / API / NFS sharing,
while nodes on the GPU-VPS side focus on GPUStack Worker and inference execution only.
By using a 10G private network to close internal communication,
you can move model distribution, NFS, and cluster connection all to the private side.
BESTNET-CLOUD
control-01 #
AI-SRV integrating GPUStack Server + NFS
- Provides Web UI / API
- Export
/srv/gpustack_modelsvia NFS - Bind mount locally to
/models - Consolidate model actual storage to one location
NFS / UI / Worker Join / internal communication
GPU-VPS
gpu-worker-01 / 02 #
Worker group sharing /models, handling inference only
- Launch GPUStack Worker
- Standardize
--cache-dir /models - Each worker does not hold large model storage
- On horizontal expansion, add nodes with same pattern
→
First worker fetches model only once
→
Other workers read from same
/modelsConcentrate storage on AI-SRV side #
The main battleground for capacity planning is the AI-SRV. Secure the shared area generously, accounting for the model itself, temporary downloads, and future model replacements.
Keep GPU-VPS lean #
Design the GPU worker side focusing on OS, GPU driver, GPUStack Worker, and minimal log area, without duplicating the model itself.
Concentrate communication on private side #
When UI, worker join, NFS, and model reuse are all completed on the private network, the public scope becomes small.
Items to decide beforehand #
| Role | Location | Hostname (example) | Private IP (example) | Main responsibilities |
|---|---|---|---|---|
| Server + NFS | Bestnet Cloud | control-01 | 10.10.0.10 | GPUStack UI / API, NFS, model storage |
| GPU Worker #1 | GPU-VPS | gpu-worker-01 | 10.10.0.21 | Inference execution, shared /models usage |
| GPU Worker #2 | GPU-VPS | gpu-worker-02 | 10.10.0.22 | Inference execution, shared /models usage |
Example values for public distribution. Adjust to your own naming rules and private subnet in production.
Design checkpoints #
- Which nodes to give public IPs
- Whether to make workers private-only
- Whether to use NAT router or give individual outbound communication
- Initial capacity and expansion plan for shared model area
Storage estimation approach #
- Total size of active models
- Allowance for old and new models to temporarily coexist during switching
- Temporary file space like
.part - Logs, operational files, and future worker addition buffer
Solidify client portal preparations first #
Protect the client portal itself #
Before server creation, determine the management protection policy. Especially for team operations,
clarifying MFA and allowed IP handling first prevents later confusion about “who can access the management interface from where”.
Register SSH keys before creating servers #
Rather than inserting keys individually after AI-SRV and GPU worker creation, preparing keys beforehand in the client portal SSH key management
allows immediate management access after template deployment.
- Organize team operational public keys
- Don’t mix shared keys and individual keys in emergency situations
- Don’t post key fingerprints in published articles
Create AI-SRV on Bestnet Cloud from template #
Use Ubuntu 24 series template for AI-SRV, hosting both GPUStack Server and NFS.
Since this node also serves as model storage, design with sufficient storage area separate from the OS disk.
- Role is GPUStack Server + NFS
- Connect to private network
- Consider public side only if UI exposure is necessary
- Allocate more capacity for model storage on AI-SRV than on workers
Create workers on GPU-VPS in necessary quantities #
Since workers are for inference, select GPU-VPS plans based on GPU type, VRAM, and future model size.
On the other hand, since the model itself is kept on the NFS side, you can easily reduce worker storage to the minimum needed for OS and execution.
- Standardize on Ubuntu 24 series
- Create with GPU driver / CUDA available
- Connect to private network
- Consider private-only configuration if no public IP is needed
Place all nodes on 10G private network #
NFS and GPUStack internal communication flow over the 10G private link connecting Bestnet Cloud and GPU-VPS.
This allows model retrieval and worker join to remain on the private side.
- Accommodate AI-SRV and all workers in the same private segment
- Assign fixed IPs to stabilize subsequent NFS export and systemd settings
- If workers need outbound communication, choose NAT router method or temporary public egress
If you want to push workers to private-only, using Bestnet Cloud’s virtual router / NAT pattern for OS updates and external fetching only is easier to manage.
Determine initial firewall rules #
A policy of “allow only minimum necessary inbound to AI-SRV, and give workers almost no public inbound” is sufficient.
In the portal firewall screen, first check existing rules, then add only necessary ports.
| Target | Port / Protocol | Source consideration | Purpose |
|---|---|---|---|
| AI-SRV | 22/tcp | Restrict to management source IPs only | SSH |
| AI-SRV | 80/tcp | Management network or private side | GPUStack Web UI / API |
| AI-SRV | 2049/tcp | Private subnet only | NFS v4 |
| AI-SRV | 111/tcp,111/udp | Private subnet only when needed | For configurations using rpcbind |
| GPU Worker | 22/tcp | Restrict to management source IPs only | SSH |
With NFS v4 assumed, it’s easy to consolidate to 2049/tcp, though some environments may use rpcbind. Always limit public scope to private subnet.
Take initial snapshot #
After OS initial creation, key injection, private network connection, and firewall setup are complete,
take snapshots of AI-SRV and GPU workers to make it easier to restore if middleware installation fails later.
Create OS baseline #
On both server and workers, first align time sync and NFS base. Since GPU workers will subsequently have GPU driver / CUDA installed,
completing OS updates and restart points beforehand stabilizes the system.
# All nodes
sudo apt update
sudo apt install -y nfs-common
# For NFS server (AI-SRV only)
sudo apt install -y nfs-kernel-server
Build GPUStack Server + NFS on AI-SRV #
Install GPUStack Server #
In the initial build example, we deploy GPUStack Server on AI-SRV to enable login to the management UI.
After noting the initial admin password, move it to a safe vault and do not post in the article.
curl -sfL https://get.gpustack.ai | sh -s -
cat /var/lib/gpustack/initial_admin_password
# Web UI (example)
# http://10.10.0.10/Create NFS shared directory #
Place the actual model files in /srv/gpustack_models.
Export this to the private subnet only, and also bind mount it as /models on AI-SRV itself.
This makes the path visible to the server and the path visible to workers consistent.
sudo mkdir -p /srv/gpustack_models
echo "/srv/gpustack_models 10.10.0.0/24(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
sudo exportfs -ra
# Server itself also bind mount
sudo mkdir -p /models
sudo mount --bind /srv/gpustack_models /models
echo "/srv/gpustack_models /models none bind 0 0" | sudo tee -a /etc/fstabIssue worker join token #
Create a worker join token from the GPUStack GUI. In public articles, express as <WORKER_JOIN_TOKEN>
and do not post actual values. Also, limiting GUI login sources to the private network side is more secure.
Mount shared cache on GPU-VPS side and have workers join #
First mount /models via NFS #
Each worker mounts the shared area exported by AI-SRV at the same path /models.
This makes the cache path for all workers consistent, preventing duplicate model storage.
sudo mkdir -p /models
echo "10.10.0.10:/srv/gpustack_models /models nfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -aAfter mounting, verify with mount | grep /models that the intended shared location is visible.
Install GPUStack Worker #
Use the token issued on the AI-SRV side to have each GPU-VPS join the cluster as a worker.
curl -sfL https://get.gpustack.ai | sh -s - --server-url http://10.10.0.10 --token <WORKER_JOIN_TOKEN>Fix --cache-dir /models in worker service #
In shared cache configuration, this specification is most important. Add--cache-dir /models to the GPUStack Worker systemd definition and also fix the data-dir.
[Service]
ExecStart=/root/.local/bin/gpustack start --server-url http://10.10.0.10 --token <WORKER_JOIN_TOKEN> --cache-dir /models --data-dir /var/lib/gpustack-datasudo systemctl daemon-reload
sudo systemctl enable --now gpustack
# If "Using cache dir: /models" appears, OK
journalctl -u gpustack -fVerify workers are Ready in GPUStack GUI #
In the GUI Workers screen, verify all GPU-VPS nodes are Ready.
If they remain Not Ready, troubleshoot in order: GPU driver / CUDA, NFS mount, worker logs.
Model deployment and operational verification #
The completion criterion is not just workers becoming Ready.
Actually deploy a model from the GUI with 2 or more replicas, verify only the first node downloads,
and the 2nd and later nodes reuse the same shared file.
- 將 Bestnet Cloud × GPU-VPS 透過 10G 私有網路連接, 實作 GPUStack 叢集的步驟
- 本文涵蓋範圍
- 將 Bestnet Cloud 知識對應到實際建置順序
- 客戶入口網站安全性設定
- SSH 金鑰註冊・管理
- 從範本建立伺服器
- 私有網路
- 防火牆設定變更
- 快照管理
- 雲端資源升級
- 虛擬路由器 NAT
- 架構的概念
- control-01
- gpu-worker-01 / 02
- 儲存空間集中於 AI-SRV 端
- GPU-VPS 保持精簡
- 通訊集中於 private side
- 事前決定的項目
- 設計時的檢查重點
- 儲存空間估算考量
- 優先完成客戶入口網站端的準備工作
- 強化客戶入口網站本身的防護
- 先註冊 SSH 金鑰再建立伺服器
- 在 Bestnet Cloud 上從範本建立 AI-SRV
- 在 GPU-VPS 端建立所需數量的 worker
- 將所有節點放上 10G 私有網路
- 決定防火牆的初始規則
- 取得第一個快照
- 建立 OS 端的基準線
- AI-SRV 需統一的項目
- GPU Worker 須準備的項目
- 在 AI-SRV 上建置 GPUStack Server + NFS
- 安裝 GPUStack Server
- 建立 NFS 共用目錄
- 發行 worker 加入 token
- 在 GPU-VPS 側掛載共用 cache 並加入 worker
- 首先以 NFS 掛載 /models
- 安裝 GPUStack Worker
- 在 worker 服務中固定 --cache-dir /models
- 在 GPUStack GUI 確認 worker 為 Ready 狀態
- 模型部署與運作確認
- 從 Catalog 進行 Deploy
- 第一台 worker 取得模型
- 第二台讀取共用檔案
- 更換時重新 Deploy
- 此架構的成本最佳化效益
- 營運時的要點
- 增加 worker 時
- 容量不足時
- 變更前的還原點
- 監控與疑難排解
- GPUStack 日誌
- 模型下載進度
- NFS 狀態確認
- 將 Bestnet Cloud 作為控制系統、GPU-VPS 作為執行系統分離,GPU 叢集會更易於管理