BESTNET-CLOUD
GPU-VPS
GPUStack
10G Private Network
NFS Shared Cache
Tech Blog / Infrastructure Build Guide
Bestnet Cloud × GPU-VPS connected via 10G private network,
GPUStack cluster implementation guide #
Separating Bestnet Cloud as the control plane and GPU-VPS as inference workers,
with a shared NFS cache centralizing model files.
From client portal preparation
through implementation on Ubuntu and operational verification, we have organized this as a how-to article.
Private Link
Model Download
GPU Worker
Shared Model Storage
Scope of this article #
This article covers the initial build pattern of launching AI-SRV on Bestnet Cloud to consolidate GPUStack Server + NFS,
and deploying multiple GPU workers on the GPU-VPS side.
Since role separation and shared cache design are the main themes, public IP addresses and internal naming conventions have been replaced with example values.
The core of the design is to place the control plane that does not require GPU and the model storage location on the general-purpose cloud side,
and treat GPU-VPS purely as inference execution nodes.
By doing this, even when adding workers, you can avoid duplicate storage investment,
and manage the actual model only in the shared area on the Bestnet Cloud side.
Model registration via GUI only
GPUStack Server + NFS integrated into AI-SRV
All workers share NFS mount point /models
and here we have organized the build pattern for the 0.6.x series for clarity.
Reinterpret Bestnet Cloud knowledge in actual build sequence #
In Bestnet Cloud knowledge base, client portal operational tasks are broken down in detail.
Therefore, in GPU cluster construction, deciding “which knowledge to use at which timing” beforehand reduces rework.
In this article, we have organized the portal-side preparations in the following order.
Portal Protection
Determine approach to MFA and allowed IPs first
SSH Key Registration
Set up management keys before OS deployment
Template Creation
Prepare AI-SRV and GPU-VPS on Ubuntu 24
Private Network
Connect all nodes to 10G segment
Firewall / NAT
Narrow down public scope and communication direction
Snapshot
Keep a restore point at initial state
Portal
Client Portal Security Settings #
Securing the management interface itself first ensures that subsequent server creation and permission management are secure.
Access
SSH Key Registration and Management #
Rather than manually inserting keys after server creation, determine key operations before template deployment.
Provision
Server Creation from Template #
Standardizing AI-SRV and GPU workers on the same base OS prevents variation in subsequent command procedures.
Network
Private Network #
By predetermining internal routes for NFS and cluster communication, you can minimize public ports.
Protect
Firewall Configuration Changes #
Keep only AI-SRV UI, NFS, and GPU worker SSH, following the policy of not exposing unnecessary public surfaces.
Snapshot Management #
Creating restore points after OS initialization, after GPU driver installation, and before app installation makes testing easier.
Scale
Cloud Resource Upgrade #
When capacity becomes insufficient, expanding the shared model area on the AI-SRV side first provides better cost efficiency.
Virtual Router NAT #
If you want to push GPU workers to private-only, you can also choose a configuration where only outbound communication goes through NAT.
Architecture approach #
The roles are simple. The AI-SRV on Bestnet Cloud side handles Web UI / API / NFS sharing,
while nodes on the GPU-VPS side focus on GPUStack Worker and inference execution only.
By using a 10G private network to close internal communication,
you can move model distribution, NFS, and cluster connection all to the private side.
BESTNET-CLOUD
control-01 #
AI-SRV integrating GPUStack Server + NFS
- Provides Web UI / API
- Export
/srv/gpustack_modelsvia NFS - Bind mount locally to
/models - Consolidate model actual storage to one location
NFS / UI / Worker Join / internal communication
GPU-VPS
gpu-worker-01 / 02 #
Worker group sharing /models, handling inference only
- Launch GPUStack Worker
- Standardize
--cache-dir /models - Each worker does not hold large model storage
- On horizontal expansion, add nodes with same pattern
→
First worker fetches model only once
→
Other workers read from same
/modelsConcentrate storage on AI-SRV side #
The main battleground for capacity planning is the AI-SRV. Secure the shared area generously, accounting for the model itself, temporary downloads, and future model replacements.
Keep GPU-VPS lean #
Design the GPU worker side focusing on OS, GPU driver, GPUStack Worker, and minimal log area, without duplicating the model itself.
Concentrate communication on private side #
When UI, worker join, NFS, and model reuse are all completed on the private network, the public scope becomes small.
Items to decide beforehand #
| Role | Location | Hostname (example) | Private IP (example) | Main responsibilities |
|---|---|---|---|---|
| Server + NFS | Bestnet Cloud | control-01 | 10.10.0.10 | GPUStack UI / API, NFS, model storage |
| GPU Worker #1 | GPU-VPS | gpu-worker-01 | 10.10.0.21 | Inference execution, shared /models usage |
| GPU Worker #2 | GPU-VPS | gpu-worker-02 | 10.10.0.22 | Inference execution, shared /models usage |
Example values for public distribution. Adjust to your own naming rules and private subnet in production.
Design checkpoints #
- Which nodes to give public IPs
- Whether to make workers private-only
- Whether to use NAT router or give individual outbound communication
- Initial capacity and expansion plan for shared model area
Storage estimation approach #
- Total size of active models
- Allowance for old and new models to temporarily coexist during switching
- Temporary file space like
.part - Logs, operational files, and future worker addition buffer
Solidify client portal preparations first #
Protect the client portal itself #
Before server creation, determine the management protection policy. Especially for team operations,
clarifying MFA and allowed IP handling first prevents later confusion about “who can access the management interface from where”.
Register SSH keys before creating servers #
Rather than inserting keys individually after AI-SRV and GPU worker creation, preparing keys beforehand in the client portal SSH key management
allows immediate management access after template deployment.
- Organize team operational public keys
- Don’t mix shared keys and individual keys in emergency situations
- Don’t post key fingerprints in published articles
Create AI-SRV on Bestnet Cloud from template #
Use Ubuntu 24 series template for AI-SRV, hosting both GPUStack Server and NFS.
Since this node also serves as model storage, design with sufficient storage area separate from the OS disk.
- Role is GPUStack Server + NFS
- Connect to private network
- Consider public side only if UI exposure is necessary
- Allocate more capacity for model storage on AI-SRV than on workers
Create workers on GPU-VPS in necessary quantities #
Since workers are for inference, select GPU-VPS plans based on GPU type, VRAM, and future model size.
On the other hand, since the model itself is kept on the NFS side, you can easily reduce worker storage to the minimum needed for OS and execution.
- Standardize on Ubuntu 24 series
- Create with GPU driver / CUDA available
- Connect to private network
- Consider private-only configuration if no public IP is needed
Place all nodes on 10G private network #
NFS and GPUStack internal communication flow over the 10G private link connecting Bestnet Cloud and GPU-VPS.
This allows model retrieval and worker join to remain on the private side.
- Accommodate AI-SRV and all workers in the same private segment
- Assign fixed IPs to stabilize subsequent NFS export and systemd settings
- If workers need outbound communication, choose NAT router method or temporary public egress
If you want to push workers to private-only, using Bestnet Cloud’s virtual router / NAT pattern for OS updates and external fetching only is easier to manage.
Determine initial firewall rules #
A policy of “allow only minimum necessary inbound to AI-SRV, and give workers almost no public inbound” is sufficient.
In the portal firewall screen, first check existing rules, then add only necessary ports.
| Target | Port / Protocol | Source consideration | Purpose |
|---|---|---|---|
| AI-SRV | 22/tcp | Restrict to management source IPs only | SSH |
| AI-SRV | 80/tcp | Management network or private side | GPUStack Web UI / API |
| AI-SRV | 2049/tcp | Private subnet only | NFS v4 |
| AI-SRV | 111/tcp,111/udp | Private subnet only when needed | For configurations using rpcbind |
| GPU Worker | 22/tcp | Restrict to management source IPs only | SSH |
With NFS v4 assumed, it’s easy to consolidate to 2049/tcp, though some environments may use rpcbind. Always limit public scope to private subnet.
Take initial snapshot #
After OS initial creation, key injection, private network connection, and firewall setup are complete,
take snapshots of AI-SRV and GPU workers to make it easier to restore if middleware installation fails later.
Create OS baseline #
On both server and workers, first align time sync and NFS base. Since GPU workers will subsequently have GPU driver / CUDA installed,
completing OS updates and restart points beforehand stabilizes the system.
# All nodes
sudo apt update
sudo apt install -y nfs-common
# For NFS server (AI-SRV only)
sudo apt install -y nfs-kernel-server
Build GPUStack Server + NFS on AI-SRV #
Install GPUStack Server #
In the initial build example, we deploy GPUStack Server on AI-SRV to enable login to the management UI.
After noting the initial admin password, move it to a safe vault and do not post in the article.
curl -sfL https://get.gpustack.ai | sh -s -
cat /var/lib/gpustack/initial_admin_password
# Web UI (example)
# http://10.10.0.10/Create NFS shared directory #
Place the actual model files in /srv/gpustack_models.
Export this to the private subnet only, and also bind mount it as /models on AI-SRV itself.
This makes the path visible to the server and the path visible to workers consistent.
sudo mkdir -p /srv/gpustack_models
echo "/srv/gpustack_models 10.10.0.0/24(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
sudo exportfs -ra
# Server itself also bind mount
sudo mkdir -p /models
sudo mount --bind /srv/gpustack_models /models
echo "/srv/gpustack_models /models none bind 0 0" | sudo tee -a /etc/fstabIssue worker join token #
Create a worker join token from the GPUStack GUI. In public articles, express as <WORKER_JOIN_TOKEN>
and do not post actual values. Also, limiting GUI login sources to the private network side is more secure.
Mount shared cache on GPU-VPS side and have workers join #
First mount /models via NFS #
Each worker mounts the shared area exported by AI-SRV at the same path /models.
This makes the cache path for all workers consistent, preventing duplicate model storage.
sudo mkdir -p /models
echo "10.10.0.10:/srv/gpustack_models /models nfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -aAfter mounting, verify with mount | grep /models that the intended shared location is visible.
Install GPUStack Worker #
Use the token issued on the AI-SRV side to have each GPU-VPS join the cluster as a worker.
curl -sfL https://get.gpustack.ai | sh -s - --server-url http://10.10.0.10 --token <WORKER_JOIN_TOKEN>Fix --cache-dir /models in worker service #
In shared cache configuration, this specification is most important. Add--cache-dir /models to the GPUStack Worker systemd definition and also fix the data-dir.
[Service]
ExecStart=/root/.local/bin/gpustack start --server-url http://10.10.0.10 --token <WORKER_JOIN_TOKEN> --cache-dir /models --data-dir /var/lib/gpustack-datasudo systemctl daemon-reload
sudo systemctl enable --now gpustack
# If "Using cache dir: /models" appears, OK
journalctl -u gpustack -fVerify workers are Ready in GPUStack GUI #
In the GUI Workers screen, verify all GPU-VPS nodes are Ready.
If they remain Not Ready, troubleshoot in order: GPU driver / CUDA, NFS mount, worker logs.
Model deployment and operational verification #
The completion criterion is not just workers becoming Ready.
Actually deploy a model from the GUI with 2 or more replicas, verify only the first node downloads,
and the 2nd and later nodes reuse the same shared file.
- 通过 10G 专用网络连接 Bestnet Cloud × GPU-VPS, 实现 GPUStack 集群的步骤
- 本文涉及范围
- 将 Bestnet Cloud 知识映射到实际构建顺序
- 客户端门户安全设置
- SSH 密钥注册·管理
- 从模板创建服务器
- 专用网络
- 防火墙设置变更
- 快照管理
- 云资源升级
- 虚拟路由器 NAT
- 架构思路
- control-01
- gpu-worker-01 / 02
- 将存储集中到 AI-SRV 侧
- 保持 GPU-VPS 精简
- 将通信集中到 private side
- 需要提前确定的项目
- 设计时的检查要点
- 存储估算的考虑方式
- 优先固定客户端门户侧的准备工作
- 先整理好客户端门户本身的防护
- 在创建服务器之前先注册 SSH 密钥
- 在 Bestnet Cloud 上从模板创建 AI-SRV
- 在 GPU-VPS 侧创建所需数量的 worker
- 将所有节点加入 10G 私有网络
- 确定防火墙的初始规则
- 获取首次快照
- 构建 OS 侧的基准线
- 在 AI-SRV 上需要统一的内容
- GPU Worker 上需要准备的内容
- 在 AI-SRV 上搭建 GPUStack Server + NFS
- 安装 GPUStack Server
- 创建 NFS 共享目录
- 发行 worker 加入令牌
- 在 GPU-VPS 侧挂载共享 cache 并加入 worker
- 首先 NFS 挂载 /models
- 安装 GPUStack Worker
- 在 worker 服务中固定 --cache-dir /models
- 在 GPUStack GUI 中确认 worker 变为 Ready
- 模型部署与运行确认
- 从 Catalog 部署
- 第一台 worker 获取
- 第二台读取共享文件
- 更换时重新 Deploy
- 此配置可实现的成本优化
- 运维要点
- 增加 worker 时
- 容量吃紧时
- 变更前的回退点
- 监控与故障排除
- GPUStack 日志
- 模型下载进度
- NFS 状态确认
- 将 Bestnet Cloud 作为控制系统、GPU VPS 作为执行系统进行分离,可以使 GPU 集群更易于管理