Deep Dive: Enterprise BeeGFS Deployment & Tuning Guide

Abstract: BeeGFS (formerly FhGFS) is a widely used parallel file system in the High-Performance Computing (HPC) domain. Compared to Lustre, it offers significant advantages in lightweight architecture, ease of management, and concurrency handling for small files. This document, based on large-scale production delivery experience, details how to build a BeeGFS cluster supporting PB-scale capacity and Tbps-scale throughput on standard x86 servers.

1. Architecture Design Philosophy

When designing high-performance storage systems, we are not just installing software; we are designing the path of data flow.

1.1 Core Component Logic

BeeGFS adopts a decoupled architecture, mainly consisting of four services:

Management Service (Mgmtd):
- Role: The "Registry Center" of the cluster, maintaining the status and ID mapping of all service nodes.
- Characteristic: Extremely low load, but critical. If it goes down, the cluster cannot accept new connections (existing connections may persist).
Metadata Service (Meta):
- Role: Stores the directory tree, permissions, attributes, and data stripe location information.
- Bottleneck: In Massive Small File (LOSF) scenarios, Meta IOPS is the core bottleneck. NVMe SSDs are strongly recommended.
Storage Service (Storage):
- Role: Stores actual data chunks.
- Strategy: Data is sliced into fixed-size chunks (default 512KB) and distributed across different Storage Targets.
Client Service (Client):
- Role: Runs on compute nodes, loaded as a kernel module, mapping distributed storage resources to a local POSIX mount point.

1.2 Advanced Architecture: Single-Service Multi-Instance (Multi-Mode)

In modern servers, a single process often cannot saturate PCIe 4.0/5.0 or 100Gb+ network bandwidth. To maximize performance, we recommend the "Single Node Multi-Instance" deployment mode:

Principle: Start multiple `beegfs-meta` or `beegfs-storage` processes on the same physical machine.
Advantages:
- NUMA Affinity: Bind different instances to different CPU NUMA nodes to reduce cross-socket memory access.
- Concurrent Queues: Increase the processing queues for network requests to saturate network card bandwidth.
Planning Example: A server configured with 2 NVMe drives for Meta and 2 RAID6 groups for Storage. Deploy 2 Meta instances and 2 Storage instances.

2. Infrastructure & Environment Preparation

2.1 Hardware Selection Suggestions

Metadata Node (MDS):
- CPU: High frequency, fewer cores (Meta operations are sensitive to single-core frequency).
- Disk: Must use SSD/NVMe. RAID1 for OS, RAID1/10 for Meta data.
Storage Node (OSS):
- CPU: Many cores (to handle massive concurrent I/O requests).
- Disk: Large capacity HDD (RAID6 10+2 or 16+2) or All-Flash. RAID controller must have Super Capacitor, Cache policy set to `Always Write Back`.
Network:
- Management Plane: 1GbE/10GbE TCP.
- Data Plane: InfiniBand (EDR/HDR/NDR) or RoCEv2 (100G/200G/400G).

2.2 OS Tuning (Critical)

Execute the following operations on all storage nodes to reduce system jitter.

1. Disable System Interference

bash

# Stop Firewall and NetworkManager
systemctl stop firewalld && systemctl disable firewalld
systemctl stop NetworkManager && systemctl disable NetworkManager

# Disable SELinux
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
setenforce 0

2. I/O Scheduler Optimization For SSD/NVMe, use `noop` or `none`; for HDD RAID, use `deadline`.

bash

# Example: Set sdb (SSD) to none
echo none > /sys/block/sdb/queue/scheduler

3. Virtual Memory Parameters Reduce swap usage tendency to avoid swapping out from Buffer Cache.

bash

sysctl -w vm.swappiness=1
sysctl -w vm.vfs_cache_pressure=100
echo "vm.swappiness=1" >> /etc/sysctl.conf

4. Prepare Yum Repos Ensure `beegfs-mgmtd`, `beegfs-meta`, `beegfs-storage`, `beegfs-client`, `beegfs-helperd`, `beegfs-utils` packages are installed.

3. Deployment: Multi-Instance Mode

Assume the physical machine hostname is `storage01`, planned as follows:

`/dev/nvme0n1` (2TB): 10GB for Mgmtd, remainder for Meta Instance 1.
`/dev/nvme1n1` (2TB): All for Meta Instance 2.
`/dev/sdc` (RAID6): For Storage Instance 1.
`/dev/sdd` (RAID6): For Storage Instance 2.

3.1 Management Service (Mgmtd) Deployment

bash

# 1. Format and Mount
parted -s /dev/nvme0n1 mklabel gpt mkpart primary 0% 10GB
mkfs.ext4 /dev/nvme0n1p1
mkdir -p /data/beegfs/mgmtd
mount /dev/nvme0n1p1 /data/beegfs/mgmtd

# 2. Initialize Service
/opt/beegfs/sbin/beegfs-setup-mgmtd -p /data/beegfs/mgmtd -f

# 3. Start and Enable
systemctl enable beegfs-mgmtd --now

3.2 Metadata Service (Meta) - Multi-Instance

Instance 1 (Meta01):

bash

# 1. Format (Recommend ext4 for small file performance, large inode)
mkfs.ext4 -i 2048 -I 512 -J size=400 -Odir_index,filetype /dev/nvme0n1p2
mkdir -p /data/beegfs/meta01
mount -o noatime,nodiratime,nobarrier /dev/nvme0n1p2 /data/beegfs/meta01

# 2. Initialize (Specify ServiceID=1, Port=8200)
/opt/beegfs/sbin/beegfs-setup-meta -p /data/beegfs/meta01 -s 1 -S meta01 -m YOUR_MGMT_IP -f

# 3. Modify Port (Critical: Avoid Conflict)
sed -i 's/connMetaPortTCP = 8005/connMetaPortTCP = 8200/g' /etc/beegfs/meta01.d/beegfs-meta.conf
sed -i 's/connMetaPortUDP = 8005/connMetaPortUDP = 8200/g' /etc/beegfs/meta01.d/beegfs-meta.conf

# 4. Start
systemctl enable beegfs-meta@meta01 --now

Instance 2 (Meta02): Repeat steps using `/dev/nvme1n1`, mount point `/data/beegfs/meta02`, ServiceID=`2`, Port=`8201`.

3.3 Storage Service (Storage) - Multi-Instance

XFS is recommended for data disks due to better performance with large files and parallel I/O.

Instance 1 (Stor01):

bash

# 1. Format XFS (Optimize RAID stripe alignment, assume stripe width 128k, 10 data disks)
mkfs.xfs -d su=128k,sw=10 -l version=2,su=128k -isize=512 /dev/sdc -f

# 2. Mount (High Performance Params)
mkdir -p /data/beegfs/stor01
mount -o noatime,nodiratime,nobarrier,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=131072k /dev/sdc /data/beegfs/stor01

# 3. Initialize (TargetID=101, Port=8300)
/opt/beegfs/sbin/beegfs-setup-storage -p /data/beegfs/stor01 -s 1 -S stor01 -i 101 -m YOUR_MGMT_IP -f

# 4. Modify Port
sed -i 's/connStoragePortTCP = 8003/connStoragePortTCP = 8300/g' /etc/beegfs/stor01.d/beegfs-storage.conf
sed -i 's/connStoragePortUDP = 8003/connStoragePortUDP = 8300/g' /etc/beegfs/stor01.d/beegfs-storage.conf

# 5. Start
systemctl enable beegfs-storage@stor01 --now

Instance 2 (Stor02): Repeat steps using `/dev/sdd`, mount point `/data/beegfs/stor02`, ServiceID=`2`, TargetID=`201`, Port=`8301`.

4. High-Performance Client Mounting

Client performance directly impacts AI training efficiency.

4.1 Enable RDMA (InfiniBand/RoCE)

Default installation only supports TCP. For IB environments, client modules must be rebuilt.

Edit Autobuild Config:

bash

vi /etc/beegfs/beegfs-client-autobuild.conf
# Modify as follows:
buildArgs=-j8 BEEGFS_OPENTK_IBVERBS=1 OFED_INCLUDE_PATH=/usr/src/ofa_kernel/default/include/

Execute Rebuild:

bash

/etc/init.d/beegfs-client rebuild

4.2 Mount & Verify

bash

# Initialize
/opt/beegfs/sbin/beegfs-setup-client -m YOUR_MGMT_IP

# Start Services
systemctl start beegfs-helperd
systemctl start beegfs-client

# Check Connection Topology
beegfs-net

5. Advanced Configuration

5.1 Buddy Mirror (High Availability)

BeeGFS Buddy Mirror provides software-based data redundancy (like RAID10). Even if a Storage Target completely fails, data is safe.

Metadata Mirror (Meta Mirror):

bash

# 1. Stop all Clients
systemctl stop beegfs-client

# 2. Create Mirror Group (Automatic Pairing)
beegfs-ctl --addmirrorgroup --automatic --nodetype=meta

# 3. Activate Mirror
beegfs-ctl --mirrormd

# 4. Restart Meta Services
systemctl restart beegfs-meta@meta01
systemctl restart beegfs-meta@meta02

Storage Mirror: Flexible; can be enabled for specific directories.

bash

# 1. Create Mirror Group (Pair ID 101 and 201)
beegfs-ctl --addmirrorgroup --nodetype=storage --primary=101 --secondary=201

# 2. Enable Mirror for Critical Directory
beegfs-ctl --setpattern --pattern=buddymirror /mnt/beegfs/critical_data

5.2 Quota Management

Prevents a single user from filling up the entire storage pool.

Steps:

Server: Set `quotaEnableEnforcement = true` in Mgmt/Meta/Storage configs.
Storage Mount: Add mount options `uqnoenforce,gqnoenforce`.
Client: Set `quotaEnabled = true` in `beegfs-client.conf`.
Init: Run `beegfs-fsck --enablequota`.

Set Limit:

bash

beegfs-ctl --setquota --uid user1 --sizelimit=10T --inodelimit=1000000

5.3 BeeOND (On-Demand Burst Buffer)

In AI training, there are many random small I/Os. BeeOND uses compute nodes' local memory or NVMe to build a temporary BeeGFS cluster as a Burst Buffer.

Start Command:

bash

# Build temp FS on node01-node10 using /local/nvme
beeond start -n nodefile -d /local/nvme -c /mnt/beeond -P

Data Warm-up:

bash

beeond-cp copy -n nodefile /mnt/beegfs/imagenet /mnt/beeond/imagenet

6. Operations & Troubleshooting

6.1 Quick Command Reference

Command	Function
`beegfs-df`	Check Target capacity and Inode usage
`beegfs-ctl --listtargets --state`	Check Target online status
`beegfs-check-servers`	Check connectivity of all services
`beegfs-net`	View current established RDMA/TCP connections
`beegfs-ctl --getentryinfo <file>`	View file stripe distribution info

6.2 Common Issues

Target Offline: Usually due to disk failure or network interruption. Check beegfs-storage.log. If disk is replaced, reuse the original TargetID for recovery.
Client Cannot Mount: Check if port 8008 from Client to Mgmtd is open. Check Kernel version mismatch (requires rebuild).
Low Performance: Check if TCP is used instead of RDMA (use beegfs-net 查看); Check for NUMA cross-node access.

01. Hardware & Chips

02. Cluster Architecture

03. Network (IB/RoCE)

04. Storage Systems

05. Automated Provisioning

06. Cloud & Scheduling

07. Heterogeneous Computing

08. AI Compiler

09. Frameworks

10. Pre-trained Models

11. Distributed Training

12. Inference Engines

13. Industry Apps

14. AI for Science

Deep Dive: Enterprise BeeGFS Deployment & Tuning Guide

1. Architecture Design Philosophy

1.1 Core Component Logic

1.2 Advanced Architecture: Single-Service Multi-Instance (Multi-Mode)

2. Infrastructure & Environment Preparation

2.1 Hardware Selection Suggestions

2.2 OS Tuning (Critical)

3. Deployment: Multi-Instance Mode

3.1 Management Service (Mgmtd) Deployment

3.2 Metadata Service (Meta) - Multi-Instance

3.3 Storage Service (Storage) - Multi-Instance

4. High-Performance Client Mounting

4.1 Enable RDMA (InfiniBand/RoCE)

4.2 Mount & Verify

5. Advanced Configuration

5.1 Buddy Mirror (High Availability)

5.2 Quota Management

5.3 BeeOND (On-Demand Burst Buffer)

6. Operations & Troubleshooting

6.1 Quick Command Reference

6.2 Common Issues

Deep Dive: Enterprise BeeGFS Deployment & Tuning Guide ​

1. Architecture Design Philosophy ​

1.1 Core Component Logic ​

1.2 Advanced Architecture: Single-Service Multi-Instance (Multi-Mode) ​

2. Infrastructure & Environment Preparation ​

2.1 Hardware Selection Suggestions ​

2.2 OS Tuning (Critical) ​

3. Deployment: Multi-Instance Mode ​

3.1 Management Service (Mgmtd) Deployment ​

3.2 Metadata Service (Meta) - Multi-Instance ​

3.3 Storage Service (Storage) - Multi-Instance ​

4. High-Performance Client Mounting ​

4.1 Enable RDMA (InfiniBand/RoCE) ​

4.2 Mount & Verify ​

5. Advanced Configuration ​

5.1 Buddy Mirror (High Availability) ​

5.2 Quota Management ​

5.3 BeeOND (On-Demand Burst Buffer) ​

6. Operations & Troubleshooting ​

6.1 Quick Command Reference ​

6.2 Common Issues ​

Deep Dive: Enterprise BeeGFS Deployment & Tuning Guide

1. Architecture Design Philosophy

1.1 Core Component Logic

1.2 Advanced Architecture: Single-Service Multi-Instance (Multi-Mode)

2. Infrastructure & Environment Preparation

2.1 Hardware Selection Suggestions

2.2 OS Tuning (Critical)

3. Deployment: Multi-Instance Mode

3.1 Management Service (Mgmtd) Deployment

3.2 Metadata Service (Meta) - Multi-Instance

3.3 Storage Service (Storage) - Multi-Instance

4. High-Performance Client Mounting

4.1 Enable RDMA (InfiniBand/RoCE)

4.2 Mount & Verify

5. Advanced Configuration

5.1 Buddy Mirror (High Availability)

5.2 Quota Management

5.3 BeeOND (On-Demand Burst Buffer)

6. Operations & Troubleshooting

6.1 Quick Command Reference

6.2 Common Issues