Practical Guide: IBM Spectrum Scale (GPFS) ECE Deployment
Abstract: This document is based on real-world production implementation records, detailing how to build a highly reliable and high-performance parallel file system on x86 servers using Erasure Code Edition (ECE).
1. Architecture Planning & Prerequisites
ECE (Erasure Code Edition) allows building data protection on commodity servers using distributed erasure coding, eliminating the need for expensive proprietary storage arrays.
1.1 Hardware Configuration Example
- Storage Nodes: 4x High-Performance Servers (NVMe SSDs for data tier, HDD for capacity tier).
- Network: InfiniBand (IB) HDR/NDR for data plane, Gigabit Ethernet for management plane.
- OS: CentOS 8.x / RHEL 8.x
1.2 Environment Initialization (All Nodes)
1. Disable Firewall & SELinux
systemctl stop firewalld && systemctl disable firewalld
sed -i "s/^SELINUX=.*/SELINUX=disabled/g" /etc/sysconfig/selinux
setenforce 02. Configure Trust & Hosts Ensure /etc/hosts contains all node resolutions and configure root passwordless SSH.
# Distribute keys
ssh-copy-id gpfs01; ssh-copy-id gpfs02; ssh-copy-id gpfs03; ssh-copy-id gpfs043. Optimize System Parameters
echo "ulimit -n 65536" >> /etc/profile
# Essential Dependencies
yum -y install gcc-c++ kernel-devel cpp binutils compat-libstdc++-33
yum -y install python3 python3-distro ansible nvme-cli sg3_utils net-snmp2. InfiniBand Network Configuration
The core of high-performance storage lies in low-latency networking.
1. Upgrade Firmware (MFT Tool)
# Start MFT service
mst start
# Burn firmware
flint -d /dev/mst/mt4123_pciconf0 -i fw-ConnectX6-rel.bin burn
# Reset device
mlxfwreset --device /dev/mst/mt4123_pciconf0 reset2. Install OFED Driver
./mlnxofedinstall --force
/etc/init.d/openibd restart
ibstat # Ensure state is LinkUp3. Spectrum Scale Software Installation
Using Ansible Toolkit for automated deployment.
1. Initialize Cluster
cd /usr/lpp/mmfs/5.1.5.1/ansible-toolkit
# Setup primary installer node
./spectrumscale setup -s 10.252.0.21 -st ece2. Add Nodes
# Add 4 storage nodes (Quorum + Manager)
./spectrumscale node add -a -q -m -so gpfs01
./spectrumscale node add -a -q -m -so gpfs02
./spectrumscale node add -a -q -m -so gpfs03
./spectrumscale node add -a -q -m -so gpfs043. Execute Installation
./spectrumscale install --skip no-ece-check
./spectrumscale deploy --skip no-ece-check4. ECE Storage Core Configuration
This is the critical step defining how disks are sliced and protected.
4.1 Specify RDMA Network
# Force communication over IB ports
mmchconfig verbsPorts="mlx5_0" -N ece_cluster
mmshutdown -a && mmstartup -a4.2 Drive Mapping
ECE needs to know the exact physical slot of each drive.
# Auto-scan NVMe
ecedrivemapping --mode nvme
# Or manually specify HDD Slots
ecedrivemapping --mode lmr --slot-range 0-234.3 Create Recovery Group (RG)
# Create Node Class
mmvdisk nodeclass create --node-class nc_1 -N gpfs01,gpfs02,gpfs03,gpfs04
# Configure Servers
mmvdisk server configure --node-class nc_1 --recycle-one
# Create RG
mmvdisk recoverygroup create --recovery-group rg_1 --node-class nc_1 --complete-log-format4.4 Define VDisk Sets
Using 4+2P (4 Data + 2 Parity) Erasure Coding strategy.
# 1. Metadata Tier (3-Way Replication)
mmvdisk vdiskset define --vdisk-set vs-meta --recovery-group rg_1 --code 3Way --block-size 1m --nsd-usage metadataonly --storage-pool system --set-size 2%
# 2. Data Tier (4+2P)
mmvdisk vdiskset define --vdisk-set vs-data --recovery-group rg_1 --code 4+2p --block-size 8m --nsd-usage dataonly --storage-pool data-pool --set-size 90%
# 3. Create
mmvdisk vdiskset create --vdisk-set vs-meta
mmvdisk vdiskset create --vdisk-set vs-data4.5 Create File System
mmvdisk filesystem create --file-system gpfs01 --vdisk-set vs-meta
mmvdisk filesystem add --file-system gpfs01 --vdisk-set vs-data
mmchfs gpfs01 -T /gpfs01
mmmount gpfs01 -a5. Client Mounting & Testing
5.1 Client Installation
Clients do not need ECE licenses, only Client licenses.
# Install packages
dpkg -i gpfs.base*.deb gpfs.gpl*.deb gpfs.msg.en*.deb ...
# Build kernel module
mmbuildgpl5.2 Join Cluster
Execute on Manager Node:
mmaddnode -N client01
mmchlicense client --accept -N client015.3 Performance Test (IOR)
# Sequential Write (16GB file, 4MB block size)
mpirun -np 16 --hostfile hosts ior -w -t 4m -b 16g -F -o /gpfs01/test6. Maintenance Commands
- Check Health:
mmgetstate -a - Check Physical Disks:
mmvdisk pdisk list --recovery-group rg_1 - Check NSD Distribution:
mmlsnsd -M
