[HPC From Scratch] Episode 4: NFS Storage & FreeIPA: One Drive, One Login

17 minute read

Published:

One drive. One login. Every node sees the same home directory.

Welcome back to HPC From Scratch. In Episode 3, we set up the network, installed Rocky Linux on all six nodes, configured DHCP and NAT, and hardened SSH. The cluster is networked and secured. Now it needs two things before Slurm makes any sense: shared storage and centralized authentication.

Without these two pieces, you are manually copying files to every node and creating the same user account six times. This episode fixes both problems.

(Click the image to watch the tutorial on YouTube)

Table of Contents


> 1. Why Shared Storage Matters

Without NFS, submitting an MPI job across two nodes means your input data has to exist on both nodes. You either copy it manually or write a script to sync it. Neither is sustainable.

With NFS, the Samsung 990 Pro on arbiter (the management node) exports a single /home directory. Every node in the cluster mounts it. Write a script on the login node, run it from any compute node. The file is already there.

NFS and FreeIPA diagram

This also matters for Slurm. When a job writes output files, they land in /home on the NFS share. You do not need to SSH into compute nodes to retrieve results.

Prerequisites

Before starting this episode:

  • All nodes are running Rocky Linux 9 with network configured (Episode 3)
  • arbiter has the Samsung 990 Pro NVMe drive installed (Episode 2)
  • SSH key-based login is working from arbiter to all other nodes


> 2. Ansible Setup

From this episode onward, we use Ansible to apply configuration across all nodes at once. Without it, every change means SSHing into six machines individually.

Ansible runs from arbiter. We keep it in /opt/ansible rather than a home directory so it stays off the NFS share. Ansible configuration files contain SSH keys and vault passwords that should not be visible to every node in the cluster.

Install Ansible

[wpaik@arbiter ~]$ sudo dnf install ansible-core
[wpaik@arbiter ~]$ sudo mkdir -p /opt/ansible
[wpaik@arbiter ~]$ sudo chown wpaik:wpaik /opt/ansible
[wpaik@arbiter ~]$ cd /opt/ansible

SSH Key

Generate a dedicated key for Ansible and distribute it to all nodes:

[wpaik@arbiter ansible]$ mkdir .ssh
[wpaik@arbiter ansible]$ ssh-keygen -t ed25519 -f .ssh/worker_ed25519 -N ""

[wpaik@arbiter ansible]$ for node in 192.168.50.1 192.168.50.15 192.168.50.32 192.168.50.11 192.168.50.19; do
    ssh-copy-id -i .ssh/worker_ed25519.pub wpaik@$node
  done

Inventory and Config

Create hosts.ini:

[head]
carrier.cluster.local ansible_host=192.168.50.1

[management]
arbiter.cluster.local ansible_host=192.168.50.50 ansible_connection=local

[workers]
interceptor-01.cluster.local ansible_host=192.168.50.15
interceptor-02.cluster.local ansible_host=192.168.50.32

[gpu]
corsair-01.cluster.local ansible_host=192.168.50.11

[visualization]
observer.cluster.local ansible_host=192.168.50.19

[compute:children]
workers
gpu

[all_nodes:children]
head
management
workers
gpu
visualization

[all_nodes:vars]
ansible_user=wpaik
cluster_network=192.168.50.0/24
cluster_domain=cluster.local
cluster_realm=CLUSTER.LOCAL

Note that arbiter uses ansible_connection=local since it is the Ansible controller itself.

Create ansible.cfg:

[defaults]
private_key_file    = /opt/ansible/.ssh/worker_ed25519
inventory           = ./hosts.ini
host_key_checking   = False
log_path            = ./log/ansible.log
vault_password_file = /opt/ansible/.ansible_vault_pw

Verify connectivity:

[wpaik@arbiter ansible]$ ansible all -m ping
carrier.cluster.local | SUCCESS => { "ping": "pong" }
arbiter.cluster.local | SUCCESS => { "ping": "pong" }
interceptor-01.cluster.local | SUCCESS => { "ping": "pong" }
interceptor-02.cluster.local | SUCCESS => { "ping": "pong" }
corsair-01.cluster.local | SUCCESS => { "ping": "pong" }
observer.cluster.local | SUCCESS => { "ping": "pong" }

All six nodes responding. From here on, playbooks handle the repetitive work.


> 3. NFS Server Setup

All commands in this section run on arbiter.

Partition the NVMe Drive with LVM

A single large partition works, but LVM gives us the flexibility to allocate separate volumes for home directories, work storage, shared software, and scratch space. This mirrors how storage is typically organized on a real HPC cluster.

First, verify the NVMe drive:

[wpaik@arbiter ~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 223.6G  0 disk
├─sda1   8:1    0   600M  0 part /boot/efi
├─sda2   8:2    0     1G  0 part /boot
└─sda3   8:3    0   222G  0 part
  ├─rl-root  253:0  0  70G  0 lvm  /
  └─rl-swap  253:1  0 7.7G  0 lvm  [SWAP]
nvme0n1  259:0  0 931.5G  0 disk

The SATA boot drive is sda. The NVMe is nvme0n1. Create a physical volume, volume group, and four logical volumes:

# Install LVM tools
$ sudo dnf install -y lvm2

# Create physical volume and volume group
$ sudo pvcreate /dev/nvme0n1
$ sudo vgcreate vg_nfs /dev/nvme0n1

# Create logical volumes
$ sudo lvcreate -L 167G -n lv_home    vg_nfs
$ sudo lvcreate -L 251G -n lv_work    vg_nfs
$ sudo lvcreate -L  84G -n lv_shared  vg_nfs
$ sudo lvcreate -L 251G -n lv_scratch vg_nfs

# Format as XFS
$ sudo mkfs.xfs /dev/vg_nfs/lv_home
$ sudo mkfs.xfs /dev/vg_nfs/lv_work
$ sudo mkfs.xfs /dev/vg_nfs/lv_shared
$ sudo mkfs.xfs /dev/vg_nfs/lv_scratch

Create mount points and mount:

$ sudo mkdir -p /nfsdata/{home,work,shared,scratch}

$ sudo mount /dev/vg_nfs/lv_home    /nfsdata/home
$ sudo mount /dev/vg_nfs/lv_work    /nfsdata/work
$ sudo mount /dev/vg_nfs/lv_shared  /nfsdata/shared
$ sudo mount /dev/vg_nfs/lv_scratch /nfsdata/scratch

Add to /etc/fstab for persistence:

$ echo '/dev/vg_nfs/lv_home    /nfsdata/home    xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_work    /nfsdata/work    xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_shared  /nfsdata/shared  xfs defaults 0 0' | sudo tee -a /etc/fstab
$ echo '/dev/vg_nfs/lv_scratch /nfsdata/scratch xfs defaults 0 0' | sudo tee -a /etc/fstab

Bind mount /nfsdata/home to /home on arbiter itself, so the management node also uses the NFS storage:

$ echo '/nfsdata/home /home none bind 0 0' | sudo tee -a /etc/fstab
$ sudo mount -a

Verify the final layout:

[wpaik@arbiter ~]$ lsblk
NAME                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                     8:0    0 223.6G  0 disk
├─sda1                  8:1    0   600M  0 part /boot/efi
├─sda2                  8:2    0     1G  0 part /boot
└─sda3                  8:3    0   222G  0 part
  ├─rl-root           253:0    0    70G  0 lvm  /
  ├─rl-swap           253:1    0   7.7G  0 lvm  [SWAP]
  └─rl-home           253:6    0 144.3G  0 lvm
nvme0n1               259:0    0 931.5G  0 disk
├─vg_nfs-lv_home      253:2    0   167G  0 lvm  /home
│                                               /nfsdata/home
├─vg_nfs-lv_work      253:3    0   251G  0 lvm  /nfsdata/work
├─vg_nfs-lv_shared    253:4    0    84G  0 lvm  /nfsdata/shared
└─vg_nfs-lv_scratch   253:5    0   251G  0 lvm  /nfsdata/scratch

The bind mount makes lv_home appear twice: once at /nfsdata/home (the actual mount point) and once at /home (the bind mount that arbiter itself uses). The other three volumes only mount at their /nfsdata paths on arbiter. Client nodes will mount them at /work, /shared, and /scratch via NFS.

Configure the NFS Server

$ sudo dnf install -y nfs-utils
$ sudo systemctl enable --now nfs-server

Configure /etc/exports:

/nfsdata/home    192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/work    192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/shared  192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)
/nfsdata/scratch 192.168.50.0/24(rw,sync,no_root_squash,no_subtree_check)

A quick note on the options: rw allows read and write, sync commits writes to disk before responding (safer), no_subtree_check avoids a performance penalty when exporting subdirectories, and no_root_squash lets root on client nodes act as root on the share, which Slurm will need later.

Note on no_root_squash: This is appropriate for a trusted internal cluster network. Our cluster is physically isolated on the 192.168.50.x subnet. On a shared cluster with untrusted users, use root_squash instead.

Apply and open the firewall:

$ sudo exportfs -ra
$ sudo firewall-cmd --permanent --add-service={nfs,rpc-bind,mountd}
$ sudo firewall-cmd --reload

# Verify
$ sudo showmount -e localhost
Export list for localhost:
/nfsdata/scratch 192.168.50.0/24
/nfsdata/shared  192.168.50.0/24
/nfsdata/work    192.168.50.0/24
/nfsdata/home    192.168.50.0/24


> 4. NFS Client Setup

Rather than SSHing into each node manually, use Ansible. Run from /opt/ansible on arbiter:

[wpaik@arbiter ansible]$ ansible-playbook playbooks/nfs_setup.yaml -K

What the playbook does on each client node: installs nfs-utils, sets the SELinux boolean for NFS home directories, creates mount points for /work, /shared, and /scratch, adds all four NFS mounts to /etc/fstab with _netdev, and mounts them.

The _netdev option tells the system to wait for network availability before mounting. Without it, a node that boots faster than arbiter will fail to mount and potentially hang at boot.

The playbook also enables XFS quota on arbiter and reboots it to apply. This is covered in the full playbook in the GitHub repository.

Verify from carrier after rebooting:

[wpaik@carrier ~]$ df -h
Filesystem                              Size  Used Avail Use% Mounted on
/dev/mapper/rl-root                      70G  5.4G   65G   8% /
arbiter.cluster.local:/nfsdata/home     167G  8.2G  159G   5% /home
arbiter.cluster.local:/nfsdata/work     251G  4.9G  247G   2% /work
arbiter.cluster.local:/nfsdata/shared    84G   23G   62G  27% /shared
arbiter.cluster.local:/nfsdata/scratch  251G   22G  230G   9% /scratch

Note: The playbook reboots worker and GPU nodes automatically. carrier (the head node) requires a manual reboot after the playbook completes since it is the SSH entry point into the cluster. After rebooting carrier, verify mounts with df -h.

Before moving on to FreeIPA, run the Chrony playbook to synchronize time across all nodes:

[wpaik@arbiter ansible]$ ansible-playbook playbooks/chrony_setup.yaml -K

This sets up carrier as the NTP server for the cluster and configures all other nodes to sync from it. FreeIPA uses Kerberos for authentication, and Kerberos will reject tickets if the time difference between nodes exceeds 5 minutes. Running Chrony before FreeIPA avoids that problem.

Test that the share works:

# Create a test file from interceptor-01
[wpaik@interceptor-01 ~]$ touch /home/nfs_test.txt

# Verify it appears on interceptor-02
[wpaik@interceptor-02 ~]$ ls /home/nfs_test.txt
/home/nfs_test.txt

One file, visible everywhere.


> 5. Time Synchronization (Chrony)

Before setting up FreeIPA, all nodes need to be synchronized to the same time source. FreeIPA uses Kerberos for authentication, and Kerberos will reject tickets if the clock difference between nodes exceeds 5 minutes. On a fresh cluster this is usually fine, but it is better to set it up explicitly.

carrier acts as the NTP server for the cluster. It syncs from external sources (time.cloudflare.com, pool.ntp.org) and serves time to all internal nodes. The other nodes sync from carrier.

[wpaik@arbiter ansible]$ ansible-playbook playbooks/chrony_setup.yaml -K

Verify sync status on any node after the playbook completes:

$ chronyc tracking
Reference ID    : C0A83201 (carrier.cluster.local)
Stratum         : 3
System time     : 0.000123456 seconds fast of NTP time
Last offset     : +0.000045678 seconds
RMS offset      : 0.000089012 seconds

Reference ID pointing to carrier.cluster.local confirms the node is syncing from carrier.


> 6. The Problem with Local Users

NFS solves the file sharing problem. But it creates a new one.

NFS uses UID (User ID) and GID (Group ID) numbers to handle file permissions, not usernames. When user will on interceptor-01 has UID 1001, and user will on interceptor-02 has UID 1002 (because you created the accounts in a different order), they see different permissions on the same NFS files.

# On interceptor-01
$ id will
uid=1001(will) gid=1001(will)

# On interceptor-02
$ id will
uid=1002(will) gid=1002(will)

# The NFS file owned by will on interceptor-01 (uid=1001)
# looks like it belongs to a different user on interceptor-02

You can work around this by manually synchronizing UIDs across every node. On a six-node cluster with a few users, that is tedious but manageable. On a real cluster with hundreds of users, it is not viable.

The proper solution is centralized authentication: one place where user accounts are defined, and every node pulls from that source. This is what FreeIPA provides.


> 7. FreeIPA Server Installation

FreeIPA bundles several services into one package: LDAP (directory), Kerberos (authentication), DNS, and a certificate authority. The installation is opinionated and sets everything up together.

All commands in this section run on arbiter.

Prerequisites

FreeIPA requires a fully qualified domain name (FQDN). Verify it resolves correctly before proceeding:

[wpaik@arbiter ~]$ hostname -f
arbiter.cluster.local

[wpaik@arbiter ~]$ ping -c 1 arbiter.cluster.local
PING arbiter.cluster.local (192.168.50.50) 56(84) bytes of data.

Also verify at least 1.5GB of free RAM. The installer is memory-hungry:

$ free -h
              total        used        free
Mem:           15Gi        800Mi       14Gi

Install and Run the Server Setup

$ sudo dnf install -y freeipa-server freeipa-server-dns

$ sudo ipa-server-install \
  --domain=cluster.local \
  --realm=CLUSTER.LOCAL \
  --ds-password=<your_directory_manager_password> \
  --admin-password=<your_admin_password> \
  --hostname=arbiter.cluster.local \
  --ip-address=192.168.50.50 \
  --no-ntp \
  --unattended

A few things to note: --realm must be uppercase, --no-ntp skips NTP configuration since we manage time sync with Chrony separately, and --unattended skips interactive prompts. The installer takes 5-10 minutes and configures LDAP, Kerberos, and the CA.

After completion, open the required firewall ports:

$ sudo firewall-cmd --permanent --add-service={freeipa-ldap,freeipa-ldaps,kerberos,dns,http,https}
$ sudo firewall-cmd --reload

Verify the Installation

$ kinit admin
Password for [email protected]:

$ klist
Ticket cache: KCM:0
Default principal: [email protected]

Valid starting     Expires            Service principal
04/27/26 09:00:00  04/28/26 09:00:00  krbtgt/[email protected]

$ ipa user-find
---------------
0 users matched
---------------

No users yet. We will add them after enrollment.

Set the default shell to bash (the FreeIPA default is /bin/sh):

$ ipa config-mod --defaultshell=/bin/bash


> 8. FreeIPA Client Enrollment

Before enrolling, add arbiter to /etc/hosts on every node. The enrollment process needs to resolve arbiter.cluster.local, and at this point SSSD is not yet configured. Doing this beforehand ensures enrollment does not fail on DNS resolution.

The Ansible playbook handles this automatically:

[wpaik@arbiter ansible]$ ansible-playbook playbooks/freeipa_setup.yaml -K

If you prefer to do it manually on each node:

# Add arbiter to /etc/hosts
$ echo "192.168.50.50 arbiter.cluster.local arbiter" | sudo tee -a /etc/hosts

# Install and enroll
$ sudo dnf install -y freeipa-client oddjob-mkhomedir

$ sudo ipa-client-install \
  --server=arbiter.cluster.local \
  --domain=cluster.local \
  --realm=CLUSTER.LOCAL \
  --principal=admin \
  --password=<your_admin_password> \
  --mkhomedir \
  --no-ntp \
  --unattended

The --mkhomedir flag tells the system to create a home directory on first login. Since /home is NFS-mounted from arbiter, the directory lands on the NFS share and is immediately visible from all nodes.

After enrollment, confirm each node can reach the IPA server:

[wpaik@interceptor-01 ~]$ ipa user-find
---------------
0 users matched
---------------

If this returns a response (even 0 users), the client is enrolled and talking to the server.

Create a Test User

Back on arbiter:

[wpaik@arbiter ~]$ kinit admin

$ ipa user-add testuser \
  --first=Test \
  --last=User \
  --password

$ ipa user-find testuser
--------------
1 user matched
--------------
  User login: testuser
  First name: Test
  Last name: User
  Home directory: /home/testuser
  Login shell: /bin/bash
  UID: 99100XXXX
  GID: 99100XXXX

Notice the UID range. FreeIPA assigns UIDs starting well above the range used by local system accounts, avoiding any collision. The exact starting range depends on how FreeIPA was configured during installation, but whatever it assigns will be identical on every node in the cluster.

For ongoing user management, the scripts/user_creation.sh script in the GitHub repository handles the full process: FreeIPA account creation, home directory setup with correct NFS ownership, XFS quota, and Slurm accounting entry.

Accessing the FreeIPA Web UI

The FreeIPA web interface is reachable from outside the cluster using sshuttle, a VPN-over-SSH tool that routes traffic through the login node.

On your local machine:

# Install sshuttle
$ sudo dnf install sshuttle    # Fedora/RHEL
# or: pip install sshuttle

# Add arbiter to your local /etc/hosts
$ echo "192.168.50.50 arbiter arbiter.cluster.local" | sudo tee -a /etc/hosts

# Open the tunnel (keep this terminal open)
$ sshuttle -r [email protected] 192.168.50.0/24 --dns

Then open a browser and go to https://arbiter.cluster.local/ipa/ui/. Accept the self-signed certificate warning and log in with the admin credentials.


> 9. Verification

SSH as the new user from the login node to a compute node:

[wpaik@carrier ~]$ ssh testuser@interceptor-01
Password:
Creating home directory for testuser.

[testuser@interceptor-01 ~]$ pwd
/home/testuser

[testuser@interceptor-01 ~]$ id
uid=99100XXXX(testuser) gid=99100XXXX(testuser) groups=99100XXXX(testuser)

Now check the same user from a different node:

[testuser@interceptor-02 ~]$ id
uid=99100XXXX(testuser) gid=99100XXXX(testuser) groups=99100XXXX(testuser)

Same UID on both nodes. Files written on interceptor-01 have correct permissions on interceptor-02. The home directory is the same NFS path regardless of which node you land on.

One account. Every node. One home directory.

Troubleshooting Common Issues

Enrollment fails with DNS error: The playbook adds arbiter.cluster.local to /etc/hosts before enrollment. If it still fails, verify the entry exists on the failing node:

$ getent hosts arbiter.cluster.local
192.168.50.50   arbiter.cluster.local arbiter

If missing, add it manually:

$ echo "192.168.50.50 arbiter.cluster.local arbiter" | sudo tee -a /etc/hosts

NFS mount fails after FreeIPA enrollment: FreeIPA updates /etc/nsswitch.conf. Confirm files appears before sss for passwd and group:

$ grep -E "^(passwd|group)" /etc/nsswitch.conf
passwd:     sss files systemd
group:      sss files systemd

If NFS mounts hang after enrollment:

$ sudo setsebool -P use_nfs_home_dirs 1

Home directory not created on first login:

$ sudo systemctl enable --now oddjobd

Node freezes on boot after NFS setup: A stale resume=UUID in GRUB can cause boot hangs. From the GRUB menu, press e, remove the resume=UUID=... argument, then Ctrl+X to boot. Once up:

$ grubby --update-kernel=ALL --remove-args="resume=UUID=<UUID>"


> 10. What is Next

The cluster now has shared storage and centralized authentication. Every node shares the same home directory and every user has a consistent identity across all nodes.

Next episode we install Slurm, the job scheduler. With NFS and FreeIPA already in place, Slurm has everything it needs to schedule jobs across nodes and write output files back to a shared location.

All configuration files and Ansible playbooks from this episode are in the GitHub repository.


Happy Computing!