HPC & ML Infrastructure Engineering

Building a real HPC cluster from scratch?

Hardware sourcing, OS install, Slurm, networking, and GPU upgrades
A full build series with a $1,264 budget cluster


Not sure where to start?

🔧 Build Your Own Cluster

Hardware, upgrades, networking, and full stack setup from $0 to working HPC

> HPC From Scratch Series

🎓 New to HPC

SSH, module systems, and Slurm fundamentals step by step

> HPC 101 Series

🐧 New to Linux/Terminal

Overcome terminal anxiety, master essential commands

> Linux 101 Series

📺 Latest Video: Building Real HPC on a Budget


📰 Recent Posts

Post Good for you if…
HPC From Scratch 02: RAM, NVMe, and the iGPU Trap Upgrading budget HPC nodes
HPC From Scratch 01: Build Your First HPC With Less Than $1300 You want to build your own HPC cluster
Special Topic: Cloud Storage Transferring files on cloud storage
Lesson 4: Slurm Job Debugging Your job is stuck in PENDING
Linux 101: Don’t Fear the Terminal The black screen intimidates you

About Me

Hi, I’m Will Paik. Welcome to The Login Node.

I specialize in scaling AI/ML models on High-Performance Computing (HPC) systems. In supercomputing, there’s always a natural tension between system administrators (“Keep it stable!”) and researchers (“Run it faster!”). My job is to find the technical sweet spot that makes both of them happy.

Currently, I work as an HPC Machine Learning Performance Engineer. By day, I optimize large-scale clusters for training massive AI models. At night, I build (and occasionally break) my own mini-supercomputer to teach you how it all works.

CORE STACK: Slurm Linux Docker/Apptainer PyTorch Distributed Ansible

Cluster Setup
"Function over Form. The physical cluster building process documented on The Login Node."

My Home Cluster

“If you can’t log in, you can’t compute.”

Hardware Specs (click to expand)
Role Hardware Model Specs
Login Node Lenovo IdeaPad 1 Ryzen 5 7520U, 8GB RAM
Management Lenovo ThinkCentre M715q Ryzen 5 2400GE, 16GB RAM
Visualization Lenovo ThinkCentre M715q Ryzen 5 2400GE, 16GB RAM
Worker Nodes (x2) Lenovo ThinkCentre M715q Ryzen 5 2400GE, 16GB RAM
GPU Node HP Envy TE01 Core i7-10700F, 32GB RAM
GTX 1660 Super (6GB)
Storage (Shared via Mgmt) 1TB NVMe SSD (NFS Share)
Network Gigabit Managed Switch 8-Port, VLAN Support
Software Stack (click to expand)
  • OS: Rocky Linux 10
  • Scheduler: Slurm 25
  • Provisioning: Ansible
  • Container: Apptainer
  • Monitoring: Prometheus + Grafana (In Progress)