[HPC 101] Virtual Environments: How to Build Your Own Workspace
Published:
It looks like it worked. But you just created a hidden mess.
Welcome back to the HPC 101 series.
In the previous post, we learned how to transfer data. Now, you are ready to run your Python code. You log in, type pip install numpy, and hit Enter.
Unlike the old days, you might not see a “Permission Denied” error. Instead, you see this:
[user123@compute-node-01 ~]$ pip install numpy
Defaulting to user installation because normal site-packages is not writeable
Collecting numpy
Using cached numpy-2.3.5...
Installing collected packages: numpy
Successfully installed numpy-2.3.5
It says “Successfully installed”. So everything is fine, right?
No. You just fell into the “User Install” trap.
Today, we will learn why this “automatic” installation is dangerous in HPC and how to build a proper “Private Laboratory” using Virtual Environments.
Table of Contents
- 1. The Trap: The “Backpack” Problem
- 2. The Solution: Your Private Lunchbox
- 3. Choose Your Tool: Conda vs. Venv
- 4. Let’s Build Your Environment
- 5. Maintenance: Cleaning the Trash (Cache)
- 6. Summary & Cheatsheet
(Click the image to watch the tutorial on YouTube)
> 1. The Trap: The “Backpack” Problem
When the system notices you cannot write to the global library, it quietly installs packages into your hidden home folder (usually ~/.local/lib/python3.x/site-packages).
Let’s use an analogy.
- System Python: This is the Restaurant Kitchen Pantry. It has standard ingredients. You are not allowed to touch it.
- User Install (
pip install): This is your Backpack. Since you can’t use the pantry, you stuff ingredients into your backpack. - Virtual Environment: This is a Separate Lunchbox.
Why is the Backpack (User Install) bad?
No Isolation (Dependency Hell):
Project A needs NumPy 1.20 and Project B needs NumPy 2.0. If you put them both in your backpack, they get squashed together. You broke Project A to fix Project B.
> 2. The Solution: Your Private Lunchbox
Instead of stuffing everything into one backpack, you should use Virtual Environments.
Think of a Virtual Environment as your own private lunchbox.
- Isolation: You create a “Project A Box” and a “Project B Box”. They never touch each other.
- Safety: Even if you mess up the installation in one box, you just throw that box away. Your other projects are safe.
In HPC, using environments is not just “good practice”. It is the only way to survive.
> 3. Choose Your Tool: Conda vs. Venv
There are two main tools for creating environments: Conda and Venv. Which one should you use?
What is it? A cross-platform package manager that installs Python packages and external libraries (C, C++, CUDA).
Pros:
- Manages Python Versions: You can create an environment with Python 3.8 today and Python 3.12 tomorrow.
- Binary Dependencies: Handles complex libraries with GPU support (CUDA/cuDNN) automatically.
Cons:
- Heavy: It takes up more disk space than venv.
- Slow: Sometimes the "solver" takes a long time to resolve dependencies.
- Shell Pollution: Improper use of
conda initcan break your terminal (See Step 2).
Recommendation: Best choice for Science, Engineering, and AI/ML projects.
What is it? A built-in Python module that creates lightweight virtual environments.
Pros:
- Lightweight & Fast: Built into Python, creates environments instantly.
- Clean: Doesn't touch your shell configuration files.
Cons:
- Limited: Cannot install non-Python tools (like CUDA drivers) easily.
- Dependent: You are tied to the system's Python version (if system has Python 3.6, your venv is 3.6).
Recommendation: Good for simple Python scripts or pure software development.
> 4. Let’s Build Your Environment
Let’s get practical. Here is how you set up your environment.
Step 1: Create the Environment
# 1. Load the module
$ module load miniconda3
# 2. Create environment
$ conda create --name myenv python=3.13
# To store in your Lab's group directory to save space in Home
$ conda create --prefix /projects/myLAB/myCondaEnv python=3.13
# 1. Load the python module
$ module load python/3.13.10
# 2. Create environment
$ python3 -m venv /projects/myLAB/myenv
# To store in your Lab's group directory to save space in Home
$ python3 -m venv /projects/myLAB/myVenv
Step 2: Activate (The Safe Way)
WARNING: Do NOT run conda init
Many tutorials tell you to run conda init. In HPC, this is dangerous.
It modifies your .bashrc file to automatically activate the (base) environment every time you log in. This causes:
- Conflict with system modules (OpenMPI, GCC).
- Open OnDemand Failure: It may prevent Jupyter or RStudio sessions from starting.
If you already ran it, disable auto-activation:
$ conda config --set auto_activate_base false
Instead, use source activate <ENV> or the full path.
# The "HPC Safe" way (Recommended)
$ source activate /projects/myLAB/myCondaEnv
# OR if you are using module system properly:
$ conda activate /projects/myLAB/myCondaEnv
# Source the activate script
$ source /projects/myLAB/myVenv/bin/activate
Step 3: Install Packages
# Handles binary deps (CUDA, etc.) better
$ module load cuda/12.8
# Make sure to match the cuda version
(myCondaEnv) $ conda install -c conda-forge cupy cuda-version=12.8
(myCondaEnv) $ conda install numpy pandas
# Works in both Conda and Venv
(myVenv) $ pip install matplotlib huggingface-hub
Rule of Thumb:
Never run pip install unless you are inside an activated environment.
Step 4: Deactivate
# For conda
$ conda deactivate
# For venv
$ deactivate
> 5. Maintenance: Cleaning the Trash (Cache)
One day, you might see this error:
Disk quota exceeded.
You check your folder, and it seems small. Where did the space go?
Both Conda and Pip store downloaded files in a hidden Cache folder (~/.conda/pkgs or ~/.cache/pip). These can grow to 10GB+ easily.
The Clean Way:
# Remove unused packages and caches
$ conda clean --all
# Remove pip cache
$ pip cache purge
The “Nuclear” Option: If your disk is 100% full, the commands above might fail (because they can’t create a lock file). In that case, you have to delete them manually.
# WARNING: Be careful with rm -rf
# For conda
$ rm -rf ~/.conda/pkgs/*
# For pip
$ rm -rf ~/.cache/pip/*
Don’t worry, deleting cache won’t break your installed environments. It just deletes the downloaded installers.
> 6. Summary & Cheatsheet
Using environments on HPC is about keeping your workspace clean and avoiding the “Disk Quota Exceeded” error.
| Action | Conda Command | Venv Command |
|---|---|---|
| Create | conda create --prefix <path> |
python -m venv <path> |
| Activate | source activate <path> |
source <path>/bin/activate |
| Install | conda install / pip install |
pip install |
| Clean | conda clean --all |
pip cache purge |
My Advice: Stick to Miniconda for AI/HPC projects to save headaches with CUDA drivers. But please, avoid conda init to keep your login clean.
Happy Computing!