Training LLaMA2-7B

Create a GPU Instance

Log in to the HPC-AI.com website:
- After logging in, navigate to the Instances page.
Create an instance:
- (Optional) Configure shared storage by going to the shared storage page, clicking Create Storage, and creating the necessary space.
- Click the "Create GPU Instance" button to start the creation process.
Select the configuration:
- Select the GPU type.
- Select the server location.
- Select the number of cards (e.g., 8 cards per machine).
- (Optional) Mount the previously created shared storage.
Configure instance information:
- Instance Name: colossal
- Image: Select CUDA (12.1) ubuntu==20.04, cuda==12.1, python==3.11, conda, or upload a custom one.
- External storage: Not used in this case due to small data size.
Initialize the instance:
- Wait for the instance to initialize after configuration.
- Connect to the instance via SSH once initialized.

Environment Configuration

Create a Conda environment:

conda create --name myenv python=3.11
conda activate myenv

Install PyTorch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify installation:

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"Is CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Number of GPU devices: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"Device {i}: {torch.cuda.get_device_name(i)}")

Installing Colossal-AI

Note: Please execute the following commands within the virtual environment you created earlier to ensure that all dependencies for Colossal-AI are properly installed.

Clone the official repository:

cd ~
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

Compile Kernels:
```
BUILD_EXT=1 pip install .
```

Install Flash Attention:

pip install flash-attn --no-build-isolation

Install NVIDIA Apex:

cd ~
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 22.04-dev

Modify the setup.py file at line 36:

#if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):
#    pass

Install dependencies:
```
pip install -r requirements.txt
```

Build and install:

pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
python setup.py install

Comment out from torch._six import string_classes in _initialize.py and replace string_classes with str in line 42.

Install TensorNVME:

cd ~
git clone https://github.com/hpcaitech/TensorNVMe.git
apt-get update
apt-get install cmake
pip install -v --no-cache-dir .

Training the LLaMA2-7B Model

Prepare the training script:

Modify /root/ColossalAI/examples/language/llama/scripts/benchmark_7B/gemini.sh:

#!/bin/bash
cd /root/ColossalAI/examples/language/llama/scripts/
export OMP_NUM_THREADS=64
colossalai run --nproc_per_node 8 benchmark.py -g -x -b 6 -s 10 --shard_param_frac 0

Execute the training script:

bash /root/ColossalAI/examples/language/llama/scripts/benchmark_7B/gemini.sh

Result Plot:

Try 3D parallelism:

dp8+tp1+pp1+zero2:

colossalai run --nproc_per_node 8 benchmark.py -p 3d -g -x -b 16 -s 10 --tp 4 --zero 2

dp4+tp1+pp2+zero1:

colossalai run --nproc_per_node 8 benchmark.py -p 3d -g -x -b 128 -s 10 --pp 2 --zero 1

Note: This tutorial uses the Colossal-AI main branch code (2024.12.8) by default. If some packages are missing or have installation problems due to the main branch upgrade, you can install them yourself through pip.

Training LLaMA2-7B

Create a GPU Instance​

Environment Configuration​

Installing Colossal-AI​

Training the LLaMA2-7B Model​

Create a GPU Instance

Environment Configuration

Installing Colossal-AI

Training the LLaMA2-7B Model