Training LLaMA2-7B
Create a GPU Instance
Log in to the HPC-AI.com website:
- After logging in, navigate to the Instances page.
Create an instance:
- (Optional) Configure shared storage by going to the shared storage page, clicking Create Storage, and creating the necessary space.
- Click the "Create GPU Instance" button to start the creation process.
Select the configuration:
- Select the GPU type.
- Select the server location.
- Select the number of cards (e.g., 8 cards per machine).
- (Optional) Mount the previously created shared storage.
Configure instance information:
- Instance Name:
colossal
- Image: Select CUDA (12.1) ubuntu==20.04, cuda==12.1, python==3.11, conda, or upload a custom one.
- External storage: Not used in this case due to small data size.
- Instance Name:
Initialize the instance:
- Wait for the instance to initialize after configuration.
- Connect to the instance via SSH once initialized.
Environment Configuration
Create a Conda environment:
conda create --name myenv python=3.11
conda activate myenvInstall PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Verify installation:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"Is CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version: {torch.version.cuda}")
print(f"Number of GPU devices: {torch.cuda.device_count()}")
for i in range(torch.cuda.device_count()):
print(f"Device {i}: {torch.cuda.get_device_name(i)}")
Installing Colossal-AI
Note: Please execute the following commands within the virtual environment you created earlier to ensure that all dependencies for Colossal-AI are properly installed.
Clone the official repository:
cd ~
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAICompile Kernels:
BUILD_EXT=1 pip install .
Install Flash Attention:
pip install flash-attn --no-build-isolation
Install NVIDIA Apex:
cd ~
git clone https://github.com/NVIDIA/apex
cd apex
git checkout 22.04-devModify the
setup.py
file at line 36:#if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):
# passInstall dependencies:
pip install -r requirements.txt
Build and install:
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
python setup.py installComment out
from torch._six import string_classes
in_initialize.py
and replacestring_classes
withstr
in line 42.
Install TensorNVME:
cd ~
git clone https://github.com/hpcaitech/TensorNVMe.git
apt-get update
apt-get install cmake
pip install -v --no-cache-dir .
Training the LLaMA2-7B Model
Prepare the training script:
- Modify
/root/ColossalAI/examples/language/llama/scripts/benchmark_7B/gemini.sh
:#!/bin/bash
cd /root/ColossalAI/examples/language/llama/scripts/
export OMP_NUM_THREADS=64
colossalai run --nproc_per_node 8 benchmark.py -g -x -b 6 -s 10 --shard_param_frac 0
- Modify
Execute the training script:
bash /root/ColossalAI/examples/language/llama/scripts/benchmark_7B/gemini.sh
Result Plot:
Try 3D parallelism:
dp8+tp1+pp1+zero2
:colossalai run --nproc_per_node 8 benchmark.py -p 3d -g -x -b 16 -s 10 --tp 4 --zero 2
dp4+tp1+pp2+zero1
:colossalai run --nproc_per_node 8 benchmark.py -p 3d -g -x -b 128 -s 10 --pp 2 --zero 1
Note: This tutorial uses the Colossal-AI main branch code (2024.12.8) by default. If some packages are missing or have installation problems due to the main branch upgrade, you can install them yourself through pip.