Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium
AWS Machine Learning Blog
DECEMBER 12, 2023
Training steps To run the training, we use SLURM managed multi-node Amazon Elastic Compute Cloud ( Amazon EC2 ) Trn1 cluster, with each node containing a trn1.32xl instance. Next, we also evaluate the loss trajectory of the model training on AWS Trainium and compare it with the corresponding run on a P4d (Nvidia A100 GPU cores) cluster.
Let's personalize your content