Benchmarking LLM finetuning and multi-node NCCL communication
Benchmarks for finetuning LLMs on HPC systems and investigating performance bottlenecks.
Benchmarks for finetuning LLMs on HPC systems and investigating performance bottlenecks.
We have developed a benchmark that compares the compute performance of fine-tuning LLMs on multiple high-performance computing (HPC) systems, including systems designed for working with sensitive data. In this blog post, we introduce the benchmark, describe the lessons learned developing it and make it open-source so that it can be used and improved by others.