high performance computing

Benchmarking LLM finetuning and multi-node NCCL communication

Benchmarks for finetuning LLMs on HPC systems and investigating performance bottlenecks.

Benchmarking LLM fine-tuning on different HPC systems

We have developed a benchmark that compares the compute performance of fine-tuning LLMs on multiple high-performance computing (HPC) systems, including systems designed for working with sensitive data. In this blog post, we introduce the benchmark, describe the lessons learned developing it and make it open-source so that it can be used and improved by others.