Benchmarking LLM finetuning and multi-node NCCL communicationBenchmarks for finetuning LLMs on HPC systems and investigating performance bottlenecks.