- Author: AlphaSignal
- Full Title: ⚡️ This Repo Makes LLMs 40% Faster
Highlights §
- Thunder increases PyTorch Large Language Model (LLM) training speed by 40%, evident in tasks like Llama 2 7B model training. (View Highlight)
- Apply Thunder to your PyTorch models by calling thunder.jit(). This enables enhanced performance for multi-GPU environments using Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP). (View Highlight)
- Thunder uses hardware executors like nvFuser, torch.compile, cuDNN, and TransformerEngine FP8, improving both single and multi-accelerator performance. It integrates seamlessly with PyTorch’s standard operations and autograd. (View Highlight)