rw-book-cover

Metadata

  • Author: AlphaSignal
  • Full Title: ⚡️ This Repo Makes LLMs 40% Faster

Highlights

  • Thunder increases PyTorch Large Language Model (LLM) training speed by 40%, evident in tasks like Llama 2 7B model training. (View Highlight)
  • Apply Thunder to your PyTorch models by calling thunder.jit(). This enables enhanced performance for multi-GPU environments using Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP). (View Highlight)
  • Thunder uses hardware executors like nvFuser, torch.compile, cuDNN, and TransformerEngine FP8, improving both single and multi-accelerator performance. It integrates seamlessly with PyTorch’s standard operations and autograd. (View Highlight)