Home / Research Articles Hub / Training Compute-Optimal Large Language Models
🤖 Artificial Intelligence OpenAlex

Training Compute-Optimal Large Language Models

📅 Published: March 29, 2022 👤 Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch et al. 📖 arXiv (Cornell University) 📊 663 citations
AI-Generated Summary

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant.
  • 2 By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
  • 3 We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4$\times$ more more data.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Mar 29, 2022
Journal arXiv (Cornell University)
DOI 10.48550/arxiv.2203.15556
Citations 663
Authors Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai