We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage.
This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.
Read the full paper
Access the original peer-reviewed research via OpenAlex.
| Category | 🤖 Artificial Intelligence |
| Published | Mar 29, 2022 |
| Journal | arXiv (Cornell University) |
| Authors | Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai |
| DOI | 10.48550/arxiv.2203.15556 |
| Citations | 663 |
| Source | OpenAlex |