Home / Research Library / Training language models to follow instructions wi...
🤖 Artificial Intelligence OpenAlex

Training language models to follow instructions with human feedback

📅 March 4, 2022 👤 Long Ouyang, Jeff Wu, Xu Jiang et al. 📖 arXiv (Cornell University) 📊 4,287 citations

🤖 Plain-English Summary

Making language models bigger does not inherently make them better at following a user's intent. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

🔑 Key Findings

  • For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user.
  • In other words, these models are not aligned with their users.
  • In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Mar 04, 2022
Journal arXiv (Cornell University)
Authors Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright
DOI 10.48550/arxiv.2203.02155
Citations 4,287
Source OpenAlex

More 🤖 Artificial Intelligence Research