Home / Research Library / BERTopic: Neural topic modeling with a class-based...
🤖 Artificial Intelligence OpenAlex

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

📅 March 11, 2022 👤 Maarten Grootendorst 📖 arXiv (Cornell University) 📊 1,323 citations

🤖 Plain-English Summary

Topic models can be useful tools to discover latent topics in collections of documents. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure.

🔑 Key Findings

  • Recent studies have shown the feasibility of approach topic modeling as a clustering task.
  • We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF.
  • More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Mar 11, 2022
Journal arXiv (Cornell University)
Authors Maarten Grootendorst
DOI 10.48550/arxiv.2203.05794
Citations 1,323
Source OpenAlex

More 🤖 Artificial Intelligence Research