Home / Research Library / Gemini: A Family of Highly Capable Multimodal Mode...
🤖 Artificial Intelligence OpenAlex

Gemini: A Family of Highly Capable Multimodal Models

📅 December 19, 2023 👤 Gemini Robotics Team, Rohan Anil, Sebastian Borgeaud et al. 📖 arXiv (Cornell University) 📊 811 citations

🤖 Plain-English Summary

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases.

🔑 Key Findings

  • The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases.
  • Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined.
  • We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases.

💡 Why This Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

Read the full paper
Access the original peer-reviewed research via OpenAlex.

View on DOI ↗

📋 Article Details

Category 🤖 Artificial Intelligence
Published Dec 19, 2023
Journal arXiv (Cornell University)
Authors Gemini Robotics Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu
DOI 10.48550/arxiv.2312.11805
Citations 811
Source OpenAlex

More 🤖 Artificial Intelligence Research