Home / Research Articles Hub / Sigmoid Loss for Language Image Pre-Training
🤖 Artificial Intelligence OpenAlex

Sigmoid Loss for Language Image Pre-Training

📅 Published: October 1, 2023 👤 Xiaohua Zhai, Basil Mustafa, А. И. Колесников et al. 📖 Research Journal 📊 614 citations
AI-Generated Summary

We propose a simple pairwise sigmoid loss for imagetext pre-training. Finally, we push the batch size to the extreme, up to one million, and find that the benefits of growing batch size quickly diminish, with a more reasonable batch size of 32k being sufficient.

⚡ This is an original paraphrased summary — not copied from the abstract. Full paper available at the source link below.

Key Findings
  • 1 Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization.
  • 2 The sigmoid loss simultaneously allows further scaling up the batch size, while also performing better at smaller batch sizes.
  • 3 With only four TPUv4 chips, we can train a Base CLIP model at 4k batch size and a Large LiT model at 20k batch size, the latter achieves 84.5% ImageNet zero-shot accuracy in two days.
Why It Matters

This research advances how AI systems learn, reason, and solve problems — with direct implications for automation and scientific discovery.

This summary is based on publicly available metadata and abstract. For the full research paper, visit the original source:

Read Full Paper at OpenAlex
More Artificial Intelligence Papers ← Back to Hub 📚 Learning Hub
Article Details
Source OpenAlex
Category 🤖 Artificial Intelligence
Published Oct 1, 2023
Journal Research Journal
DOI 10.1109/iccv51070.2023.01100
Citations 614
Authors Xiaohua Zhai, Basil Mustafa, А. И. Колесников, Lucas Beyer