ECE Women Community

Understanding CLIP latent space – where the common and rare images reside?

Date: December,22,2025 Start Time: 10:30 - 11:30

Location: 506, Zisapel Building

Add to:

Lecturer: Meir Yossef Levi

Research Areas:

Machine learning and intelligent systems

CLIP is a pioneering work for embedding image and text into a shared latent space using contrastive learning. It facilitates semantic space which propelled wide range of vision tasks, from retrieval, classification to text-to-image synthesis.
In this seminar I will focus on to the geometry of CLIP’s latent space, drawing on two ICML 2025 papers.
The latent space of CLIP is actually modeled by two shifted non-isometric ellipsoids; one for images and one for text, rather than a shared hypersphere. This perspective uncovers systematic geometric biases and motivates conformity, a measure capturing how common or rare a concept is, and where geometrically it resides. Common concepts exhibit high conformity, while rare ones lie farther from the mean, revealing a geometric view of concept commonality.
Then, a simple whitening transformation further maps the latent space into an isotropic form where embedding norms correlate with likelihood. This enables practical applications such as OOD detection and identifying generative artifacts, offering a cohesive geometric–probabilistic understanding of CLIP.
I will begin with a short brief on my earlier works on robust 3D classification, highlighting how robustness analysis and explainability techniques reveal the structural behavior of point-cloud classifiers.
This is a seminar talk for PhD candidacy of Meir Yossef Levi under the supervision of Prof. Guy Gilboa.

Meir Yossef Levi (Yossi Levi) is in the final stages of his Ph.D. at the Technion, advised by Prof. Guy Gilboa, after receiving both his B.Sc. and M.Sc. in Electrical Engineering from the Technion. His research focuses on multimodal representation learning, with a particular interest in understanding the latent geometry of vision-language models and its implications. His recent work centers on CLIP, with two papers accepted to ICML 2025 on this topic. Prior to this, he studied robust classification in 3D vision, with publications at ICCV and 3DV.

Seminar: Graduate Seminar

Seminars

Understanding CLIP latent space – where the common and rare images reside?

Seminars

Understanding CLIP latent space – where the common and rare images reside?

Upcoming Seminars

Understanding Scenes as 3D-Consistent Representations

Effective Game-Theoretic Motion Planning via Nested Search

LLM Post-Training and Reasoning via Efficient Value-Based RL.