Seminar: Pixel Club
Toward Generative Models that Understand the Visual World
Despite remarkable advances, visual generative models are still far from faithfully modeling the world, struggling with fundamental aspects such as spatial relations, physics, motion, and dynamic interactions.
In this talk, I present a line of work that tackles these challenges, based on a deep understanding of the inner mechanisms that drive models. I will begin by analyzing state-of-the-art visual generators, gaining insights into the underlying reasons for their limited understanding. Building upon these insights, I will demonstrate methods that significantly enhance both spatial and temporal reasoning in image and video generation, surpassing even resource-intensive proprietary models without relying on additional data or model scaling. I will conclude the talk by discussing open challenges and future directions for advancing faithful world modeling in visual generative models.
Bio:
Hila is a PhD candidate at Tel Aviv University, advised by Prof. Lior Wolf. Her research focuses on understanding, interpreting, and correcting the predictions of deep foundational models. During her PhD, she was a visiting researcher at Google Research, Google DeepMind, and Meta AI, where she led works on video generation.
Hila has received several awards, including the Fulbright Postdoctoral Fellowship, the Eric and Wendy Schmidt Postdoctoral Award, the Deutsch Prize for Outstanding PhD Students, and the Council for Higher Education (VATAT) Award for Outstanding PhD Students.