Seminar: Pixel Club

ECE Women Community

Overcoming Critical Challenges – Towards Reliable, Enhanced, and Efficient VLMs

Date: December,02,2025 Start Time: 11:30 - 12:30
Location: 506, Zisapel Building
Add to:
Lecturer: Nimrod Shabtay
Vision-language models have achieved impressive performance across diverse tasks, yet they face critical challenges, for example in how we evaluate them, how they use visual context, and how efficiently they process information. This talk presents three interconnected works addressing these limitations. First, LiveXiv provides contamination-free evaluation by automatically generating benchmarks from newly published scientific papers, revealing that some reported VLM improvements may stem from test set contamination rather than genuine advances. Second, IPLoc exposes a surprising gap: current VLMs, struggle with personalized object localization, failing to learn from visual examples the way humans naturally do. By teaching models to focus on contextual cues rather than relying solely on prior knowledge, we significantly improve their few-shot localization abilities. Finally, CARES addresses efficiency by recognizing that not all queries need high-resolution images. Using a lightweight module to predict the minimal sufficient resolution per query, we reduce computational costs by up to 80% while maintaining accuracy. Together, these works demonstrate that context-awarenessโ€”in evaluation, visual reasoning, and resource allocation – is essential for building VLMs that are more reliable, capable, and practical for real-world deployment
Nimrod Shabtay is a PhD candidate at the faculty of engneering at Tel-Aviv University and a research intern at IBM-Research, supervised by Prof. Raja Giryes.
His research focuses on Large Multimodal Models (LMMs). He is particularly interested in overcoming critical challenges towards reliable, enhanced, and efficient LMMs.

 

All Seminars
Skip to content