Seminar: Pixel Club
Overcoming Critical Challenges – Towards Reliable, Enhanced, and Efficient VLMs
Date:
December,02,2025
Start Time:
11:30 - 12:30
Location:
506, Zisapel Building
Add to:
Lecturer:
Nimrod Shabtay
Research Areas:
| Vision-language models have achieved impressive performance across diverse tasks, yet they face critical challenges, for example in how we evaluate them, how they use visual context, and how efficiently they process information. This talk presents three interconnected works addressing these limitations. First, LiveXiv provides contamination-free evaluation by automatically generating benchmarks from newly published scientific papers, revealing that some reported VLM improvements may stem from test set contamination rather than genuine advances. Second, IPLoc exposes a surprising gap: current VLMs, struggle with personalized object localization, failing to learn from visual examples the way humans naturally do. By teaching models to focus on contextual cues rather than relying solely on prior knowledge, we significantly improve their few-shot localization abilities. Finally, CARES addresses efficiency by recognizing that not all queries need high-resolution images. Using a lightweight module to predict the minimal sufficient resolution per query, we reduce computational costs by up to 80% while maintaining accuracy. Together, these works demonstrate that context-awarenessโin evaluation, visual reasoning, and resource allocation – is essential for building VLMs that are more reliable, capable, and practical for real-world deployment |
| Nimrod Shabtay is a PhD candidate at the faculty of engneering at Tel-Aviv University and a research intern at IBM-Research, supervised by Prof. Raja Giryes. His research focuses on Large Multimodal Models (LMMs). He is particularly interested in overcoming critical challenges towards reliable, enhanced, and efficient LMMs.
|

