From Attacks to Security-Enhancing Insights in NLP Models

Date: January,28,2026 Start Time: 11:30 - 12:30

Location: 1061, Meyer Building

Add to:

Lecturer: Mahmood Sharif

Research Areas:

תקשורת ואינפורמציה

Recent advances in natural language processing (NLP) have given rise to transformative models, including large language models (LLMs) and text retrievers. Still, critical concerns remain regarding the security of these models: chiefly, LLMs can be jailbroken and misused (e.g., to launch cyberattacks), and text retrievers in search applications can be manipulated to prioritize adversary-chosen content. In this talk, I will present our recent efforts toward making LLMs and text retrievers more secure. In particular, I will show how potent attacks can provide explanations for models’ vulnerabilities, which, in turn, enable us to enhance security. Crucially, I will also demonstrate how our insights can inform the design of even stronger attacks, establishing a cycle that guides continuous model improvements.

Based on joint work with Matan Ben-Tov and Mor Geva.

סמינר: ceClub: The Technion Computer Engineering Club

סמינרים

From Attacks to Security-Enhancing Insights in NLP Models

סמינרים

From Attacks to Security-Enhancing Insights in NLP Models

סמינרים קרובים

Unlocking New Capabilities for Quantum Computation with Neutral Atom Arrays

Reinventing Vision: Artificial Intelligence Through New Artificial Eyes

In-situ quantum signal processing