ECE Women Community

Discovering and Erasing Unsafe Concepts

Date: July,16,2024 Start Time: 11:30 - 12:30

Location: 1061, Meyer Building

Add to:

Lecturer: Niv Cohen

Research Areas:

Machine learning and intelligent systems

The rapid growth of generative models allows an ever-increasing variety of capabilities. Yet, these models may also produce undesired content such as unsafe images, private information, or copyrighted material.

In this talk, I will discuss practical methods to prevent undesired generation and the evaluation of such methods. First, I will show how the challenge of avoiding undesired generations manifested itself in a simple Capture-the-Flag LLM setting, where even our top defense strategy was breached. Next, I will demonstrate a similar vulnerability in state-of-the-art concept erasure methods for Text-to-Image models. Finally, I will describe the notion of ‘Unconditional Concept Erasure’ aiming to mitigate these issues. I will show that Task Vectors can achieve Unconditional Concept Erasure, and discuss the opportunities and limitations of applying Task Vectors in practice.

Niv is a postdoctoral researcher at New York University hosted by Prof. Chinmay Hegde. He received a BSc. in mathematics with physics in the Technion Excellence Program. He received a Ph.D. in computer science from the Hebrew University of Jerusalem, advised by Prof. Yedid Hoshen. Niv was awarded the Israeli data science scholarship for outstanding postdoctoral fellows (VATAT). He is interested in model personalization, anomaly detection, and AI Safety for Language and Vision and Language models.

Seminar: Pixel Club

Seminars

Discovering and Erasing Unsafe Concepts

Seminars

Discovering and Erasing Unsafe Concepts

Upcoming Seminars

Metagratings on Low-Cost Substrates for Efficient Anomalous Reflection: Addressing Dielectric Loss

Non-Adaptive Multi-Stage Algorithm for Group Testing with Prior Statistics

Unsupervised Invariant Risk Minimization