ECE Women Community

Learning to Learn in Quantized-Aware Training

Date: August,20,2025 Start Time: 11:30 - 12:30

Location: Zisapel , 506

Zoom: Zoom link

Add to:

Lecturer: Gil Denekamp

Research Areas:

למידת מכונה ומערכות נבונות

Quantization-Aware Training (QAT) is a leading technique for compressing neural networks by simulating reduced precision during training, enabling high performance even under severe resource constraints. Despite its success, QAT suffers from a fundamental optimization bottleneck: the non-differentiability of quantization operations. To address this, surrogate gradients, most commonly the Straight-Through Estimator (STE), are used. While STE has shown empirical success, particularly in moderate bit-width regimes, it introduces significant gradient mismatch in ultra-low precision settings and has limited theoretical understanding and justification.

This talk introduces a meta-learning framework for dynamically learning the optimal surrogate gradient during QAT. We begin with a theoretical insight showing that the standard STE is a special case of the Finite Differences gradient approximation method. This connection highlights inherent limitations of the STE and motivates a learned alternative. Building on this, we define a family of parameterized surrogate functions, Learned Straight-Through Estimators (LSTE), which are trained via gradient-based optimization, extending ideas from hyperparameter learning in optimizers.

We propose several optimization strategies for LSTEs, including exact analytical updates and heuristic schemes. Experiments on CIFAR-10 with ResNet architectures demonstrate that LSTEs consistently outperform standard STEs in 1–3-bit weight quantization, with the largest gains in binary regimes.

Gil Denekamp is an M.Sc. student at the Technion, under the supervision of Prof. Daniel Soudry.

Seminar: Graduate Seminar

Seminars

Learning to Learn in Quantized-Aware Training

Seminars

Learning to Learn in Quantized-Aware Training

Upcoming Seminars

Network Probing and Diagnostics through Null Space Analysis

Noise Optimization in Power Electronics

Chemical Vapor Deposition (CVD) Growth of Monolayer MoS₂: Impact of Growth Promoters on Morphology and Electrical Performance