סמינר: Graduate Seminar
Learning to Learn in Quantized-Aware Training
Quantization-Aware Training (QAT) is a leading technique for compressing neural networks by simulating reduced precision during training, enabling high performance even under severe resource constraints. Despite its success, QAT suffers from a fundamental optimization bottleneck: the non-differentiability of quantization operations. To address this, surrogate gradients, most commonly the Straight-Through Estimator (STE), are used. While STE has shown empirical success, particularly in moderate bit-width regimes, it introduces significant gradient mismatch in ultra-low precision settings and has limited theoretical understanding and justification.
This talk introduces a meta-learning framework for dynamically learning the optimal surrogate gradient during QAT. We begin with a theoretical insight showing that the standard STE is a special case of the Finite Differences gradient approximation method. This connection highlights inherent limitations of the STE and motivates a learned alternative. Building on this, we define a family of parameterized surrogate functions, Learned Straight-Through Estimators (LSTE), which are trained via gradient-based optimization, extending ideas from hyperparameter learning in optimizers.
We propose several optimization strategies for LSTEs, including exact analytical updates and heuristic schemes. Experiments on CIFAR-10 with ResNet architectures demonstrate that LSTEs consistently outperform standard STEs in 1–3-bit weight quantization, with the largest gains in binary regimes.
Gil Denekamp is an M.Sc. student at the Technion, under the supervision of Prof. Daniel Soudry.