סמינר: Machine Learning Seminar

קהילת נשות הנדסת חשמל ומחשבים

Resource Efficient Neural Networks- From Theory to Practice

Date: July,09,2024 Start Time: 10:00 - 11:00
Location: 1061, Meyer Building
Add to:
Lecturer: Yaniv Blumenfeld
Modern advances in Artificial Intelligence (AI), sometimes referred to as the “AI revolution”, are driven by the development and appliance of Deep Neural Networks (DNNs). DNNs were shown to have impressive and essential capabilities in the context of Machine Learning: Not only can they be efficiently trained over large datasets, but they also have a surprising ability to extrapolate, and handle data they were not trained over. However, deep neural networks also exhibit tremendous computation costs, far exceeding the amount of computation that was previously available for large-scale scientific computation. Furthermore, DNNs tend to work better and have more capabilities, when applied with bigger models and larger datasets. This guarantees an almost endless demand for more computation resources, with significant economical, energy and environmental costs.
The research work presented in this seminar will span a wide range of applications, from image classification tasks and text-to-text generation to high-end latent diffusion models. The majority of the work presented will be under the subject of network quantization: For example, we will start by presenting a rigid, mathematical formulation that relates the maximal length of trainable neural network with the degree to which the network was quantized and offer new solutions for initialization of quantized neural networks. Other prominent methods for efficient neural network will be addressed as well, and our research concerning the importance of random initialization is directly aimed to challenge the prevailing conceptions that have led the literature on the topic of network pruning.
While our starting point in each line of research will always be theoretical, a major part of this work will be focused on practical solutions, and will conclude with tangible, feasible solutions for real-life problems. When studying the effect of quantization of the accumulators used for matrix multiplication, we will show novel methods that allows using significantly cheaper hardware during inference: For example, our method allows for inference with 12-bits hardware accumulators while maintaining the high accuracy of common, state-of-art models. Likewise, our research about diffusion models offers real-world solutions, that enables the utilization of neural networks with FP8 datatypes, while providing near-identical outputs compared with full-precision models.

Ph.D. Under the supervision of Prof. Daniel Soudry.

 

כל הסמינרים
דילוג לתוכן