๐—ช๐—ฒ๐—ถ๐—ด๐—ต๐˜ ๐—œ๐—ป๐—ถ๐˜๐—ถ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป in Deep Learning:

Facebook
Twitter
LinkedIn

Table of Contents

๐Ÿ’ก ๐—ช๐—ฒ๐—ถ๐—ด๐—ต๐˜ ๐—œ๐—ป๐—ถ๐˜๐—ถ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป in Deep Learning: What is it and why you should care about it ?

Letโ€™s start with ๐˜„๐—ฒ๐—ถ๐—ด๐—ต๐˜๐˜€.

Those ๐—ณ๐—น๐—ผ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฝ๐—ผ๐—ถ๐—ป๐˜ numbers which model learn during training and somehow encapsulates the magic of deep learning.

But, whatโ€™s the value of those floats when we start the training?

Should it be randomly set? Should it be kept at zero or one ? Will that be optimal? Will it help in faster convergence ?

Letโ€™s find out.

Imagine you are in your garden planting seeds.

If you plant seeds too deep (weights too small), they might never sprout.

If you plant them too shallow (weights too large), they might sprout too fast but wonโ€™t grow strong roots.

The right depth ensures healthy growthโ€”just like proper weight initialization ensures good learning.

Why it reminds me of ๐—š๐—ผ๐—น๐—ฑ๐—ถ๐—น๐—ผ๐—ฐ๐—ธ๐˜€ ๐Ÿ™‚ Not too deep, not too shallow but just right.

๐—ช๐—ต๐˜† ๐—ช๐—ฒ๐—ถ๐—ด๐—ต๐˜ ๐—œ๐—ป๐—ถ๐˜๐—ถ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐— ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐˜€

At the beginning of training, weights need to be initialized to some starting values.

Why? Because if you donโ€™t start with good initial values , bad things can happen ๐Ÿ™‚

1๏ธโƒฃ Vanishing Gradients: weights ttoo small can cause gradients to shrink exponentially as they backpropagate. This slows down learning, especially in the early layers.

2๏ธโƒฃ Exploding Gradients: Conversely, weights that are too large can cause gradients to grow uncontrollably, leading to unstable training.

3๏ธโƒฃ Symmetry Breaking: If all weights are initialized to the same value, neurons in the same layer will learn the same features, defeating the purpose of having multiple neurons.

4๏ธโƒฃ Training Efficiency: Poor initialization can make the optimization process unnecessarily slow / suboptimal.

๐—ช๐—ต๐—ฎ๐˜โ€™๐˜€ ๐˜๐—ต๐—ฒ ๐—ฟ๐—ฒ๐—บ๐—ฒ๐—ฑ๐˜†?

Here are some common weight initialization techniques.

1๏ธโƒฃ Random Initialization : Weights initialized randomly, typically using a uniform or normal distribution.

2๏ธโƒฃ Xavier Initialization (Glorot Initialization) Adjusts the scale of random weights based on the number of input and output neurons.

3๏ธโƒฃ He Initialization Similar to Xavier Initialization but scales weights based on the number of input neurons only.

4๏ธโƒฃ LeCun Initialization Specifically tailored for sigmoid and tanh activations. Scales weights to maintain variance stability for these activation functions.

When in doubt, Use He Initialization for ReLU-based activations and
Xavier or LeCun for sigmoid/tanh activations.

Facebook
Twitter
LinkedIn

Similar Posts

Contact Us

We would be delighted to help !

Contact Us

Call Now Button