Deep Learning Terminology¶

Linear unit, or layer : Given features \(h\), a linear output layer will produce a vector, \(\hat{y} = W^{\mathsf{T}}h + b\) . They produce the mean of a conditional Gaussian distribution.
- Linear units do not saturate; can therefore be used with a wide variety of optimization algorithms, including, but not limited to, SGD.
Covariance : a measure of the joint variability of two random variables.
- sign of covariance shows the tendency in the linear relationship between the two variables.
- Normalized covariance = correlation coefficient, which by its magnitude shows the strength of the linear relationship.
\[cov(X,Y) = E[(X - E[X])(Y - E[Y])]\]
Dropout :