Skip to content

Deep Learning Terminology

  • Linear unit, or layer : Given features \(h\), a linear output layer will produce a vector, \(\hat{y} = W^{\mathsf{T}}h + b\) . They produce the mean of a conditional Gaussian distribution.

    • Linear units do not saturate; can therefore be used with a wide variety of optimization algorithms, including, but not limited to, SGD.
  • Covariance : a measure of the joint variability of two random variables.

    • sign of covariance shows the tendency in the linear relationship between the two variables.
    • Normalized covariance = correlation coefficient, which by its magnitude shows the strength of the linear relationship.
    \[cov(X,Y) = E[(X - E[X])(Y - E[Y])]\]
  • Dropout :