CS231n: Convolutional Neural Networks for Visual Recognition | Course

Lecture 6 | Training Neural Networks I
Sigmoid
  • Problems of the sigmoid activation function
    • Problem 1: Saturated neurons kill the gradients.
    • Problem 2: Sigmoid outputs are not zero-centered.
      • Suppose a given feed-forward neural network has hidden layers and all activation functions are sigmoid.
      • Then, except the first layer, the other layers get only positive inputs.
      • If \forall i, x_i>0, then all the gradients are positive.
        • \frac{\partial \sigma}{\partial w_i} = \frac{\partial \sigma}{\partial (\sum_{i}x_i w_i+b)}x_i = (+)(+)>0
      • If the gradients are only positive, then the update direction gets very constrained.
    • Problem 3: exp() is a bit expensive computation. – (a minor problem)
      • Numerical methods now well solves this problem.
tanh (tangent hyperbolic)
  • Zero centered
    • The problem 2 has been solved.
  • The problem 1 and 3 are still remained.
ReLU (rectified linear unit)
  • The problem 1 has been solved in the positive region.
  • Actually more biologically plausible than sigmoid. The detail was not introduced in this lecture.
  • AlexNet used ReLU.
  • Problems
    • Problem 1: Not zero-centered
      • The gradient of each weight is zero or positive.
      • The update direction is always the combination of zeros or positives.
      • The update direction is restricted. This effects inefficient optimization.
    • Problem 2: dead ReLU
      • 20% of units are never active nor updated, which are called dead ReLUs.
  • Initialization
    • People like to initialize ReLU neurons with slightly positive biases (e.g. 0.01)
  • Leaky ReLU
  • PReLU (Parametric Rectifier)
  • ELU (Exponential Linear Unit)
    • Between leaky ReLU and ReLU
Maxout
  • Nonlinear
  • a generalized form of ReLU and leaky ReLU
  • Benefits
    • Linear regimes
    • Its output does not saturate.
    • Its gradient does not die.
  • Drawback
    • Double the number of weights.
In practice
  • Use ReLU first.
  • Try out Leakey ReLU, Maxout, and ELU.
  • Try out tanh but don’t expect much.
  • Don’t use sigmoid.

Leave a Reply

Your email address will not be published. Required fields are marked *