Neural Networks and Deep Learning | Deep Learning Specialization | Coursera

Lecture Planning

Week 1: Introduction to Deep Learning
  • Welcome to the Deep Learning Specialization
    • C1W1L01 Welcome
  • Introduction to Deep Learning
    • C1W1L02 Welcome
    • C1W1L03 What is a neural network?
    • C1W1L04 Supervised Learning with Neural Networks
    • C1W1L05 Why is Deep Learning taking off?
    • C1W1L06 About this Course
    • C1W1R1 Frequently Asked Questions
    • C1W1L07 Course Resources
    • C1W1R2 How to use Discussion Forums
    • C1W1L08 Geoffrey Hinton interview
  • Practice Questions
    • C1W1Q1 Introduction to deep learning
Week 2: Neural Networks Basics
  • Logistic Regression as a Neural Network
    • C1W2L01 Binary Classification
    • C1W2L02 Logistic Regression
    • C1W2L03 Logistic Regression Cost Function
    • C1W2L04 Gradient Descent
    • C1W2L05 Derivatives
    • C1W2L06 More Derivative Examples
    • C1W2L07 Computation Graph
    • C1W2L08 Derivatives with a Computation Graph
    • C1W2L09 Logistic Regression Gradient Descent
    • C1W2L10 Gradient Descent on m examples
  • Python and Vectorization
    • C1W2L11 Vectorization
    • C1W2L12 More Vectorization Examples
    • C1W2L13 Vectorizing Logistic Regression
    • C1W2L14 Vectorizing Logistic Regression’s Gradient Output
    • C1W2L15 Broadcasting in Python
    • C1W2L16 A note on python/numpy vectors
    • C1W2L17 Quick tour of Jupyter/iPython Notebooks
    • C1W2L18 Explanation of logistic regression cost function (optional)
  • Python and Vectorization
    • C1W2L11 Vectorization
    • C1W2L12 More Vectorization Examples
    • C1W2L13 Vectorizing Logistic Regression
    • C1W2L14 Vectorizing Logistic Regression’s Gradient Output
    • C1W2L15 Broadcasting in Python
    • C1W2L16 A note on python/numpy vectors
    • C1W2L17 Quick tour of Jupyter/iPython Notebooks
    • C1W2L18 Explanation of logistic regression cost function (optional)
  • Practice Questions
    • C1W2Q1 Neural Network Basics
  • Programming Assignments
    • C1W2P1 Practice Programming Assignment: Python Basics with numpy (optional)
    • C1W2P2 Programming Assignment: Logistic Regression with a Neural Network mindset
Week 3: Shallow Neural Networks
  • Shallow Neural Networks
    • C1W3L01 Neural Networks Overview
    • C1W3L02 Neural Network Representation
    • C1W3L03 Computing a Neural Network’s Output
    • C1W3L04 Vectorizing across multiple examples
    • C1W3L05 Explanation for Vectorized Implementation
    • C1W3L06 Activation functions
    • C1W3L07 Why do you need non-linear activation functions?
    • C1W3L08 Derivatives of activation functions
    • C1W3L09 Gradient descent for Neural Networks
    • C1W3L10 Backpropagation intuition (optional)
    • C1W3L11 Random Initialization
  • Practice Questions
    • C1W3Q1 Shallow Neural Networks
  • Programming Assignment
    • C1W3P1 Planar data classification with a hidden layer
Week 4: Deep Neural Networks
  • Deep Neural Network
    • C1W4L01 Deep L-layer neural network
    • C1W4L02 Forward Propagation in a Deep Network
    • C1W4L03 Getting your matrix dimensions right
    • C1W4L04 Why deep representations?
    • C1W4L05 Building blocks of deep neural networks
    • C1W4L06 Forward and Backward Propagation
    • C1W4L07 Parameters vs Hyperparameters
    • C1W4L08 What does this have to do with the brain?
  • Practice Questions
    • C1W4Q1 Key concepts on Deep Neural Networks
  • Programming Assignments
    • C1W4P1 Key concepts on Deep Neural Networks
    • C1W4P2 Building your deep neural network: Step by Step
    • C1W4P3 Deep Neural Network Application

C1W2L03 Logistic Regression Cost Function

Loss function: to measure how bad prediction of a single example is.

  • Loss function = error function
  • For a single example

Cost function: to measure the average of the loss function of each examples.

  • For the overall examples

The loss function computes the error for a single training example; the cost function is the average of the loss functions of the entire training set.

Logistic regression can be viewed as a small neural network.

C1W2L04 Gradient Descent

dw = \frac{\partial J}{\partial w}

db = \frac{\partial J}{\partial b}

C1W2L05 Derivatives

Intuitive understanding of derivatives

If you already understand derivatives, you can skip this video.

C1W2L10 Gradient Descent on m Examples

  • For loop: sequential processing
  • Vectorization = matrix computation: parallel processing

Practice Programming Assignment (Optional)

  • Actually, we rarely use the “math” library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more useful.
  • np.linalg.norm: to get norm of rows
  • np.reshape is widely used. In the future, you’ll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
  • np.dot(): matrix multiplication
  • np.multiply(), * operator: element-wise multiplication

Programming Assignment: Logistic Regression with a Neural Network mindset

  • Many software bugs in deep learning come from having matrix/vector dimensions that don’t fit.
  • Flattening technique
    • X_flatten = X.reshape(X.shape[0], -1).T
  • Common steps for pre-processing a new dataset are:
    • Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, …)
    • Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)
    • “Standardize” the data
  • Preprocessing the dataset is important.
  • You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
  • Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm.

C1W3L06 Activation Functions

  • Activation functions introduced: sigmoid, tanh, ReLU, LeakyReLU
  • Sigmoid
    • Not used in practice.
    • If |z| is large enough, the gradient at z is nearly 0. So, almost no update by a gradient descent method.
  • tanh
    • A translated version of sigmoid.
    • The mean of activations by tanh can be 0 but sigmoid cannot have zero-mean activations. The zero-mean activation helps learning fast. Normalizing effect?
  • ReLU
    • If z>0, its parameter can learn without the gradient vanishing problem.
    • If z<0, its parameter is not able to learn, say, is not updated.
    • It is reported that if a layer with ReLU has sufficiently lots of units, the problem at z<0 is not a big deal.
      • z can escape from z<0 with updates of z such that z>0.
  • LeakyReLU
    • In the range of z<0, the gradient is between 0 and 1, and mostly 0.01.
    • ReLU is the default choice. If you need LeakyReLU in particular, then use LeakyReLU.

C1W3L11 Random Initialization

  • W=np.random.randn((a,b)) * 0.01
    • Why not W=np.random.randn((a,b)) * 100 ?
      • [If activation function g is sigmoid or tanh]
        • |W| is small → |z| is small → The derivative of a=g(z) is not so small to update W. → W will be updated!
        • |W| is large → |z| is large → The derivative of a=g(z) is so small. → W will not be updated!
      • [If activation function is z=0 symmetric – my thought]
        • |W| is small → |z| is small → It is easy to change z to be z>0 or z<0 with gradient update. → Able to learn nonlinear decisions
        • |W| is large → |z| is large → It is hard to change z to be z>0 or z<0 with gradient update. → Rarely able to learn nonlinear decisions

C1W4L04 Why deep representations?

  • To approximate a function, a deep network requires fewer hidden units than a shallow network does. – circuit theory and deep learning
    •  A shallow network requires exponentially more hidden units than a deep network does.
    • Andew Ng thinks this theory is less useful for gaining intuition of deep representation.

C1W4L06 Forward and Backward Propagation

Forward and Backward Propagation

C1W4L06 Forward and Backward Propagation

Forward Propagation for One Layer

Backward Propagation for One Layer

Summary of Forward and Backward Propagation

C1W4L08 What does this (deep learning) have to do with the brain?

  • The structure of artificial neural networks have the simplified structure of biological neurons. However, the biological ones have more complicated systems.
  • Learning methods of artificial neural networks, such as backpropagation, do not seem to be used in real biological neural networks. The biological neurons may use different learning algorithms.
  • The analogy of biological neurons to understand deep learning is not quite useful.

Leave a Reply

Your email address will not be published. Required fields are marked *