## Some Basic Neural Networks and Applications

• Single layer perceptrons
• Multilayer perceptrons
• The deep part of deep learning.
• Convolutional Neural Networks
• Image/Video classification
• Machine translationRecurrent Neural Networks
• Language models
• Reinforcement Learning

Beginning tools:

• Pytorch
• GPU
• Caffe2
• Can run on smartphones?

Some other ML stuff.

• Deep-Q Network (reinforcement learning {ML for training computers to play games},
• Sequence to sequence with attention (translation, summarization)
• Residual Networks (image recognition)

Neural Networks

Awesome at finding patterns, mapping high dimensional data. Before NN support vector machines, boosting, random forest. Perception does not actually imply intelligence.

Hebbian learning– Something positive happens you increase the positive weights.

Generative Adversarial Networks- Given a random samples will generate image

Genetic Algorithm.

Single Layer Network

Activation Functions

Activation function you choose depends on the convergence of the NN

• Sigmoid is always positive and never too large. Somewhat robust to outliers.
• Rectified Linear Unit is essentially the linear function with negative rectification.

Creating Logic Gates from Single Layer Perceptrons

The decision boundaries here are arbitrary and simply represent our choice of weights. There are infinitely many weights that will satisfy our decision boundary conditions here.

Classification vs. Regression (as it applies to ML)

Generically this is determined by asking whether the estimated output is continuous or discrete. Regression representing the continuous output case say if you are fitting a line to data or discrete if you are trying to identify between say 2 colors red and blue.

How to Determine Goodness of Model

Training data is known good data. The objective function measures the difference between the target and the model (NN) output. The objective function is what we try to minimize over the weights (w). The loss function is in other mathematical language the residual sum of error squared (RSS).

Where the new weight is given by: and

where
and
step size
target data
first derivative of the activation function.
input data
Getting more explicit here.

then use the chain rule:

and recall

Heaviside derivative is zero so can’t train with it. It’s useful to use the sigmoid to train and then put a threshold on (not sure how this works).

Learning Rate

Obvious problems if the learning rate is too small, never converges. If too big, miss minima.

Batch update is based on gradient descent.

• Compute the average delta weight over the whole dataset
• Update weights
• repeat

Incremental based on stochastic gradient descent.

• Compute delta w for single training example.
• update weight
• repeat until convergence.

Mini-batch are subsets of the data-set. Sample dataset randomly.

Support Vector Machine.

Maximize the distance between the decision boundary and the data.

Decision Tree

Breaking things up into orthogonal sections. There are oblique decision trees are expensive to compute.