MACHINE LEARNING AND PATTERN RECOGNITION: Lecture 3.1 cs.nyu.edu/~yann/2008f-G22-2565-001/diglib/lecture03-basisfn.pdf
Sent from my iPad
Learn about ML
MACHINE LEARNING AND PATTERN RECOGNITION: Lecture 3.1 cs.nyu.edu/~yann/2008f-G22-2565-001/diglib/lecture03-basisfn.pdf
Sent from my iPad
Lecture 1 slides Yann LeCumm cs.nyu.edu/~yann/2008f-G22-2565-001/diglib/lecture01.pdf
Sent from my iPad
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective
research.fb.com/wp-content/uploads/2017/12/hpca-2018-facebook.pdf?
Sent from my iPad
A guide on how to go from zero to ML “expert” in 10 months. www.analyticsvidhya.com/blog/2018/07/mystory-became-a-machine-learning-expert-10-months/
Sent from my iPad
Unitary matrix paper showing benefits of computation for RNN. arxiv.org/pdf/1511.06464.pdf
Sent from my iPad
Comments on the previous post from Reddit. https://www.reddit.com/r/MachineLearning/comments/5x5bt6/ama_were_developing_new_deep_learning_hardware/
Sent from my iPad
Coriants photonics applies to ML arxiv.org/pdf/1610.02365.pdf
Sent from my iPad
Gaussian processes for ML www.gaussianprocess.org/gpml/chapters/
Sent from my iPad
Beginning tools:
Some other ML stuff.
Neural Networks
Awesome at finding patterns, mapping high dimensional data. Before NN support vector machines, boosting, random forest. Perception does not actually imply intelligence.
Hebbian learning– Something positive happens you increase the positive weights.
Generative Adversarial Networks- Given a random samples will generate image
Genetic Algorithm.
Single Layer Network
Activation Functions
Activation function you choose depends on the convergence of the NN
Creating Logic Gates from Single Layer Perceptrons
The decision boundaries here are arbitrary and simply represent our choice of weights. There are infinitely many weights that will satisfy our decision boundary conditions here.
Classification vs. Regression (as it applies to ML)
Generically this is determined by asking whether the estimated output is continuous or discrete. Regression representing the continuous output case say if you are fitting a line to data or discrete if you are trying to identify between say 2 colors red and blue.
How to Determine Goodness of Model
Training data is known good data. The objective function measures the difference between the target and the model (NN) output. The objective function is what we try to minimize over the weights (w). The loss function is in other mathematical language the residual sum of error squared (RSS).
Weight Updates
Where the new weight is given by: and
where
Heaviside derivative is zero so can’t train with it. It’s useful to use the sigmoid to train and then put a threshold on (not sure how this works).
Learning Rate
Obvious problems if the learning rate is too small, never converges. If too big, miss minima.
Batch update is based on gradient descent.
Incremental based on stochastic gradient descent.
Mini-batch are subsets of the data-set. Sample dataset randomly.
Maximize the distance between the decision boundary and the data.
Decision Tree
Breaking things up into orthogonal sections. There are oblique decision trees are expensive to compute.