Great into to gradient descent. Nice little examples for beginners. spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/

Sent from my iPad

## Introduction to Statistical Learning

Free book intro to statistical learning. www-bcf.usc.edu/~gareth/ISL/index.html

Sent from my iPad

## SiP for

SiP for neural networks www.ece.ust.hk/~eexu/OPTICS2018/Bert%20Offrein.pdf

Sent from my iPad

## Activation function – Wikipedia

Activation function for a neuron. en.wikipedia.org/wiki/Activation_function

Sent from my iPad

## Von Neumann Bottleneck

Von Neumann architecture wiki.c2.com/?VonNeumannBottleneck

Sent from my iPad

## Should Deep Learning use Complex Numbers? – Intuition Machine – Medium

## Bayesian neural nets by Yann Lecum on FB 2016

m.facebook.com/yann.lecun/posts/10154058859142143

Sent from my iPad

## MESI protocol – Wikipedia

Phase change memory. en.wikipedia.org/wiki/MESI_protocol

Sent from my iPad

## Bisection bandwidth – Wikipedia

From Wikipedia:Theoretical support for the importance of this measure of network performance was developed in the PhD research of Clark Thomborson (formerly Clark Thompson).[3] Thomborson proved that important algorithms for sorting, Fast Fourier transformation, and matrix-matrix multiplication become communication-limited—as opposed to CPU-limited or memory-limited—on computers with insufficient bisection width. F. Thomson Leighton’s PhD research[4] tightened Thomborson’s loose bound [5] on the bisection width of a computationally-important variant of the De Bruijn graph known as the shuffle-exchange graph. Based on Bill Dally’s analysis of latency, average case throughput, and hot-spot throughput of m-ary n-cube networks[2] for various m, It can be observed that low-dimensional networks, in comparison to high-dimensional networks (e.g., binary n-cubes) with the same bisection width (e.g., tori), have reduced latency and higher hot-spot throughput.[6]

en.wikipedia.org/wiki/Bisection_bandwidth

Sent from my iPad

## Complexity Issues in VLSI | The MIT Press

A thorough derivation showing IO bound for matrix-matrix multiplication. mitpress.mit.edu/books/complexity-issues-vlsi

Sent from my iPad