I recently attended the 2020 HotChips conference. A number of talks focused on how to scale out ML models and the need for Message Passing Interfaces between different machines. One of the most interesting places for hardware is possibly doing sums in the network as part of the AllGather operation so the TPU/GPUs can continue multiplications. There are some interesting problems here that hardware can potentially solve so I’m taking it on myself to learn more about MPI.