Wednesday, March 23, 2022

Ruminating on CUDA and TPU

 CUDA  is a parallel computing platform (or a programming model - available as an API) that was developed by Nvidia to enable developers leverage the full parallel processing power of its GPUs. 

For deep learning (training neural nets) requires a humongous amount of processing power and it is here that HPCs with thousands of GPU cores (e.g. A100 GPU) are essential. 

Nvidia has also released a library for use in neural nets called as cuDNN. CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.

Many deep learning frameworks rely on CUDA & the cuDNN library for parallelizing work on  GPUs - Tensorflow, Caffe2, H2O.ai, Keras, PyTorch. 

To address the growing demand for training ML models, Google came up with it's own custom integrated circuit called as TPU (Tensor Processing Unit). A TPU is basically a ASIC (application-specific integrated circuit) designed by Google for use in ML. TPU's are tailored for TensorFlow and can handle massive multiplications and additions for neural networks, at great speeds while reducing the use of too much power and floor space. 

Examples: Google Photos use TPU's to process more than 100 million photos every day. TPU's also power Google RankBrain - that part of Google's algorithm that uses machine-learning and artificial intelligence to better understand the intent of a search query. 

No comments:

Post a Comment