- Google has developed its second-generation tensor processor—four 45-teraflops chips packed onto a 180 TFLOPS tensor processor unit (TPU) module, to be used for machine learning and artificial intelligence—and the company is bringing it to the cloud.
- The new TPUs are optimized for both workloads, allowing the same chips to be used for both training and making inferences.
- Quite how floating point performance maps to these integer workloads isn’t clear, and the ability to use the new TPU for training suggests that Google may be using 16-bit floating point instead.
- But as a couple of points of comparison: AMD’s forthcoming Vega GPU should offer 13 TFLOPS of single precision, 25 TFLOPS of half-precision performance, and the machine-learning accelerators that Nvidia announced recently—the Volta GPU-based Tesla V100—can offer 15 TFLOPS single precision and 120 TFLOPS for “deep learning” workloads.
- Microsoft has been using FPGAs for similar workloads, though, again, a performance comparison is tricky; the company has performed demonstrations of more than 1 exa-operations per second (that is, 1018 operations), though it didn’t disclose how many chips that used or the nature of each operation.
Up to 256 chips can be joined together for 11.5 petaflops of machine-learning power.
Continue reading “Google brings 45 teraflops tensor flow processors to its compute cloud”