Here we introduced an integrated photonics-based TPU by strategically utilizing a) photonic parallelism via wavelength division multiplexing, b) high 2 Peta-operations-per second throughputs enabled by 10’s of picosecond-short delays from optoelectronics and compact photonic integrated circuitry, and c) zero power-consuming novel photonic multi-state memories based on phase-change materials featuring vanishing losses in the amorphous state. Combining these physical synergies of material, function, and system, we show that the performance of this 8-bit photonic TPU can be 2-3 orders higher compared to an electrical TPU whilst featuring similar chip areas. The runtime complexity is O(1) once the kernel matrix is programmed, however the engine scales with O(N^3) resources (devices). This system could ultimately perform in the range 10-500fJ/MAC, 1-50 TMACs/mm^2, and ~100ps (1 clock cycle) per VMM operation.
|