February_EDFA

edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 23 NO. 1 30 compact device size (scalingdown to 10 × 10nm 2 cell area), which places its popularity above other modern memory devices. [4] The data is stored in the RRAM in the form of a resistance value of the dielectric conductance (leakage) sandwichedbetween twoactive/passivemetal electrodes. When a certain amount of electric field is applied to the metal insulator (dielectric)/metal-capacitor virgin device in high-resistance state (HRS), defects are created in the dielectric that eventually link up in space and time to form a conductive nanometallic filament (resulting in a dielec- tric breakdown) which causes the device to be in the high- conduction (low-resistance) state (LRS). [5,6] Similarly, the application of an opposite polarity field or a high current or both cause this filament to partially dissolve or rupture, resulting in an electron tunnel barrier in the dielectric that takes the device back to its low conductance (high resis- tance state, HRS). This reversible transition between the LRS and HRS is made possible by the polarity dependent drift anddiffusion of defects (or joule-heating assisteddis- solution of metallic nanobridges) between the dielectric and metal electrodes. Considering the stochasticity of manynanoscaleevents in this filament formation, rupture, and reformationprocess, it is natural to expect a lot of vari- ance both in the switching voltage and in the resistance of the LRS and HRS states. Some RRAM devices also enable us to more precisely tune the resistance to several states in between LRS and HRS through careful control of the rate and magnitude of the voltage (field) applied and the current compliance setting as well. The device structure and a comparison chart show- ing the advantages of RRAM over other non-volatile memory (NVM) technologies is depicted in Fig. 1. [7-12] As is known, CNN is a shared-weighted architecture with more than half of the structure composed of trained weights. [13] The weights are obtained from a computa- tionally intense, optimized gradient training process from a vast set of well-labelled data sets. [14] Hence, employing a power-efficient memory device in a memory-intense CNN algorithm structure reduces the overall end use application power significantly. Thus, ultralow-power memory such as RRAM stands a good chance for reducing the CNN operational power and enabling it to be used in self-powered/battery-powered applications on the edge for on-the-fly decision making (even in remote locations) without high latency problems that arise frompermanent and frequent reliance on the cloud. Nevertheless, all the entrenched RRAM properties as hardware-coded synaptic weights in a low powered CNN network have degraded the prediction (classification) accuracy of the algorithm. This increased inaccuracy is a result of the anomalous and stochastic switching charac- teristics (with high variability especially at lower switching power conditions) that persist in the given RRAMmaterial structure during the filament shape and size transitions between LRS and HRS. [15] Several research studies have been carried out with the objective of better controlling the tenuous nature of filamentary RRAMconductance vari- ability. Quantifying the trend and impact of RRAM device variability on the CNN prediction accuracy is therefore essential so that the operating regime of the RRAM-based AI hardware canbe configured to co-optimize the trade-off between operating power and prediction accuracy based on the intended end application. HARDWARE-BASED CNN FORMULATION AND SIMULATION METHODOLOGY A look-up-table (LUT) approach has been proposed to encode the RRAM physical resistance variability onto the GPU trained CNN weights, [16] as shown in Fig. 2, using the 32-bit floating point representation. Every CNN algorithm trained weight is replaced by its 32-bit floating point representation (Fig. 2b) and the binary 0s and 1s in the floating point scheme are replaced by hardware-encoded RRAMstate resistance values based on a certain resistance threshold ( R TH ) for the gap in the probability density func- tions of the LRS and HRS states (Fig. 2a). By virtue of the stochastic nature of switching, the tails of the LRS andHRS distribution do overlap (depending on the current com- pliance which in turn determines the switching power of the device). This overlap results in false-0 and false-1 bits getting encoded in the hardware realization, as shown by the red bits in Fig. 2b. The LRS and HRS distributions with different current compliances ( I comp ) of 2µA, 5µA, and10µA for a sub-stoichiometricHfO x (x <2) RRAMdevice fabricated using a 65 nmCMOS processwere extracted fromthework of Fantini et al. [17] from IMEC and used for the simulation here. The difference in the classification error for the hard- ware based encoded CNN weights and algorithm trained CNN weights are then extracted (Fig. 2c) for various use cases (Fig. 2d) and this difference in accuracy/error is used as a metric which can then be used as a guide to decide on the operating regime of the hardware network on the edge. More error is expected in the hardwareCNN for lower power RRAM switching (corresponding to lower current compliance setting) when the HRS and LRS distributions have more overlap in the tail regions. The naive inception module is a type of inception algorithm used in CNN, as shown in Fig. 3a, with three

February_EDFA_Digital