May_EDFA_Digital

edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 21 NO. 2 18 where each x t ( i ) is a vector of M SMART attribute values collected from drive i at time t ∈ [1,…, T i ] and each y ( i ) is a Boolean indicator such that A general failure predictor implements a binary deci- sion rule δ t (( x 1 , …, x t )), which decides whether a drivewith SMART data history ( x 1 , …, x t ) will fail at time t : Note that datacenters need to predict drive failures some time in advance, so that at-risk data can be backed up. To account for this case, the predictor can be general- ized as δ t +t adv (( x 1 , …, x t )). Many techniques have been developed for learning binary decision rules from data, and a few have been applied to predict field failures from SMART data. A “good” decision rule has a high true positive rate (TPR) for detecting failing drives, whilemaintaining a low (<1%) false positive rate (FPR). These twometrics are defined as Table 2 summarizes the results from two recent works on predicting datacenter HDD failures, using variousmod- eling techniques anddatasets similar to the one published by Backblaze. An earlier report by Zhu et al. in 2013 used support vector machines (SVMs) and feedforward neural networks (NN) as binary classifiers. [3] Their models were trained using 10 SMART attributes, sampled hourly for up to 20 days, collected from23,395 drives of a single Seagate model. Using four time samples from the past 12 hours, they were able to achieve high TPR using both models while maintaining a low FPR <1%. However, the models used by Zhu et al. do not take advantageof the temporal structureof thedata. The recur- rent neural network (RNN) is a neural network architec- ture designed to capture dependencies between present and past entries of time series and has been used with great success for speech recognition and other sequence prediction tasks. In a follow-up publication, [4] a RNN was developed for predicting HDD failures using the same dataset. [4] SMART data were down-sampled before being fed to the RNN; unsurprisingly, a higher TPR and lower FPR were achieved. CASUAL INFERENCE TOWARD INTERPRETABLE AND GENERALIZABLE MODELS While many machine learning models have been suc- cessful at predicting field failures from SMART data, little progress has beenmade toward interpreting howpatterns within SMART data are related to causes of field failures. Which SMART attributes are prognostic for predicting field failures, and which attributes provide only redundant or irrelevant information? How can these dependencies be inferred from data? These are questions of causal infer- ence. This article focuses on the feature selection problem for failure prediction, using tools from causal inference as one direction of special interest that will be outlined here. Feature selection is useful for both machine learn- ing practitioners and datacenters looking to incorporate failure prediction into their reliability framework. The goal is to reduce the dimensionality of the input variables by removing as many redundant and unnecessary variables as possible. From a machine learning perspective, this is appeal- ing for two reasons. First, it allows for simpler learning algorithms that can achieve high performance from fewer samples. This reduces the chance of underfitting, especially when data is limited. At the same time, feature selection helps to reduce the chance of overfitting. For example, a failure prediction model may learn nuances in the data specific to a certain drive model or operat- ing conditions of a specific datacenter, in which case the model would perform poorly when applied to other datasets. Removing features that containonly noisewould reduce the chance of overfitting and increase the chance of successful transfer learning. In other words, it would be much easier to adapt a model trained using one drive model to predict failures of another model than to train a new model from scratch. For datacenters looking to perform analytics on time- series data, feature selection has additional appeal. Table 2 A comparison of three machine learning models trained and evaluated on the HDD dataset introduced in Reference 3 Model Type True Positive Rate (TPR) False Positive Rate (FPR) SVM [3] 80% 0.3% NN [3] 94.62% 0.48% RNN [4] 97.71% 0.06%