May_EDFA_Digital

edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 25 NO. 2 22 three parts, the training, validation, and test datasets. The first one is provided to the model during the training process. The validation dataset is used for the model selection. For instance, during the training of a neural network, the goal is finding a model with the best performance. The latter can be evaluated by applying each model to the validation dataset and computing a value of a performance measure. The test dataset is used to measure the generalization ability of a model, i.e., simulate a prac- tical application of the model to previously unseen examples. The selection of a quantitative performance measure for testing and validating a model often depends on the task of a learning system and provided experience. Accuracy is one of the most popular measures computed as the ratio of correct model outputs to the number of all outputs. Other popular measures are precision and recall, which quantify the correctness and the hit rate of predictions for a particular label, respectively. The harmonic mean of precision and recall is called the F1-score, which is often used as an alternative to the accuracy for imbalanced datasets, i.e., when the number of examples with some label is much larger than the number of examples with other labels. ALGORITHMS In the literature[16,30] the algorithms are often divided into classical and deep learning categories. The former algorithms, such as linear and logistic regression, support vector machines, or tree-based methods[33] and require a specific representation of features. For instance, natural language processing (NLP) tasks require vectorization of input texts, i.e., representation of the whole text, individual words, or sequences of characters (tokens) as numerical vectors. However, not all possible vector representations are appropriate for further training or application of a model. Ideally, it’s desirable to find a function that results in vectors representing the meaning of the input artifacts, e.g., vectors of synonyms or words appearing often together must be close to each other, whereas vectors of two random words must not. Term frequency-inverse document frequency (TF-IDF)[34] is a bag-of-words technique that vectorizes input collection of n documents by computing a vector representation for each word occurring in them while ignoring all information provided by positions of this word in the documents. TF-IDF of a word results in a vector with n components, where each component i indicates the number of a word’s occurrences in the document i normalized by the logarithm of the fraction of all documents containing this word. The dataset for a classic technique, like a support vector machine, is a n × m real-valued matrix, where m is the number of words appearing in all documents. Deep learning methods help to avoid the tedious and error-prone process of manual feature extraction and representation by incorporating it into the learning techniques. For the NLP example discussed above, the search for a vectorization method can be replaced by training a neural network, which architecture includes specific layers for representation learning (Fig. 8). For image processing, the architecture of a convolutional neural network (CNN) assumes that all neighboring pixels of an image are related to each other and that more general features can be constructed from specific ones. Similarly in NLP, training of a tokenizer allows the model to find (a) the best possible vocabulary, i.e., “fused” can be split into “fus” and “ed” tokens, and (b) the encoding of every token into a vector that minimizes the error rate of the network’s decision layers. The encoder itself is often modeled as a deep neural network, for example, using a recurrent neural network, like the long short-term memory (LSTM),[35] or a transformer architecture.[36] Next, extracted features are provided to a (multilayer) feedforward network, where each neuron of a layer is connected to the output of every neuron from a previous layer. These densely connected Fig. 8 Sample architectures of deep neural networks.

RkJQdWJsaXNoZXIy MTMyMzg5NA==