Feb/March_AMP

A D V A N C E D M A T E R I A L S & P R O C E S S E S | F E B R U A R Y / M A R C H 2 0 2 1 1 4 ...THE MOST VALUABLE MICROSTRUCTURAL DATA SETS INCLUDE METADATA THAT ENRICHES THEIR INFORMATION CONTENT. Q uantitative representation of mi- crostructure is the foundational tool of microstructural science, connecting the material’s structure to its composition, process history, and prop- erties. Microstructural quantification tra- ditionally involves a human deciding a priori what to measure and then devis- ing a purpose-built method for doing so. However, recent advances in data sci- ence, including computer vision (CV) and machine learning (ML), offer new ap- proaches to extracting information from microstructural images [1-7] . The objective of CV is to represent the visual content of an image in numerical form, and ML makes use of these representations to accomplish a given goal. Given a micro- structural image, a CV/ML system can perform a variety of analysis objectives, including image classification (e.g., fer- ritic, austenitic, martensitic), property prediction (e.g., yield strength), feature is less important than exposing the ML system to the full scope of the data space. This is not a recommendation to ignore good microscopy practices, but rather a suggestion that many accept- able images are better than one perfect image. Data collection practices that in- crease the performance of CV/ML sys- tems include taking redundant images of a sample with non-overlapping fields of view, standardizing imaging con- ditions (such as instrument, settings, magnification, and orientation), and data augmentation via subsampling or affine transformations such as transla- tion or rotation [17] . Moreover, the most valuable microstructural data sets in- clude metadata that enriches their information content. Metadata may in- clude multiple imaging modalities (e.g., EBSD and backscatter data for the same field of view), as well as information on material system, composition, imaging information, processing history, prop- erty measurements, and any other data available related to the image. Data size is often assumed to be the limiting factor in developing CV/ML methods, and in some cases it is. However, excellent results have been achieved with very small num- bers of original micrographs (some- times fewer than 10). This has to do with the data-richness of microstruc- tural images. The upshot is that a rela- tively modest investment in data may yield a successful CV/ML system. IMAGE CLASSIFICATION AND CHARACTERIZATION Image classification may not seem important, because microstructures are usually known. However, classification of images underlies a host of critical ar- chiving and analysis tasks. Classifica- tion relies on the fact that the CV feature vector is a numerical representation of the visual information contained in an image. As such, similarities in the fea- ture vector should correspond to visual similarities. Thus, the distance between two feature vectors can be used to per- form visual search, clustering, and clas- sification. For example, for a database of 961 ultrahigh carbon steel (UHCS)mi- crostructures [18] Fig. 1 shows the three it as a vector, and then utilize an artifi- cial neural network or other ML method to draw a conclusion about the visual content of the image [10,11] . The first part of the CNN pipeline—encoding the im- age as a feature vector—is termed the feature learning stage, and the second part—drawing a conclusion—is the clas- sification stage. Designing and training a CNN re- quires deep expertise and a large data set (typically millions of images), mak- ing it impractical for most microstruc- tural data sets. However, CNNs that have been optimized and trained on a large set of natural images have been successfully used with other kinds of images, including microstructures. This transfer learning [12] approach enables using pre-trained CNNs (such as the VGG16 network [13] trained on the Im- ageNet data set [14] ) for microstructur- al representation. However, because the goal is not to classify microstruc- tural images into the ImageNet cate- gories (broccoli, bucket, bassoon), the network is truncated before the classi- fication stage. Instead, the CNN layers themselves are used as the image rep- resentations for ML tasks. Machine learning methods are ei- ther supervised (trained using known correct answers, termed ground truth) or unsupervised (finding patterns with- out knowledge of a ground truth), and there are important roles for each ap- proach. Supervised ML methods make predictions about newdata based on in- formation learned from training data with known ground truth answers [10,11,15] . In contrast, unsupervisedML algorithms find relationships between images without ground truth data or human in- tervention, typically by generating clus- ters of related images [6,16] . The choice of ML modality and model depends on the nature of the input data and the desired outcome. In this process, it is helpful to include a domain expert in ML algo- rithms because the best-in-class solu- tions are ever-evolving. MICROSTRUCTURAL IMAGE DATA When assembling an image data set for CV/ML analysis, image quality measurement (e.g., grain size), constitu- ent identification (e.g., phase identifica- tion), or a host of other characterization tasks. The CV/ML approach is not a sin- gle solution that addresses every micro- structural science challenge, but it offers a path toward objective, repeatable, gen- eralizable, and scalable methods that complement the traditional materials characterization workflow. COMPUTER VISION AND MACHINE LEARNING Computer vision encompasses an array of methods for creating a numer- ical representation of a visual image, termed the feature vector [8] . Machine learning methods then extract quan- titative visual information from the high-dimensional feature vector [9] . Most high-performance CV/ML systems cur- rently use convolutional neural net- works (CNNs), which take an image as input, apply a variety of signal process- ing operations to it in order to encode

Feb/March_AMP_Digital