November 2025_EDFA_Digital

edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 27 NO. 4 22 TOWARD A FAILURE ANALYSIS CHATBOT WITH RETRIEVAL-AUGMENTED GENERATION Maik Fichtenkamm1,3, Markus Kofler2,3, Konstantin Schekotihin3, and Christian Burmer1 1Infineon Technologies, Villach, Austria 2Infineon Technologies, Klagenfurt, Austria 3University Klagenfurt, Austria konstantin.schekotihin@aau.at EDFAAO (202) 4:22-30 1537-0755/$19.00 ©ASM International® INTRODUCTION Failure analysis (FA) is a critical business process with- in the semiconductor industry supported by various infor- mation systems storing FA data in different formats distributed across databases, file shares, or wikis, making simple keyword search interfaces highly inefficient. The extensive capabilities of large language models (LLMs), such as ChatGPT or Llama, in various text processing tasks allow for the creation of universal search interfaces (chatbots) providing straight-to-the-point answers for user queries. However, existing LLMs are trained on a wide range of data, which may not be specific enough for the FA domain in general and an FA lab in particular. This leads to a lack of understanding of the specific terminology and abbreviations, context, and product-specific nuances of user queries. As a result, LLMs can generate answers that are inaccurate, irrelevant, or may even hallucinate. To address these issues, in the literature LLMs are frequently adapted to specific domains through fine-tuning, which involves additional training in domain-specific data, or employing zero- or few-shot learning techniques, commonly called prompt engineering or in-context learning, to achieve optimal performance.[1,2] Both approaches to improving the LLM performance focus on the ability of the transformer architecture to learn context-dependent representations.[3] A transformer network comprises an encoder and decoder. The former embeds an input sequence into an internal representation vector space. The obtained sequence of vectors is forwarded to the decoder, which determines the output sequence by dynamically focusing on different parts of the representation.[4] Both components can be used separately from each other, resulting in the development of specialized architectures. Decoder-only architectures— generation models—are used in generative applications, whereas encoder-only ones—embedding models—aim at understanding and transforming input sequences into meaningful embeddings. To improve the performance of both types of models, researchers developed various methods, which can roughly be split into two families. Fine-tuning enhances LLM performance by training on task-specific datasets. Fine-tuning requires significant computational resources and costly data labeling,[5] but the obtained models are more efficient in many downstream tasks. Prompting can use pre-trained models but requires very careful crafting of input sequences containing task descriptions, input data, contextual information, and output format to guide the model’s responses.[6] Typically, the task description outlines the specific instructions for the model, while the input data represents the question or input requiring a response. The context provides background knowledge or external information to guide the model, and the format specifies the desired output style or structure. Prompting methods can operate an LLM in single-turn or multi-turn interactions, with the latter enabling reasoning by dividing questions into subproblems solved iteratively.[6] Zero-shot and few-shot are popular in-context learning, single-turn techniques, which incorporate inputoutput examples to improve performance, emphasizing balanced class labels, diverse examples, and readable formats. Multi-turn interactions, like chain-of-thought (CoT) or tree-of-thought (ToT), enhance reasoning through intermediate steps, using phrases like “let’s think step by step” or specialized examples.[7,8] CoT or ToT can be further enhanced with quality control mechanisms. For example, “self-consistency” samples multiple answers via few-shot

RkJQdWJsaXNoZXIy MTYyMzk3NQ==