edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 27 NO. 4 24 goal of retrieval techniques is to return the top-k most relevant documents for a query. The data can be stored and retrieved in various forms, ranging from simple tokens[14] to chunks[15] and complex structures such as knowledge graphs,[16] where chunking is the most popular technique. The data retrieval is done in an associative way, by analyzing the similarity of the data to the query in their vector representations. Sparse methods, like TF-IDF or BM25 (best match), represent documents as vectors using simple statistics, like term frequency (TF) and/or inverse document frequency (IDF).[17] The dense methods use language models to generate vectors that capture semantic similarity between documents in a more compact (and possibly precise) form.[18] RAG implementations often rely on vector databases and similarity functions like cosine similarity, often employing approximate nearest neighbor search for efficiency.[19] Indexing methods can use different approaches to construct vector databases by optimizing the document chunking strategy.[20] Pre-retrieval techniques refine queries to improve retrieval granularity and reduce ambiguity. For example, query transformation reformulates queries, query expansion generates diverse or decomposed sub-queries, and query construction enables retrieval from structured data sources like SQL or graph databases.[21] Retrieval approaches discussed above can be combined with quantization techniques for efficiency improvements by compressing vector representations with scalar, product, and binary quantization. In the post-retrieval stage, re-ranking approaches reorder documents based on similarity or diversity to mitigate noise and improve relevance. The generation methods can use various LLMs to formulate a response. Commercial APIs, like ChatGPT or Claude, offer high quality but raise privacy concerns, while locally deployed LLMs provide flexibility at higher ownership costs.[22] Finally, orchestration pipelines combine all components described above into workflows aiming at efficient decision-making in RAG systems. For example, scheduling methods specify retrieval and generation triggers, with iterative, recur- sive, and adaptive retrieval enhancing reasoning and efficiency.[10,23] Fusion merges results from multiple retrievals using methods like reciprocal rank fusion or response modes such as compact, refine, or tree-summarization.[24] IMPLEMENTATION OF A FAILURE ANALYSIS CHATBOT The authors conducted the implementation of the FA chatbot in two iterations, which allowed for the issues identified in the first iteration to be addressed in the next one. Initially, the focus was on the implementation of naive and advanced RAG pipelines, to optimize them toward a modular RAG approach aiming at continuous improvement of the chatbot’s quality, reliability, and complexity. DATA STORAGE The fundamental architecture of all systems is based on a vector database. Figure 3 shows an overview of the storage pipeline that facilitates effective data retrieval. Various support systems in FA provide structured and unstructured data sources, including analysis results, sample information, laboratory test data, and textual documents such as presentations, articles, and research papers, indexed in the data infrastructure using a custom data crawler connected to the “elastic search” database. The preprocessing steps involve extracting data from the data infrastructure in JSON format for structured sources, like databases, and the simple textual form for all documents, presentations, etc., that can be converted into it. To maintain coherent text blocks, JSON (a) (b) Fig. 3 FA indexing process. Fig. 4 General workflows of (a) naive and (b) advanced RAG pipelines.
RkJQdWJsaXNoZXIy MTYyMzk3NQ==