November 2025_EDFA_Digital

edfas.org ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 27 NO. 4 26 (Elastic Search), and ontology-based methods, along with hierarchical chunking (Small2Big), to enhance accuracy. FA ontology knowledge[29] is encoded and stored as vectors, supporting the retrieval of failure annotations. The postretrieval pipeline applies re-ranking methods described above and compression (extractive and custom abstractive modules) to ensure relevant and concise context for generation. For answer generation, the pipeline tests several LLMs (Llama-3-70B and -8B, as well as Mistral-8x7B) and applies optimized prompt templates, including zero- and few-shot, chain-of-thought (CoT), and specific instructions for the following categories: (i) next-step prediction, (ii) related information identification, (iii) failure mode identification, and (iv) safety verification. All prompts are designed to satisfy the citation requirements for source verification. The control unit primarily represents the orchestration technique, which regulates how often retrieval is initiated, if at all. Furthermore, the orchestration applies semantic routing, whereby the RAG pipeline is initialized with a specific template based on the question category. Finally, it is essential to note that the prompts must be carefully designed to support the generation of citations. As a result, users can verify the statements by reviewing the source documents. EVALUATION First, the control unit was evaluated to ensure it correctly triggers the retrieval. For this, the 20 benchmark questions and an additional 20 random LLM interactions were used to create a balanced class ratio. The results indicated a flawless classification with 100% accuracy. With the modular RAG, the performance of the pipelines was again measured using reference-free metrics and human experts. First, a series of ablation studies was performed to optimize the Modular RAG pipeline. Initial experiments assessed LLM choice, response mode, and ranking method, showing that Llama-3-70B with compact response mode and reciprocal rank (high top-k) yielded the best results. Combining dense, sparse, and ontology retrievers improved retrieval accuracy, with Small2Big chunking (high merging ratio) and selective metadata further enhancing performance. Re-rankers and compressors, especially SFR and diversity re-ranking, increased answer quality, while higher top-k values for retrievers and fusion modules consistently improved metrics. Prompt template selection was critical, with a five-shot CoT outperforming others. These findings guided the con- figuration of the modular RAG pipeline for subsequent evaluations. Next, a human evaluation study was conducted using three approaches: Pipeline A, a simple pipeline that includes a semantic re-ranker and a few-shot prompt; Pipeline B, a complex pipeline that only applies the abstractive and extractive compressors with a semantic re-ranker; and Pipeline C, the most complex pipeline including abstract and extractive compressors, diversity and semantic re-ranking, the router approach with specialized prompts, and metadata filtering. The human evaluation confirmed that the modular RAG Pipeline C outperforms the simpler pipelines (A and B) across all question categories (five-point Likert scale, where higher Fig. 6 Workflow of the modular RAG approach.

Made with FlippingBook

RkJQdWJsaXNoZXIy MTYyMzk3NQ==