November 2025_EDFA_Digital

edfas.org 27 ELECTRONIC DEVICE FAILURE ANALYSIS | VOLUME 27 NO. 4 is better): (i) Total average: 3.70; (ii) Total average without refusal: 3.79; (iii) Next-step prediction average: 3.67; (iv) Related information identification: 3.83; (v) Failure mode identification: 3.33; (vi) Safety verification: 4.00; (vii) Default knowledge retrieval: 3.69. Pipeline C demonstrates robust performance on both general and complex questions, particularly benefiting from metadata filter- ing in categories like related information identifica- tion and safety verification. These results highlight the capability of modular RAG to generate high-quality answers, though further optimization is possible for speci- fic cases. CONCLUSIONS This study identified RAG as a suitable adaptation technique that aligns with the specified resource constraints in a cost-effective manner. The research implemented naive, advanced, and modular RAG pipelines, emphasizing iterative enhancements to optimize performance and adaptability. Initial evaluation shows that a naive RAG pipeline can answer simple FA questions but struggles with complex ones due to limited retrieval and reasoning, as well as frequent hallucinations. The experiments confirm that increasing RAG complexity using hybrid retrieval, specialized prompts, re-rankers, and com- pression improves performance and coherence, with modular architectures yielding higher-quality chatbots. Reference-free metrics used in the evaluations correlate well with human judgments; however, these results, obtained on a small benchmark dataset, have limited generalizability. User testing revealed that more complex RAG components increase accuracy but also runtime, raising trade-off considerations. Even advanced modular RAGs still hallucinate, suggesting that integrating additional adaptation techniques, such as reinforcement learning or targeted fine-tuning, may be necessary for optimal performance. However, implementation of RAG systems is still highly experimental. Due to the rapid evolution of AI and LLMs, new methods such as agentic systems,[30] and tools like LangChain and LlamaIndex are rapidly developed. As a result, it is fairly easy to create a RAG pipeline from existing components. Nevertheless, whenever the results are not satisfactory, it is very hard to improve them by tracing inadequate answers to specific problems. Thus, exten- sive testing was required to understand and address response patterns. Finally, despite using advanced techniques and thorough evaluation, hallucinations persist, highlighting the need to train users and raise awareness of these limitations. REFERENCES 1. M. Mosbach, et al.: “Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation,” ACL (Findings), 2023, p. 12284-12314. 2. J. Wei, et al.: “Finetuned Language Models are Zero-shot Learners,” ICLR, 2022. 3. A. Vaswani, et al.: “Attention Is All You Need,” 2017, doi.org/10.48550/ arXiv.1706.03762. 4. A. Zhang, et al.: Dive into Deep Learning, Cambridge University Press, 2023. 5. Z. Han, et al.: “Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey,” 2024, doi.org/10.48550/arXiv.2403.14608. 6. P. Sahoo, et al.: “A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications,” 2024, doi. org/10.48550/arXiv.2402.07927. 7. T. Kojima, et al.: “Large Language Models are Zero-shot Reasoners,” NeurIPS, 2024. 8. Y. Zhou, et al.: “Thread of Thought Unraveling Chaotic Contexts,” 2023, doi.org/10.48550/arXiv.2311.08734. 9. X. Wang, et al.: “Self-Consistency Improves Chain of Thought Rea- soning in Language Models,” 2022, doi.org/10.48550/arXiv. 2203.11171. 10. Y. Gao, et al.: “Retrieval-Augmented Generation for Large Language Models: A Survey,” 2024, doi.org/10.48550/arXiv.2312.10997. 11. K. Luu, et al.: “Time Waits for No One! Analysis and Challenges of Temporal Misalignment,” NAACL: Human Language Technologies, 2022, p. 5944-5958. 12. A. Asai, et al.: “Reliable, Adaptable, and Attributable Language Mo- dels with Retrieval,” 2024, doi.org/10.48550/arXiv.2403.03187. 13. G. Izacard, et al.: “Atlas: Few-shot Learning with Retrieval Augmented Language Models,” JMLR, 2024, 24(1). 14. U. Khandelwal, et al.: “Generalization through Memorization: Nearest Neighbor Language Models,” 2019, doi.org/10.48550/ arXiv.1911.00172. 15. O. Ram, et al.: “In-context Retrieval-augmented Language Models,” TACL, 2023, 11, p. 1316-1331. 16. M. Kang, et al.: “Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation,” 2023, doi.org/10.48550/ arXiv.2305.18846. 17. S. Robertson, H. Zaragoza, and M. Taylor: “Simple BM25 Extension to Multiple Weighted Fields,” CICM, 2004, p. 42-49. 18. W.X. Zhao, et al.: “Dense Text Retrieval Based on Pretrained Language Models: A Survey,” TOIS, 2024, 42(4). 19. P. Zhao, et al.: “Retrieval-Augmented Generation for AI-Generated Content: A Survey,” 2024, doi.org/10.48550/arXiv.2402.19473. 20. A.J. Yepes, et al.: “Financial Report Chunking for Effective Retrieval Augmented Generation,” 2024, doi.org/10.48550/arXiv.2402.05131. 21. L. Zha, et al.: “TableGPT: Towards Unifying Tables, Natural Language and Commands into One GPT,” 2023, doi.org/10.48550/ arXiv.2307.08674. 22. C. Fourrier, et al.: Open LLM leaderboard v2, 2024. 23. Z. Jiang, et al.: “Active Retrieval Augmented Generation,” EMNL, 2023, p. 7969-7992. 24. P. Finardi, et al.: “The Chronicles of RAG: The Retriever, the Chunk, and the Generator,” 2024, doi.org/10.48550/arXiv.2401.07883. 25. N. Muennighoff, et al.: “MTEB: Massive Text Embedding Benchmark,” EACL, 2023, p. 2014-2037. 26. S. Xiao, et al.: “C-pack: Packed Resources for General Chinese Embeddings,” SIGIR, 2024, p. 641-649. 27. M. Fichtenkamm, et al.: “Towards an FA Chatbot with Retrievalaugmented Language Modeling,” Proc. IPFA, 2024, p. 1-8. (continued on page 30)

Made with FlippingBook

RkJQdWJsaXNoZXIy MTYyMzk3NQ==