Secure Curation and Reuse of Digital Scientific Data Using Retrieval-Augmented Generation Systems
DOI:
https://doi.org/10.55630/dipp.2025.15.17Keywords:
Retrieval-Augmented Generation (RAG), Intelligent Curation, Digital Scientific Heritage, Privacy-Preserving AI, Creative Reuse of DataAbstract
This study explores secure and intelligent curation of digital scientific data using Retrieval-Augmented Generation (RAG) systems. Focusing on privacy-preserving approaches and reuse of structured healthcare datasets, we propose models for fraud detection, enhancing semantic interpretation, personalization, and secure access to digital knowledge assets in critical sectors.References
Aly, A., & Smart, N. P. (2019, May). Benchmarking privacy-preserving scientific operations. In International Conference on Applied Cryptography and Network Security (pp. 509–529). Springer International Publishing. https://eprint.iacr.org/2019/354.pdf
Asai, A., Min, S., Zhong, Z., & Chen, D. (2023, July). Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts) (pp. 41–46). https://aclanthology.org/2023.acl-tutorials.6.pdf
Chen, Z., Gong, Y., Chen, M., Liu, H., Cheng, Q., Zhang, F., Lu, W., Liu, X., & Liu, J. (2025). FlipedRAG: Black-box opinion manipulation attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2501.02968. https://arxiv.org/pdf/2501.02968
Geren, C., Board, A., Dagher, G. G., Andersen, T., & Zhuang, J. (2024). Blockchain for large language model security and safety: A holistic survey. arXiv preprint arXiv:2407.20181. https://arxiv.org/pdf/2407.20181
Gupta, S., Ranjan, R., & Singh, S. N. (2024). A comprehensive survey of retrievalaugmented generation (RAG): Evolution, current landscape and future directions. arXiv preprint arXiv:2410.12837. https://arxiv.org/pdf/2410.12837
Jiang, C., Pan, X., Hong, G., Bao, C., & Yang, M. (2024). Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks. arXiv preprint arXiv:2411.14110. https://arxiv.org/pdf/2411.14110
Koga, T., Wu, R., & Chaudhuri, K. (2024). Privacy-preserving retrieval augmented generation with differential privacy. arXiv preprint arXiv:2412.04697. https://arxiv.org/pdf/2412.04697
Labajová, L. (2023). The state of AI: Exploring the perceptions, credibility, and trustworthiness of users towards AI-generated content. https://www.diva-portal.org/smash/get/diva2:1772553/FULLTEXT02
Lavingia, K. R., & Mehta, R. (2022). Information retrieval and data analytics in Internet of Things: Current perspective, applications and challenges. Scalable Computing: Practice and Experience, 23 (1), 23–34. https://scpe.org/index.php/scpe/article/download/1969/716
Malatji, M., & Tolah, A. (2024). Artificial intelligence (AI) cybersecurity dimensions: A comprehensive framework for understanding adversarial and offensive AI. AI and Ethics, 1–28. https://link.springer.com/content/pdf/10.1007/s43681-024-00427-4.pdf
Nazary, F., Deldjoo, Y., & di Noia, T. (2025). Poison-RAG: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems. arXiv preprint arXiv:2501.11759. https://arxiv.org/pdf/2501.11759
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Digital Presentation and Preservation of Cultural and Scientific Heritage

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.