Secure Curation and Reuse of Digital Scientific Data Using Retrieval-Augmented Generation Systems

Authors

  • Harshetha Murthy Keshav Murthy SRH Hochschule Berlin, Sonnenallee 221, Berlin, Germany
  • Alexander Iliev Iliev SRH Hochschule Berlin, Sonnenallee 221, Berlin, Germany; Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria, 8 Acad. Georgi Bonchev Str., Bulgaria, 1113, Sofia

DOI:

https://doi.org/10.55630/dipp.2025.15.17

Keywords:

Retrieval-Augmented Generation (RAG), Intelligent Curation, Digital Scientific Heritage, Privacy-Preserving AI, Creative Reuse of Data

Abstract

This study explores secure and intelligent curation of digital scientific data using Retrieval-Augmented Generation (RAG) systems. Focusing on privacy-preserving approaches and reuse of structured healthcare datasets, we propose models for fraud detection, enhancing semantic interpretation, personalization, and secure access to digital knowledge assets in critical sectors.

References

Aly, A., & Smart, N. P. (2019, May). Benchmarking privacy-preserving scientific operations. In International Conference on Applied Cryptography and Network Security (pp. 509–529). Springer International Publishing. https://eprint.iacr.org/2019/354.pdf

Asai, A., Min, S., Zhong, Z., & Chen, D. (2023, July). Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts) (pp. 41–46). https://aclanthology.org/2023.acl-tutorials.6.pdf

Chen, Z., Gong, Y., Chen, M., Liu, H., Cheng, Q., Zhang, F., Lu, W., Liu, X., & Liu, J. (2025). FlipedRAG: Black-box opinion manipulation attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2501.02968. https://arxiv.org/pdf/2501.02968

Geren, C., Board, A., Dagher, G. G., Andersen, T., & Zhuang, J. (2024). Blockchain for large language model security and safety: A holistic survey. arXiv preprint arXiv:2407.20181. https://arxiv.org/pdf/2407.20181

Gupta, S., Ranjan, R., & Singh, S. N. (2024). A comprehensive survey of retrievalaugmented generation (RAG): Evolution, current landscape and future directions. arXiv preprint arXiv:2410.12837. https://arxiv.org/pdf/2410.12837

Jiang, C., Pan, X., Hong, G., Bao, C., & Yang, M. (2024). Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks. arXiv preprint arXiv:2411.14110. https://arxiv.org/pdf/2411.14110

Koga, T., Wu, R., & Chaudhuri, K. (2024). Privacy-preserving retrieval augmented generation with differential privacy. arXiv preprint arXiv:2412.04697. https://arxiv.org/pdf/2412.04697

Labajová, L. (2023). The state of AI: Exploring the perceptions, credibility, and trustworthiness of users towards AI-generated content. https://www.diva-portal.org/smash/get/diva2:1772553/FULLTEXT02

Lavingia, K. R., & Mehta, R. (2022). Information retrieval and data analytics in Internet of Things: Current perspective, applications and challenges. Scalable Computing: Practice and Experience, 23 (1), 23–34. https://scpe.org/index.php/scpe/article/download/1969/716

Malatji, M., & Tolah, A. (2024). Artificial intelligence (AI) cybersecurity dimensions: A comprehensive framework for understanding adversarial and offensive AI. AI and Ethics, 1–28. https://link.springer.com/content/pdf/10.1007/s43681-024-00427-4.pdf

Nazary, F., Deldjoo, Y., & di Noia, T. (2025). Poison-RAG: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems. arXiv preprint arXiv:2501.11759. https://arxiv.org/pdf/2501.11759

Downloads

Published

2025-09-05

How to Cite

Murthy Keshav Murthy, H., & Iliev Iliev, A. (2025). Secure Curation and Reuse of Digital Scientific Data Using Retrieval-Augmented Generation Systems. Digital Presentation and Preservation of Cultural and Scientific Heritage, 15, 187–196. https://doi.org/10.55630/dipp.2025.15.17