A Comparative Analysis of Historical Culinary Recipes Using Topic Modeling and Data Visualization
DOI:
https://doi.org/10.55630/dipp.2024.14.15Keywords:
Topic Modelling, Historical Culinary Recipes, Clustering, Data Visualization, Large Language Models (LLM)Abstract
The proposed method involves analyzing historical culinary recipes through topic modeling and data visualization. It utilizes an unsupervised pipeline, pre-trained language models, and visualization techniques. The method was thoroughly tested on English-language cookbooks and provides information on the evolution of culinary practices.References
Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. arXiv. http://arxiv.org/abs/2008.09470
Artese, M. T., & Gagliardi, I. (2023). Unsupervised Creation of Semantic Graphs to Navigate Intangible Cultural Heritage Using Transformers. Digital Presentation and Preservation of Cultural and Scientific Heritage, 13, 137–148.
Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. In J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu, Advances in Knowledge Discovery and Data Mining (Vol. 7819, pp. 160–172). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_14
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv. https://doi.org/10.48550/arXiv.1810.04805
Egger, R., & Yu, J. (2022). A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology, 7, 886498. https://doi.org/10.3389/fsoc.2022.886498
Gagliardi, I., & Artese, M. T. (2023). Intuitive Semantic Graph Tool for Enhanced Archive Exploration. In Proceedings of the 34th ACM Conference on Hypertext and Social Media (HT '23) (Article 11, 1–3). Association for Computing Machinery, https://doi.org/10.1145/3603163.3609069
Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv. http://arxiv.org/abs/2203.05794
Li, Y. (2023). Insights from Tweets: Analysing Destination Topics and Sentiments, and Predicting Tourist Arrivals [PhD Thesis, Durham University].
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2 (11), 205. https://doi.org/0.21105/joss.00205
McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. https://doi.org/10.48550/arXiv.1802.03426
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp.404– 411). https://aclanthology.org/W04-3252
Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13 (2), 797. https://doi.org/10.3390/app13020797
Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan, Text Mining (pp. 1–20). Wiley. https://doi.org/10.1002/9780470689646.ch1
Sprenkamp, K., Zavolokina, L., Angst, M., & Dolata, M. (2023). Data-Driven Governance in Crises: Topic Modelling for the Identification of Refugee Needs. In Proceedings of the 24th Annual International Conference on Digital Government Research (dg.o '23) (pp. 1–11). https://doi.org/10.1145/3598469.3598470
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA (pp. 5998--6008). https://doi.org/10.48550/arXiv.1706.03762
Wojciechowska, J., Sypniewski, M., Śmigielska, M., Kamiński, I., Wiśnios, E., Schreiber, H., & Pieliński, B. (2023). Deep Dive into the Language of International Relations: NLP-based Analysis of UNESCO’s Summary Records. arXiv. https://doi.org/10.48550/arXiv.2307.16573
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Digital Presentation and Preservation of Cultural and Scientific Heritage
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.