A Comparative Analysis of Historical Culinary Recipes Using Topic Modeling and Data Visualization

Authors

  • Maria Teresa Artese IMATI-MI National Research Council, Via A. Corti 12, 20133, Milan, Italy
  • Isabella Gagliardi IMATI-MI National Research Council, Via A. Corti 12, 20133, Milan, Italy

DOI:

https://doi.org/10.55630/dipp.2024.14.15

Keywords:

Topic Modelling, Historical Culinary Recipes, Clustering, Data Visualization, Large Language Models (LLM)

Abstract

The proposed method involves analyzing historical culinary recipes through topic modeling and data visualization. It utilizes an unsupervised pipeline, pre-trained language models, and visualization techniques. The method was thoroughly tested on English-language cookbooks and provides information on the evolution of culinary practices.

References

Angelov, D. (2020). Top2Vec: Distributed Representations of Topics. arXiv. http://arxiv.org/abs/2008.09470

Artese, M. T., & Gagliardi, I. (2023). Unsupervised Creation of Semantic Graphs to Navigate Intangible Cultural Heritage Using Transformers. Digital Presentation and Preservation of Cultural and Scientific Heritage, 13, 137–148.

Campello, R. J. G. B., Moulavi, D., & Sander, J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. In J. Pei, V. S. Tseng, L. Cao, H. Motoda, & G. Xu, Advances in Knowledge Discovery and Data Mining (Vol. 7819, pp. 160–172). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-37456-2_14

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv. https://doi.org/10.48550/arXiv.1810.04805

Egger, R., & Yu, J. (2022). A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology, 7, 886498. https://doi.org/10.3389/fsoc.2022.886498

Gagliardi, I., & Artese, M. T. (2023). Intuitive Semantic Graph Tool for Enhanced Archive Exploration. In Proceedings of the 34th ACM Conference on Hypertext and Social Media (HT '23) (Article 11, 1–3). Association for Computing Machinery, https://doi.org/10.1145/3603163.3609069

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv. http://arxiv.org/abs/2203.05794

Li, Y. (2023). Insights from Tweets: Analysing Destination Topics and Sentiments, and Predicting Tourist Arrivals [PhD Thesis, Durham University].

McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2 (11), 205. https://doi.org/0.21105/joss.00205

McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. https://doi.org/10.48550/arXiv.1802.03426

Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp.404– 411). https://aclanthology.org/W04-3252

Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., & Brunsdon, T. (2023). Comparison of topic modelling approaches in the banking context. Applied Sciences, 13 (2), 797. https://doi.org/10.3390/app13020797

Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry & J. Kogan, Text Mining (pp. 1–20). Wiley. https://doi.org/10.1002/9780470689646.ch1

Sprenkamp, K., Zavolokina, L., Angst, M., & Dolata, M. (2023). Data-Driven Governance in Crises: Topic Modelling for the Identification of Refugee Needs. In Proceedings of the 24th Annual International Conference on Digital Government Research (dg.o '23) (pp. 1–11). https://doi.org/10.1145/3598469.3598470

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan & R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA (pp. 5998--6008). https://doi.org/10.48550/arXiv.1706.03762

Wojciechowska, J., Sypniewski, M., Śmigielska, M., Kamiński, I., Wiśnios, E., Schreiber, H., & Pieliński, B. (2023). Deep Dive into the Language of International Relations: NLP-based Analysis of UNESCO’s Summary Records. arXiv. https://doi.org/10.48550/arXiv.2307.16573

Downloads

Published

2024-09-05

How to Cite

Teresa Artese, M., & Gagliardi, I. (2024). A Comparative Analysis of Historical Culinary Recipes Using Topic Modeling and Data Visualization. Digital Presentation and Preservation of Cultural and Scientific Heritage, 14, 167–178. https://doi.org/10.55630/dipp.2024.14.15

Most read articles by the same author(s)