Automatic Identification of Domain Terms: An Approach for Italian


  • Maria Teresa Artese IMATI – CNR, Via Bassini 15, 20133, Milan, Italy
  • Isabella Gagliardi IMATI – CNR, Via Bassini 15, 20133, Milan, Italy



Classification Methods, Word Embedding Models, Probability, Food, Italian Language


The problem of creating a fully automated specific-domain thesaurus is very topical. The paper presents a novel method to address this problem in the Italian language. The main feature of this approach is the integration of different methods: machine learning classification methods working on the semantic representation of candidate terms, word embeddings models, able to capture the semantics of words, and a computation of the degree of specialization of a term. The work is in progress and results obtained so far are promising.


Arora, C., Sabetzadeh, M., & Briand, L. &. (2016). Automated extraction and clustering of requirements glossary terms. IEEE Transactions on Software Engineering , 918–945.

Artese, M. T., & Ciocca, G. &. (2019). CookIT: A Web Portal for the Preservation and Dissemination of Traditional Italian Recipes. International Journal of Humanities and Social Sciences , 171–176.

Artese, M., & Gagliardi, I. (2014). Multilingual Specialist Glossaries in a Framework for Intan-gible Cultural Heritage. International journal of heritage in the digital era (Online). doi:

Khosla, K. J. (2017). Featureless Deep Learning Methods for Auto-mated Key-Term Extraction.

Kulkarni, A. &. (2018). Automated glossary construction of a biology textbook. Stanford CS229: Machine Learning, Fall.

Mikolov, T. S. (2013). Distributed represen-tations of words and phrases and their compositionality. Advances in Neural Information Pro-cessing Systems , 3111–3119.

Pennington, J. S. (2014). Glove: Global vectors for word repre-sentation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (p. 1532–1543). Doha, Qatar: Association for Computational Linguistics.

Research, G. (s.d.). Tratto da Art & Architecture Thesaurus: from

Schmid, H. (2013). Probabilistic part-of speech tagging using decision trees. New methods in language processing , 154.

Singh, M. &. (2019). Automatic Extraction of Textbook Glossaries Using Deep Learning.

UNESCO. (2003). Intangible Heritage Home . Tratto da

Wang, C. P. (2019). A learning-based approach for automatic construction of domain glossary from source code and documentation. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (p. 97-108). Tallinn, Estonia: Association for Computing Machinery.




How to Cite

Teresa Artese, M., & Gagliardi, I. (2020). Automatic Identification of Domain Terms: An Approach for Italian. Digital Presentation and Preservation of Cultural and Scientific Heritage, 10, 251–258.

Most read articles by the same author(s)