Language Resources – a Part of World Cultural Heritage


  • Ludmila Dimitrova Institute of Mathematics and Informatics, Bulgarian Academy of Science, Sofia, Bulgaria



natural language, multilingual corpus, parallel corpus, aligned corpus, comparable corpus, annotation


This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.


Dimitrova, L., Erjavec, T., Ide, N., Kaalep, H.-J., Petkevic, V., and Tufis, D.: Multext- East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. In: COLING- ACL '98. Montréal, Québec, Canada, pp. 315– 319. (1998)

Dimitrova, L., Garabík, R.: Bulgarian -Slovak Parallel Corpus. In: 6 th International Conference NLP, Multilinguality. Bratislava. (2011) (be appear)

Dimitrova, L., Koseska, V.: Bulgarian-Polish Corpus. J. Cogni tive Studies/Études Cognitives. v. 9, SOW, Warsaw, pages 133 – 141. (2009)

Dimitrova, L., Koseska, V., Roszko, D., Roszko, R.: Application of Multilingual Corpus in Contrastive Studies (on the example of the Bulgarian-Polish-Lithuanian Parallel Corpus. J. Cognitive Studies/Études Cognitives. v. 10, SOW, Warsaw, pages 217 – 240. (2010)

Dimitrova, L., Pavlov, R., Simov, K., Sinapova, L.: Bulgarian MULTEXT-East Corpus – Structure and Content. J. Cybernetics and Information Technologies. v. 5, n. 1, BAS, Sofia, pages 67 – 73. (2005)

Ide, N., Bonhomme, P., and Romary, L.: XCES: An XMLbased Encoding Standard for Linguistic Corpora. In: 2 nd International Language Resources and Evaluation Conference. Paris: ELRA, pages 825 – 830. (2000)

Ide, N., Veronis, J.: Multext (multilingual tools and corpora). In: COLING’94. Kyoto, Japan, pp. 90 – 96 (1994)




How to Cite

Dimitrova, L. (2011). Language Resources – a Part of World Cultural Heritage. Digital Presentation and Preservation of Cultural and Scientific Heritage, 1, 151–160.

Most read articles by the same author(s)