Language Resources – a Part of World Cultural Heritage
DOI:
https://doi.org/10.55630/dipp.2011.1.16Keywords:
natural language, multilingual corpus, parallel corpus, aligned corpus, comparable corpus, annotationAbstract
This article briefly reviews multilingual language resources for Bulgarian, developed in the frame of some international projects: the first-ever annotated Bulgarian MTE digital lexical resources, Bulgarian-Polish corpus, Bulgarian-Slovak parallel and aligned corpus, and Bulgarian-Polish-Lithuanian corpus. These resources are valuable multilingual dataset for language engineering research and development for Bulgarian language. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures.References
Dimitrova, L., Erjavec, T., Ide, N., Kaalep, H.-J., Petkevic, V., and Tufis, D.: Multext- East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages. In: COLING- ACL '98. Montréal, Québec, Canada, pp. 315– 319. (1998)
Dimitrova, L., Garabík, R.: Bulgarian -Slovak Parallel Corpus. In: 6 th International Conference NLP, Multilinguality. Bratislava. (2011) (be appear)
Dimitrova, L., Koseska, V.: Bulgarian-Polish Corpus. J. Cogni tive Studies/Études Cognitives. v. 9, SOW, Warsaw, pages 133 – 141. (2009)
Dimitrova, L., Koseska, V., Roszko, D., Roszko, R.: Application of Multilingual Corpus in Contrastive Studies (on the example of the Bulgarian-Polish-Lithuanian Parallel Corpus. J. Cognitive Studies/Études Cognitives. v. 10, SOW, Warsaw, pages 217 – 240. (2010)
Dimitrova, L., Pavlov, R., Simov, K., Sinapova, L.: Bulgarian MULTEXT-East Corpus – Structure and Content. J. Cybernetics and Information Technologies. v. 5, n. 1, BAS, Sofia, pages 67 – 73. (2005)
Ide, N., Bonhomme, P., and Romary, L.: XCES: An XMLbased Encoding Standard for Linguistic Corpora. In: 2 nd International Language Resources and Evaluation Conference. Paris: ELRA, pages 825 – 830. (2000)
Ide, N., Veronis, J.: Multext (multilingual tools and corpora). In: COLING’94. Kyoto, Japan, pp. 90 – 96 (1994)