The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)

Authors

  • Petya Osenova Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 2, Georgi Bonchev Str., Sofia, 1113, Bulgaria

DOI:

https://doi.org/10.55630/dipp.2023.13.5

Keywords:

Parliamentary Debates, Parlamint, Comparable Corpora, Language Technology, Digital Humanities

Abstract

The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure. The project makes content from the national and regional parliaments visible, comparable and accessible for policy making and research. Then, the approaches are considered that have been applied in the creation of 31 corpora from national and regional parliaments. Last but not least, the utility of the multilingual resource is discussed.

References

Calabretta, I. a. (2021). Helsinki Digital Humanities Hackathon 2021: ‘Parliamentary Debates in COVID Times’ . https://www.clarin.eu/impact-stories/helsinki-digitalhumanities-hackathon-2021-parliamentary-debates-covid-times

DCEP. (n.d.). Digital Corpus of the European Parliament . https://joint-researchcentre.ec.europa.eu/language-technology-resources/dcep-digital-corpus-europeanparliament_en

DeepL. (n.d.). DeepL translator . https://www.deepl.com/translator

Del Fante, D. a. (2023). ParlaMint – A Resource for Democracy . https://www.clarin.eu/impact-stories/parlamint-resource-democracy

Devlin, J. a.-W. (2019). BERT: Pre training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805

DHH23. (n.d.). Political polarization . https://www.helsinki.fi/en/digitalhumanities/dhh23-hackathon/dhh23-themes

Erjavec, T. a. (2023). The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, , 415–448. https://doi.org/10.1007/s10579-021-09574-0

Erjavec, T. e. (2021). Multilingual comparable corpora of parliamentary debates ParlaMint 2.1 . CLARIN ERIC. https://doi.org/10.1007/s10579-021-09574-0

Fan, A. a.-K. (2020). Beyond English - Centric Multilingual Machine Translation. https://arxiv.org/abs/2010.11125

Fišer, D. a. (2021). Voices of the Parliament: A Corpus Approach to Parliamentary Discourse Research. Institute of Contemporary History. https://sidih.si/cdn/121/index.html

Google. (n.d.). Google Translate . https://pypi.org/project/googletrans/

Infrastructure, C. L. (n.d.). CLARIN . https://www.clarin.eu/

Liu, Y. a. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692

NoSketchEngine. (n.d.). ParliaMint corpora. https://www.clarin.si/noske/

Opus-MT. (n.d.). Opus - MT . https://github.com/Helsinki-NLP/Opus-MT

ParlaCLARIN. (n.d.). Parla - CLARIN Schema . https://github.com/clarin-eric/parlaclarin

ParlaMint. (n.d.). ParlaMint: Towards Comparable Parliamentary Corpora . https://www.clarin.eu/parlamint

ParlaMint-II. (n.d.). ParlaMint II . https://www.clarin.eu/parlamint#parlamint-ii

ParlaMint-Partners. (n.d.). Project Partners . https://www.clarin.eu/parlamint#Partners

ParlaMint-Schema. (n.d.). ParlaMint - Schema . https://github.com/clarineric/ParlaMint/blob/main/Schema/README.md

ParlaMintSpeech. (n.d.). ASR training dataset for Croatian ParlaSpeech - HR v1.0 . https://www.clarin.si/repository/xmlui/handle/11356/1494

Skubic, J. a. (2023). Networks of Power - Gender Analysis in European Parliaments . https://www.clarin.eu/impact-stories/networks-power-gender-analysis-europeanparliaments

Stanford. (n.d.). Stanza . https://stanfordnlp.github.io/stanza/available_models.html

Tang, Y. a.-J. (2020). Multilingual Translation with Extensible Multilingual Pretraining and Finetuning. https://arxiv.org/abs/2008.00401

UD. (n.d.). Universal Dependencies . https://universaldependencies.org/

USAS. (n.d.). UCREL Semantic Analysis System (USAS) . http://ucrel.lancs.ac.uk/usas/

Downloads

Published

2023-09-01

How to Cite

Osenova, P. (2023). The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates). Digital Presentation and Preservation of Cultural and Scientific Heritage, 13, 61–68. https://doi.org/10.55630/dipp.2023.13.5