The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)
DOI:
https://doi.org/10.55630/dipp.2023.13.5Keywords:
Parliamentary Debates, Parlamint, Comparable Corpora, Language Technology, Digital HumanitiesAbstract
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure. The project makes content from the national and regional parliaments visible, comparable and accessible for policy making and research. Then, the approaches are considered that have been applied in the creation of 31 corpora from national and regional parliaments. Last but not least, the utility of the multilingual resource is discussed.References
Calabretta, I. a. (2021). Helsinki Digital Humanities Hackathon 2021: ‘Parliamentary Debates in COVID Times’ . https://www.clarin.eu/impact-stories/helsinki-digitalhumanities-hackathon-2021-parliamentary-debates-covid-times
DCEP. (n.d.). Digital Corpus of the European Parliament . https://joint-researchcentre.ec.europa.eu/language-technology-resources/dcep-digital-corpus-europeanparliament_en
DeepL. (n.d.). DeepL translator . https://www.deepl.com/translator
Del Fante, D. a. (2023). ParlaMint – A Resource for Democracy . https://www.clarin.eu/impact-stories/parlamint-resource-democracy
Devlin, J. a.-W. (2019). BERT: Pre training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805
DHH23. (n.d.). Political polarization . https://www.helsinki.fi/en/digitalhumanities/dhh23-hackathon/dhh23-themes
Erjavec, T. a. (2023). The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, , 415–448. https://doi.org/10.1007/s10579-021-09574-0
Erjavec, T. e. (2021). Multilingual comparable corpora of parliamentary debates ParlaMint 2.1 . CLARIN ERIC. https://doi.org/10.1007/s10579-021-09574-0
Fan, A. a.-K. (2020). Beyond English - Centric Multilingual Machine Translation. https://arxiv.org/abs/2010.11125
Fišer, D. a. (2021). Voices of the Parliament: A Corpus Approach to Parliamentary Discourse Research. Institute of Contemporary History. https://sidih.si/cdn/121/index.html
Google. (n.d.). Google Translate . https://pypi.org/project/googletrans/
Infrastructure, C. L. (n.d.). CLARIN . https://www.clarin.eu/
Liu, Y. a. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692
NoSketchEngine. (n.d.). ParliaMint corpora. https://www.clarin.si/noske/
Opus-MT. (n.d.). Opus - MT . https://github.com/Helsinki-NLP/Opus-MT
ParlaCLARIN. (n.d.). Parla - CLARIN Schema . https://github.com/clarin-eric/parlaclarin
ParlaMint. (n.d.). ParlaMint: Towards Comparable Parliamentary Corpora . https://www.clarin.eu/parlamint
ParlaMint-II. (n.d.). ParlaMint II . https://www.clarin.eu/parlamint#parlamint-ii
ParlaMint-Partners. (n.d.). Project Partners . https://www.clarin.eu/parlamint#Partners
ParlaMint-Schema. (n.d.). ParlaMint - Schema . https://github.com/clarineric/ParlaMint/blob/main/Schema/README.md
ParlaMintSpeech. (n.d.). ASR training dataset for Croatian ParlaSpeech - HR v1.0 . https://www.clarin.si/repository/xmlui/handle/11356/1494
Skubic, J. a. (2023). Networks of Power - Gender Analysis in European Parliaments . https://www.clarin.eu/impact-stories/networks-power-gender-analysis-europeanparliaments
Stanford. (n.d.). Stanza . https://stanfordnlp.github.io/stanza/available_models.html
Tang, Y. a.-J. (2020). Multilingual Translation with Extensible Multilingual Pretraining and Finetuning. https://arxiv.org/abs/2008.00401
UD. (n.d.). Universal Dependencies . https://universaldependencies.org/
USAS. (n.d.). UCREL Semantic Analysis System (USAS) . http://ucrel.lancs.ac.uk/usas/