The RTA team will focus on resources of the regional Latgalian language. Coordinator Sanita Martena explains, ‘During the years 2007-2013, the first corpus of written Latgalian was created within a joined Lithuanian-Latvian project, alongside a trilingual (Lithuanian-Latvian-Latgalian) dictionary, and a parallel corpus of Lithuanian and Latvian texts. This first corpus, called MuLa, is still an important resource of Modern Latgalian, but with 1 million wordforms, it is rather small. Furthermore, it contains texts published between 1987 and 2012, which show different orthographic norms and considerable variation in morphology. Our first aim is therefore to expand this corpus by texts published after 2012 with an emphasis on texts following the standard implemented in 2007. This part of the corpus will be of crucial importance for the development of language technology tools for Latgalian as a lesser used language – spell checkers, part-of-speech taggers, morphological analyzers, and more. Second, we will create a corpus of spoken Latgalian that will show the speech of different age groups, as well as regional and functional varieties. As making and transcribing recordings is a very time-consuming task, we will profit from results of previous projects carried out by RTA researchers alone or in collaboration with other institutions. This will include recordings made during the annual field practice of our teachers and students over ten years as well as the especially interesting collection of Latgalian spoken in Siberia created by RTA student Armands Kociņš.’
In addition to the corpus itself and academic papers discussing its creation and use, the project team will create video lectures for students about possibilities to use the corpus for educational purposes, thereby furthering the use and understanding of digital humanity tools among teachers and students.
The RTA team is supported by guest professor Nicole Nau from Adam Mickiewicz University in Poznan, who will consult the team based on her rich experience in corpus research on Latvian and Latgalian and her special interest in documenting spoken varieties of Latgalian.