The parallel corpus of
|
0. General notesCreating parallel corpora is one of the most promising branches of corpus linguistics. A certain amount of parallel corpora can be accessed freely in the Internet today. These are mainly two-language corpora containing texts of classical literary works and their translations to some language1. Multi-language parallel corpora are much fewer in the Internet, though the necessity of creating them is constantly discussed.2 We will speak on the multi-language corpus of translations of The Lay of Igor’s Warfare, available on the web at http://nevmenandr.net/slovo/. The parallel corpus of The Lay of Igor’s Warfare has been active since February, 2007. Considerable changes in its content and functionality were made after September 2007, when a scientific team was established to support the project, consisting of B.V.Orekhov, E.A.Slobodyan and M.S.Rybin. There are 206 aligned texts in the corpus, a possibility of extended presentation of texts (in the form of draggable lines fit into the screen or text blocks) and search. The search functionality for Russian texts was developed in cooperation with Andrey Alexandrovich Belov. Why was exactly The Lay of Igor’s Warfare selected for such a corpus? The Lay of Igor’s Warfare is a comparatively small text, but a first line one, a literary masterpiece created in Russia before the Mongol yoke, which gave rise to a great amount of works, including research and commentary and literary responses. Translations hold a special place among these works. At the moment, the corpus contains more than 90 translations into modern Russian language, and there are at least thirty texts that have not been digitized and not included into the corpus yet. Besides, there are about two hundred translations into other languages (and the translators include such key figures for this tradition as V.Nabokov, R.M.Rilke, Y.Tuvima, F.Supo, V.Ganki, I.Franko, Y.Kupala). One can only agree that these are impressive figures covering a number of philological, publishing and reader-related problems, and it’s the task of the corpus to solve them. The task of gathering translations at one place has been of high priority until present, despite the fact that it’s been a long time since there were attempts to do that. The matter is that the book format gives very limited opportunities to solve this problem. Generally, parallel representation of a text and its translation is a fully customary editional practice that was also implemented as part of publications of The Lay a number of times. It’s most convenient for the readers to have the necessary texts before their eyes but it’s impossible to determine beforehand what texts and in what order they will need and what texts will turn out to be unnecessary; while the static character of a paper edition predetermines that this choice must be made once and for all. The second problem in this respect is the location of texts. The book format gives an opportunity to present to the reader for simultaneous study two or four texts at the most, that must be located on the left and on the right on a side opening or squeezed into two or three columns on one page – the page width cannot allow anything more. It’s very inconvenient even to correlate one text and one translation on a side opening, as this would require the reader’s constant effort to find similarities. In the parallel corpus of The Lay of Igor’s Warfare translations, the texts are located not in the customary columns but in lines. Thus, the appropriate text fragments can be seen one under another, which gives the user full and exact information on the similarities and differences of variants, translation liberties and variety of interpretations. In most cases, this leads to the use of horizontal scroll along the screen, but we have to sacrifice the traditional principles of HTML makeup in favor of demonstrativeness. In case such a form is inconvenient for the user, there is another way of representing texts when the compared text fragments are fit into the screen. Inter alia, those text lines can be dragged up and down with the mouse. Finally, texts may be located on the screen in the form of movable blocks. Once you can now show any number of texts on the screen, the user can select them themselves. You can check the required translations in a special form and thereby form a sort of a sub-corpus. In case the user presses ENTER without selecting any translation, all the texts available in the database will be shown on the screen. When working with blocks, you can only use 6 translations at a time. Each translation in the corpus is divided into 218 fragments (“links”), in accordance with the division of The Lay proposed by R.Yakobson. The text is shown by fragments on the screen, the number of the fragment may be given in a special form, when selecting translations. There is an opportunity to scroll through the text from fragment to fragment while keeping the selection of translations, that is, as part of the established sub-corpus. As the corpus is parallel, that is, designed to correlate the texts accumulated within the culture, the system will in any case give the reference text in Old Russian, in addition to the translations selected by the user (as a reference point, to be more exact). The text published in the Encyclopedia of The Lay of Igor’s Warfare was selected as a reference text, as it takes into account the accepted corrections but treats the first edition diplomatically. The texts in the corpus’ menu are distributed by four categories now, such as texts and editions, translations into modern Russian, translations into Slavonic languages and translations into other languages. This division, however, is purely conditional and does not prevent the user from correlating any text from one category with any text from another category. The developers also plan to create a dynamic menu where the user could sort translations by the time they were created, by alphabetical order, by the translator’s name, etc. The order of locating translations in the form is free. In case the cursor is pointed at the name of translation, a pop-up help appears giving the source of the text. A click on the name of the translation leads to a page with that text represented separately. With the help of arrows at the end of each fragment, you can switch to the parallel view. Poetic translations “stretched” in lines are also deprived of their traditional look of a column, but a fundamental characteristic of poetic speech such as division into lines is kept in the corpus and is marked by a special sign, a vertical bar: "|". Such a corpus may not only be used when researching the problems associated with the study of a certain literary work. We made a successful use of the corpus to solve various educational tasks and to study the language material as part of linguistic research. An interesting feature of the Corpus under consideration is, in our opinion, that it has several translations of the literary monument into one and the same language, which we cannot always find in other parallel corpora, and such material is of high value. For example, the corpus contains numerous translations of The Lay of Igor’s Warfare into Russian (starting from the 19th century), Ukrainian and Polish languages. Let’s concentrate our attention on three series of translations of the Old Russian monument, such as Russian, Ukrainian and Polish languages. Such language-related material allows solving the following linguistic tasks:
It’s worth noticing that the peculiarities mentioned here are reflected in all the three translation groups considered, with few exceptions. The possibilities of using the corpus under consideration are not restricted by the said problems. The corpus can be used in education, when learning a language as native or foreign, in literary research, and in linguistic research, in terms of the aspects not covered in this presentation. 1 For example: http://ruscorpora.ru/search-para.html, http://korpus.juls.savba.sk/parus/, http://www.linguateca.pt/COMPARA/ etc. 2 Building a Multilingual Parallel Subtitle Corpus Jorg Tiedemann - www.let.rug.nl/~tiedeman/paper/clin17.pdf Multilingual Corpora in Teaching and Research (From the series Language and Computers: Studies in Practical Linguistics, No 22) Simon P. Botley, Anthony M. McEnery, and Andrew Wilson, Eds. 2000 ISBN: 90-420-0541-6 Editions Rodopi B.V. Amsterdam (Netherlands) and Atlanta, GA (USA)
|
1. Концепция представления текстов в корпусе 1.2.2. Динамический вывод текстов 3. Особенности представления текстов и интерфейс |
* В 2008 году развитие корпуса осуществлялось при финансовой поддержке РГНФ в рамках проекта создания информационной системы «Параллельный корпус переводов „Слова о полку Игореве“», проект № 08–04–12104в