The parallel corpus of “The Tale of Igor’s Campaign”; translations

The parallel corpus of
“The Tale of Igor’s Campaign” translations *


0. General notes

Creating parallel corpora is one of the most promising branches of corpus linguistics. A certain amount of parallel corpora can be accessed freely in the Internet today. These are mainly two-language corpora containing texts of classical literary works and their translations to some language1. Multi-language parallel corpora are much fewer in the Internet, though the necessity of creating them is constantly discussed.2

We will speak on the multi-language corpus of translations of The Lay of Igor’s Warfare, available on the web at The parallel corpus of The Lay of Igor’s Warfare has been active since February, 2007. Considerable changes in its content and functionality were made after September 2007, when a scientific team was established to support the project, consisting of B.V.Orekhov, E.A.Slobodyan and M.S.Rybin. There are 206 aligned texts in the corpus, a possibility of extended presentation of texts (in the form of draggable lines fit into the screen or text blocks) and search. The search functionality for Russian texts was developed in cooperation with Andrey Alexandrovich Belov.

Why was exactly The Lay of Igor’s Warfare selected for such a corpus? The Lay of Igor’s Warfare is a comparatively small text, but a first line one, a literary masterpiece created in Russia before the Mongol yoke, which gave rise to a great amount of works, including research and commentary and literary responses. Translations hold a special place among these works. At the moment, the corpus contains more than 90 translations into modern Russian language, and there are at least thirty texts that have not been digitized and not included into the corpus yet. Besides, there are about two hundred translations into other languages (and the translators include such key figures for this tradition as V.Nabokov, R.M.Rilke, Y.Tuvima, F.Supo, V.Ganki, I.Franko, Y.Kupala). One can only agree that these are impressive figures covering a number of philological, publishing and reader-related problems, and it’s the task of the corpus to solve them.

The task of gathering translations at one place has been of high priority until present, despite the fact that it’s been a long time since there were attempts to do that. The matter is that the book format gives very limited opportunities to solve this problem. Generally, parallel representation of a text and its translation is a fully customary editional practice that was also implemented as part of publications of The Lay a number of times. It’s most convenient for the readers to have the necessary texts before their eyes but it’s impossible to determine beforehand what texts and in what order they will need and what texts will turn out to be unnecessary; while the static character of a paper edition predetermines that this choice must be made once and for all. The second problem in this respect is the location of texts. The book format gives an opportunity to present to the reader for simultaneous study two or four texts at the most, that must be located on the left and on the right on a side opening or squeezed into two or three columns on one page – the page width cannot allow anything more. It’s very inconvenient even to correlate one text and one translation on a side opening, as this would require the reader’s constant effort to find similarities.

In the parallel corpus of The Lay of Igor’s Warfare translations, the texts are located not in the customary columns but in lines. Thus, the appropriate text fragments can be seen one under another, which gives the user full and exact information on the similarities and differences of variants, translation liberties and variety of interpretations. In most cases, this leads to the use of horizontal scroll along the screen, but we have to sacrifice the traditional principles of HTML makeup in favor of demonstrativeness. In case such a form is inconvenient for the user, there is another way of representing texts when the compared text fragments are fit into the screen. Inter alia, those text lines can be dragged up and down with the mouse. Finally, texts may be located on the screen in the form of movable blocks.

Once you can now show any number of texts on the screen, the user can select them themselves. You can check the required translations in a special form and thereby form a sort of a sub-corpus. In case the user presses ENTER without selecting any translation, all the texts available in the database will be shown on the screen. When working with blocks, you can only use 6 translations at a time.

Each translation in the corpus is divided into 218 fragments (“links”), in accordance with the division of The Lay proposed by R.Yakobson. The text is shown by fragments on the screen, the number of the fragment may be given in a special form, when selecting translations. There is an opportunity to scroll through the text from fragment to fragment while keeping the selection of translations, that is, as part of the established sub-corpus.

As the corpus is parallel, that is, designed to correlate the texts accumulated within the culture, the system will in any case give the reference text in Old Russian, in addition to the translations selected by the user (as a reference point, to be more exact). The text published in the Encyclopedia of The Lay of Igor’s Warfare was selected as a reference text, as it takes into account the accepted corrections but treats the first edition diplomatically.

The texts in the corpus’ menu are distributed by four categories now, such as texts and editions, translations into modern Russian, translations into Slavonic languages and translations into other languages. This division, however, is purely conditional and does not prevent the user from correlating any text from one category with any text from another category. The developers also plan to create a dynamic menu where the user could sort translations by the time they were created, by alphabetical order, by the translator’s name, etc. The order of locating translations in the form is free. In case the cursor is pointed at the name of translation, a pop-up help appears giving the source of the text. A click on the name of the translation leads to a page with that text represented separately. With the help of arrows at the end of each fragment, you can switch to the parallel view.

Poetic translations “stretched” in lines are also deprived of their traditional look of a column, but a fundamental characteristic of poetic speech such as division into lines is kept in the corpus and is marked by a special sign, a vertical bar: "|".

Such a corpus may not only be used when researching the problems associated with the study of a certain literary work. We made a successful use of the corpus to solve various educational tasks and to study the language material as part of linguistic research.

An interesting feature of the Corpus under consideration is, in our opinion, that it has several translations of the literary monument into one and the same language, which we cannot always find in other parallel corpora, and such material is of high value. For example, the corpus contains numerous translations of The Lay of Igor’s Warfare into Russian (starting from the 19th century), Ukrainian and Polish languages. Let’s concentrate our attention on three series of translations of the Old Russian monument, such as Russian, Ukrainian and Polish languages. Such language-related material allows solving the following linguistic tasks:

  1. with what tools of the given language (syntactic, morphological, lexical, metaphoric) it is possible to depict a typical situation: for example, advance of the cumans that can be felt through external features (flags) is described by the translators with a sound metaphor: Стяги ревут:|«Половцы идут от Дона, и от моря. G. P. Shtorm; or through using the words in their direct meaning when describing a situation of aural perception: Стяги плещут: половцы идут! N. А. Zabolotsky, and V.P.Buynachov rejects the metaphor of “speaking” flags and conveys the meaning by describing the situation in another way: Дороги говорят:| Половцы идут от Дона и от моря. In this respect, in our opinion, another fragment is also of interest, describing Prince Igor’s anxiety and impatience before the break-out: Погасоша вечеру зари. Игорь спитъ, Игорь бдитъ, Игорь мыслію поля мѣритъ отъ Великаго Дону до Малаго Донца (fr. 185). Of course, a number of translators made a direct translation using syntactical parallelism and antonyms “to sleep” – “to keep awake”: Игорь спит, Игорь бдит R.О. Yakobson, Згасли вже вечірні зорі.|Ігор спить і Ігор бдить I. Steshenko, Pogasły zorze wieczorne. Igor śpi –Igor czuwa А. Belewsky. But there are also other linguistic variants of conveying this state of the character: complicated predicate: Игорь-князь спит – не спит V.А. Zhukovsky, Ігор спить, не спить М. Chernyavsky; periphrastically: Дремлет Игорь, но не засыпает N.А. Zabolotsky, Игорь спит, но бодры сны его V.М. Goncharov; Не спиться князю Ігорю V. Shevchuk, Зорі звечора погасли;|Береться к півночі,|А князь Ігор то заплющить,|То розплющить очі А. Kovalenko; Blado już zorza dogasając świeci,|A Igor jeszcze nie skleił powieki А. Krasinsky; through denying what has been said before: Игорь спит. А Игорь спит ли? D.К. Balmont, Игорь спит. Игорь не спит. I.I. Shklyarevsky, Ой спить Ігор, не спить Ігор!|Чи тут спати Ігореві? V. Shchurat, Ігор спить, здаесь, ні, не спить Ігор I. Franko.

  2. what words are used to convey this or that semantics: for example, the words Карна and Жля are not fully clear semantically. Some translators leave those, perhaps, personal names, others convey this unclarity with words denoting negative phenomena relating to war and robbery, compare: Ukr. туга (К. Zinkivsky), Плачниця, плач (К. Sklyarenko), вбивство і грабіж (V. Shevchuk); another example: description of the poor state of Russian land during the internecine wars: уже снесеся хула на хвалу; уже тресну нужда на волю; уже връжеся дивъ на землю. Russian and Ukrainian translators convey this phrase in most cases with the help of words хула, хвала, нужда, воля that are genetically related to the Old Russian words, which is an obvious stylistic failure, as those words require an additional commentary for the speakers of modern language; thus, the word хула is obsolete in Russian, and such words as хвала, нужда, though they are intuitively understandable, can only be classified as bookish. Some translators used word-symbols that are associated in the minds of native speakers with such fundamental concepts as насилие (violence), горе (grief), слава (fame), позор (shame), wolność, nędza, hańba. Some authors capitalize them, to make an additional stress on the symbolic character of those notions.

  3. how lexemes function in the language and go with one another: it’s curious, for example, to track the way modern language conveys the verb denoting the sounds made by foxes: the search gives no results in the National corpus of the Russian language. Nevertheless, Russian translators think almost unanimously that foxes mainly брешут or лают, the Ukrainian, that they брешуть, the Polish - szczekają). The Lay also mentions the sounds made by crows very often. All Russian translators describe the sounds made by crows with such words as каркать, граять, in rare cases with the word кричать which is wider in terms of semantics. The word каркать denotes the sounds made by crows only. The word граять is wider in terms of semantics, it denotes the sounds made by birds “called the crow type” (V. I. Dal), such as jackdaws, magpies, and crows. It’s interesting that in The Lay translations, the word граяти is only used in relation to crows, though all the mentioned birds called, according to V.I.Dal, the crow type, are mentioned here. The selection of this or that variant is not preconditioned by the time the translation was made (both каркать and граять have ancient Indo-European roots), but merely the influence of either some word usage common for the translator’s epoch, or of the original (the original says граяти).

  4. what the characteristics of the translation within a certain period of time are; how the source text influences the quality of the translation. It should be noted that you can build a rather adequate history of research on The Lay of Igor’s Warfare based only on translations of this literary memorial. Trying to convey the meaning as exactly as possible, to keep the “medieval color”, translators often made stylistic mistakes without noticing it, attempting to find cognate words and syntactic structures similar to the original, even if they can be hardly fit into the context of the modern language or not characteristic for it: вран, граять, злато, петь славу, что мне звенит in Russian, piać (‘петь’), grom (‘гром’), pieją sławę ‘поют славу, прославляют’ in Polish. Of interest is the example of conveying in Russian translations the sounds made by the nightingale. The names of the sounds made by this bird are mentioned in two fragments: fragment 35, describing the dawn, that reads as follows in Old Russian: щекотъ славій успе, говоръ галичь убуди; and fragment 202, also describing the dawn. In most translations, the lexeme щекот is used to denote the sounds made by the nightingale (from the verb щекотать ‘sing (about some birds)’). This verb, known and used in the 18th and 19th centuries, became obsolete in the 20th century and cannot be found in this meaning in all dictionaries published even in the middle of the century, let alone the modern ones (it cannot be found in this meaning in Ozhegov’s, Dal’s and Ushakov’s dictionaries), the National Corpus of the Russian Language does not give any results with the verb and noun in this meaning (the results that were found anyway – there are less than five of them – are not dated later than 1920). Consequently, modern translations feature the influence of the primal source and previous translation tradition. The lexeme щебет is more natural for modern Russian language; it’s used by А. Y. Chernov, B. D. Bychenko, А. B. Kozlov.

It’s worth noticing that the peculiarities mentioned here are reflected in all the three translation groups considered, with few exceptions.

The possibilities of using the corpus under consideration are not restricted by the said problems. The corpus can be used in education, when learning a language as native or foreign, in literary research, and in linguistic research, in terms of the aspects not covered in this presentation.

2 Building a Multilingual Parallel Subtitle Corpus Jorg Tiedemann - Multilingual Corpora in Teaching and Research (From the series Language and Computers: Studies in Practical Linguistics, No 22) Simon P. Botley, Anthony M. McEnery, and Andrew Wilson, Eds. 2000 ISBN: 90-420-0541-6 Editions Rodopi B.V. Amsterdam (Netherlands) and Atlanta, GA (USA)


С дозволенiя московской ценсуры


0. Общие замечания

1. Концепция представления текстов в корпусе

1.1. Проблемы

1.2. Решения

1.2.1. Строки вместо колонок

1.2.2. Динамический вывод текстов

1.2.3. Разбиение на фрагменты

1.2.4. Эталонный текст

2. Состав и структура корпуса

3. Особенности представления текстов и интерфейс

3.1. Интерфейс

3.2. Особенности представления

4. История проекта

* В 2008 году развитие корпуса осуществлялось при финансовой поддержке РГНФ в рамках проекта создания информационной системы «Параллельный корпус переводов „Слова о полку Игореве“», проект № 08–04–12104в

© Борис Орехов 2007