# This script is based on two scripts, made by Jarle Ebeling and Vladislav Dorokhin. # The idea of the script is to collect in one place the whole functionality we need to prepare a text for the alignmemt. # Thus, the script contains a component removing trash symbols (such as trailing spaces), a tagging component (for

# and tagging), a component which adds id to

tag and tag [and its own xml-validator (as we don't need to # run Oxygen or other console validator to validate xml) - unfortunately, it's not done yet]. # If you install free ActivePerl (http://www.activestate.com/activeperl) which makes the .pl-files executable, you can # run this script with one click only. That means you don't need to use the command line or enter any arguments (including # file name) to process your text. # Usage. This script takes the file with a plain text at the directory it is itself located in. If there are several # plain text files the script takes the first one in alphabetical order. Obviously, if there are no any txt-files # in the directory the script will do nothing. On the next step the script makes an XML file and puts the header there, # as well as the text which is already processed. # Please note that the ID which will be generated for each tag has a strong dependence of the file name. The recommended # format of the file name is: a) initials of the author, b) the number of his text in the corpus, c) letter "T" if this # is a translation d) the letter which marks the language of the text. For example "EH1E" for the first text of Ernest # Hemingway which is written in English. "EH1TR" for the same case with an exception that this is a Russian translation # of Hemingway«s text. # Please pay attention to these four things. They are really important! # First. Your text file must be true UTF-8. # Second. Correct file name (see above). The extension ".txt" is required. There must be only one txt file in # the script directory. # Third. XML-header. After the script has worked you must open an XML file with your text-editor and replace stars (***) # in the header with the relevant information. # Fourth.

tagging. Unfortunately we can do nothing about this problem, and the only way is to solve it manually. # In the end of the file, before the closing tag , you'll find a closing tag
which is added automatically. # Please add a correct number to this tag. It must be like this: or this: . # Feel free to contact me: nevmenandr@gmail.com # Boris Orekhov