# This script is based on two  scripts, made by Jarle Ebeling and Vladislav Dorokhin.
# The idea of the script is to collect in one place the whole functionality we need to prepare a text for the alignmemt. 
# Thus, the script contains a component removing trash symbols (such as trailing spaces), a tagging component (for <p> 
# and <s> tagging), a component which adds id to <p> tag and <s> tag [and its own xml-validator (as we don't need to 
# run Oxygen or other console validator to validate xml) - unfortunately, it's not done yet].
# If you install free ActivePerl (http://www.activestate.com/activeperl) which makes the .pl-files executable, you can
# run this script with one click only. That means you don't need to use the command line or enter any arguments (including
# file name) to process your text.

# Usage. This script takes the file with a plain text at the directory it is itself located in. If there are several
# plain text files the script takes the first one in alphabetical order. Obviously, if there are no any txt-files 
# in the directory the script will do nothing. On the next step the script makes an XML file and puts the header there,
# as well as the text which is already processed. 
# Please note that the ID which will be generated for each tag has a strong dependence of the file name. The recommended 
# format of the file name is: a) initials of the author, b) the number of his text in the corpus, c) letter "T" if this 
# is a translation d) the letter which marks the language of the text. For example "EH1E" for the first text of Ernest 
# Hemingway which is written in English. "EH1TR" for the same case with an exception that this is a Russian translation 
# of Hemingway«s text.

# Please pay attention to these four things. They are really important!

# First. Your text file must be true UTF-8.
# Second. Correct file name (see above). The extension ".txt" is required. There must be only one txt file in 
# the script directory.
# Third. XML-header. After the script has worked you must open an XML file with your text-editor and replace stars (***) 
# in the header with the relevant information.
# Fourth. <div> tagging. Unfortunately we can do nothing about this problem, and the only way is to solve it manually. 
# In the end of the file, before the closing tag </body>, you'll find a closing tag </div> which is added automatically. 
# Please add a correct number to this tag. It must be like this: </div1> or this: </div2>.

# Feel free to contact me: nevmenandr@gmail.com
# Boris Orekhov