-
Notifications
You must be signed in to change notification settings - Fork 2
/
TICCL.Template.config
20 lines (20 loc) · 2.4 KB
/
TICCL.Template.config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-a abcmdef ## $mode : The mode specifies which submodules will be run.
-b XML ## $texttype : What is the type of your input files? These types may be: IM : images, PDF : images in PDF files, TXT : plain text files, XML : an XML format, FOLIA : FoLiA XML format, TSV : a frequency file (word type - tab - frequency)
-z ticcl/ ## $ROOTDIR : The directory where your version of the TICCL system files are located.
-c ticcl/data/int/nld/nld.aspell.dict.c20.d2.confusion ## $charconfus : A file listing the particular character confusions the system will gather word pairs for.
-d empty.txt ## $KHC : Specify the name of the Known Historical Confusions file (if you have one). If not, create an empty file in TICCL's root directory.
-e xml ## $ext : The extension ending your input file names. Can be single, e.g. '.xml' or double '.folia.xml'.
-f 100000000 ## $artifrq : The artificial frequency. Should be higher than the highest word frequency in your input files frequency list. Typically set at: '100000000' (i.e. one hundred million).
-g ticcl/data/int/nld/nld.aspell.dict.lc.chars ## $alph : The alphabet as derived for your language on the basis of a lexicon or corpus frequency list.
#-i /exp2/mre/ticclops/data/int/nld ## $INPUTDIR : Directory where system input files such as alphabet, character confusions file and lexicon are to be found.
-i inputdir/ ## $dir : Directory from which files to be corrected are to be read
-l ticcl/data/int/nld/nuTICCL.OldandINLlexandINLNamesAspell.v2.COL1.tsv ## $lex : the lexicon for your language
-L 2 ## $LD : The levenshtein limit to be imposed on word pairs collected.
-o outputdir/ ## $OUTPUTDIR : the directory system output files will be written to. Will contain a dir zzz/TICCL for intermediate TICCL files and a directory zzz/FOLIA where corrected FoLiA XML files will be written to.
-j TICCLtest ## $prefix : A prefix to begin intermediate TICCL output file names with.
-r 3 ## $rank : The number of TICCL correction candidates that will be output.
-t nld ## $lang : The language your texts are written in. To be specified in three-letter ISO language code.
-u tooldir/ ## $tooldir : Directory that holds the C++ modules of which TICCL is composed.
-v 12 ## $threads : the number of threads the system is allowed to set up for processing.
-x 5 ## $minlength : Minimum length of word types to be examined, in characters.
-y 50 ## $maxlength : Maximum length of word types to be examined, in characters.