The script converts Arabic text into an ASCII transliteration that can be used with text analysis programs that do not support Arabic script (for example, R for Windows).
[Python 3.4 must be installed; earlier versions work with Unicode files differently]
- Put your texts into ./originals/
- Run the script (on Windows, a double-click on the script "translit_converter.py" should do the trick)
- New files will appear in ./ascii/ and in ./arabic/
The script (translit_converter.py) does the following:
- removes all short vowels
- replaces all Latin letters and numbers with "@"
- transliterates Arabic into ASCII characters (saving into ./ascii/)
- transliterates ASCII back into Arabic (saving into ./arabic/)---for control purposes mostly
Transliteration Table (Simplified Buckwalter):
'ء' : 'c',
'ؤ' : 'u',
'ئ' : 'i',
'ا' : 'A',
'إ' : 'I',
'أ' : 'a',
'آ' : 'O',
'ب' : 'b',
'ة' : 'o',
'ت' : 't',
'ث' : 'v',
'ج' : 'j',
'ح' : 'H',
'خ' : 'x',
'د' : 'd',
'ذ' : 'V',
'ر' : 'r',
'ز' : 'z',
'س' : 's',
'ش' : 'E',
'ص' : 'S',
'ض' : 'D',
'ط' : 'T',
'ظ' : 'Z',
'ع' : 'C',
'غ' : 'g',
'ف' : 'f',
'ق' : 'q',
'ك' : 'k',
'ل' : 'l',
'م' : 'm',
'ن' : 'n',
'ه' : 'h',
'و' : 'w',
'ى' : 'Y',
'ي' : 'y',