In this notebook you will find all the functions that we use to process a text
- Transliteration : We've created a library named DSAraby that aims to transliterate text which write a word using the closest corresponding letters of a different alphabet or language. The algorithm gives the possible words in Arabic based on a given word in Latin by mapping Latin letters to Arabic ones, then takes the most frequent word existing in a corpus.
- Text Normalization : is the process to transform a text to a unified form, remove Al-tashkil and elongation.
- Stop words : is the process to remove useless words from a text, in Arabic there is a lot of stop words than any other language, for example : أيّان,هَيْهَاتَ, مابرح
- Dealing with hashtags
- Dealing with Emojis
- Removing Links.
- RT, CC, Mentions.
- Filtered all non-Arabic text
- Remove digits, dash, punctuation marks and any other mark.