Skip to content

In this notebook you will find all the functions that we use to process a text

Notifications You must be signed in to change notification settings

saobou/arabic-text-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Arabic text preprocessing

In this notebook you will find all the functions that we use to process a text

What this notebook contains ?

  • Transliteration : We've created a library named DSAraby that aims to transliterate text which write a word using the closest corresponding letters of a different alphabet or language. The algorithm gives the possible words in Arabic based on a given word in Latin by mapping Latin letters to Arabic ones, then takes the most frequent word existing in a corpus.
  • Text Normalization : is the process to transform a text to a unified form, remove Al-tashkil and elongation.
  • Stop words : is the process to remove useless words from a text, in Arabic there is a lot of stop words than any other language, for example : أيّان,هَيْهَاتَ, مابرح
  • Dealing with hashtags
  • Dealing with Emojis
  • Removing Links.
  • RT, CC, Mentions.
  • Filtered all non-Arabic text
  • Remove digits, dash, punctuation marks and any other mark.

About

In this notebook you will find all the functions that we use to process a text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published