Efficient and Intelligent Farsi Text Prediction
Welcome to the Farsi Predictive Typing App repository! This project is an innovative typing application that combines a trie-based autocomplete system with the HooshvareLab GPT-2 (gpt2-fa) model for accurate and intelligent Farsi text prediction.
- Trie-Based Autocomplete: Efficient, lightweight, and fast text lookup for Farsi words.
- GPT-2 Integration: Context-aware predictions leveraging the powerful Farsi adaptation of GPT-2 by HooshvareLab.
- Farsi Language Support: Designed specifically to improve typing efficiency and experience for Farsi users.
Learn more about GPT-2 (gpt2-fa):
- Programming Language: Python (3.9+)
- Machine Learning Model: HooshvareLab GPT-2 (gpt2-fa)
- Data Structures: Trie for efficient word lookups
The trie-based autocomplete system relies on a structured database to provide efficient and accurate word suggestions. The database is a SQLite file containing a table named word
, which stores Farsi words along with their usage frequencies. This data is critical for building the trie structure and prioritizing commonly used words.
id
(INTEGER): A unique identifier for each word entry.word
(VARCHAR): The Farsi word stored in the database.count
(INTEGER): The frequency of the word's occurrence, indicating its popularity or usage frequency.
Here are a few examples of the data stored in the word
table:
انوشه
: Frequency 919آسیه
: Frequency 1974دهسالگی
: Frequency 489بههمپیوستگی
: Frequency 819
The word
table's data is used to:
- Build the trie structure for fast and efficient lookups.
- Rank suggestions based on the frequency (
count
) to improve the relevance of predictions.
This database ensures that the autocomplete system is both accurate and optimized for Farsi text input.
- Clone the repository:
git clone https://github.com/your-username/Farsi-Predictive-Typing-App.git cd Farsi-Predictive-Typing-App
I am actively working to enhance the application! Here are some upcoming features and improvements:
- Custom Dataset Support: Allow users to add their Farsi text corpora for fine-tuning.
- Mobile Compatibility: Develop a mobile-friendly version of the application.
- UI Enhancements: Add a graphical interface for easier use.
- Real-Time Suggestions: Improve the speed and accuracy of text predictions.
Feel free to suggest more features by opening an issue in this repository.
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
Special thanks to HooshvareLab for their development of the GPT-2 (gpt2-fa) model, which plays a vital role in providing intelligent and context-aware Farsi text predictions in this project. Visit their GitHub repository for more information on their work.
Contributions are welcome! If you’d like to improve the app, feel free to fork the repository, create a pull request, or report issues.
If you have any questions or feedback, feel free to reach out:
- Project Maintainer: M.Mardani
- GitHub Profile: My GitHub Profile
I appreciate your interest in the Farsi Predictive Typing App and look forward to your contributions!