- Referencing Flow of Conversation using Usernames
- Topic Clustering
- Getting subject of a sentence - possibly LDA (https://towardsdatascience.com/nlp-extracting-the-main-topics-from-your-dataset-using-lda-in-minutes-21486f5aa925)
- Likelihood of it being a reply is higher if pronoun is used
- Message Gap = Particular reference to message -> Probability decreases exponentially
- Context to map which message to reply to -> subject of the conversation - map it to the closest entity by default
- Direct word matches give a higher probability
- Single Word - Response to message
- 1v1 Conversations - based on threshold - always same topic
- Parallel Models - Word Similarity, Sentence Similarity and Reference Matching
- Gap between messages
- TimeStamp
- Unread/Read
- Username
- Removal of Stopwords
- Word Similarity - No Removal
- Sentence Similarity - Removal Needed
- Reference Matching - No Removal
- Tokenisation and Lemmatisation
- Extracting Whatsapp Messages
- Reply feature (used in common messaging applications) is not applicable
- Language used not that of villager's.