Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Human Suggestions: Semantic Similarity Classification #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

navvye
Copy link

@navvye navvye commented Apr 5, 2024

Classified data on the basis of various semantic similarity tools.

Classified data on the basis of various semantic similarity tools
@rkrishnasanka
Copy link
Contributor

Here's my feedback @navvye

  • ignore all the telugu / non-english words (manually remove it from the excel file
  • In all your approaches you are loading the entire set of unique values, group them per unique column name value because you only need to sort between them. The options will only be one of them.
  • Use a combo of approaches
  • You can use the Levensthein distance to group the ones that are just going to be spelling errors / variations to create some preliminary sets
  • You can use the other word-2-vec versions after that keep adding and remove things from the sets
  • I'm curious to see how the k-means will work when you group similar words together

Just to clarify an example:

Tailoring
Driver
Ammavadi
Lari Driver
Travels
Contract worker
Bullero (auto)
Dairy
Millets&bullero
Chakali ( battalu wash)
Tractor
Lari
Tailor
Auto
Cloths Iron
Track tar
Lari driver
RMP Docter
Tracter
NREGS
Chepalu pattadam
24000

Will split into the following groups:

(Tailoring, Tailor)
Driver
Ammavadi
Travels
Contract worker
(Bullero (auto), Millets&bullero)
Dairy
(Chakali ( battalu wash), Cloths Iron)
(Lari, Lari driver, Lari Driver)
Auto
(Track tar, Tracter, Tractor)
RMP Docter
NREGS
Chepalu pattadam
24000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants