My bachelor work in Hochschule Merseburg written in Python, using Native Language Processing of ML
My bachelor thesis is: ***Classification of advertisements by means of supervised learning methods ***
- Learn about NLP
- Scrap data
- Try NLTK / spacy on datasets
- Learn more about hclustering algorithms / Neural networks / Other NLP methods like Topic Modelling, W2W and so on
- Code the Diploma
- Write a Diploma itself = Thesis
- Data
- Scrapping data from web using scapy, google useragent or proxies. I used to scrap amazon with proxie, but because of lagging and switching off decided to use useragent and time.sleep()
- ML
- Code implemenation
One of the 2 branches above: subproject: message. Not including README.md.
Example:
Data: amazon: added new spider
README.md: update
- obszone
- had problems downloading american products for sale, so had to use a litle trick with url
- geebo
- adlandpro
- pennysaverusa
- hoobly
- oodle
- gumtree
- letgo
- salespider
- ebay
- amazon
When entering departments on amazon you can scrap either 400 pages of common
products of said department, or go into Feature Categories and scrap precise
products.
For instance: 400 pages of automotive department OR Car care, car electronics and so on.