Website-Classification

In the given classifier, Machine-Learning techniques have been used to classify a website into one of the given categories in Real Time without even the need to download the webpage. The URL of the website is used to extract its contents. Text processing libraries available in python are used to lemmatize the words. The text classification technique using Bag-of-Words model is applied to extract the feature vector from the text in the website. This feature vector is then fed into an SVM which accordingly classifies the website into one of the following categories:

Adult
Arts
Business
Computers
Games
Health
Home
Kids
News
Recreation
Reference
Science
Shopping
Society
Sports

Dataset

The DMOZ dataset has been used for the training purpose. The dataset contains the URLs for each category. A web crawler is used on these URLs to get the keywords and the description of the websites.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Classifier		Classifier
FeatureExtraction		FeatureExtraction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website-Classification

Dataset

About

Releases

Packages

Languages

ContentHolmes/Website-Classifier-Native

Folders and files

Latest commit

History

Repository files navigation

Website-Classification

Dataset

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages