Skip to content

A website classifier that classifies a website into categories depending upon its contents.

Notifications You must be signed in to change notification settings

ContentHolmes/Website-Classifier-Native

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Website-Classification

In the given classifier, Machine-Learning techniques have been used to classify a website into one of the given categories in Real Time without even the need to download the webpage. The URL of the website is used to extract its contents. Text processing libraries available in python are used to lemmatize the words. The text classification technique using Bag-of-Words model is applied to extract the feature vector from the text in the website. This feature vector is then fed into an SVM which accordingly classifies the website into one of the following categories:

  • Adult
  • Arts
  • Business
  • Computers
  • Games
  • Health
  • Home
  • Kids
  • News
  • Recreation
  • Reference
  • Science
  • Shopping
  • Society
  • Sports

Dataset

The DMOZ dataset has been used for the training purpose. The dataset contains the URLs for each category. A web crawler is used on these URLs to get the keywords and the description of the websites.

About

A website classifier that classifies a website into categories depending upon its contents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages