This application crawls and scrapes data from University websites in Singapore, parses their content and consolidates their information into an Elasticsearch database for searching. Document clustering may then be performed to allow for users to discover similar webpages between University websites.
This project aims to assist prospective University students by presenting course information in a consolidated manner to facilitate comparisons.
Made for NUS Orbital 2024.
Can be found at https://docs.google.com/document/d/1WzcwicQI4hg8aESSdqEEhPX5unTr_2-EmTcJpDUixUE/edit?usp=sharing
- Clone the Repository using
git clone https://github.com/C5hives/tertiary-trekker
-
From the project directory, navigate to the
webpage-scraper
Folder. -
Install the required dependencies using
npm install
-
Compile the Typescript project files by using the
tsc
command. -
Run the compiled Javascript files using
npm start
-
Same as Web Crawler
-
From the project directory, navigate to the
webpage-parser
Folder. -
Install the required dependencies using
mvn install
- Run the Unit test using
mvn clean test
Check external README
Check external README
Check external README