Project #2 for Stat 517 at the University of Idaho.
In problem #0 the salary of a job is predicted based off of several categorical and text variables. Based on a project for Data Mining at the University of Southampton. The aim is to predict a job's salary with statistical learning, based on information within a job advert. The main dataset consists of roughly 250k rows representing individual job ads, and a set of variables descriptions about each job ad.
In problem #1 unsupervised learning methods are used. PCA, clustering analysis, assosiation and seriation are performed. The idea of this project is to determine whether or not the Happiness rankings given by the World Happiness Report are justified.
In problem #2 unsupervised learning methods are used to investigate groupings of individuals based off of a large (1077 rows by 2712 columns) sparse genomic dataset. The groupings determined by the unsupervised learning methods are then compared to the haplotype given in the dataset.
In problem #3 upsupervised learning methods are used to perform clustering analysis the bible. The clusters found by these methods are then compared to the structure of the bible. The new testament is compared to the old testament as well as the 7 sections of the bible. The Testament.html file can be downloaded and then viewed (takes a while to load, best viewed in chrome).