Using Linear Regression To Predict Housing Price

The purpose of this project is to develop a model for the Sale Price of a home in Ames, Iowa. Based on the other variables in the data set, we will use this model to help us predict housing price.

Authors

@NavarroAlexKU

🔗 Social Media Links

Documentation

You can get the dataset used in the analysis by downloading it at the CRAN website.

Data

Project Topics:

In the analysis, we will touch on concepts such as exploratory data analysis, data preprocessing, model selection, and model diagnostics.

Installation & Packages:

The analysis was done using R, you will need the following packages to run the code.

1.) MASS

2.) ggplot2

3.) Sleuth2

install.packages("MASS")

install.packages("ggplot2")

install.packages("Sleuth2")

Exploratory Data Analysis:

There is a lot of variables in this data set. One thing I always like to do is look at the data structure and summary of the dataset. Doing this allows me to see how many NaN values are in the data set and the unique data types I will be working with.

# Execute Summary and Structure of Data:
summary(data)
str(data)

It's good practice to plot all of our independent variables against our dependent variable SalePrice so we can see if there is any correlations between the two variables. This also can help us eliminate variables right away if we see no correlation between the two variables.

Modeling:

Train/Test Split:

We want to split our data into train and test sets: for more information on this please refer to Train/Test_Split.

### Split Training Set 70/30
train <- sample(2258,1800)
test <- (c(1:2258)[-train])

Modeling Strategy:

There are many different strategies one can utilize when trying to determine the best predictors for our dependent variable SalePrice. You could use:

1.) forward stepwise regression

2.) best subset

3.) backwards elimination

and many more.

For this specific demonstration, I'll be looking at the pvalue for each coefficient. If the pvalue is greater than 0.05, I will remove the variable from the model and then rerun the model until all I am left with is variables that are considered statistically signficiant. After executing the above process, my final model with continious variables only is the following:

Model Check Diagnostics:

Some of diagnostic plots we can look at is the fitted vs the residuals, testing normality of the model and the Shapiro-Wilkins test. For this project, I will not go into the break down for each of these check diagnostic plots but will produce a future project going more into depth over this topic.

For now, I will say that we want the variance for our residuals vs fitted plot to be constant. We can see here that the variance is constantly changing. One method we can do to try and fix this is using the boxcox method to transform our data.

Box Cox Transformation:

# Run boxcox transformation to help normalize data:
    boxcox(SalePrice~Overall.Qual + Year.Built + Year.Remod.Add + BsmtFin.SF.1 + Total.Bsmt.SF + X1st.Flr.SF + Gr.Liv.Area + TotRms.AbvGrd +Garage.Yr.Blt + Wood.Deck.SF, data = num.ames)

The below output shows that our lambda value is closes to zero. Therefore, we will take the log transformation of our dependent variable SalePrice.

Fitted vs Residuals After Taking Log Transformation:

While we can still see clusters of data points in some portions of the output, we can see that the variance of our model looks much better after taking the log transformation.

Modeling Categorical Variables:

Using the anova function in R, I will fit one categorical variable at a time to the numeric only model until all of the variables remaining in my model are statistically signficiant.

Final Model:

After running the anova function, the following is my final model and predictors:

Final Model Check Diagnostics:

The variance in our residuals vs fitted plot looks consistent in our final model.

We can see some skewness in our normal distribution plot but overall our model looks good when testing normality.

Housing Price Predictions:

The final model shows the following upper and lower bound housing price prediction:

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Predicting-Housing-Price_cache		Predicting-Housing-Price_cache
Predicting-Housing-Price_files		Predicting-Housing-Price_files
.DS_Store		.DS_Store
.gitignore		.gitignore
AmesHousing.csv		AmesHousing.csv
AmesHousing_predict.csv		AmesHousing_predict.csv
Predicting Housing Price HTML.Rmd		Predicting Housing Price HTML.Rmd
Predicting Housing Price.Rmd		Predicting Housing Price.Rmd
Predicting-Housing-Price.html		Predicting-Housing-Price.html
Predicting-Housing-Price.log		Predicting-Housing-Price.log
Predicting-Housing-Price.tex		Predicting-Housing-Price.tex
R Logo.jpeg		R Logo.jpeg
README.md		README.md
Screen Shot 2021-10-29 at 1.23.28 PM.png		Screen Shot 2021-10-29 at 1.23.28 PM.png
Screen Shot 2021-10-29 at 1.33.59 PM.png		Screen Shot 2021-10-29 at 1.33.59 PM.png
Screen Shot 2021-10-29 at 12.54.56 PM.png		Screen Shot 2021-10-29 at 12.54.56 PM.png
Screen Shot 2021-11-01 at 4.17.17 PM.png		Screen Shot 2021-11-01 at 4.17.17 PM.png
Screen Shot 2021-11-01 at 5.22.00 PM.png		Screen Shot 2021-11-01 at 5.22.00 PM.png
Screen Shot 2021-11-01 at 6.01.59 PM.png		Screen Shot 2021-11-01 at 6.01.59 PM.png
Screen Shot 2021-11-01 at 7.29.52 PM.png		Screen Shot 2021-11-01 at 7.29.52 PM.png
Screen Shot 2021-11-01 at 7.30.17 PM.png		Screen Shot 2021-11-01 at 7.30.17 PM.png
Screen Shot 2021-11-01 at 7.30.31 PM.png		Screen Shot 2021-11-01 at 7.30.31 PM.png
images		images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Linear Regression To Predict Housing Price

Authors

🔗 Social Media Links

Documentation

Project Topics:

Installation & Packages:

Exploratory Data Analysis:

Modeling:

Train/Test Split:

Modeling Strategy:

Model Check Diagnostics:

Box Cox Transformation:

Fitted vs Residuals After Taking Log Transformation:

Modeling Categorical Variables:

Final Model:

Final Model Check Diagnostics:

Housing Price Predictions:

About

Releases

Packages

Languages

NavarroAlexKU/Predicting-Housing-Price

Folders and files

Latest commit

History

Repository files navigation

Using Linear Regression To Predict Housing Price

Authors

🔗 Social Media Links

Documentation

Project Topics:

Installation & Packages:

Exploratory Data Analysis:

Modeling:

Train/Test Split:

Modeling Strategy:

Model Check Diagnostics:

Box Cox Transformation:

Fitted vs Residuals After Taking Log Transformation:

Modeling Categorical Variables:

Final Model:

Final Model Check Diagnostics:

Housing Price Predictions:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages