Skip to content

Homeworks completed for the data science course I took in Falll 2016 at Harvard SEAS.

Notifications You must be signed in to change notification settings

zhangly811/Data-Science-AC209

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-AC209

AC209: Data Science is a course offered at Harvard SEAS. I completed 8 coding assignments and a final group project in this course in fall 2016.

What did I learn from this course?

The course focuses on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered integrates the five key facets of an investigation using data: (1) data collection: data wrangling, cleaning, and sampling to get a suitable data set;
(2) data management: accessing data quickly and reliably;
(3) exploratory data analysis: generating hypotheses and building intuition;
(4) prediction or statistical learning;
(5) communication ? summarizing results through visualization, stories, and interpretable summaries.

Skillset:

Python packages:

Numpy, Pandas, scipy, Scikit-learn, matplotlib, BeautifulSoup

Models:

  1. Linear regression
  2. Linear regression with regularization (Ridge and Lasso)
  3. Logistic regression
  4. Multinomial logistic regression
  5. LDA and QDA
  6. KNN
  7. Random forest
  8. Bagging and boosting
  9. SVM

Model building skills:

  1. dimension reduction
  2. variable selection
  3. parameter tuning
  4. boostrapping and cross-validation
  5. model evaluation

Data wrangling and cleaning:

  1. pulling data out of HTML and XML files
  2. imbalanced data
  3. missing data

About

Homeworks completed for the data science course I took in Falll 2016 at Harvard SEAS.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published