This repo is a Python and R extention of the work from two Python notebooks in the Predict repository (some of the code in those notebooks may be obsolete, but they are still a good read):
We illustrate the use of Data Science tools, specifically focusing on exploratory data analysis and predictive modeling, using a small toy data set with oil production data.
Data science tools for petroleum exploration and production
- For pre-talk abstract see here
- For slides see
/Talk slides/
above - Annotated slides coming soon on Speaker Deck
Matteo Niccoli, see /Python/
above.
Blog posts:
- Visual data exploration in Python: correlation, confidence, spuriousness
- Data exploration in Python: distance correlation and variable clustering
- Variable selection in Python, Part I
- Understanding and using confidence interval for the correlation coefficient
- Variable selection in Python, Part II
- Regression model in Python
- click on the
Binder button
below, browse to the/Python/notebooks
directory, and open one of the notebooks
- Data loading, visualization, significance testing
- NEW: Confidence interval for the correlation coefficient - (interactive)
- NEW: OLS reggression confidence interval and prediction interval
- Distance correlation and clustering
- Variable selection with distance correlation
- Variable selection with Variance Inflation Factor
- Variable selection with Sequential Feature Selection
- Variable selection with Sequential Feature Selection
- Variable selection with Random Forest drop-column variable importance AND variable dependance
- Variable selection with Linear regression and permutation importance
- Variable selectrions with SHAP values and linear regression
- Variable selection with conditional statistics
Thomas Speidel, see /R/
above.