diff --git a/Projects-ML/Reg-models/Project-2_Advertising.ipynb b/Projects-ML/Reg-models/Project-2_Advertising.ipynb index 95b39db..79b9640 100644 --- a/Projects-ML/Reg-models/Project-2_Advertising.ipynb +++ b/Projects-ML/Reg-models/Project-2_Advertising.ipynb @@ -1648,11 +1648,818 @@ ] }, { + "attachments": { + "image.png": { + "image/png": "" + } + }, "cell_type": "markdown", "metadata": {}, "source": [ "## K-Fold Cross-Validation\n", - "\n" + "K-fold cross-validation is a robust validation approach that can be adopted to verify if the model is\n", + "overfitting. The model, which generalizes well and does not overfit, should not be very sensitive to any\n", + "change in underlying training samples. K-fold cross-validation can do this by building and validating\n", + "multiple models by resampling multiple training and validation sets from the original dataset.\n", + "\n", + "1. Split the training data set into K subsets of equal size. Each subset will be called a fold. Let the\n", + "folds be labelled as $f_1, f_2, … , f_K$. Generally, the value of $K$ is taken to be 5 or 10.\n", + "2. For i = 1 to K\n", + " (a) Fold $f_i$ is used as validation set and all the remaining K – 1 folds as training set.\n", + " (b) Train the model using the training set and calculate the accuracy of the model in fold $f_i$.\n", + "\n", + "Calculate the final accuracy by averaging the accuracies in the test data across all K models. The average accuracy value shows how the model will behave in the real world. The variance of these accuracies is an indication of the robustness of the model.\n", + "\n", + "![image.png](attachment:image.png)" + ] + }, + { + "cell_type": "code", + "execution_count": 153, + "metadata": {}, + "outputs": [], + "source": [ + "ipl_auction_df = pd.read_csv('IPL-IMB381IPL2013.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 154, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 130 entries, 0 to 129\n", + "Data columns (total 26 columns):\n", + " # Column Non-Null Count Dtype \n", + "--- ------ -------------- ----- \n", + " 0 Sl.NO. 130 non-null int64 \n", + " 1 PLAYER NAME 130 non-null object \n", + " 2 AGE 130 non-null int64 \n", + " 3 COUNTRY 130 non-null object \n", + " 4 TEAM 130 non-null object \n", + " 5 PLAYING ROLE 130 non-null object \n", + " 6 T-RUNS 130 non-null int64 \n", + " 7 T-WKTS 130 non-null int64 \n", + " 8 ODI-RUNS-S 130 non-null int64 \n", + " 9 ODI-SR-B 130 non-null float64\n", + " 10 ODI-WKTS 130 non-null int64 \n", + " 11 ODI-SR-BL 130 non-null float64\n", + " 12 CAPTAINCY EXP 130 non-null int64 \n", + " 13 RUNS-S 130 non-null int64 \n", + " 14 HS 130 non-null int64 \n", + " 15 AVE 130 non-null float64\n", + " 16 SR-B 130 non-null float64\n", + " 17 SIXERS 130 non-null int64 \n", + " 18 RUNS-C 130 non-null int64 \n", + " 19 WKTS 130 non-null int64 \n", + " 20 AVE-BL 130 non-null float64\n", + " 21 ECON 130 non-null float64\n", + " 22 SR-BL 130 non-null float64\n", + " 23 AUCTION YEAR 130 non-null int64 \n", + " 24 BASE PRICE 130 non-null int64 \n", + " 25 SOLD PRICE 130 non-null int64 \n", + "dtypes: float64(7), int64(15), object(4)\n", + "memory usage: 26.5+ KB\n" + ] + } + ], + "source": [ + "ipl_auction_df.info()" + ] + }, + { + "cell_type": "code", + "execution_count": 155, + "metadata": {}, + "outputs": [], + "source": [ + "X_features = [\"AGE\", \"COUNTRY\", \"PLAYING ROLE\", \"T-RUNS\", \"T-WKTS\", \"ODI-RUNS-S\", \"ODI-SR-B\", \n", + " \"ODI-WKTS\", \"ODI-SR-BL\", \"CAPTAINCY EXP\", \"RUNS-S\", \"HS\", \"AVE\", \"SR-B\", \"SIXERS\",\n", + " \"RUNS-C\", \"WKTS\", \"AVE-BL\", \"ECON\", \"SR-BL\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Out of these, there are four categorical features that need to be encoded into dummy features using\n", + "OHE (One Hot Encoding)." + ] + }, + { + "cell_type": "code", + "execution_count": 156, + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize a list with the categorical feature names.\n", + "categorical_features = [\"AGE\", \"COUNTRY\", \"PLAYING ROLE\", \"CAPTAINCY EXP\"]\n", + "#get_dummies() is invoked to return the dummy features.\n", + "ipl_auction_encoded_df = pd.get_dummies( ipl_auction_df[X_features], columns = categorical_features, \n", + " drop_first = True)" + ] + }, + { + "cell_type": "code", + "execution_count": 157, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['T-RUNS', 'T-WKTS', 'ODI-RUNS-S', 'ODI-SR-B', 'ODI-WKTS', 'ODI-SR-BL',\n", + " 'RUNS-S', 'HS', 'AVE', 'SR-B', 'SIXERS', 'RUNS-C', 'WKTS', 'AVE-BL',\n", + " 'ECON', 'SR-BL', 'AGE_2', 'AGE_3', 'COUNTRY_BAN', 'COUNTRY_ENG',\n", + " 'COUNTRY_IND', 'COUNTRY_NZ', 'COUNTRY_PAK', 'COUNTRY_SA', 'COUNTRY_SL',\n", + " 'COUNTRY_WI', 'COUNTRY_ZIM', 'PLAYING ROLE_Batsman',\n", + " 'PLAYING ROLE_Bowler', 'PLAYING ROLE_W. Keeper', 'CAPTAINCY EXP_1'],\n", + " dtype='object')" + ] + }, + "execution_count": 157, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ipl_auction_encoded_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": {}, + "outputs": [], + "source": [ + "X = ipl_auction_encoded_df\n", + "y = ipl_auction_df['SOLD PRICE']" + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "metadata": {}, + "outputs": [], + "source": [ + "# Standardization of X and Y\n", + "from sklearn.preprocessing import StandardScaler\n", + "## Initializing the StandardScaler\n", + "scaler = StandardScaler()\n", + "## Standardize all the feature columns\n", + "X_scaled = scaler.fit_transform(X)\n", + "## Standardizing Y explictly by subtracting mean and dividing by standard deviation\n", + "y = (y - y.mean()) / y.std()" + ] + }, + { + "cell_type": "code", + "execution_count": 168, + "metadata": {}, + "outputs": [], + "source": [ + "# Split the Dataset into Train and Test\n", + "X_train, X_test, y_train, y_test = train_test_split( X_scaled, y, test_size=0.2, random_state = 42)" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "LinearRegression()" + ] + }, + "execution_count": 169, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Building the model\n", + "from sklearn.linear_model import LinearRegression\n", + "linreg = LinearRegression()\n", + "linreg.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 170, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([-0.43539611, -0.04632556, 0.50840867, -0.03323988, 0.2220377 ,\n", + " -0.05065703, 0.17282657, -0.49173336, 0.58571405, -0.11654753,\n", + " 0.24880095, 0.09546057, 0.16428731, 0.26400753, -0.08253341,\n", + " -0.28643889, -0.26842214, -0.21910913, -0.02622351, 0.24817898,\n", + " 0.18760332, 0.10776084, 0.04737488, 0.05191335, 0.01235245,\n", + " 0.00547115, -0.03124706, 0.08530192, 0.01790803, -0.05077454,\n", + " 0.18745577])" + ] + }, + "execution_count": 170, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "linreg.coef_" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
columnscoef
0T-RUNS-0.435396
1T-WKTS-0.046326
2ODI-RUNS-S0.508409
3ODI-SR-B-0.033240
4ODI-WKTS0.222038
\n", + "
" + ], + "text/plain": [ + " columns coef\n", + "0 T-RUNS -0.435396\n", + "1 T-WKTS -0.046326\n", + "2 ODI-RUNS-S 0.508409\n", + "3 ODI-SR-B -0.033240\n", + "4 ODI-WKTS 0.222038" + ] + }, + "execution_count": 171, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Saving the coefficeint as a dataframe\n", + "columns_coef_def = pd.DataFrame({'columns': ipl_auction_encoded_df.columns, 'coef' : linreg.coef_})\n", + "columns_coef_def.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "## Sorting the features by coefficient values in descending order\n", + "sorted_coef_vals = columns_coef_def.sort_values( 'coef', ascending=False)\n", + "plt.figure(figsize=(12,8))\n", + "sns.barplot(x= 'coef', y=\"columns\", data = sorted_coef_vals)\n", + "plt.xlabel(\"Coefficients from Linear regression\")\n", + "plt.ylabel(\"Features\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Few observation from this figure:\n", + "- AVE, ODI-RUNS-S, SIXERS are top three highly influential features which determine the player’s SOLD PRICE\n", + "- Higher ECON, SR-B and AGE have negative effect on SOLD PRICE.\n", + "- Interestingly, higher test runs (T-Runs) and highest score (HS) have negative effect on the SOLD\n", + "PRICE. Note that few of these counter-intuitive sign for coefficients could be due to multicollinearity.\n", + "For example, we expect SR-B (batting strike rate) to have a positive effect on the\n", + "SOLD PRICE." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Calculating the RMSE:** We can calculate the RMSE on training and test sets to understand the model’s ability to predict SOLD PRICE." + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn import metrics\n", + "\n", + "# Takes a model as a parameter. Prints the RMSE on train and test set\n", + "def get_train_test_rmse( model ):\n", + " # Predicting on training dataset\n", + "\n", + " y_train_pred = model.predict( X_train )\n", + " # Compare the actual y with predicted y in the training dataset\n", + " rmse_train = round(np.sqrt(metrics.mean_squared_error( y_train, y_train_pred)),3)\n", + "\n", + " # Predicting on test dataset\n", + " y_test_pred = model.predict( X_test )\n", + " # Compare the actual y with predicted y in the test dataset\n", + " rmse_test = round(np.sqrt(metrics.mean_squared_error( y_test, y_test_pred)),3)\n", + " print( \"train: \", rmse_train, \" test: \", rmse_test )" + ] + }, + { + "cell_type": "code", + "execution_count": 174, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train: 0.679 test: 0.749\n" + ] + } + ], + "source": [ + "get_train_test_rmse(linreg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "RMSE on the training set is 0.679, while it is 0.749 on the test set. A good model that generalizes well\n", + "needs to have a very similar error on training and test sets. Large difference indicates that the model\n", + "may be overfitting to the training set. Most widely used approach to deal with model overfitting is called Regularization, which will be discussed in the next section." + ] + }, + { + "attachments": { + "image.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Applying regularization:** One way to deal with overfitting is regularization. It is observed that overfitting is typically caused by inflation of the coefficients. To avoid overfitting, the coefficients should be regulated by penalizing potential inflation of coefficients. Regularization applies penalties on parameters if they inflate to large values and keeps them from being weighted too heavily.\n", + "\n", + "The coefficients are penalized by adding the coefficient terms to the cost function. If the coefficients\n", + "become large, the cost increases significantly. So, the optimizer controls the coefficient values to minimize the cost function. Following are the two approaches that can be used for adding a penalty to the cost function:\n", + "1. **L1 Norm:** Summation of the absolute value of the coefficients. This is also called Least Absolute\n", + "Shrinkage and Selection Operator (**LASSO** Term) (Tibshirani, 1996). The corresponding cost function is given by: \n", + "$$\\epsilon_\\text{MSE} = \\frac{1}{n}\\sum_{i=1}^n (y_i - (\\beta_0+\\beta_1 X_1+ ... + \\beta_n X_n))^2 +\\alpha \\sum_{i=1}^n |\\beta_i|$$\n", + "\n", + "2. **L2 Norm:** Summation of the squared value of the coefficients. This is called **Ridge** Term (Hoerl A E and Kennard Kennard 1970). The cost function is given by:\n", + "$$\\epsilon_\\text{MSE} = \\frac{1}{n}\\sum_{i=1}^n (y_i - (\\beta_0+\\beta_1 X_1+ ... + \\beta_n X_n))^2 +\\alpha \\sum_{i=1}^n (\\beta_i)^2$$\n", + "\n", + "Ridge term distributes (smoothens) the coefficient values across all the features, whereas LASSO seems\n", + "to reduce some of the coefficients to zero. Features with coefficients value as zero can be treated as features with no contribution to the model. So, LASSO can also be used for feature selection, that is, remove features with zero coefficients, thereby reducing the number of features.\n", + "\n", + "![image.png](attachment:image.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. **Ridge Regression:** `sklearn.linear_model` provides Ridge regression for building linear models by applying L2 penalty. Ridge regression takes the following parameters:\n", + "1. `alpha` $\\alpha$ – float – is the regularization strength; regularization strength must be a positive float. Regularization improves the estimation of the parameters and reduces the variance of the estimates. Larger values of alpha imply stronger regularization.\n", + "2. `max_iter` – int (integer) – is the maximum number of iterations for the gradient solver." + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Ridge(alpha=1, max_iter=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "Ridge(alpha=1, max_iter=500)" + ] + }, + "execution_count": 175, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Importing Ridge Regression\n", + "from sklearn.linear_model import Ridge\n", + "# Applying alpha = 1 and running the algorithms for maximum of 500 iterations\n", + "ridge = Ridge(alpha = 1, max_iter = 500)\n", + "ridge.fit( X_train, y_train )" + ] + }, + { + "cell_type": "code", + "execution_count": 176, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train: 0.68 test: 0.724\n" + ] + } + ], + "source": [ + "get_train_test_rmse(ridge)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The difference in RMSE on train and test has reduced because of penalty effect. The difference can be\n", + "reduced by applying a stronger penalty. For example, apply a value as 2.0." + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train: 0.682 test: 0.706\n" + ] + } + ], + "source": [ + "ridge = Ridge(alpha = 2.0, max_iter = 1000)\n", + "ridge.fit( X_train, y_train )\n", + "get_train_test_rmse( ridge )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The difference in model accuracy on training and test has reduced. We need to calculate the optimal\n", + "value for $\\alpha$. This can be achieved in many ways. Multiple values of $\\alpha$ can be tested before arriving at the optimal value. The parameters which can be tuned are called hyperparameters in machine learning. Here $\\alpha$ is a hyperparameter." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`sklearn.model_selection.GridSearchCV` can help search for the optimal value (will be discussed later). For now, let us assume the optimal value for a is 2.0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "2. **LASSO Regression:** `sklearn.linear_model` provides LASSO regression for building linear models by applying L1 penalty. Two key parameters for LASSO regression are:\n", + "1. `alpha` – float – multiplies the L1 term. Default value is set to 1.0.\n", + "2. `max_iter` – int – Maximum number of iterations for gradient solver." + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Lasso(alpha=0.01, max_iter=500)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "Lasso(alpha=0.01, max_iter=500)" + ] + }, + "execution_count": 178, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Importing LASSO Regression\n", + "from sklearn.linear_model import Lasso\n", + "\n", + "# Applying alpha = 1 and running the algorithms for maximum of 500 iterations\n", + "lasso = Lasso(alpha = 0.01, max_iter = 500)\n", + "lasso.fit( X_train, y_train )" + ] + }, + { + "cell_type": "code", + "execution_count": 179, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train: 0.688 test: 0.698\n" + ] + } + ], + "source": [ + "get_train_test_rmse(lasso)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It can be noticed that the model is not overfitting and the difference between train RMSE and test RMSE\n", + "is very small. LASSO reduces some of the coefficient values to 0, which indicates that these features are not necessary for explaining the variance in the outcome variable." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will store the feature names, coefficient values in a DataFrame and then filter the features with\n", + "zero coefficients" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
columnscoef
0T-RUNS-0.301242
1T-WKTS-0.000000
2ODI-RUNS-S0.413059
3ODI-SR-B-0.000000
4ODI-WKTS0.157779
\n", + "
" + ], + "text/plain": [ + " columns coef\n", + "0 T-RUNS -0.301242\n", + "1 T-WKTS -0.000000\n", + "2 ODI-RUNS-S 0.413059\n", + "3 ODI-SR-B -0.000000\n", + "4 ODI-WKTS 0.157779" + ] + }, + "execution_count": 180, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "## Storing the feature names and coefficient values in the DataFrame\n", + "lasso_coef_df = pd.DataFrame( { 'columns': ipl_auction_encoded_df.columns, 'coef': lasso.coef_ } )\n", + "lasso_coef_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 181, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
columnscoef
1T-WKTS-0.0
3ODI-SR-B-0.0
13AVE-BL-0.0
28PLAYING ROLE_Bowler0.0
\n", + "
" + ], + "text/plain": [ + " columns coef\n", + "1 T-WKTS -0.0\n", + "3 ODI-SR-B -0.0\n", + "13 AVE-BL -0.0\n", + "28 PLAYING ROLE_Bowler 0.0" + ] + }, + "execution_count": 181, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "## Filtering out coefficients with zeros\n", + "lasso_coef_df[lasso_coef_df.coef == 0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The LASSO regression indicates that the features listed under “columns” are not influencing factors for\n", + "predicting the SOLD PRICE as the respective coefficients are 0.0." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "3. **Elastic Net Regression:** ElasticNet regression combines both L1 and L2 regularizations to build a regression model. The corresponding cost function is given by:\n", + "\n", + "$$\\epsilon_\\text{MSE} = \\frac{1}{n}\\sum_{i=1}^n (y_i - (\\beta_0+\\beta_1 X_1+ ... + \\beta_n X_n))^2 +\\gamma \\sum_{i=1}^n |\\beta_i| +\\sigma \\sum_{i=1}^n (\\beta_i)^2$$\n", + "\n", + "While building ElasticNet regression model, both hyperparameters $\\sigma$ (L2) and $\\gamma$ (L1) need to be set. ElasticNet takes the following two parameters:\n", + "1. `alpha` - Constant that multiplies the penalty terms. Default value is set to 1.0. (alpha = $\\sigma +\\gamma$)\n", + "2. `l1_ratio`: The ElasticNet mixing parameter, with `0 <= l1_ratio <= 1`.\n", + " \n", + " $$l1\\_ratio = \\frac{\\gamma}{\\sigma+\\gamma}$$\n", + "\n", + " where:\n", + " - `l1_ration = 0` implies that the penalty is an L2 penalty\n", + " - `l1_ratio = 1` implies that it is an L1 penalty.\n", + " - `0 < l1_ratio < 1` implies that the penalty is a combination of L1 and L2." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Let's take a example:** penalties applied are $\\gamma =0.01$ and $\\sigma =1.0$. So \n", + "\n", + "alpha =$\\sigma+\\gamma =1.01$ and `l1_ratio` = $\\frac{\\gamma}{\\gamma+\\sigma} =0.0099$." + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train: 0.789 test: 0.665\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import ElasticNet\n", + "enet = ElasticNet(alpha = 1.01, l1_ratio = 0.001, max_iter = 500)\n", + "enet.fit( X_train, y_train )\n", + "get_train_test_rmse( enet )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As we can see, applying both the regularizations did not improve the model performance. It has become\n", + "worse. In this case, we can choose to apply only L1 (LASSO) regularization, which seems to deal with the\n", + "overfitting problem efficiently." ] }, {