-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is multiple Regression possible with Hyperparameter_hunter #96
Comments
I've never tested multiple regression, actually. But I'm very interested in making sure its supported! Can you provide a minimal toy example that resembles the shape of the data you'd be working with, and uses the appropriate metrics? |
We made you an example of Multiple Regression - both a simple linear example and a much harder non-linear example. |
Wow! Thank you very much for setting up such a clear example! I have a few questions that are going to sound stupid, but I need to ask just to make sure I understand what you need since I haven’t worked with multiple regression before:
If I am missing any requirements in my third question, please let me know, and if anything I’ve said seems even slightly questionable, I’d appreciate the correction. Unrelated: In your example's first |
It looks like HyperparameterHunter already works with the regressors in your example that aren't wrapped in from hyperparameter_hunter import Environment, CrossValidationExperiment
import pandas as pd
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, MultiTaskLasso, MultiTaskElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.tree import DecisionTreeRegressor
# Trivial Linear Multiple Regression Problem with a little Noise (0.1)
x, y = make_regression(
n_samples=1000,
n_features=4,
n_informative=4,
n_targets=4,
noise=0.1,
random_state=42,
)
#################### Train/Holdout Split ####################
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.10, random_state=42)
#################### Scale Data ####################
x_scaler = StandardScaler()
x_train_scaled = x_scaler.fit_transform(x_train)
x_test_scaled = x_scaler.transform(x_test)
y_scaler = StandardScaler()
y_train_scaled = y_scaler.fit_transform(y_train)
y_test_scaled = y_scaler.transform(y_test)
#################### Reorganize Into Scaled DFs ####################
x_train_df = pd.DataFrame(x_train_scaled, columns=['x1', 'x2', 'x3', 'x4'])
y_train_df = pd.DataFrame(y_train_scaled, columns=['y1', 'y2', 'y3', 'y4'])
train_df = pd.concat([x_train_df, y_train_df], axis=1)
x_holdout_df = pd.DataFrame(x_test_scaled, columns=['x1', 'x2', 'x3', 'x4'])
y_holdout_df = pd.DataFrame(y_test_scaled, columns=['y1', 'y2', 'y3', 'y4'])
holdout_df = pd.concat([x_holdout_df, y_holdout_df], axis=1)
regressors = [
LinearRegression,
KNeighborsRegressor,
DecisionTreeRegressor,
MultiTaskLasso,
MultiTaskElasticNet,
Ridge,
MLPRegressor,
]
regressor_params = [
dict(),
dict(),
dict(),
dict(alpha=0.01),
dict(alpha=0.01),
dict(alpha=0.05),
dict(
hidden_layer_sizes=(5,),
activation='relu',
solver='adam',
learning_rate='adaptive',
max_iter=1000,
learning_rate_init=0.01,
alpha=0.01,
),
]
#################### HyperparameterHunter ####################
env = Environment(
train_dataset=train_df,
holdout_dataset=holdout_df,
root_results_path="multiple_regression_assets",
metrics_map=["mean_squared_error"],
target_column=['y1', 'y2', 'y3', 'y4'],
cross_validation_type="KFold",
cross_validation_params=dict(n_splits=10, shuffle=True, random_state=32),
)
for initializer, init_params in zip(regressors, regressor_params):
exp = CrossValidationExperiment(
model_initializer=initializer,
model_init_params=init_params,
) |
No Problem - it's actually useful for us to set up some toy problems to test ideas
Yes, we're trying to predict all target columns
For regression, Also, most fancier models (keras, tensorflow, ...) absolutely require scaling in order to work correctly. This is sometimes hidden or implicit in classification problems - for example I think scaling is part of data prep prior to HPH but it would be good if provided examples show scaling.
1,2 & 4 would be great.
oops, yes, the perils of Jupyter notebooks and not working strictly from top to bottom. |
Understood. I've had an unfinished
I see; thank you for explaining that. I was unfamiliar with I noticed that
Thank you for your patience in explaining to me what you need and for helping make HyperparameterHunter even better! I appreciate you taking the time! Edit: Another question, would it be an expectation that using |
Hi, I tried Suppose you had two sequences and you wanted to predict the 2nd from the first. The more likely and general approach would be to run I Tried the code your provided for error line is from predictors.py with error presumably (4,100) is the expected fold of 4 y's notebook with minimal changes to your code here - [HPH multi-regression code notebook](https://github.com/strelzoff-erdc/HPH-experiments/blob/master/HPH_multi-regression%20experiment%20test.ipynb ) |
Are you using the current master version of HyperparameterHunter? It looks like you might be using the latest PyPI release (2.0.0), in which predictions for those multi-regression algorithms had not yet been added. I apologize; I should have clarified that I was running the examples with the unreleased master version. Can you try installing HyperparameterHunter from GitHub and running the example again?
Edit: I also just realized that at the time I posted the example, I hadn't even pushed the changes that were making it work. So, once again, I apologize for causing this confusion. |
I have everything running, but the problem I am looking to solve is a multiple regression problem. Is this possible with hyperparameter_hunter?
The text was updated successfully, but these errors were encountered: