Dask integration #8

bendruitt · 2017-11-08T06:49:05Z

Much like your idea for pyspark integration, I would like to see simliar support for passing in a dask client as is supported by the dask-xgboost library. I have found initial success in reducing high dimensional data using the BoostaRoota library but find the bottleneck to be during the initial load of the parquet file repository. I'll offer what assitance I can regarding this work.

Ben.

chasedehan · 2017-11-08T16:47:37Z

I have never used dask before, but have been wanting to look into it. This gives me a reason to! I'll start looking into set up and usage, but might reach back out to you for assistance. Feel free to send me an email: chasedehan at yahoo dot com

chasedehan · 2017-11-15T17:53:21Z

Just an update: I have gotten dask and dask-xgboost working on my local and cluster, but will need to do some work on the shadow feature creation. I thought I would be able to just drop in the dxgb.train() along with Client(), but I am doing all the feature work under the hood with pandas. The dask dataframe is slightly different; it doesn't look too hard, but might take me a few days to work it out how it will fit in with the rest of the package. (I really want to avoid bloat on the main functionality)

For example, this is one of the helper functions I need to rework:

def _create_shadow(x_train):
    x_shadow = x_train.copy()
    for c in x_shadow.columns:
        np.random.shuffle(x_shadow[c].values)
    # rename the shadow
    shadow_names = ["ShadowVar" + str(i + 1) for i in range(x_train.shape[1])]
    x_shadow.columns = shadow_names
    # Combine to make one new dataframe
    new_x = pd.concat([x_train, x_shadow], axis=1)
    return new_x, shadow_names

jonimatix · 2019-05-24T12:42:26Z

Hello,
Is there any update on this feature? Would be great as it would speed up processing even more.

chasedehan self-assigned this Nov 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask integration #8

Dask integration #8

bendruitt commented Nov 8, 2017

chasedehan commented Nov 8, 2017

chasedehan commented Nov 15, 2017

jonimatix commented May 24, 2019

Dask integration #8

Dask integration #8

Comments

bendruitt commented Nov 8, 2017

chasedehan commented Nov 8, 2017

chasedehan commented Nov 15, 2017

jonimatix commented May 24, 2019