-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dask integration #8
Comments
I have never used dask before, but have been wanting to look into it. This gives me a reason to! I'll start looking into set up and usage, but might reach back out to you for assistance. Feel free to send me an email: chasedehan at yahoo dot com |
Just an update: I have gotten dask and dask-xgboost working on my local and cluster, but will need to do some work on the shadow feature creation. I thought I would be able to just drop in the dxgb.train() along with Client(), but I am doing all the feature work under the hood with pandas. The dask dataframe is slightly different; it doesn't look too hard, but might take me a few days to work it out how it will fit in with the rest of the package. (I really want to avoid bloat on the main functionality) For example, this is one of the helper functions I need to rework:
|
Hello, |
Much like your idea for pyspark integration, I would like to see simliar support for passing in a dask client as is supported by the dask-xgboost library. I have found initial success in reducing high dimensional data using the BoostaRoota library but find the bottleneck to be during the initial load of the parquet file repository. I'll offer what assitance I can regarding this work.
Ben.
The text was updated successfully, but these errors were encountered: