scikit-learn and scikit-image are two of the major scientific Python toolbox, enabling data-driven discoveries. The first one proposes simple yet efficient tools for data mining and data analysis, while the latter focuses on image processing algorithms. With the flow of data being processed and analysed, these two libraries face unprecedent scalability challenges.
One currently under-utilized avenue for solving such scalability challenge is to leverage the Python library Dask, which provides flexible parallelized NumPy and Pandas DataFrame, the core numerical objects used in Scientific Python. Our goal is thus to organize a sprint bringing together a small number of developers from scikit-learn, scikit-image, and Dask to experiment and improve the three libraries.
Dates: May, 28th to June 2nd Location: Berkeley Institute for Data Science
- Nelle Varoquaux
- Matt Rocklin
- Stéfan van der Walt