-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polars-learn #202
Comments
Hey thank you for the interest! This is actually my post in our public discord:
Regarding marketing, I have tried some linkedin posts in earliers in the year and got some stars. I had terrible experience with reddit and X before, and thus I am not actively promoting the project on those platforms. I used to be more blunt and like using strong words. When I voice some of my unconventional opinions on programming, e.g. OOP not good for scientific computing, I got some terrible comments. Still learning how to navigate the online tech world. With current Polars-ds, you can actually do MRMR feature selection with many options for correlation (4 different correlations readily available in the package rn). I cannot publish it because it is used in my company. But if you look up MRMR feature selection online, the logic isn't hard to implement. Traditional ML pipelines are also available. But it strictly only applies to data transformations before being consumed by a model. Personally I think those two steps should be separate. You can find more in examples/. PCA is also available in the package if you are not aware. See query_pca and query_principal_components. Still a lot of work to do. Right now the focus is on regression. I am implementing Lasso with coordinate descent. Let's see how that one goes. :) |
Hopefully you have seen this solution https://github.com/azmyrajab/polars_ols You should get your company onboard OS packages are becoming the best marketing for great talent acquisition |
Yep I am fully aware of polars_ols. The main thing is that I do not want to introduce dependency on third party blas or Lapack distributions.. It's a nightmare configuring all that. I am betting on Faer-rs, which is an alternative to those old C/Fortran linear algebra libraries. I take dependency very seriously. You can read up on how SciPy almost couldn't compile for Python 3.12 because of old Fortran dependency. So far for linear regression and SVD, speed is on par and even better in some cases and the author seems to be very knowledgeable. |
@firmai By the way, I am an NYU alumni :) |
functime developers don't seem that interested in the pacakge, they still haven't updated for polars 1, don't you think it is better to asborb it into your package functime-org/functime#250 |
Yes and no. I have been adding tsfresh style features slowly. Currently they are scattered around in num.py and stats.py and I haven't consolidated the features. Functime did a huge project of rewriting most Tsfresh features and I was part of the project, and I did a lot of performance testing and wrote more optimal queries for a lot of the features... So yes, I can take care of the feature extraction easily. I have more than half of what tsfresh and functime offers. And yes again the low recognition is because I am not actively marketing.. Sigh... Although I like functional programming, I find it hard to track states in Functimes's transforms and I am increasingly feeling that classes are fine as long as they are shallow and serve one focus.. Time series transforms can be very different from traditional tabular ML transforms and |
how is your OLS coming along? |
Ridge, Lasso, Rolling and Recursive were added in v0.5.1 (a bugged version of Ridge was introduced in v0.5.0). I have made some changes and improvements to all of these since the release. You can find them in the docs here: https://polars-ds-extension.readthedocs.io/en/latest/num.html#polars_ds.num.query_rolling_lstsq More null_policies will be good. But that can be tricky... We also developed some benchmarks vs. sklearn. These are not strictly apples-to-apples, because the default "solver" may not be the same. For Lasso, I am also "not minding the dual gap" at this moment because I do not understand it well enough. (Also practically, I think it is enough to stop coordinate descent when the updates are small). Anyways, using the default, we have some good numbers: Standalone modules for rolling and recursive is hard, because of the lack of interop support between NumPy and Faer-rs.. |
Turns out standalone linear regression is EASY. A regular LR class (linear regression) and an Online LR class has been implemented. I also added weighted lstsq as an option in query_lstsq, and linear regression with rcond. A new ver will be released this coming weekend and I would like to take a break from linear regression.. Next is likely k-means, standalone kdtree and the ball-tree algorithm. |
Always funny how these projects start, one goes from like 100 users that understand the tremendous opportunity of fast in-memory computing --- and then 2-3 years later 10 million people heavily rely on your solutions.
There is a lot of potential in your project, it lacks a name, clarity (too much text on readme), marketing, and objectives. I think polars extensions and libraries built ontop of polars is the future of data science for the next 10 years.
Similar to rust-ml/linfa, I think you should aim for something much more grand. Why not search active open-source sponsorship, corporate or otherwise.
There is currently no polars sklearn. How could this be?. It is currently the most obvious missing link in data science. It is for a DS geek a wheeled suitcase moment - "We put wheels on bags after we put a man on the moon".
We need clustering, we need dimensionality reduction, feature selection, pairwise interactions, we need a polars machine learning and data-science project worth throwing yourself behind. Everybody is so busy creating LLMs to generate chicken casserol recipes, nobody is doing any actual work.
What are the hundreds and thousands of data scientists doing in their free time? Thanks for picking up the slack, you will be tomorrow's heros.
The text was updated successfully, but these errors were encountered: