-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is data.table abandoned? Should we switch to something else (arrow, tibble, collapse,...)? #5656
Comments
data.table is not abandoned, but Matt has limited bandwidth. Lots of issues and PRs is normal. I wrote a NSF POSE grant which will be funded starting later this summer, and lasting for two years, about expanding the ecosystem of contributors in data.table. This project will include creating new de-centralized governance, documentation, and testing tools, and it would be great to have your input! |
Lots of issues is also because we don't use any bots that closes/locks the issues due to inactivity, and ignores if they are resolved or not. |
@tdhock, are you able to provide more details? My coding skills in C aren't the best, but I'd love to keep documentation up to date and/or create more vignettes (like our missing join vignette 😅 that I already have a draft of), so want to know if I can contribute more actively. |
hey @avimallu thanks for your interest. You could help by volunteering to code review some files, by adding your name to the CODEOWNERS file in PR #5629. |
@tdhock Sorry for the negative comment, but no commits since February is definitely not normal. A lot of people rely on this package, if it is de-facto abandoned people will (and already are) looking for alternatives. I understand it's open-source so there's no expectation of anything, but it would be a shame for such an important package like data.table to essentially rot. collapse is currently outshining data.table in terms of speed so there are very few reasons for anyone to use data.table in 2023 (apart from legacy and familiarity). |
Collapse is wonderful, but lacks the data.table's merge and reshape capabilities, and sometimes its memory efficiency. Perhaps we should simply think of data.table as a mature package. |
The main worry is not the lack of commits, it is the lack of maintainer. There are 134 pull requests, some of them obviously correct even to me (such as fixing the github actions). There are people willing to contribute but the only person able to approve changes and upload new versions to CRAN does not seem to be available to do so or delegate those tasks to someone else. It may take a fork and a new data.table2 to get things moving again. (thanks for pointing out collapse , it looks interesting) |
Having only one maintainer who is rarely present is a terrible bus factor for the project, and it gives a feeling that the development is hindered by it: looking at the number of open PRs and also comparing the development speed on other projects such us Arrow, Collapse, etc. makes it look like it's slowly dying. Maybe someone who has contact with @mattdowle would be able to speak to him so more people are able to contribute to the project in his absence. If this isn't possible, maybe a project fork is what we need... |
@tdhock is in contact AFAIK. His last message clearly addresses our concerns. If his project will work out then there is no need for any forks etc. So as Matt is less responsive now, you can ask Toby for update, which I am pretty sure he will provide as soon as he will have any. What you or your organization can do to help is to look at codeowners file and possibly make a commitment to maintain a piece of the package. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@tdhock any update on the new form of governance? On the 20th of June, you wrote that the project will start next month. Will that materialise? |
It's a little concerning that the person in contact with the maintainer is also MIA (edit: missing in action?), even though they were meant to start the new form of governance two months ago. How can we get this project going if the |
data.table stands as an unparalleled tool for many, characterized by its efficient data manipulation capabilities, swift performance, and its concise yet potent syntax. Personally, I consider it the most valuable R package, and it's the primary reason my team and I gravitate towards using R. Its role in our everyday tasks and larger projects is monumental. The recent absence of its main maintainer, coupled with a growing roster of unresolved issues, casts shadows over the package's trajectory. Given its significance, we should leave no stone unturned in safeguarding its future. What if we established a task force of dedicated users to sift through and prioritize these issues, potentially even sketching out developmental roadmaps? This could serve as an interim solution as we seek to engage with the primary maintainer. Furthermore, might there be a platform or method allowing us to financially back the package's evolution? The decline of such a pivotal package would be a significant loss. I earnestly hope someone with the requisite skills and passion can rise to champion its continued development. |
I definitively back the proposal of @bluetealatte. Moreover, if I understood it correctly, I think that was the original idea by @MLopez-Ibanez, but it seemed to end up in a dead end (?). The first thing should be to contact @mattdowle to check what could be done. Otherwise, without him, the only viable road is a fork. |
Checks have started failing in CRAN: https://cran.r-project.org/web/checks/check_results_data.table.html At this point, it seems a fork (with a new name rather than a git fork) may be the only solution. If someone can create a |
Cran check looks good. Timeout on old windows is known issue, reported to cran and discussed here in another issue. |
hi all, thanks for your concerns and valuable comments. I have created a new issue #5676 to discuss possibilities and proposals to formalize a governance document for data.table, and hopefully that should address some of these concerns. |
I don't have any fundamental problem with the existence of a governance document. However, the existence of such a document doesn't actually solve some of the important questions raised here. From a practical standpoint, when might users might expect to see a CRAN release to address issues like #5538 that are slated to be resolved in 1.14.9? Are you proposing that everything will be put on hold for 6-9 months while a document is drafted before anyone besides Matt approves a PR or puts out a release? |
I agree. At least should make 1.14.9 completed while drafting the governance document. |
Hi @msummersgill thanks for your comment. |
Glad you pointed out #5133. If anyone wants to release fast, this is the place to start. |
Is it necessary to wait 9 months to merge the PR that fixes the github actions #5632 ? |
Another thing that would be useful in the short term: Pin issues like #5676 so people can find them quickly. |
Hi! Another revdep issue that is tricky, but must be resolved prior to releasing new features to CRAN is #5541 so if anyone has time to investigate and fix, that would be much appreciated. (and would make it possible to release new features to CRAN sooner) |
Another alternative is to fork If by the time this process is finished (or the person doing it has had enough or the next release of R is about to be released), there is no progress with The code here: https://github.com/tdhock/data.table-revdeps (more details here: https://github.com/Rdatatable/data.table/wiki/Release-management-and-revdep-checks) may be helpful to implement the above idea. |
data.table is not abandoned, and Matt has granted Maintainer team to Jan, Michael, and myself, so we definitely do not need to fork. Let's continue working together in this repo to make data.table the best it can be! |
There has been no commits since February.
There are more than 1k issues and 132 PRs (some of them obvious , like fixing the github actions).
@mattdowle seems to be the only person able to commit to the main branch and he has not been active in GitHub since February.
Has the project been abandoned? Is there some activity going on behind the scenes that is not visible from the outside?
The text was updated successfully, but these errors were encountered: