Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to tibbles #48

Closed
matthiasgomolka opened this issue Apr 22, 2023 · 8 comments
Closed

Switch to tibbles #48

matthiasgomolka opened this issue Apr 22, 2023 · 8 comments
Assignees
Milestone

Comments

@matthiasgomolka
Copy link
Owner

Since data.table does not seem to be under active development any more, we should make the switch to returning tibbles instead of data.tables.

@matthiasgomolka matthiasgomolka added this to the API V3 milestone Apr 22, 2023
@matthiasgomolka matthiasgomolka self-assigned this Apr 22, 2023
@GitHubGeniusOverlord
Copy link

Incorrect assumption! https://github.com/Rdatatable/data.table/blob/master/NEWS.md
Data table is under development.

@matthiasgomolka
Copy link
Owner Author

Well, the last real feature update was more than two years ago. That's quite a while, isn't it?

@joshuaulrich
Copy link

A package not being actively developed is not a good reason to stop using it. You could argue that it's a good reason to keep using it, because it's less likely to make breaking changes. It would be a good idea to stop using it if it weren't actively maintained. But the last release was a few months ago (Feb-2023).

Another very important consideration is that switching from returning data.tables to tibbles is almost certainly going to break your user's code. That's a very significant breaking change that shouldn't be taken lightly.

@matthiasgomolka
Copy link
Owner Author

Thanks for your thoughtful considerations. I'm aware that this change might break users code. That's why I would like to make the change now, since I'm updating the package to the new API version, where breaking changes will be inevitable.

But I'll rethink the issue.

@joshuaulrich
Copy link

joshuaulrich commented May 16, 2023

My main point was that "actively developed" and "actively maintained" are different. For example, most of my packages are feature-complete, so they're not actively developed. They don't get new features often (or at all). That doesn't mean people should avoid them. They're still actively maintained. I still fix bugs and intend to keep them on CRAN.


Good point about updating the API version breaking stuff too. After I made my comment, I noticed you don't have reverse dependencies on CRAN, and your lifecycle is 'experimental'. Those things make breaking changes less of an issue for you right now.

I just had another thought: it's good practice to bump the major version when there are breaking changes.

Anyway, these are just my thoughts. I'm not going to criticize whatever decision you make. It's your package, after all. 😉


EDIT: I reached out to @MichaelChirico and he pointed me to Rdatatable/data.table#5629. So there's some work being done to get data.table development moving again.

@GitHubGeniusOverlord
Copy link

All valid points!
Maybe one more pro argument for data.table (that you all know probably): It's faster. As far as I know its the fastest dataframe package anywhere. Even more when you look at the reliability vs speed tradeoff! (e.g. some python packages like polars or pydatatable are fast too, but just fail at too many tasks). Dt however does what one would expect.
For my projects in which I use your great package, I would transfer the tibble back to data.table if you would change the package to tibble. The reason being that the time costs of not using dt just accumulates later in the workflow of a larger project. I can imagine this is the case for many projects in the finance domain, where time is of essence. As I understand, dt is very popular there for that reason.
So while users certainly can deal with the change, its just going to feel a bit annoying for some. Probably.
I hope, this helps.

@eddelbuettel
Copy link

@matthiasgomolka Another option you could move towards is to add a new function argument, say, return_as supporting a list of supported return formats. We did this in a few Rblpapi functions years ago because even among the three of us (at the time) looking after the package there wasn't "one" preference. If memory serves it wasn't an entirely new idea then either but I no longer recall where we may have gotten it from. I do the same now in package tiledb, and in both also support local configuration option so that for me the return defaults to data.table but the overall default may another value. Just a thought to make the switch to tibble less disruptiive.

@matthiasgomolka
Copy link
Owner Author

matthiasgomolka commented Feb 27, 2024

Since data.table got a new update recently and due to your thoughtful remarks (thanks a lot!) I've decided to stick to data.table. Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants