Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[💡SUG] support for atomic files > memory #1618

Open
skunkwerk opened this issue Jan 4, 2023 · 1 comment
Open

[💡SUG] support for atomic files > memory #1618

skunkwerk opened this issue Jan 4, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@skunkwerk
Copy link

thank you for the wonderful library!

Is your feature request related to a problem? Please describe.
I'm wondering if RecBole's data flow will allow for training on large atomic files that won't fit into memory on a single machine? It seems like the bottleneck may be the DataSet stage, where a pandas DataFrame is created?

Describe the solution you'd like
Switching the dataframe to another data structure to allow for out-of-core access to large atomic files

Describe alternatives you've considered

  • Dask dataframes will require some code changes, as it's not 100% compatible with the pandas APIs
  • Modin should require minor code changes, as it's a drop-in replacement for pandas dataframes
@skunkwerk skunkwerk added the enhancement New feature or request label Jan 4, 2023
@Ethan-TZ Ethan-TZ self-assigned this Jan 6, 2023
@Ethan-TZ
Copy link
Member

Ethan-TZ commented Jan 6, 2023

@skunkwerk Thanks for your suggestion! We will consider it into our subsequent development plan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants