You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm wondering if RecBole's data flow will allow for training on large atomic files that won't fit into memory on a single machine? It seems like the bottleneck may be the DataSet stage, where a pandas DataFrame is created?
Describe the solution you'd like
Switching the dataframe to another data structure to allow for out-of-core access to large atomic files
Describe alternatives you've considered
Dask dataframes will require some code changes, as it's not 100% compatible with the pandas APIs
Modin should require minor code changes, as it's a drop-in replacement for pandas dataframes
The text was updated successfully, but these errors were encountered:
thank you for the wonderful library!
Is your feature request related to a problem? Please describe.
I'm wondering if RecBole's data flow will allow for training on large atomic files that won't fit into memory on a single machine? It seems like the bottleneck may be the DataSet stage, where a pandas DataFrame is created?
Describe the solution you'd like
Switching the dataframe to another data structure to allow for out-of-core access to large atomic files
Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: