You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I will have a look into the data provider interface and do a reference implementation for ASE DB files including periodic boundary conditions. For that I would slightly change the current interface, specifically:
make the DataProvider class abstract, e.g., removing 'read_database'
get_properties(property_names, idx=None) will return a dict of properties according to the given indices
iterate(property_names, idx=None) does the same but as a generator
implement a reference ASEDataProvider which will include the functionality currently in DataProvider
then we could have subclasses of DataProvider on top that do batching, pre-loading etc. for the low-level DataProviders
Any thoughts?
The text was updated successfully, but these errors were encountered:
After pondering a bit over the DataProvider (I was on vacation, so only pondering), I have some thoughts and ideas, also for the whole project:
when I have periodic boundary conditions, I preprocess the data by collecting neighborhood information and write that into the ASE database. How should I deal with that here? I could (1) just write that information into the DB, but this is somehow method-specific. Option (2) would be to copy the whole database, which would be a waste of memory. Perhaps it would be best, to require the user to choose in the config.
perhaps, we don't even need this dataprovider classes, we could have this package as a pure interface (as in the name). I would put all the code that does the work in my github as a separate, stand-alone package, same for GDML, SOAP, etc. Then, we can have here only template files for the interface, i.e., we will just do duck-typing to avoid dependencies to this package. That way, this package only needs to "point" to the interface classes of the other packages that take a path to ASE DB, train_idx, val_idx, and a json file with model config, and returns a predefined results object (e.g.. NamedTuple). This way, we can quickly integrate more codes and regularly check for compatibility issues with some CI tool.
I will have a look into the data provider interface and do a reference implementation for ASE DB files including periodic boundary conditions. For that I would slightly change the current interface, specifically:
Any thoughts?
The text was updated successfully, but these errors were encountered: