You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quoting a recent discussion concerning pylearn2, Pascal Lamblin (@lamblin) offered some nice solutions to a problem both our libraries are having:
Considering Datasets immutable, and only allow read access
to the data through an iterator. Current-style preprocessing
could be done either by a different script beforehand, or by
a function that returns a different Dataset object. That would
help making the experiments checkpoint and restart.
have an explicit pipeline of on-the-fly processing on minibatches
between the Dataset and the Model. These transformations would not
be part of the Theano graph, but happen on the numeric data. These
could be iterators not unlike TransformerIterator, but would not
be limited to batch-by-batch processing, and could do things like
caching, data augmentation, data reordering.
While these solutions are offered for pylearn2, they also concern dp. The Preprocess objects currently modify the DataSets inplace. Currently, preprocessing has to be done each time you run an experiment. But you could easily do it once, and reuse that Checkpoint for your experiments. All you would need is a script to create the checkpoint and a means of referring to the resulting files from you experiment.
Change Preprocesses so that they work with Batches and provide an output (no inplace).
preprocess.lua : a script to apply Preprocess to DataSources and save the resulting data to disk
Checkpoint : a generic DataSource that works with the output of the preprocess.lua scripts.
common data format : hdf5 + view spec. Or th7 + view spec (for now). View spec in JSON format.
The text was updated successfully, but these errors were encountered:
We call the on-the-fly preprocessors data streams, while the datasets themselves are immutable. For a lengthier discussion on how we do checkpointing you can look here.
Quoting a recent discussion concerning pylearn2, Pascal Lamblin (@lamblin) offered some nice solutions to a problem both our libraries are having:
While these solutions are offered for pylearn2, they also concern dp. The Preprocess objects currently modify the DataSets inplace. Currently, preprocessing has to be done each time you run an experiment. But you could easily do it once, and reuse that Checkpoint for your experiments. All you would need is a script to create the checkpoint and a means of referring to the resulting files from you experiment.
The text was updated successfully, but these errors were encountered: