Shuffling your training data every epoch and splitting a dataset into training and validation splits are common practices.
While DataLoaders
itself only provides tools to load your data effectively, using the underlying MLDataPattern
package makes these things easy.
using MLDataPattern: shuffleobs
data = ...
dataloader = DataLoader(shuffleobs(data), batchsize)
using MLDataPattern: datasubset
data = ...
idxs = 1:1000 # indices to select from dataset
dataloader = DataLoader(datasubset(data, idxs)), batchsize)
using MLDataPattern: splitobs
data = ...
traindata, valdata = splitobs(data, 0.8) # 80/20 split
dataloader = DataLoader(shuffleobs(data), batchsize)
For other dataset operations like weighted sampling, see this section in MLDataPattern's documentation.