Datasets & Dataloaders

PyTorch Tutorials

Two data primitives:

  1. torch.utils.data.DataLoader - wraps an iterable around a Dataset for easy access to samples.
  2. torch.utils.data.Dataset - stores samples and corresponding labels

Loading a Dataset

Iterating & Visualizing the Dataset

Create a Custom Dataset for Files

--init--

__len__

__getitem__

Prepare data for training with DataLoaders

The Dataset gets the dataset's features and labels one sample at a time. Usually, we train with mini-batches, reshuffling the data at each epoch to reduce overfitting, and use Python's multiprocessing to speed up retrieval.

DataLoader is an iterable that abstracts this complexity in an easy API.

Iterate through the DataLoader