ml.tasks.datasets.multi_iter

Defines a dataset for iterating from multiple sub-datasets.

It’s often the case that you want to write a dataset for iterating from a single sample, then combine all those datasets into one mega-dataset for iterating from all the samples. This dataset serves that purpose by, at each iteration, randomly choosing a dataset and getting it’s next sample, until all samples in all datasets have been exhausted.

class ml.tasks.datasets.multi_iter.DatasetInfo(dataset: torch.utils.data.dataset.IterableDataset[T], sampling_rate: float = 1.0)[source]

Bases: Generic[T]

dataset: IterableDataset[T]

sampling_rate: float = 1.0

class ml.tasks.datasets.multi_iter.MultiIterDataset(datasets: Iterable[DatasetInfo[T]], *, until_all_empty: bool = False, iterate_forever: bool = False)[source]

Bases: IterableDataset[T]

Defines a dataset for iterating from multiple iterable datasets.

Parameters:

datasets – The information about the datasets to iterate from and how to iterate them; specifically, the sampling rate of each dataset.
until_all_empty – If set, iterates until all datasets are empty, otherwise only iterate until any dataset is empty
iterate_forever – If set, iterate child dataset forever

iterators: list[Iterator[T]]

rate_cumsum: ndarray