ml.tasks.datasets.multi_iter
Defines a dataset for iterating from multiple sub-datasets.
It’s often the case that you want to write a dataset for iterating from a single sample, then combine all those datasets into one mega-dataset for iterating from all the samples. This dataset serves that purpose by, at each iteration, randomly choosing a dataset and getting it’s next sample, until all samples in all datasets have been exhausted.
- class ml.tasks.datasets.multi_iter.DatasetInfo(dataset: torch.utils.data.dataset.IterableDataset[T], sampling_rate: float = 1.0)[source]
Bases:
Generic
[T
]- dataset: IterableDataset[T]
- sampling_rate: float = 1.0
- class ml.tasks.datasets.multi_iter.MultiIterDataset(datasets: Iterable[DatasetInfo[T]], *, until_all_empty: bool = False, iterate_forever: bool = False)[source]
Bases:
IterableDataset
[T
]Defines a dataset for iterating from multiple iterable datasets.
- Parameters:
datasets – The information about the datasets to iterate from and how to iterate them; specifically, the sampling rate of each dataset.
until_all_empty – If set, iterates until all datasets are empty, otherwise only iterate until any dataset is empty
iterate_forever – If set, iterate child dataset forever
- iterators: list[Iterator[T]]
- rate_cumsum: ndarray