ml.tasks.datasets.clippify
Defines a dataset for converting frame streams to clips.
- class ml.tasks.datasets.clippify.ClippifyDataset(image_dataset: IterableDataset[Batch], num_images: int, *, stride: int = 1, jump_size: int = 1, sample_first: bool = False, use_last: bool = False)[source]
Bases:
IterableDataset
[tuple
[Tensor
,Batch
]],Generic
[Batch
]Defines a dataset which efficiently yields sequences of images.
The underlying image dataset just needs to iterate through a sequence of images, in order. This wrapper dataset collates the images into clips with some striding between adjacent images.
Images are inserted into a deque and routinely popped. The underlying dataset should do necessary error handling, since this dataset will simply throw an error on failure.
- Parameters:
image_dataset – The child dataset which yields the images
num_images – The number of images in each clip
stride – The stride between adjacent images
jump_size – How many frames to jump in the future
sample_first – If set, don’t always start on the first item; instead, sample the first item within jump_size
use_last – If set, always use the last item in the dataset
- image_iter: Iterator[Batch]
- inds: list[int]
- image_queue: Deque[tuple[int, Batch]]
- image_ptr: int
- image_queue_ptr: int
- hit_last: bool