ml.utils.audio

Defines utilites for saving and loading audio streams.

The main API for using this module is:

from ml.utils.audio import read_audio, write_audio

This just uses FFMPEG so it should be rasonably quick.

class ml.utils.audio.AudioProps(sample_rate: int, channels: int, num_frames: int)[source]

Bases: object

sample_rate: int

channels: int

num_frames: int

classmethod from_file(fpath: str | Path) → AudioProps[source]

class ml.utils.audio.AudioFile(path: pathlib.Path, props: ml.utils.audio.AudioProps)[source]

Bases: object

path: Path

props: AudioProps

classmethod parse(line: str) → AudioFile[source]

ml.utils.audio.rechunk_audio(audio_chunks: Iterator[ndarray], *, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: tuple[int, int] | None = None) → Iterator[ndarray][source]

Rechunks audio chunks to a new size.

Parameters:

audio_chunks – The input audio chunks.
prefetch_n – The number of samples to prefetch.
chunk_length – The length of the chunks to yield.
sample_rate – If set, resample all chunks to this sample rate. The first argument is the input sample rate and the second argument is the output sample rate.

Yields:

Chunks of waveforms with shape (channels, num_frames).

ml.utils.audio.read_audio(in_file: str | Path, *, blocksize: int = 16000, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: int | None = None) → Iterator[ndarray][source]

Function that reads an audio file to a stream of numpy arrays using SoundFile.

Parameters:

in_file – Path to the input file.
blocksize – Number of samples to read at a time.
prefetch_n – The number of samples to prefetch.
chunk_length – The length of the chunks to yield.
sample_rate – If set, resample all chunks to this sample rate.

Yields:

Audio chunks as numpy arrays, with shape (channels, num_frames).

ml.utils.audio.write_audio(itr: Iterator[ndarray | Tensor], out_file: str | Path, sample_rate: int) → None[source]

Function that writes a stream of audio to a file using SoundFile.

Parameters:

itr – Iterator of audio chunks, with shape (channels, num_frames).
out_file – Path to the output file.
sample_rate – Sampling rate of the audio.

ml.utils.audio.get_audio_props(fpath: str | Path) → AudioProps

ml.utils.audio.read_audio_random_order(in_file: str | Path | BinaryIO, chunk_length: int, *, sample_rate: int | None = None, include_last: bool = False) → Iterator[ndarray][source]

Function that reads a stream of audio from a file in random order.

This is similar to read_audio, but it yields chunks in random order, which can be useful for training purposes.

Parameters:

in_file – Path to the input file.
chunk_length – Size of the chunks to read.
sample_rate – Sampling rate to resample the audio to. If None, will use the sampling rate of the input audio.
include_last – Whether to include the last chunk, even if it’s smaller than chunk_length.

Yields:

Audio chunks as arrays, with shape (n_channels, chunk_length).

class ml.utils.audio.AudioSarFileDataset(sar_file: str | Path, sample_rate: int, length_ms: float, max_iters: int | None = None, channel_idx: int = 0, include_file_fn: Callable[[str, int], bool] | None = None)[source]

Bases: IterableDataset[tuple[Tensor, int, tuple[str, int]]]

Defines a dataset for iterating through audio samples in a SAR file.

This dataset yields samples with shape (num_channels, num_samples), along with the name of the file they were read from.

Parameters:

sar_file – The SAR file to read from.
sample_rate – The sampling rate to resample the audio to.
length_ms – The length of the audio clips in milliseconds.
channel_idx – The index of the channel to use.

include_file(name: str, num_bytes: int) → bool[source]

property sar: sarfile

property names: list[str]

class ml.utils.audio.AudioSarFileSpeakerDataset(ds: AudioSarFileDataset)[source]

Bases: IterableDataset[tuple[Tensor, int]], ABC

Defines a dataset with speaker information for a TAR file.

abstract get_speaker_id(name: str, num_bytes: int) → str | int[source]

Returns the speaker ID for a given file.

Parameters:

name – The file entry name.
num_bytes – The number of bytes in the file entry.

Returns:

The speaker ID corresponding to the file.

property num_speakers: int

property ds_iter: AudioSarFileDataset

property speaker_ids: list[str | int]

property speaker_map: dict[str | int, int]

property inv_speaker_map: dict[int, str | int]

property speaker_counts: Counter[str | int]