ml.utils.audio

Defines utilites for saving and loading audio streams.

The main API for using this module is:

from ml.utils.audio import read_audio, write_audio

This just uses FFMPEG so it should be rasonably quick.

class ml.utils.audio.AudioProps(sample_rate: int, channels: int, num_frames: int)[source]

Bases: object

sample_rate: int
channels: int
num_frames: int
classmethod from_file(fpath: str | Path) AudioProps[source]
class ml.utils.audio.AudioFile(path: pathlib.Path, props: ml.utils.audio.AudioProps)[source]

Bases: object

path: Path
props: AudioProps
classmethod parse(line: str) AudioFile[source]
ml.utils.audio.rechunk_audio(audio_chunks: Iterator[ndarray], *, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: tuple[int, int] | None = None) Iterator[ndarray][source]

Rechunks audio chunks to a new size.

Parameters:
  • audio_chunks – The input audio chunks.

  • prefetch_n – The number of samples to prefetch.

  • chunk_length – The length of the chunks to yield.

  • sample_rate – If set, resample all chunks to this sample rate. The first argument is the input sample rate and the second argument is the output sample rate.

Yields:

Chunks of waveforms with shape (channels, num_frames).

ml.utils.audio.read_audio(in_file: str | Path, *, blocksize: int = 16000, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: int | None = None) Iterator[ndarray][source]

Function that reads an audio file to a stream of numpy arrays using SoundFile.

Parameters:
  • in_file – Path to the input file.

  • blocksize – Number of samples to read at a time.

  • prefetch_n – The number of samples to prefetch.

  • chunk_length – The length of the chunks to yield.

  • sample_rate – If set, resample all chunks to this sample rate.

Yields:

Audio chunks as numpy arrays, with shape (channels, num_frames).

ml.utils.audio.write_audio(itr: Iterator[ndarray | Tensor], out_file: str | Path, sample_rate: int) None[source]

Function that writes a stream of audio to a file using SoundFile.

Parameters:
  • itr – Iterator of audio chunks, with shape (channels, num_frames).

  • out_file – Path to the output file.

  • sample_rate – Sampling rate of the audio.

ml.utils.audio.get_audio_props(fpath: str | Path) AudioProps
ml.utils.audio.read_audio_random_order(in_file: str | Path | BinaryIO, chunk_length: int, *, sample_rate: int | None = None, include_last: bool = False) Iterator[ndarray][source]

Function that reads a stream of audio from a file in random order.

This is similar to read_audio, but it yields chunks in random order, which can be useful for training purposes.

Parameters:
  • in_file – Path to the input file.

  • chunk_length – Size of the chunks to read.

  • sample_rate – Sampling rate to resample the audio to. If None, will use the sampling rate of the input audio.

  • include_last – Whether to include the last chunk, even if it’s smaller than chunk_length.

Yields:

Audio chunks as arrays, with shape (n_channels, chunk_length).

class ml.utils.audio.AudioSarFileDataset(sar_file: str | Path, sample_rate: int, length_ms: float, max_iters: int | None = None, channel_idx: int = 0, include_file_fn: Callable[[str, int], bool] | None = None)[source]

Bases: IterableDataset[tuple[Tensor, int, tuple[str, int]]]

Defines a dataset for iterating through audio samples in a SAR file.

This dataset yields samples with shape (num_channels, num_samples), along with the name of the file they were read from.

Parameters:
  • sar_file – The SAR file to read from.

  • sample_rate – The sampling rate to resample the audio to.

  • length_ms – The length of the audio clips in milliseconds.

  • channel_idx – The index of the channel to use.

include_file(name: str, num_bytes: int) bool[source]
property sar: sarfile
property names: list[str]
class ml.utils.audio.AudioSarFileSpeakerDataset(ds: AudioSarFileDataset)[source]

Bases: IterableDataset[tuple[Tensor, int]], ABC

Defines a dataset with speaker information for a TAR file.

abstract get_speaker_id(name: str, num_bytes: int) str | int[source]

Returns the speaker ID for a given file.

Parameters:
  • name – The file entry name.

  • num_bytes – The number of bytes in the file entry.

Returns:

The speaker ID corresponding to the file.

property num_speakers: int
property ds_iter: AudioSarFileDataset
property speaker_ids: list[str | int]
property speaker_map: dict[str | int, int]
property inv_speaker_map: dict[int, str | int]
property speaker_counts: Counter[str | int]