ml.utils.audio
Defines utilites for saving and loading audio streams.
The main API for using this module is:
from ml.utils.audio import read_audio, write_audio
This just uses FFMPEG so it should be rasonably quick.
- class ml.utils.audio.AudioProps(sample_rate: int, channels: int, num_frames: int)[source]
Bases:
object
- sample_rate: int
- channels: int
- num_frames: int
- classmethod from_file(fpath: str | Path) AudioProps [source]
- class ml.utils.audio.AudioFile(path: pathlib.Path, props: ml.utils.audio.AudioProps)[source]
Bases:
object
- path: Path
- props: AudioProps
- ml.utils.audio.rechunk_audio(audio_chunks: Iterator[ndarray], *, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: tuple[int, int] | None = None) Iterator[ndarray] [source]
Rechunks audio chunks to a new size.
- Parameters:
audio_chunks – The input audio chunks.
prefetch_n – The number of samples to prefetch.
chunk_length – The length of the chunks to yield.
sample_rate – If set, resample all chunks to this sample rate. The first argument is the input sample rate and the second argument is the output sample rate.
- Yields:
Chunks of waveforms with shape
(channels, num_frames)
.
- ml.utils.audio.read_audio(in_file: str | Path, *, blocksize: int = 16000, prefetch_n: int = 1, chunk_length: int | None = None, sample_rate: int | None = None) Iterator[ndarray] [source]
Function that reads an audio file to a stream of numpy arrays using SoundFile.
- Parameters:
in_file – Path to the input file.
blocksize – Number of samples to read at a time.
prefetch_n – The number of samples to prefetch.
chunk_length – The length of the chunks to yield.
sample_rate – If set, resample all chunks to this sample rate.
- Yields:
Audio chunks as numpy arrays, with shape
(channels, num_frames)
.
- ml.utils.audio.write_audio(itr: Iterator[ndarray | Tensor], out_file: str | Path, sample_rate: int) None [source]
Function that writes a stream of audio to a file using SoundFile.
- Parameters:
itr – Iterator of audio chunks, with shape
(channels, num_frames)
.out_file – Path to the output file.
sample_rate – Sampling rate of the audio.
- ml.utils.audio.get_audio_props(fpath: str | Path) AudioProps
- ml.utils.audio.read_audio_random_order(in_file: str | Path | BinaryIO, chunk_length: int, *, sample_rate: int | None = None, include_last: bool = False) Iterator[ndarray] [source]
Function that reads a stream of audio from a file in random order.
This is similar to
read_audio
, but it yields chunks in random order, which can be useful for training purposes.- Parameters:
in_file – Path to the input file.
chunk_length – Size of the chunks to read.
sample_rate – Sampling rate to resample the audio to. If
None
, will use the sampling rate of the input audio.include_last – Whether to include the last chunk, even if it’s smaller than
chunk_length
.
- Yields:
Audio chunks as arrays, with shape
(n_channels, chunk_length)
.
- class ml.utils.audio.AudioSarFileDataset(sar_file: str | Path, sample_rate: int, length_ms: float, max_iters: int | None = None, channel_idx: int = 0, include_file_fn: Callable[[str, int], bool] | None = None)[source]
Bases:
IterableDataset
[tuple
[Tensor
,int
,tuple
[str
,int
]]]Defines a dataset for iterating through audio samples in a SAR file.
This dataset yields samples with shape
(num_channels, num_samples)
, along with the name of the file they were read from.- Parameters:
sar_file – The SAR file to read from.
sample_rate – The sampling rate to resample the audio to.
length_ms – The length of the audio clips in milliseconds.
channel_idx – The index of the channel to use.
- property sar: sarfile
- property names: list[str]
- class ml.utils.audio.AudioSarFileSpeakerDataset(ds: AudioSarFileDataset)[source]
Bases:
IterableDataset
[tuple
[Tensor
,int
]],ABC
Defines a dataset with speaker information for a TAR file.
- abstract get_speaker_id(name: str, num_bytes: int) str | int [source]
Returns the speaker ID for a given file.
- Parameters:
name – The file entry name.
num_bytes – The number of bytes in the file entry.
- Returns:
The speaker ID corresponding to the file.
- property num_speakers: int
- property ds_iter: AudioSarFileDataset
- property speaker_ids: list[str | int]
- property speaker_map: dict[str | int, int]
- property inv_speaker_map: dict[int, str | int]
- property speaker_counts: Counter[str | int]