ml.loggers.multi

Defines a general logger for munging logged values to an expected format.

This logger handles munging, rate limiting, and multiplexing logged values to each of the implemented child loggers. It is the logging interface that is exposed to the task and model.

ml.loggers.multi.standardize_text(text: str, max_line_length: int | None = None, remove_non_ascii: bool = False) list[str][source]

Standardizes a text string to a list of lines.

Parameters:
  • text – The text to standardize

  • max_line_length – If set, truncate lines to this length

  • remove_non_ascii – Remove non-ASCII characters if present

Returns:

The standardized text lines

ml.loggers.multi.get_audio_channel(audio: Tensor, channel_select_mode: Literal['first', 'last', 'mean']) Tensor[source]

For stereo audio, selects a single channel.

Parameters:
  • audio – The audio tensor to select a channel from, with shape (C, L)

  • channel_select_mode – The channel selection mode

Returns:

The selected audio channel

Raises:

ValueError – If the audio shape is invalid

ml.loggers.multi.make_human_viewable_resolution(image: Tensor, interpolation: InterpolationMode = InterpolationMode.BILINEAR, trg_res: tuple[int, int] = (250, 250)) Tensor[source]

Resizes image to human-viewable resolution.

Parameters:
  • image – The image to resize, with shape (C, H, W)

  • interpolation – Interpolation mode to use for image resizing

  • trg_res – The target image resolution; the image will be reshaped to have approximately the same area as an image with this resolution

Returns:

The resized image

ml.loggers.multi.standardize_image(image: Tensor, *, log_key: str | None = None, normalize: bool = True, keep_resolution: bool = False) Tensor[source]

Converts an arbitrary image to shape (C, H, W).

Parameters:
  • image – The image tensor to log

  • log_key – An optional logging key to use in the exception message

  • normalize – Normalize images to (0, 1)

  • keep_resolution – If set, preserve original image resolution, otherwise change image resolution to human-viewable

Returns:

The normalized image, with shape (C, H, W)

Raises:

ValueError – If the image shape is invalid

ml.loggers.multi.standardize_images(images: Tensor, labels: LabelT, *, max_images: int | None = None, log_key: str | None = None, normalize: bool = True, keep_resolution: bool = False) tuple[torch.Tensor, LabelT][source]

Converts an arbitrary set of images to shape (B, C, H, W).

Parameters:
  • images – The image tensor to log

  • labels – The labels for the images

  • max_images – Maximum number of images to select

  • log_key – An optional logging key to use in the exception message

  • normalize – Normalize images to (0, 1)

  • keep_resolution – If set, preserve original image resolution, otherwise change image resolution to human-viewable

Returns:

The normalized image, with shape (B, C, H, W)

Raises:

ValueError – If the image shape is invalid

ml.loggers.multi.audio_warning_ticker() IntervalTicker[source]
ml.loggers.multi.standardize_audio(audio: Tensor, *, log_key: str | None = None) Tensor[source]

Converts an arbitrary audio tensor to shape (C, T).

Parameters:
  • audio – The audio tensor to log

  • log_key – An optional logging key to use in the exception message

Returns:

The standardized audio tensor, with shape (C, T)

Raises:

ValueError – If the audio shape is invalid

ml.loggers.multi.standardize_audios(audios: Tensor, *, log_key: str | None = None, max_audios: int | None = None) Tensor[source]

Converts an arbitrary audio tensor to shape (B, C, T).

Parameters:
  • audios – The audio tensor to log

  • log_key – An optional logging key to use in the exception message

  • max_audios – Maximum number of audios to select

Returns:

The standardized audio tensor, with shape (B, C, T)

Raises:

ValueError – If the audio shape is invalid

ml.loggers.multi.separate_with_padding(audio: Tensor, sep_frames: int) Tensor[source]

Converts a (B, C, T) waveform to (C, B * (T + sep_frames) - sep_frames).

Parameters:
  • audio – The audio tensor to separate

  • sep_frames – Number of frames to insert between each audio tensor

Returns:

The separated audio tensor

Raises:

ValueError – If the audio shape is invalid

ml.loggers.multi.standardize_video(video: Tensor, *, log_key: str | None = None, normalize: bool = True) Tensor[source]

Converts an arbitrary video to shape (T, C, H, W).

Parameters:
  • video – The video tensor to log

  • log_key – An optional logging key to use in the exception message

  • normalize – Normalize images to (0, 1)

Returns:

The normalized video, with shape (T, C, H, W)

Raises:

ValueError – If the video shape is invalid

ml.loggers.multi.standardize_videos(videos: Tensor, *, max_videos: int | None = None, log_key: str | None = None, normalize: bool = True) Tensor[source]

Converts an arbitrary video to shape (B, T, C, H, W).

Parameters:
  • videos – The video tensor to log

  • max_videos – Maximum number of images to select

  • log_key – An optional logging key to use in the exception message

  • normalize – Normalize images to (0, 1)

Returns:

The normalized video, with shape (B, T, C, H, W)

Raises:

ValueError – If the video shape is invalid

ml.loggers.multi.image_with_text(image: Tensor, text: list[str], max_num_lines: int | None = None, line_spacing: int = 4, centered: bool = True) Tensor[source]

Adds a text label to an image.

Parameters:
  • image – The image to label, with shape (C, H, W)

  • text – The text label for the image

  • max_num_lines – The number of lines of spacing to add to the bottom of the image

  • line_spacing – The spacing between adjacent lines

  • centered – If set, center the text labels, otherwise align to the left

Returns:

The image with a text label

ml.loggers.multi.normalize_video_fps(video: Tensor | list[torch.Tensor], fps: int | None, length: float | None, stack_dim: int = 0, target_fps: int = 12) Tensor[source]

Normalizes a video to have a particular FPS.

Parameters:
  • video – The video to normalize, with shape (T, C, H, W)

  • fps – The desired frames per second

  • length – The desired video length, in seconds, at the target FPS

  • target_fps – The target frames per second for the logger

  • stack_dim – Which dimension to stack along, for lists

Returns:

The normalized video

ml.loggers.multi.standardize_point_cloud(value: Tensor, max_points: int, *, log_key: str | None) Tensor[source]
ml.loggers.multi.make_square_image_or_video(images_or_videos: Tensor, *, sep: int = 0, squareness_weight: float = 1.0, emptiness_weight: float = 1.0) Tensor[source]

Makes a square image by concatenating all the child images.

This does a simple ternary search to minimize a squareness penalty and an emptiness penalty (i.e., the resulting image should be mostly filled in and also approximately square).

Parameters:
  • images_or_videos – The images tensor, with shape (B, C, H, W) or (B, T, C, H, W)

  • sep – Some optional padding around the images

  • squareness_weight – Weight for number of non-square pixels in penalty

  • emptiness_weight – Weight for number of empty pixels in penalty

Returns:

The square image, with shape (C, H’, W’) or (T, C, H’, W’)

ml.loggers.multi.cast_fp32(value: T) T[source]
class ml.loggers.multi.namespace_context(name: str | None)[source]

Bases: object

class ml.loggers.multi.MultiLogger(default_namespace: str = 'value')[source]

Bases: object

Defines an intermediate container which holds values to log somewhere else.

resolve_namespace(namespace: str | None = None) str[source]
log_scalar(key: str, value: Callable[[], int | float | Tensor] | int | float | Tensor, *, namespace: str | None = None) None[source]

Logs a scalar value.

Parameters:
  • key – The key being logged

  • value – The scalar value being logged

  • namespace – An optional logging namespace

log_string(key: str, value: Callable[[], str] | str, *, namespace: str | None = None) None[source]

Logs a string value.

Parameters:
  • key – The key being logged

  • value – The string value being logged

  • namespace – An optional logging namespace

log_image(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, keep_resolution: bool = False) None[source]

Logs an image.

Parameters:
  • key – The key being logged

  • value – The image being logged; can be (C, H, W), (H, W, C) or (H, W) as an RGB (3 channel) or grayscale (1 channel) image

  • namespace – An optional logging namespace

  • keep_resolution – If set, keep the image resolution the same, otherwise upscale or downscale the image to a standard resolution

log_labeled_image(key: str, value: Callable[[], tuple[torch.Tensor, str]] | tuple[torch.Tensor, str], *, namespace: str | None = None, max_line_length: int | None = None, keep_resolution: bool = False, centered: bool = True) None[source]

Logs an image with a label.

Parameters:
  • key – The key being logged

  • value – The image and label being logged; the image can be (C, H, W), (H, W, C) or (H, W) as an RGB (3 channel) or grayscale (1 channel) image

  • namespace – An optional logging namespace

  • max_line_length – Labels longer than this length are wrapped around

  • keep_resolution – If set, keep the image resolution the same, otherwise upscale or downscale the image to a standard resolution

  • centered – If set, center the text labels, otherwise align to the left

log_images(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, keep_resolution: bool = False, max_images: int | None = None, sep: int = 0) None[source]

Logs a set of images.

The images are tiled to be nearly-square.

Parameters:
  • key – The key being logged

  • value – The images being logged; can be (B, C, H, W), (B, H, W, C) or (B H, W) as an RGB (3 channel) or grayscale (1 channel) image

  • namespace – An optional logging namespace

  • keep_resolution – If set, keep the image resolution the same, otherwise upscale or downscale the image to a standard resolution

  • max_images – The maximum number of images to show; extra images are clipped

  • sep – An optional separation amount between adjacent images

log_labeled_images(key: str, value: Callable[[], tuple[torch.Tensor, Sequence[str]]] | tuple[torch.Tensor, Sequence[str]], *, namespace: str | None = None, max_line_length: int | None = None, keep_resolution: bool = False, max_images: int | None = None, sep: int = 0, centered: bool = True) None[source]

Logs a set of images with labels.

The images are tiled to be nearly-square.

Parameters:
  • key – The key being logged

  • value – The images and labels being logged; images can be (B, C, H, W), (B, H, W, C) or (B, H, W) as an RGB (3 channel) or grayscale (1 channel) image, with exactly B labels

  • namespace – An optional logging namespace

  • max_line_length – Labels longer than this length are wrapped around

  • keep_resolution – If set, keep the image resolution the same, otherwise upscale or downscale the image to a standard resolution

  • max_images – The maximum number of images to show; extra images are clipped

  • sep – An optional separation amount between adjacent images

  • centered – If set, center the text labels, otherwise align to the left

log_audio(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, sample_rate: int = 44100, log_spec: bool = True, n_fft_ms: float = 32.0, hop_length_ms: float | None = None, channel_select_mode: Literal['first', 'last', 'mean'] = 'first', keep_resolution: bool = False) None[source]

Logs an audio clip.

Parameters:
  • key – The key being logged

  • value – The audio clip being logged; can be (C, T) or (T) as a mono (1 channel) or stereo (2 channel) audio clip

  • namespace – An optional logging namespace

  • sample_rate – The sample rate of the audio clip

  • log_spec – If set, also log the spectrogram

  • n_fft_ms – FFT size, in milliseconds

  • hop_length_ms – The FFT hop length, in milliseconds

  • channel_select_mode – How to select the channel if the audio is stereo; can be “first”, “last”, or “mean”; this is only used for the spectrogram

  • keep_resolution – If set, keep the resolution of the spectrogram; otherwise, make human-viewable

log_audios(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, sep_ms: float = 0.0, max_audios: int | None = None, sample_rate: int = 44100, log_spec: bool = True, n_fft_ms: float = 32.0, hop_length_ms: float | None = None, channel_select_mode: Literal['first', 'last', 'mean'] = 'first', spec_sep: int = 0, keep_resolution: bool = False) None[source]

Logs multiple audio clips.

Parameters:
  • key – The key being logged

  • value – The audio clip being logged; can be (B, C, T) or (B, T) as a mono (1 channel) or stereo (2 channel) audio clip, with exactly B clips

  • namespace – An optional logging namespace

  • sep_ms – An optional separation amount between adjacent audio clips

  • max_audios – An optional maximum number of audio clips to log

  • sample_rate – The sample rate of the audio clip

  • log_spec – If set, also log the spectrogram

  • n_fft_ms – FFT size, in milliseconds

  • hop_length_ms – The FFT hop length, in milliseconds

  • channel_select_mode – How to select the channel if the audio is stereo; can be “first”, “last”, or “mean”; this is only used for the spectrogram

  • spec_sep – An optional separation amount between adjacent spectrograms

  • keep_resolution – If set, keep the resolution of the spectrogram; otherwise, make human-viewable

log_spectrogram(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, sample_rate: int = 44100, n_fft_ms: float = 32.0, hop_length_ms: float | None = None, channel_select_mode: Literal['first', 'last', 'mean'] = 'first', keep_resolution: bool = False) None[source]

Logs spectrograms of an audio clip.

Parameters:
  • key – The key being logged

  • value – The audio clip being logged; can be (C, T) or (T) as a mono (1 channel) or stereo (2 channel) audio clip

  • namespace – An optional logging namespace

  • sample_rate – The sample rate of the audio clip

  • n_fft_ms – FFT size, in milliseconds

  • hop_length_ms – The FFT hop length, in milliseconds

  • channel_select_mode – How to select the channel if the audio is stereo; can be “first”, “last”, or “mean”; this is only used for the spectrogram

  • keep_resolution – If set, keep the resolution of the spectrogram; otherwise, make human-viewable

log_spectrograms(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, max_audios: int | None = None, sample_rate: int = 44100, n_fft_ms: float = 32.0, hop_length_ms: float | None = None, channel_select_mode: Literal['first', 'last', 'mean'] = 'first', spec_sep: int = 0, keep_resolution: bool = False) None[source]

Logs spectrograms of audio clips.

Parameters:
  • key – The key being logged

  • value – The audio clip being logged; can be (B, C, T) or (B, T) as a mono (1 channel) or stereo (2 channel) audio clip, with exactly B clips

  • namespace – An optional logging namespace

  • max_audios – An optional maximum number of audio clips to log

  • sample_rate – The sample rate of the audio clip

  • n_fft_ms – FFT size, in milliseconds

  • hop_length_ms – The FFT hop length, in milliseconds

  • channel_select_mode – How to select the channel if the audio is stereo; can be “first”, “last”, or “mean”; this is only used for the spectrogram

  • spec_sep – An optional separation amount between adjacent spectrograms

  • keep_resolution – If set, keep the resolution of the spectrogram; otherwise, make human-viewable

log_video(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, fps: int | None = None, length: float | None = None) None[source]

Logs a video.

Parameters:
  • key – The key being logged

  • value – The video being logged; the video can be (T, C, H, W), (T, H, W, C) or (T, H, W) as an RGB (3 channel) or grayscale (1 channel) video

  • namespace – An optional logging namespace

  • fps – The video frames per second

  • length – The desired video length, in seconds, at the target FPS

log_videos(key: str, value: Callable[[], Tensor | list[torch.Tensor]] | Tensor | list[torch.Tensor], *, namespace: str | None = None, max_videos: int | None = None, sep: int = 0, fps: int | None = None, length: int | None = None) None[source]

Logs a set of video.

Parameters:
  • key – The key being logged

  • value – The videos being logged; the video can be (B, T, C, H, W), (B, T, H, W, C) or (B T, H, W) as an RGB (3 channel) or grayscale (1 channel) video

  • namespace – An optional logging namespace

  • max_videos – The maximum number of videos to show; extra images are clipped

  • sep – An optional separation amount between adjacent videos

  • fps – The video frames per second

  • length – The desired video length, in seconds, at the target FPS

log_histogram(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None) None[source]

Logs a histogram.

Parameters:
  • key – The key being logged

  • value – The values to create a histogram from, with arbitrary shape

  • namespace – An optional logging namespace

log_point_cloud(key: str, value: Callable[[], Tensor] | Tensor, *, namespace: str | None = None, max_points: int = 1000) None[source]

Logs a point cloud.

Parameters:
  • key – The key being logged

  • value – The point cloud values, with shape (N, 3) or (B, …, 3); can pass multiple batches in order to show multiple point clouds

  • namespace – An optional logging namespace

  • max_points – An optional maximum number of points in the point cloud

write_dict(loggers: list[ml.loggers.base.BaseLogger], values: dict[str, dict[str, Callable[[], LogT]]], state: State, func: Callable[[BaseLogger], Callable[[str, Callable[[], LogT], State, str], None]]) None[source]
write(loggers: list[ml.loggers.base.BaseLogger], state: State) None[source]