Tensorboard not working with Trainer Pattern #20809

GeraudK · 2025-01-24T09:58:16Z

I'm using the Keras Trainer pattern as illustrated here. The issue when using this pattern is that when you use Tensorboard only the top level weights are being recorded.

The reason for this is that Tensorboard is recording the weights for the all the layers in self.model.layers here. But this equal to [<Sequential name=sequential, built=True>] ~~and the weights for that Sequential object is []~~

I tried several things:

Passing a CallBackList to the Tensorflow Trainer when calling fit passing model_a instead of trainer_a, but this fails because model_a has no optimizer
I tried to overwrite the layers method in the Trainer object to have recursive=True but the weights were still not showing in TensorBoard suggesting that something else is going on

I'm open to any suggestions here.

full example

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import tensorflow as tf
import keras
from keras.callbacks import TensorBoard

# Load MNIST dataset and standardize the data
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

class MyTrainer(keras.Model):
    def __init__(self, model):
        super().__init__()
        self.model = model
        # Create loss and metrics here.
        self.loss_fn = keras.losses.SparseCategoricalCrossentropy()
        self.accuracy_metric = keras.metrics.SparseCategoricalAccuracy()

    @property
    def metrics(self):
        # List metrics here.
        return [self.accuracy_metric]

    def train_step(self, data):
        x, y = data
        with tf.GradientTape() as tape:
            y_pred = self.model(x, training=True)  # Forward pass
            # Compute loss value
            loss = self.loss_fn(y, y_pred)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update metrics
        for metric in self.metrics:
            metric.update_state(y, y_pred)

        # Return a dict mapping metric names to current value.
        return {m.name: m.result() for m in self.metrics}

    def test_step(self, data):
        x, y = data

        # Inference step
        y_pred = self.model(x, training=False)

        # Update metrics
        for metric in self.metrics:
            metric.update_state(y, y_pred)
        return {m.name: m.result() for m in self.metrics}

    def call(self, x):
        # Equivalent to `call()` of the wrapped keras.Model
        x = self.model(x)
        return x

model_a = keras.models.Sequential(
    [
        keras.layers.Flatten(input_shape=(28, 28)),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(10, activation="softmax"),
    ]
)

callbacks = [TensorBoard(histogram_freq=1)]
trainer_1 = MyTrainer(model_a)
trainer_1.compile(optimizer=keras.optimizers.SGD())
trainer_1.fit(
    x_train, y_train, epochs=5, batch_size=64, validation_data=(x_test, y_test), callbacks=callbacks,
)

The text was updated successfully, but these errors were encountered:

GeraudK · 2025-01-29T18:05:18Z

After some more investigation I found out that nothing is wrong with TensorBoard or the Trainer pattern but the issue resided on how the weights are named (basically they aren't unique). In TensorBoard when you look into the self.model.layers[0].weight object you find the following list:

[<KerasVariable shape=(784, 256), dtype=float32, path=sequential_2/dense_4/kernel>,
 <KerasVariable shape=(256,), dtype=float32, path=sequential_2/dense_4/bias>,
 <KerasVariable shape=(256, 10), dtype=float32, path=sequential_2/dense_5/kernel>,
 <KerasVariable shape=(10,), dtype=float32, path=sequential_2/dense_5/bias>]

Now if you look at the name of each weight you will find the following:

['kernel', 'bias', 'kernel', 'bias']

This leas TensorBaord to save everything under the same name: kernel and bias

GeraudK · 2025-01-29T18:44:23Z

I fixed it like this but obviously this is not the underlying cause:

from keras.callbacks import TensorBoard as Base
from pathlib import Path

__all__ = ["TensorBoard"]


def find_new_name(weights_names: list, name: str) -> str:
    if name not in weights_names:
        return name

    for i in range(1, 1000):
        new_name = f"{name}_{i}"
        if new_name not in weights_names:
            return new_name
    raise ValueError("Could not find a new name")


class TensorBoard(Base):
    weights_names = {}

    def get_weight_name(self, weight):
        key = id(weight)
        name = self.weights_names.get(key)
        values = self.weights_names.values()

        if name is None:
            name = str(Path(*Path(weight.path).parts[2:]))
            name = find_new_name(values, name)
            self.weights_names[key] = name

        return name

    def _log_weights(self, epoch):
        """Logs the weights of the Model to TensorBoard."""
        with self._train_writer.as_default():
            for layer in self.model.layers:
                for weight in layer.weights:
                    weight_name = self.get_weight_name(weight)
                    # Add a suffix to prevent summary tag name collision.
                    histogram_weight_name = weight_name + "/histogram"
                    self.summary.histogram(histogram_weight_name, weight, step=epoch)
                    if self.write_images:
                        # Add a suffix to prevent summary tag name
                        # collision.
                        image_weight_name = weight_name + "/image"
                        self._log_weight_as_image(weight, image_weight_name, epoch)
            self._train_writer.flush()

github-actions bot assigned sachinprasadhs Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorboard not working with Trainer Pattern #20809

Tensorboard not working with Trainer Pattern #20809

GeraudK commented Jan 24, 2025 •

edited

Loading

GeraudK commented Jan 29, 2025

GeraudK commented Jan 29, 2025

Tensorboard not working with Trainer Pattern #20809

Tensorboard not working with Trainer Pattern #20809

Comments

GeraudK commented Jan 24, 2025 • edited Loading

GeraudK commented Jan 29, 2025

GeraudK commented Jan 29, 2025

GeraudK commented Jan 24, 2025 •

edited

Loading