Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard not working with Trainer Pattern #20809

Open
GeraudK opened this issue Jan 24, 2025 · 2 comments
Open

Tensorboard not working with Trainer Pattern #20809

GeraudK opened this issue Jan 24, 2025 · 2 comments
Assignees

Comments

@GeraudK
Copy link

GeraudK commented Jan 24, 2025

I'm using the Keras Trainer pattern as illustrated here. The issue when using this pattern is that when you use Tensorboard only the top level weights are being recorded.

The reason for this is that Tensorboard is recording the weights for the all the layers in self.model.layers here. But this equal to [<Sequential name=sequential, built=True>] and the weights for that Sequential object is []

I tried several things:

  1. Passing a CallBackList to the Tensorflow Trainer when calling fit passing model_a instead of trainer_a, but this fails because model_a has no optimizer
  2. I tried to overwrite the layers method in the Trainer object to have recursive=True but the weights were still not showing in TensorBoard suggesting that something else is going on

I'm open to any suggestions here.

full example

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import tensorflow as tf
import keras
from keras.callbacks import TensorBoard

# Load MNIST dataset and standardize the data
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

class MyTrainer(keras.Model):
    def __init__(self, model):
        super().__init__()
        self.model = model
        # Create loss and metrics here.
        self.loss_fn = keras.losses.SparseCategoricalCrossentropy()
        self.accuracy_metric = keras.metrics.SparseCategoricalAccuracy()

    @property
    def metrics(self):
        # List metrics here.
        return [self.accuracy_metric]

    def train_step(self, data):
        x, y = data
        with tf.GradientTape() as tape:
            y_pred = self.model(x, training=True)  # Forward pass
            # Compute loss value
            loss = self.loss_fn(y, y_pred)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)

        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update metrics
        for metric in self.metrics:
            metric.update_state(y, y_pred)

        # Return a dict mapping metric names to current value.
        return {m.name: m.result() for m in self.metrics}

    def test_step(self, data):
        x, y = data

        # Inference step
        y_pred = self.model(x, training=False)

        # Update metrics
        for metric in self.metrics:
            metric.update_state(y, y_pred)
        return {m.name: m.result() for m in self.metrics}

    def call(self, x):
        # Equivalent to `call()` of the wrapped keras.Model
        x = self.model(x)
        return x

model_a = keras.models.Sequential(
    [
        keras.layers.Flatten(input_shape=(28, 28)),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dropout(0.2),
        keras.layers.Dense(10, activation="softmax"),
    ]
)

callbacks = [TensorBoard(histogram_freq=1)]
trainer_1 = MyTrainer(model_a)
trainer_1.compile(optimizer=keras.optimizers.SGD())
trainer_1.fit(
    x_train, y_train, epochs=5, batch_size=64, validation_data=(x_test, y_test), callbacks=callbacks,
)
@GeraudK
Copy link
Author

GeraudK commented Jan 29, 2025

After some more investigation I found out that nothing is wrong with TensorBoard or the Trainer pattern but the issue resided on how the weights are named (basically they aren't unique). In TensorBoard when you look into the self.model.layers[0].weight object you find the following list:

[<KerasVariable shape=(784, 256), dtype=float32, path=sequential_2/dense_4/kernel>,
 <KerasVariable shape=(256,), dtype=float32, path=sequential_2/dense_4/bias>,
 <KerasVariable shape=(256, 10), dtype=float32, path=sequential_2/dense_5/kernel>,
 <KerasVariable shape=(10,), dtype=float32, path=sequential_2/dense_5/bias>]

Now if you look at the name of each weight you will find the following:

['kernel', 'bias', 'kernel', 'bias']

This leas TensorBaord to save everything under the same name: kernel and bias

Image

@GeraudK
Copy link
Author

GeraudK commented Jan 29, 2025

I fixed it like this but obviously this is not the underlying cause:

from keras.callbacks import TensorBoard as Base
from pathlib import Path

__all__ = ["TensorBoard"]


def find_new_name(weights_names: list, name: str) -> str:
    if name not in weights_names:
        return name

    for i in range(1, 1000):
        new_name = f"{name}_{i}"
        if new_name not in weights_names:
            return new_name
    raise ValueError("Could not find a new name")


class TensorBoard(Base):
    weights_names = {}

    def get_weight_name(self, weight):
        key = id(weight)
        name = self.weights_names.get(key)
        values = self.weights_names.values()

        if name is None:
            name = str(Path(*Path(weight.path).parts[2:]))
            name = find_new_name(values, name)
            self.weights_names[key] = name

        return name

    def _log_weights(self, epoch):
        """Logs the weights of the Model to TensorBoard."""
        with self._train_writer.as_default():
            for layer in self.model.layers:
                for weight in layer.weights:
                    weight_name = self.get_weight_name(weight)
                    # Add a suffix to prevent summary tag name collision.
                    histogram_weight_name = weight_name + "/histogram"
                    self.summary.histogram(histogram_weight_name, weight, step=epoch)
                    if self.write_images:
                        # Add a suffix to prevent summary tag name
                        # collision.
                        image_weight_name = weight_name + "/image"
                        self._log_weight_as_image(weight, image_weight_name, epoch)
            self._train_writer.flush()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants