Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

tensorflow - Different loss values computed by train_on_batch and evaluate

Evaluation before batch training, training on batch and post training returns different loss values.

pre_train_loss = model.evaluate(batch_x, batch_y, verbose=0)
train_loss = model.train_on_batch(batch_x, batch_y)
post_train_loss = model.evaluate(batch_x, batch_y, verbose=0)


Pre batch train loss  : 2.3195652961730957
train_on_batch loss   : 2.3300909996032715
Post batch train loss : 2.2722578048706055

I assumed train_on_batch returns loss computed before parameters update (before backpropagation). But pre_train_loss and train_loss are not exacly the same. Moreover all loss values are different.

Is my assumption of train_on_batch right? If so, why all loss values are different?

Colab example

question from:https://stackoverflow.com/questions/65902096/different-loss-values-computed-by-train-on-batch-and-evaluate

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Let me give you a detailed explanation of what is going on.

Calling model.evaluate (or model.test_on_batch) will invoke the model.make_test_function which will invoke the model.test_step and this function does following:

y_pred = self(x, training=False)
# Updates stateful loss metrics.
self.compiled_loss(
    y, y_pred, sample_weight, regularization_losses=self.losses)

Calling model.train_on_batch will invoke the model.make_train_function which will invoke the model.train_step and this function does following:

with backprop.GradientTape() as tape:
  y_pred = self(x, training=True)
  loss = self.compiled_loss(
      y, y_pred, sample_weight, regularization_losses=self.losses)

As you can see from above source code, the only difference between model.test_step and model.train_step when compute the loss is whether training=True when forward pass data to model.

Because some neural network layers behave differently during training and inference (e.g Dropout and BatchNormalization layers), so we have training argument for let those layer know which of the two "paths" it should take, e.g:

  • During training, dropout will randomly drop out units and correspondingly scale up activations of the remaining units.

  • During inference, it does nothing (since you usually don't want the randomness of dropping out units here).

Since you have dropout layer in your model, so the loss increase in training mode is expected.

If you remove the line layers.Dropout(0.5), when define the model you will see the loss is nearly identical (i.e with little floating point precision mismatch), e.g outputs of three epoch:

Epoch: 1
Pre batch train loss  : 1.6852061748504639
train_on_batch loss   : 1.6852061748504639
Post batch train loss : 1.6012675762176514

Pre batch train loss  : 1.7325702905654907
train_on_batch loss   : 1.7325704097747803
Post batch train loss : 1.6512296199798584

Epoch: 2
Pre batch train loss  : 1.5149778127670288
train_on_batch loss   : 1.5149779319763184
Post batch train loss : 1.4209072589874268

Pre batch train loss  : 1.567994475364685
train_on_batch loss   : 1.5679945945739746
Post batch train loss : 1.4767804145812988

Epoch: 3
Pre batch train loss  : 1.3269715309143066
train_on_batch loss   : 1.3269715309143066
Post batch train loss : 1.2274967432022095

Pre batch train loss  : 1.3868262767791748
train_on_batch loss   : 1.3868262767791748
Post batch train loss : 1.2916004657745361

Reference:

Documents and source code link of tf.keras.Model

What does training=True mean when calling a TensorFlow Keras model?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...