TransWikia.com

running model.evaluate many times results different accuracy and loss value tensorflow 2

Data Science Asked by hamid.khb on March 23, 2021

I have trained a CNN network using dataset = tf.data.Dataset.from_tensor_slices((data, label)) to create the dataset. training went well but evaluating the model on test dataset results different values every time without changing anything in test dataset or the network and I am not using any Dropout or Batchnormalization:

If necessary my code:

 model = tf.keras.Sequential([
    Input((1,30,30)),
    Conv2D(filters = 8, kernel_size=(3,3), padding="same", activation="relu", name="c1", data_format="channels_first"),
    Conv2D(filters = 16, kernel_size=(3,3), padding="same", activation="relu", name="c2", data_format="channels_first"),
    MaxPool2D(pool_size=(2,2), strides=(1,1),padding="same", name="m1", data_format="channels_first"),

    Conv2D(filters = 16, kernel_size=(3,3), padding="same", activation="relu", name="c3", data_format="channels_first"),
    MaxPool2D(pool_size=(2,2), strides=(1,1),padding="same", name="m2",data_format="channels_first"),

    Flatten(),
    Dense(256, activation="relu", use_bias=True),
    Dense(5,  use_bias=True)])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"])
model.fit(train_data, verbose=1, validation_data=valid_data, epochs=20)


model.evaluate(test_data)

How I made the Dataset:

def split_dataset(dataset: tf.data.Dataset, validation_data_fraction: float):

    validation_data_percent = round(validation_data_fraction * 100)
    if not (0 <= validation_data_percent <= 100):
        raise ValueError("validation data fraction must be ∈ [0,1]")

    dataset = dataset.enumerate()
    train_dataset = dataset.filter(lambda f, data: f % 100 >= validation_data_percent)
    validation_dataset = dataset.filter(lambda f, data: f % 100 < validation_data_percent)

    # remove enumeration
    train_dataset = train_dataset.map(lambda f, data: data)
    validation_dataset = validation_dataset.map(lambda f, data: data)

    return train_dataset, validation_dataset

def load_data(path):
    data, label = data_prep(path)
    dataset = tf.data.Dataset.from_tensor_slices((data, label))
    dataset = dataset.shuffle(100000)
    train_dataset, rest = split_dataset(dataset, 0.3)
    test_dataset, valid_dataset = split_dataset(rest, 0.5)
    train_data = train_dataset.shuffle(1000).batch(10)
    valid_data = valid_dataset.batch(10)
    test_data = test_dataset.batch(10)
    return train_data, valid_data, test_data

for example running model.evaluate(test_data) gives:

885/Unknown - 2s 2ms/step - loss: 0.1039 - accuracy: 0.9663
885/Unknown - 2s 2ms/step - loss: 0.0959 - accuracy: 0.9675
885/Unknown - 2s 2ms/step - loss: 0.0999 - accuracy: 0.9661
885/Unknown - 2s 2ms/step - loss: 0.0888 - accuracy: 0.9688
885/Unknown - 2s 2ms/step - loss: 0.0799 - accuracy: 0.9715

2 Answers

If you train your model and then run model.evaluate N times (do not retrain the model) you should get the same answer each time PROVIDED your test data is the SAME each time. However if you train your model then run evaluate and do that combination N times results will vary due to the random weight initialization of the network.

Answered by Gerry P on March 23, 2021

The problem lies in your first shuffle of the whole dataset. Can you inspect your test_data just before calling model.evaluate(test_data) by calling something like list(test_data.as_numpy_array())? My assumption would be, that this would yield different results every time you call it. In other words: your model is fine, but your dataset is different each time, most likely, because you use dataset.shuffle without seed and without deactivating reshuffle_each_iteration. While the former explains differences between runs, the latter explains differences within runs.

My suggestion would something like:

seed = 42

def load_data(path):
    data, label = data_prep(path)
    dataset = tf.data.Dataset.from_tensor_slices((data, label))
    # shuffle your dataset **once**, but reliably so that each run yields the same results
    dataset = dataset.shuffle(100000, seed=seed, reshuffle_each_iteration=False)
    train_dataset, rest = split_dataset(dataset, 0.3)
    test_dataset, valid_dataset = split_dataset(rest, 0.5)
    # (re)shuffle only the training set, but again, using a seed
    train_data = train_dataset.shuffle(1000, seed=seed).batch(10)
    valid_data = valid_dataset.batch(10)
    test_data = test_dataset.batch(10)
    return train_data, valid_data, test_data
```

Answered by Christian Steinmeyer on March 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP