A guide to transfer learning with Keras using ResNet50

Hamdi Ghorbel
6 min readFeb 18, 2021
recognition with transfer learning

Abstract

In this blog we will present a guide for transfer learning with an example implementation in Keras using ResNet50 as the trained model. The case is to transfer the learning of a ResNet50 trained with Imagenet to a model that identify images from CIFAR-10 dataset. The final model of this blog we get an accuracy of 94% on test set

Introduction

Humans have an inherent ability to transfer knowledge across tasks. What we acquire as knowledge while learning about one task, we utilize in the same way to solve related tasks. The more related the tasks, the easier it is for us to transfer, or cross-utilize our knowledge.

In each scenarios, we don’t learn everything from scratch when we attempt to learn new aspects or topics. We transfer and leverage our knowledge from what we have learnt in the past!

Conventional machine learning and deep learning algorithms, so far, have been traditionally designed to work in isolation. These algorithms are trained to solve specific tasks. The models have to be rebuilt from scratch once the feature-space distribution changes. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones. In this article, we will do a comprehensive coverage of the concepts, scope and real-world applications of transfer learning and even showcase some hands-on examples. To be more specific, we will be covering the following.

Materials and Methods

Environment :

We are going to use Keras is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has extensive documentation and developer guides.

Database:

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes.[3] The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Computer algorithms for recognizing objects in photos often learn by example. CIFAR-10 is a set of images that can be used to teach a computer how to recognize objects. Since the images in CIFAR-10 are low-resolution (32x32), this dataset can allow researchers to quickly try different algorithms to see what works. Various kinds of convolutional neural networks tend to be the best at recognizing the images in CIFAR-10.

To load a database with Keras, we use:

import tensorflow.keras as K
K.datasets.cifar10.load_data()

Processes the data for your model:

Now that the data is loaded, we are going to build a preprocess function for the data using predefined function from Keras. X is a numpy.ndarray of shape (m, 32, 32, 3) containing the CIFAR 10 data, where m is the number of data points.Y is a numpy.ndarray of shape (m,) containing the CIFAR 10 labels for X. Return as result: X_p, Y_p

  • X_p is a numpy.ndarray containing the preprocessed X
  • Y_p is a numpy.ndarray containing the preprocessed Y

As we said before, we are going to use ResNet50. There are also many other models available here.

def preprocess_data(X, Y):"""trains a convolutional neural network to classify the dataset"""X_p = K.applications.resnet50.preprocess_input(X)Y_p = K.utils.to_categorical(Y, 10)return X_p, Y_p

Next, we are going to call our function with the parameters loaded from the CIFAR10 database

(trainX, trainy), (testX, testy) = K.datasets.cifar10.load_data()trainX, trainy = preprocess_data(trainX, trainy)testX, testy = preprocess_data(testX, testy)

Using weights of a trained ResNet50:

A pretrained model from the Keras Applications has the advantage of allow you to use weights that are already calibrated to make predictions. In this case, we use the weights from Imagenet and the network is a ResNet50. The option include_top=False allows feature extraction by removing the last dense layers. This let us control the output and input of the model

inputs = K.Input(shape=(224, 224, 3))
#Loading the ResNet50 model with pre-trained ImageNet weights
resnet = K.applications.ResNet50(weights=’imagenet’,include_top=False,input_t ensor=inputs)

we are going ‘freeze’ all layers except for the last block of the ResNet50. The way to do this in Keras is with:

for layer in resnet.layers[:170]:
layer.trainable = False

we need to connect our pretrained model with the new layers of our model. We can use Flatten or a GlobalAveragePooling2D layer to connect the dimensions of the previous layers with the new layers:

model.add(K.layers.Flatten())
model.add(K.layers.GlobalAveragePooling2D())

The final layers are below, you can see the complete code here. However we explain some more aspects to improve the model and make a good classification. We present the main aspects taken into account to build the model.

model = K.models.Sequential()model.add(K.layers.Lambda(lambda x: tf.image.resize(x,(224, 224))))model.add(resnet)model.add(K.layers.GlobalAveragePooling2D())model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(256, activation='relu'))model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(128, activation='relu'))model.add(K.layers.Dropout(0.3))model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(64, activation='relu'))model.add(K.layers.Dropout(0.3))model.add(K.layers.BatchNormalization())model.add(K.layers.Dense(10, activation='softmax'))

We have regularizers to help us avoid overfitting and optimizers to get a faster result. Each of them can also affect our accuracy, so we present what to take into account. The most important are:

  • Batch size: It is recommended to use a number of batch size with powers of 2 (8, 16, 32, 64, 128, …) because it fits with the memory of the computer.
  • Learning rate: For transfer learning it is recommended a very low learning rate because we don’t want to change too much what is previously learned.
  • Number of layers: This depends on how much you relay from the layers of the pretrained model. We found that if we leave all the model for training just a flatten layer and a dense with softmax is enough but since we incorporated the feature extraction it was required more layers at the end.
  • Optimization methods: We tested with SGD and RMSprop. SGD with a very low learning required more epochs (30) to complete a razonable training. We used RMSprop with 5 epochs to get our result.
  • Regularization methods: To avoid overfitting we used Batch normalization and dropout in-between the dense layers.
  • Callbacks: In Keras, we can use callbacks in our model to perform certain actions in the training such as weight saving.
model.compile(loss='binary_crossentropy',
optimizer=K.optimizers.RMSprop(lr=2e-5), metrics=['accuracy'])
checkpointer = K.callbacks.ModelCheckpoint(filepath='cifar10.h5',
monitor="val_accuracy", verbose=1, save_best_only=True)
model.fit(trainX, trainy, batch_size=32, epochs=10, verbose=1,
callbacks=[checkpointer], validation_data=(testX, testy),
shuffle=True)
model.summary()model.save("cifar10.h5")

Results:

We obtained an accuracy of 89.55% on training set and 91.24% on validation with 10 epochs.

The summary of the model is below. We found that batch normalization and dropout greatly reduces overfitting and it helps get better accuracy on validation set. The method of ‘freezing layers’ allows a faster computation but hits the accuracy so it was necessary to add dense layers at the end. The shape of the layers holds part of the structure of the original ResNet50 like it was a continuation of it but with the features we mentioned.

Discussion

We confirmed that ResNet50 works best with input images of 224 x 224. As CIFAR-10 have 32 x 32 images, it was necessary to perform a resize. With this adjustment alone, the model can achieve a high accuracy, I think it was the most important for ResNet50.

A good recommendation when building a model using transfer learning is to first test optimizers to get a low bias and good results in training set, then look for regularizers if you see overfitting over the validation set.

The discussion over using freezing on the pretrained model continues. It reduces computation time, reduces overffiting but lowers accuracy. When the new dataset is very different from the datased used for training it may be necessary to use more layer for adjustment.

On the selecting of hyperparameters, it is important for transfer learning to use a low learning rate to take advantage of the weights of the pretrained model. This choice as the optimizer choice (SGD, Adam, RMSprop) will impact the number of epochs needed to get a successfully trained model.

References

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda

https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a

https://www.youtube.com/watch?v=FQM13HkEfBk&index=20&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF

--

--