Transfer Learning(using MobileNet) on Cat and Dog (with 99% accuracy)

Shivam Chauhan
7 min readDec 12, 2020

--

Tutorial Overview

  1. Dogs vs. Cats Prediction Problem
  2. Dogs vs. Cats Dataset Preparation
  3. Develop a Data Pipeline
  4. Batch Size
  5. Data Augmentation
  6. Model Defination
  7. Train the Model
  8. Loss Function
  9. Optimizer
  10. Learning Rate
  11. Metric
  12. Save the Model
  13. Test the Model

Dogs vs. Cats Prediction Problem

The train folder contains 25,000 images of dogs and cats. Each image in this folder has the label as part of the filename. The test folder contains 12,500 images, named according to a numeric id.
For each image in the test set, you should predict a probability that the image is a dog (1 = dog, 0 = cat).

Dogs vs. Cats Dataset Preparation

Dataset can be download from Kaggle website. If you do not have an account on kaggle, you have to sign-up first.

For the training process, we need to store our dataset in the proper folder structure. We’ll divide the images into two sets: training and validation. For an image file, Keras will automatically assign the name of the class (category) based on its parent folder name.

Plot cat and dog photo

Look some random photos from directory, here you can see one example.

IMG_PATH = ‘file_path/cat.jpg ’
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image
img = image.load_img(IMG_PATH, target_size=(224, 224))
plt.imshow(img)
plt.show()

Develop a Data Pipeline

To start off with our Python program, we begin by importing the necessary packages:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout,GlobalAveragePooling2D
from tensorflow.keras.applications.mobilenet import MobileNet, preprocess_input
import math
import pandas as pd
import numpy as np

Place the following lines of configuration right after the import statements, which we can modify based on our dataset:

TRAIN_DATA_DIR = ‘file_path_training_dir’
VALIDATION_DATA_DIR = ‘file_path_val_dir’
NUM_CLASSES = 2
IMG_WIDTH, IMG_HEIGHT = 224, 224
BATCH_SIZE = 64

With two classes to distinguish between, we can treat this problem as a binary classification task.

As a binary classification task, it’s important to note that “cat versus dog is really “cat versus not cat.” A dog would be classified as a “not cat” much like a desk or a ball would. For a given image, the model will give a single probability value corresponding to the “cat” class — hence the probability of “not cat” is 1 — P(cat). If the probability is higher than 0.5, we predict it as “cat”; otherwise, “not cat.” To keep things simple, we assume that it’s guaranteed that the validation set would contain only images of either cats or dogs. Because “cat versus not cat” is a binary classification task, we set the number of classes to 1; that is, “cat.” Anything that cannot be classified as “cat” will be classified as “not cat.”

Batch Size

The batch size defines how many images are seen by the model at a time. It’s important that each batch has a good variety of images from different classes in order to prevent large fluctuations in the accuracy metric between iterations. A sufficiently large batch size would be necessary for that. However, it’s important not to set the batch size too large; a batch that is too large might not fit in GPU memory, resulting in an “out of memory” crash. Usually, batch sizes are set as powers of 2. A good number to start with is 64 for most problems, and we can play with the number by increasing or decreasing it.

Data Augmentation

Data augmentation gives ways to increase the size of the dataset. Data augmentation introduces noise during training, producing robustness in the model to various inputs. This technique is useful in scenarios when the dataset is small and can be combined and used with technique.

Fig. In above figure some of common image transformations applied for data augmentation

By combining rotation, shifting, and zooming, the program can generate an almost infinite number of unique images. This important step is called data augmentation. Data augmentation is useful not only for adding more data, but also for training more robust models for real-world scenarios. For example, not all images have the cat properly centered in the middle or at a perfect 0-degree angle. Keras provides the ImageDataGenerator function that augments the data while it is being loaded from the directory.

train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
zoom_range=0.2)
val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

we use the preprocess_input function (which, among other things, divides each pixel by 255)

Model Definition

Now that the data is taken care of, we come to the most crucial component of our training process: the model. In the code that follows, we reuse a CNN previously trained on the ImageNet dataset (MobileNet in our case), throw away the last few layers, called fully connected layers (i.e., ImageNet-specific classifier layers), and replace them with our own classifier suited to the task at hand. For transfer learning, we “freeze” the weights of the original model; that is, set those layers as unmodifiable, so only the layers of the new classifier (that we’ll add) can be modified. We use MobileNet here to keep things fast, but this method will work just as well for any neural network. The following lines include a few terms such as Dense, Dropout, and so on.

def model_maker():
base_model = MobileNet(include_top=False,
input_shape=(IMG_WIDTH, IMG_HEIGHT, 3))
for layer in base_model.layers[:]:
layer.trainable = False
input = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 3))
my_model = base_model(input)
my_model = GlobalAveragePooling2D()(my_model)
my_model = Dense(64, activation=’relu’)(my_model)
my_model = Dropout(0.5)(my_model)
predictions = Dense(NUM_CLASSES, activation=’softmax’)(my_model)
return Model(inputs=input, outputs=predictions)

Train the Model

With both the data and model ready, all we have left to do is train the model. This is also known as fitting the model to the data. For training a model, we need to select and modify a few different training parameters.

Loss function

The loss function is the penalty we impose on the model for incorrect predictions during the training process. It is the value of this function that we seek to minimize. For example, in a task to predict house prices, the loss function could be the root-mean-square error.

Optimizer

This is an algorithm that helps minimize the loss function. We use Adam, one of the fastest optimizers out there.

Learning rate

Learning is incremental. The learning rate tells the optimizer how big of a step to take toward the solution; in other words, where the loss is minimum. Take too big of a step, and we end up wildly swinging and overshooting our target. Take too small a step, and it can take a really long time before eventually arriving at the target loss value. It is important to set an optimal learning rate to ensure that we reach our learning goal in a reasonable amount of time. In our example, we set the learning rate at 0.001.

Metric

Choose a metric to judge the performance of the trained model. Accuracy is a good explainable metric, especially when the classes are not imbalanced (i.e., roughly equal amounts of data for each class). Note that this metric is not related to the loss function and is mainly used for reporting and not as feedback for the model. In the following piece of code, we create the custom model using the model_maker function that we wrote earlier. We use the parameters described here to customize this model further for our task of cats versus dogs

model = model_maker()
model.compile(loss=’categorical_crossentropy’,
optimizer=tf.keras.optimizers.Adam(0.001),metrics=[‘acc’])
model.fit_generator(train_generator,epochs=10,validation_data=validation_generator)

All it took was 10–15 seconds in the very first epoch to reach 96% accuracy on the validation set. Not bad! And by the 10th step, we observe about 99% validation accuracy. That’s the power of transfer learning.

Save the Model

Before we forget, save the model that you just trained so that we can use it later:

model.save(‘model.h5’)

Test the Model

Now that we have a trained model, we might eventually want to use it later for our application. We can now load this model anytime and classify an image .load_model, as its name suggests, loads the model.

from tf.keras.models import load_model
model =load_model('model.h5')

Now let’s try loading our original sample images and see what results we get:

img_path = ‘file_path/dog.jpg’ 
img = image.load_img(img_path, target_size=(224,224))
img_array = image.img_to_array(img)
expanded_img_array = np.expand_dims(img_array, axis=0)
preprocessed_img = preprocess_input(expanded_img_array)
prediction = model.predict(preprocessed_img)
print(prediction)
print(validation_generator.class_indices)
[[0.9967706]]
{‘dog’: 1, ‘cat’: 0}

Printing the value of the probability, we see that it is 0.996. This is the probability of the given image belonging to the class “1,” which is a dog. Because the probability is greater than 0.5, the image is predicted as a dog.

Reference :

https://www.kaggle.com/c/dogs-vs-cats

--

--

Shivam Chauhan
Shivam Chauhan

Written by Shivam Chauhan

Machine learning engineer, Love to work in Computer vision & share knowledge which i learnt.

Responses (1)