Building Convolutional Neural Network Model
Introduction
The main objective of this tutorial is to get hands-on experience in building a Convolutional Neural Network (CNN) model on Cloudera Data Platform (CDP). This tutorial explains how to fine-tune the parameters to improve the model, and also how to use transfer learning to achieve state-of-the-art performance. Finally, you will understand the concept of model interpretability through which CNN makes decisions. To get started with this tutorial, you should have some background knowledge of Python and neural network concepts for better understanding. You will be using Keras, which is an open-source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible. You would be using Cloudera Machine Learning (CML), which is a powerful tool to leverage Deep Learning skills to build your models.
Prerequisites
Read through “Introduction to Convolutional Neural Networks”
Read through “Tour of Cloudera Data Science Workbench”
Basic concepts of Python and neural networks
Ouline
Build a Convolutional Neural Network model
Introduction to Transfer Learning
Visual explanations from Convolutional Neural Networks
1.Build a Convolutional Neural Network model
1.1 Setting up your environment
Using the “Tour of Cloudera Data Science Workbench” tutorial, create your own project and choose Python session
Make sure to install tensorflow
Install NumPy, matplotlib
1.2 Implementation
Step1 : Import all the libraries necessary to run your code
import keras
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras.utils.np_utils import to_categorical
import matplotlib.pyplot as plt
from keras.callbacks import LearningRateScheduler
import numpy as np
Note: If you find any errors importing the above packages, try to make sure all the modules are installed properly before importing them.
Step 2: Import data using keras datasets.
Dataset : CIFAR10 (multi class classification)
Brief description about dataset: The CIFAR10 dataset is a standard dataset for starting with the basics. It consists of 60000 images with size 32X32, with 10 categories of data. They are pretty low resolution images. You can load the dataset using the below command.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
You can check the shape of your data using the below commands.
print(x_train.shape[0], 'training examples')
print(x_test.shape[0], 'testing examples’')
Step 3: Initialization of required variables
batch_size = 32
num_classes = 10
epochs = 10
data_augmentation = True
Data augmentation is used to provide better training. You can also turn it off by initializing it to False.
Using one hot encoding to convert the data into a feature vector to pass to machine learning models. Convert array of labeled data to one-hot vector.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
Step 4: Building your model. The below model is a Sequential model.
Conv2D Layer:
Arguments used:
- Filters: This is an integer which denoted the output space dimensionality in each convolution
- Kernel size: It represents the height and width of the convolution window to perform convolution operation.
- Padding: same(add padding) or valid(do not add padding)
- Input shape: provide the dimensions of input, here 32x32x3.
Activation Layer:
You will be using ReLU activation but you can also try different activation functions for practice.
Pooling Layer:
You will be using MaxPooling layer in this model. You can also try using AvgPooling to observe the results.
Arguments used:
Pool size: It is an integer or tuple used to downscale. Here we used (2, 2) which will halve the input in both spatial dimension(height, width)
Dropout Layer: Dropout layer is used to minimize overfitting.
Flatten Layer: Flattens the input into a single column vector to pass it into Artificial Neural Network(ANN)which is Dense Layer.
Below is the basic model built using keras. Now you can copy the model and execute it on CML.
model = Sequential()
model.add(Conv2D(6, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(16, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
Converting the datatype to float to perform operations
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
Rescaling the image to stay between 0 and 1
x_train /= 255
x_test /= 255
Step 5: Once the model is built it's time to compile the model using below command. You will be using categorical_crossentropy as loss function and metrics as accuracy.
opt = keras.optimizers.adam()
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
Provide an option to perform Data Augmentation when required
if not data_augmentation:
print('Not using data augmentation.')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, y_test),
shuffle=True)
else:
print('Using real-time data augmentation.')
# This will do preprocessing and real time data augmentation:
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
)
When Data Augmentation is set to True fit the data as below
datagen.fit(x_train)
Before fitting the model set seed to reproduce the results
from numpy.random import seed
seed(1)
Step 6: Once the model is compiled you have to fit the model using the training data and validation data is used the same as the test data and the history variable records the metrics.
history= model.fit_generator(datagen.flow(x_train, y_train,
batch_size=batch_size),
epochs=epochs,
steps_per_epoch=x_train.shape[0] // batch_size,
validation_data=(x_test, y_test),
shuffle= False)
Note: Seed is used for reproducibility. You can observe slight variations in the accuracy scores when you re-run the same model multiple times. Hence, to ensure not much variation, seed is used. Note that every time you fit the model again, make sure you run the seed command; otherwise, you might see a huge difference in your accuracy, due to random weight assignments.
Step 7: Evaluate your results and print the metrics
scores = model.evaluate(x_test, y_test, batch_size=32, verbose=1)
print("Test Accuracy",scores[1]*100)
print("Loss value",scores[0])
The loss for the 10 epochs is shown below:
def plotLosses(history):
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
plotLosses(history)
Fig 1: Training and validation loss graph
You can make some of the modifications in the hyper parameters and architectural structure to improve accuracy. Come up with your own ideas to make changes to this model. Here are a few techniques that can improve accuracy:
Use of data augmentation
Use different data transformation techniques
Use of LRfinder and one cycle policy to find out the best learning rate [link]
You can also use a heuristic approach of providing custom learning rates and decreasing it when epochs increases using Keras LearningRateScheduler
Use additional layers and build a deeper neural network with appropriate hyper parameters
You can try some of the techniques shown below to observe the difference. Feel free to use different hyper parameters and optimizers for your model.
You can use LRscheduler with your customized learning rates through the epochs. Observe that learning rate should decrease as you go deeper into epochs to converge.
def lr_schedule(epoch):
lrate = 0.0001
if epoch > 5:
lrate = 0.00005
if epoch > 10:
lrate = 0.00003
return lrate
Note: This was only scheduled for 10 epochs. This can be changed according to the number of epochs.
Recompile and refit the model using step 5 and step 6.
opt = keras.optimizers.adam(lr=0.0001)
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'])
from numpy.random import seed
seed(1)
history= res_model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),\
steps_per_epoch=x_train.shape[0] // batch_size,epochs=epochs,\
verbose=1,validation_data=(x_test,y_test), shuffle= False, callbacks=[LearningRateScheduler(lr_schedule)])
Re run step 7 to evaluate and print the results.
Adding a Batch Normalization Layer
Importance of Batch Normalization
The main purpose of Batch Normalization is to reduce the internal covariance shift. When the deep neural network is trained, the parameters of the preceding layers change, and the distribution of the inputs to the current layer also need to constantly rearrange to adjust to the new parameters. This is a huge overhead and is also severe in deep layers because the shallow layers will be amplified to further layers in the network, resulting in a huge shift in deeper hidden layers. As a result, this layer will reduce the unwanted shifts during the training and speed up the process. It also helps to reduce the vanishing gradient problem.
Import this layer using from keras.layers import BatchNormalization
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(128, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
Recompile and refit the model using step 5 and step 6. Re run step 7 to print the results.
The typical training process is explained by the below figure:
Fig 2: Flowchart describing the process of model training
2. Introduction to Transfer Learning
2.1 Introduction
In this section, you will learn about transfer learning, how to use state-of-the-art algorithms to train your dataset, and also understand the importance of transfer learning. This section will focus on one of the models known as RESNET50, winner of the 2015 IMAGENET challenge.
Due to the advancements in computational resources, deep neural networks have gained in importance to solve various problems. But if you have to build your own neural network with state-of-the-art performance, you might require access to GPUs that might cost you more than you think. In real life, to deal with a complex problem it is always useful to think intelligently about how to utilize the resources that are already available to us. Thankfully, transfer learning helps by using pretrained models that have already attained state-of-the-art performance levels, and you can fine-tune them according to your use cases very easily and gain more accurate results with small changes.
Why transfer learning is useful
Transfer learning is similar to knowledge transfer in real life. Consider an example where you join a sports team and learn how to play from your coach, gaining expert advice and knowledge from your coach’s experience that helps you in facing situations in the future. In a similar manner, a deep neural network that is already trained on a huge dataset, and has knowledge about it in the form of weights, enables you to use this knowledge to solve your problems by making only certain changes as needed, rather than having to build your model from scratch.
Most of the time spent on building your own model, as seen in the previous section, would be spent on trial-and-error methodology to see which weights and hyper parameters would fine-tune your model. Transfer learning helps you avoid wasting time on those aspects.
Using the architecture pretrained on the classic IMAGENET dataset, which consists of 1000 different classes with more than 1.2 million images, helps make your job much easier. Thanks to this advancement, we can build models in less time.
How to use pre trained models
You can understand the exact architecture followed in pretrained models and can build your own model that can work better in your use case.
You can use pretrained models as a feature extractor, where you simply modify the output layer according to your dataset.
You can freeze some of the layers in the pretrained model, then make other layers trainable on your dataset. (This may lead to updating of weights and consume more time.)
You can compare your own architecture to the pretrained model to observe the performance that can be learned in this tutorial.
Brief overview on RESNET50
Deep neural networks are very hard to train: as they go deeper, there is a vanishing gradient problem. RESNET50 is a deep neural network that has brought a great improvement in dealing with the computer vision problem. RESNET50 dealt with the vanishing gradient by using skip connections. RESNET50 architecture has ResNet blocks that are responsible for dealing with the skip connections. Each layer feeds into the next layer and directly into the layers about 2–3 hops away. It is very important to understand why this actually helps to build state-of-the-art performance. Let’s look at us the Fig 3:
Fig 3: Resnet block
Neural networks are known as universal function approximators, and the assumption is that accuracy increases with the number of layers added to the network. But there should be a limit on how many layers should be added. If you tend to add as many layers as you want, then there is no point wasting a lot of time on training. And also it leads to the curse of dimensionality and the vanishing gradient problem. This is one of the reasons you can still observe shallow networks performing better than the deeper counterparts with additional layers. This leads to the concept of skip connections, where you can skip a few layers as shown in the above image. You can also observe that the identity function learns over the skip connections, hence they are also known as identity shortcut connections. The identity function here is trying to learn the residual block instead of the true output of each layer. This offers the opportunity to have larger gradients in the initial training of the network and helps it to train faster, giving us a way to train deep neural networks.
Let’s understand through the code how to use transfer learning with a pretrained ResNet50 model. Using some of the implementations in the previous section, let’s see how transfer learning can help
2.2 Implementation
Step 1: Import all necessary libraries
from matplotlib import pyplot
from scipy.misc import toimage
import keras
from keras.datasets import cifar10
from keras.applications.resnet50 import ResNet50
from keras.models import Sequential
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization
from keras.layers import Conv2D, MaxPooling2D
from keras import regularizers
from keras.callbacks import LearningRateScheduler
import numpy as np
from keras.callbacks import Callback
import math
from keras import models
from keras import layers
from keras import optimizers
from keras.models import Sequential
import matplotlib.pyplot as plt
Step2: Loading and transformation of data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print(x_train.shape[0], 'training examples')
print(x_test.shape[0], 'testing examples’')
y_train = keras.utils.to_categorical(y_train,10)
y_test = keras.utils.to_categorical(y_test,10)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
Step 3: Build the model
Convolution_base layer holds the Resnet50 model with weights as imagenet. Include_top parameter is set to false because we are defining our own output layer to this model with 10 classes. You can also add more layers on top of the base layer. Here we are using batch size as 32.
convolution_base = ResNet50(weights= 'imagenet', include_top=False, input_shape=(32, 32, 3))
res_model= models.Sequential()
res_model.add(convolution_base)
res_model.add(layers.Flatten())
res_model.add(Dropout(0.2))
res_model.add(layers.Dense(10, activation='softmax'))
batch_size=32
Add Data Augmentation
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
)
datagen.fit(x_train)
Step 4: Use LRscheduler as the previous section and compile the model
def lr_schedule(epoch):
lrate = 0.0001
if epoch > 5:
lrate = 0.00005
if epoch > 10:
lrate = 0.00003
return lrate
opt_adam = keras.optimizers.Adam(lr=0.0001)
res_model.compile(loss='categorical_crossentropy', optimizer=opt_adam, metrics=['accuracy'])
Step 5: Fit the model
from numpy.random import seed
seed(1)
history= res_model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),\
steps_per_epoch=x_train.shape[0] // batch_size,epochs=epochs,\
verbose=1,validation_data=(x_test,y_test), shuffle= False, callbacks=[LearningRateScheduler(lr_schedule)])
Step 6: Evaluate the model
scores = res_model.evaluate(x_test, y_test, batch_size=128, verbose=1)
print("Test Accuracy",scores[1]*100)
print("Loss value",scores[0])
Step 7: Visualize the loss
def plotLosses(history):
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
plotLosses(history)
Fig 4: Training and validation loss graph using Transfer Learning
Now you can observe a good improvement in the accuracy using the Resnet model with few steps.
3. Visual Explanations from Convolutional Neural Networks
3.1 Introduction
It is often said that a deep neural network is a black box, and it is very difficult to understand how the model makes predictions. Interpretability of models is very important to trusting the model, and therefore it is very important to understand how the model predicts. In order to build trust in intelligent systems and move toward their meaningful integration into our everyday lives, it is clear that we have to build transparent models that explain why they predict what they do. There are many techniques that have been developed to understand the representation and interpretation of visual concepts.This section focuses on a few techniques such as Grad-CAM and visualizing interpretations of intermediate convolution outputs. This helps us understand how CNN layers transform their input to get the first idea of the meaning of individual convnet filters. Also, you will be able to see which features of the image are being used to make predictions. By the end of this section, you’ll understand how the model uses these important features to make its predictions.
3.2 Visualizing the intermediate convolution layer outputs
Visualizing the intermediate outputs of convolution layers helps us understand how the input is transformed by the layers and how different filters are learned by the network. The output of a given input is taken care of by the activation layer, which is the activation function output. Let’s observe the implementation on how to visualize the activation outputs. We’ll try using a custom image to understand it better. You can use your own image to test this.
Run the below code to load the ResNet50 pretrained model and predict the input image. You will make use of Keras library to do this quickly.
from keras.applications import ResNet50
from keras import backend as K
model = ResNet50(weights='imagenet', include_top=True)
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
img_path = 'tyson.jpeg' # You can input your image here or use the same image from assets
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
preds = model.predict(x)
print('Predicted class:', decode_predictions(preds, top=1)[0]) # This will output the predicted label
As the above code makes a prediction on the input image, let’s try to understand what made the model predict the class.
from keras.preprocessing import image
import numpy as np
img = image.load_img(img_path, target_size=(224, 224))
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor /= 255.
print(img_tensor.shape)
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(img_tensor[0])
plt.show()
from keras import models
layer_outputs = [layer.output for layer in model.layers[1:]]
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(img_tensor)
first_layer_activation = activations[1]
print(first_layer_activation.shape)
Fig 5: Original input image on first activation
plt.matshow(first_layer_activation[0, :, :, 3], cmap='viridis')
plt.show()
Fig 6: Third channel of the first activation layer
Fig 6 shows the output of the first activation layer with channel 3. You can try using other activation layers and channels to observe the variations in the interpretations.The code below helps to visualize the filters applied to the image on the first activation layer.
def display_outputs(activations, col_size, row_size, act_index):
activation = activations[act_index]
activation_index=0
fig, ax = plt.subplots(row_size, col_size, figsize=(row_size*2.5,col_size*1.5))
for row in range(0,row_size):
for col in range(0,col_size):
ax[row][col].imshow(activation[0, :, :, activation_index], cmap='viridis')
activation_index += 1
display_outputs(activations, 8, 8, 1)
Fig 7: Filter ouputs of the first activation layer using various channels
Fig 8 shows the output of the activation layer 3.
display_outputs(activations, 8, 8, 3)
Fig 8: Filter outputs for third activation layer using various channels
As you can observe, in the initial activation the visual interpretations are detecting edges but in deeper activations it is not so visually interpretable and more abstract detecting the complex patterns. In the deeper layers, the model is trying to highlight only the areas that make its prediction stronger, such as dog nose, dog ear, etc. You can see that some of the filters are blank in the deeper layers: this happens when the pattern encoded by the filter is not found in the image. We can now see that deep layers focus only on the high-level representations that have greater importance than irrelevant details of the image.
3.3 Using Gradient Based visualization methods
3.3.1 Saliency Maps
The idea behind saliency is pretty simple in hindsight. We compute the gradient of the output category with respect to the input image. This should tell us how the output value changes with respect to a small change in input. We can use these gradients to highlight input regions that cause the most change in the output. Intuitively, this should highlight the salient image regions that contribute most to the output.
You can use the below code on CML to visualize the interpretations.
from vis.utils import utils
from keras import activations
import warnings
warnings.filterwarnings('ignore')
layer_idx = -1
# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)
from vis.utils import utils
from matplotlib import pyplot as plt
%matplotlib inline
img1 = utils.load_img('tyson.jpeg', target_size=(224, 224))
from vis.visualization import visualize_saliency, overlay
from vis.utils import utils
from keras import activations
layer_idx = -1
grads = visualize_saliency(model, layer_idx, filter_indices=2, seed_input=img1)
plt.imshow(grads, cmap='jet')
for modifier in ['guided', 'relu']:
plt.figure()
plt.suptitle(modifier)
grads = visualize_saliency(model, layer_idx, filter_indices=2,
seed_input=img1, backprop_modifier=modifier)
plt.imshow(grads, cmap='jet')
Although we can see these kinds of maps without backpropagation modifiers, we can see them more clearly when we apply back propagation modifiers. The most likely reason is that we have optimizers that make it much more complex in terms of determining the direct impact that each pixel would have on the output. The two backpropagation algorithms that can be used are guided and ReLU. In guided saliency, the backprop step is modified to only propagate positive gradients for positive activations. ReLU modifies backpropagation by zeroing negative gradients and letting positive ones go through unchanged.
Fig 9: Interpretation of the image using saliency maps
You can observe that guided is better in visually presenting and highlighting just the important features (i.e., ears, nose, eyes of the dog) than the ReLU representation.
3.3.2 Grad-CAM
Class activation maps, or grad-CAM, is another way of visualizing attention over input. Instead of using gradients with respect to output, grad-CAM uses the penultimate (pre–Dense layer) Conv layer output. The intuition is to use the nearest Conv layer to utilize spatial information that gets completely lost in Dense layers.
Gradient-weighted Class Activation Mapping (Grad-CAM) uses the class-specific gradient information flowing into the final convolutional layer of a CNN to produce a coarse localization map of the important regions in the image. Grad-CAM is a strict generalization of the Class Activation Mapping (CAM). There typically exists a trade-off between accuracy and simplicity/interpretability. Grad-CAM is highly class-discriminative (i.e., the “tiger” explanation exclusively highlights the “tiger” regions, and not the “dog” regions, and vice versa). Guided Grad-CAM visualizations are both high-resolution and class-discriminative. As a result, important regions of the image that correspond to a class of interest are visualized in high-resolution detail even if the image contains multiple classes. When visualized for “tiger cat,” guided Grad-CAM not only highlights the cat regions, but also highlights the stripes on the cat, which are important for predicting that particular variety of cat.
Below is the code that can be used to test this kind of approach. Paste the below code into CML to view the results:
import numpy as np
import matplotlib.cm as cm
from vis.visualization import visualize_cam
penultimate_layer = utils.find_layer_idx(model, 'res5c_branch2c')
for modifier in [None, 'guided', 'relu']:
plt.figure()
plt.suptitle("vanilla" if modifier is None else modifier)
# 20 is the imagenet index corresponding to `ouzel`
grads = visualize_cam(model, layer_idx, filter_indices=2,
seed_input=img1, penultimate_layer_idx=penultimate_layer,
backprop_modifier=modifier)
# Lets overlay the heatmap onto original image.
jet_heatmap = np.uint8(cm.jet(grads)[..., :3] * 255)
plt.imshow(overlay(jet_heatmap, img1))
Fig 10: Visual representations using GRAD-CAM
Out of the three modifiers, guided works best in highlighting important portions of the dog type that led to the prediction of breed.
Further Reading
This tutorial gives you a general idea for building the models, but the architecture can be made better by adding various hyper parameters. You can start playing with it. Check out the links below for more information.
Deep Learning made easier with Transfer Learning: A blog by Cloudera Fast Forward Labs focused on on Transfer Learning.
Deep Learning Image Analysis: Report by Cloudera Fast Forward Labs on deep learning image analysis. The prototype in the report highlights two important concepts: similarity search and CNN model interpretation. Similarity search returns results from a database that are most similar to a given image. Model interpretability focuses on how neural network layers learn important features to predict an image, depending on selected models and number of channels. Click here to play with it.