# Classification of the CIFAR-10 dataset

The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) provides 60000 32x32-pixel images, classified into 10 categories.  The figure below provides a random sample of some images in each category.

![images.png](images.png)

During this session, you will learn how to build a Convolutional Neural Network (CNN), which (when trained) will be able to automatically classify new images into one of these categories.  We will make use of the [Keras library](https://www.tensorflow.org/guide/keras) which provides a high-level interface to TensorFlow. Begin by importing the necessary modules.

It is strongly recommended to use Google Colab (with a GPU) to run this notebook. This will drastically speed up computations.

In [None]:
import os
import time
import datetime
from tqdm import tqdm_notebook

import numpy as np
import pandas
import matplotlib.pyplot as plt
from sklearn import metrics

import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms, utils
from torch.utils.data import Dataset, DataLoader
import torch.optim as optim

from keras.layers import Dense, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dropout, BatchNormalization
from tensorflow.keras.optimizers import SGD
from keras.datasets import cifar10
from keras.models import Model
from keras.models import Sequential
from keras.callbacks import EarlyStopping

# Table of content

[1. A first look at the data set](#dataset)<br>

[2. A first naive model](#first_model)<br>

[3. Convolutional Neural Networks](#cnn)<br>
- [3.1 Create your first CNN](#first_cnn)<br>
- [3.2 Influence of parameters on the performance](#cnn_parameters)<br>
- [3.3 Studying predictions](#results)<br>

[4. Pretrained Networks](#pretrained_cnn)<br>


<a id='dataset'></a>
# 1 - A first look at the data set

**1) Download the dataset. See [`keras.datasets`](https://keras.io/datasets/) for how to download the data.  Note that the dataset is already divided into a training set of 50000 images, and a test set of 10000.**

In [None]:
################## TODO BLOCK
 
################## END TODO BLOCK

**2) Check the shape of images and targets.**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**3) Create a list of labels corresponding to the 10 categories.  This will be used to convert the 0-9 digits in the target arrays to string labels. The categories are labeled as follows**
  0. airplane
  1. automobile
  2. bird
  3. cat
  4. deer
  5. dog
  6. frog
  7. horse
  8. ship
  9. truck


In [None]:
################## TODO BLOCK


################## END TODO BLOCK

**4) Normalize images from [0,255] to be [0,1] (normalizing usually improves model training).**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**5) Convert the target arrays to one-hot encodings.  Hint: checkout the [`keras.utils.to_categorical()`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical)**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**6) Visualize some images in each category using the `imshow()` function in `matplotlib.pyplot`.  Can you recreate the figure at the top?  Hint: the top figure was created using the first 8 images belonging to each category in the training data.**

In [None]:
################## TODO BLOCK


################## END TODO BLOCK

In [None]:
#The following code summarizes all previous operations. 
#No Need to fill in this cell. You can continue the lab.

################## TODO BLOCK


################## END TODO BLOCK

<a id='first_model'></a>

# 2 - First naive model

In order to better understand the importance of CNNs, it is instructive to first see how well a naive dense network performs on the dataset.

**7) Create a sequential model with 4 `Dense` hidden layers of 1024, 512, 256, and 100 nodes each, with ReLU activation, and an output layer suited for the learning task. For the training, use the SGD optimizer with a learning rate of 0.1 and a decay of $10^{-6}$. The performance of the network will be assessed via the accuracy metric.**

In [None]:
################## TODO BLOCK


################## END TODO BLOCK

**8) Compute by hand the total number of trainable parameters (weights and biases) in the model.**

################## TODO BLOCK



################## END TODO BLOCK

**9) Use the `summary()` function on model to get a text summary of the model.  Did you compute the number of parameters correctly?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**10) Train the model for 10 epochs, with a batch size of 32 (you may also use early stopping). What is the model performance?**

In [None]:
################## TODO BLOCK


################## END TODO BLOCK

**11) Plot several images with their predictions. Any comment?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

<a id='cnn'></a>

# 3 - Convolutional Neural Network
 

Convolutional neural networks allow us to do drastically better on this dataset (and many image classification problems in general).  In this task, you will build your first convolutional network and see how it performs during training.

<a id='first_cnn'></a>
## 3.1 - Create your first CNN

**12) Create a new model with the following layers (use the same optimizer and loss as above)**
  - 3x3 2D convolution with zero-padding, a stride of 1, 8 filters
  - ReLU activation
  - 3,3 2D convolution, no padding, a stride of 1, 8 filters
  - ReLU activation
  - Max pooling with size (2,2) and a stride of 2
  - 3x3 2D convolution, with zero-padding, a stride of 1, 32 filters
  - ReLU activation
  - 3x3 2D convolution, no padding, a stride of 1, 32 filters
  - ReLU activation
  - Max pooling with size (2,2) and a stride of 2
  - Flatten
  - Dense layer with 408 nodes, ReLU activation
  - A well-chosen output layer

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**13) Compute by hand the number of trainable parameters in this network.  Are there more or less than the more simple dense network?  Why?  Confirm with `summary()`.**

################## TODO BLOCK


################## END TODO BLOCK

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**14) Train the network for 10 epochs, with batch size of 32. How does the validation accuracy change with each epoch?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK

################## END TODO BLOCK

<a id='cnn_parameters'></a>
## 3.2 - Influence of parameters on the performance

**15) How does the performance depend on batch size?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK


################## END TODO BLOCK

**16) Consider now a batch size of 32 and consider the above CNN. Try adding 3 dropout layers to this model, one after each max-pooling layer and one before the last layer, using a dropout of parameter p=0.25. Does this improve the model? How does the performance vary with the dropout ratio? What does p correspond to?**

In [None]:
################## TODO BLOCK


################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK

################## END TODO BLOCK

**17) Add batch normalization layers before each dropout layer, with the dropout parameter of your choice. What is the impact of batch normalization on the model's performance?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK

################## END TODO BLOCK

**18) Based on the CNNs you have considered so far in this lab, what would be the next improvement you would like to test to increase the validation accuracy of your model? Test it and comment.**

In [None]:
################## TODO BLOCK


################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK


################## END TODO BLOCK

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK

################## END TODO BLOCK

<a id='results'></a>

## 3.3 - Studying predictions

Assuming all went well during the previous tasks, you can now predict the category of a new image!  Here are a few examples of my predictions:

![predictions.png](predictions.png)

**19) Use `predict` on your trained model (the best you have created so far) to test its prediction on a few example images of the test set. Using `imshow` and `hbar` from `matplotlib.pyplot`, try to recreate the image above for few test images. Compute the accuracy of your model on the test set and comment.**

NB: You can save the model after training it (function `save` in keras), and then decide to load from saved file instead of building a new one (if available) on successive runs (via the function `load_model` in keras).**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

# Compute the accuracy
print("The accuracy on the test set is", metrics.accuracy_score(y_test, y_pred) )

A confusion matrix is often used in supervised learning to understand how well (or not) each category is being classified.  Each element (i,j) in the confusion matrix represents the predicted class j for each true class i.  Consider the following 10 predictions for a 2 category model predicting male or female.

| example     | true category  | predicted category  |
|-------------|----------------|---------------------|
| 1           | male           | male                |
| 2           | female         | male                |
| 3           | female         | female              |
| 4           | male           | male                |
| 5           | male           | female              |
| 6           | male           | male                |
| 7           | female         | female              |
| 8           | male           | female              |
| 9           | female         | female              |
| 10          | female         | female              |

Based on the above data, the model is accurate 70% of the time.  The confusion matrix is

|        | predicted male | predicted female |
|--------|------|--------|
| true male   | 3    | 2      |
| true female | 1    | 4      |

The confusion matrix gives us more information than a simple accuracy measurement. 

**20) Create the confusion matrix the CIFAR-10 dataset using the test data.  What does it tell you about the relationships between each class?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

<a id='pretrained_cnn'></a>
# 4 - Pretrained Networks

Several pre-trained networks are directly accessible via keras.

**21) Build a classifier with a better accuracy on the test set than all CNN you have built before. One rule only: do not use CNN pretrained on CIFAR10.** 

In [None]:
# Import all usefull libraries

################## TODO BLOCK

################## END TODO BLOCK

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

In [None]:
# Plot the evolution of (train/val) accuracy through epochs

################## TODO BLOCK

################## END TODO BLOCK

In [None]:
################## TODO BLOCK

################## END TODO BLOCK

**22) Plot several images with their predictions. Any comment?**

In [None]:
################## TODO BLOCK

################## END TODO BLOCK