"Open

# Multimodal Learning Lab

In this lab we will show how to work with mixed data, also called multimodal learning, using deep learning. Mixed data is anything that is not just one type of data, such as mixing traditional structured data with images, or with audio, video, etc. The possibilities are endless!

We will use Keras' very powerful Model API, which will allow us to create models with multiple inputs, multiple outputs, forks, and whatever our imagination takes us. This tutorial borrows a bit from [this post](https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/) where I originally found the application, and uses the data from [this GitHub](https://github.com/emanhamed/Houses-dataset) referencing the paper by Ahmed & Moustafa (2016) that can be found [here](https://arxiv.org/abs/1609.08399).

The plan is to:

- Import the data and prepare it for multimodal learning.
- Create an architecture that receives an image and uses a pre-trained model for the image inputs, a second structure that takes the structured data and runs a small feature engineering process (i.e. a dense layer) and a final section that combines both inputs into a final set of dense layers.

## Data Import and Preprocessing

I have modified the original dataset for easier processing. The objective of the problem is to create a model that predicts house prices using the property's basic information (Bedrooms, bathrooms, area in sqft, zipcode) and a collage of four images in the house.

In [None]:
!nvidia-smi -L

In [None]:
# Download and unzip the data.
!gdown 'https://drive.google.com/uc?id=1XZGDY0XVHNDawfMymX7d1BRO5a17KIN5'

In [None]:
!unzip HousesDatasetClean.zip

In [None]:
import pandas as pd

HouseData = pd.read_csv('HousesDatasetClean/HousesInfo.csv')
HouseData.describe()

In [None]:
HouseData.head()

So, the data has the following attributes:

- Number of bedrooms and bathrooms.
- Geographical area it is located (categorical!).
- Zipcode (categorical!)
- House price.

Our target is, using the information from all other variables, predict a single house price. Let's see an example of an image.

In [None]:
from IPython.display import Image
Image(filename='HousesDatasetClean/1/4.jpg') 

Let's start with a model that uses only the images, as a benchmark.

## Image-only Deep Learning model

In the following code we will use ResNet50 for the purposes of our model.

This model was trained over the ImageNet data, thus looking to classify among 1000 different types of objects, over a very large database of images. We can leverage these already-trained weights, and adapt just the last few layers for our purposes.

We start by loading the model. Keras comes pre-packaged with a series of models, loaded into the [Applications library](https://keras.io/applications/). We start by first loading the model on-the-fly using the library. We can check the options of the model in the options of the function [ResNet50v2](https://keras.io/api/applications/resnet/#resnet50v2-function). Note that ResNet50 requires inputs between -1 and 1, so we do **not** need to rescale in the generator and we simply apply [ResNet50v2's preprocessing function](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet_v2/preprocess_input).

We also need a package that allows for an efficient storage of the model using a binary format. The package is called [h5py](https://www.h5py.org/) and also allows for storing your pre-trained models. You can read a tutorial for this [here](https://machinelearningmastery.com/save-load-keras-deep-learning-models/).


In [None]:
import numpy as np
import h5py as h5py
import PIL

# Others
import numpy as np
from sklearn.model_selection import train_test_split

# For AUC estimation and ROC plots
from sklearn.metrics import roc_curve, auc

# Plots
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Image and directories
import cv2
import os

# Tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import optimizers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow import keras

In [None]:
# Parameters
ImageSize = (224,224)
BatchSize = 32

In [None]:
# Import base model. Using ResNet50v2.
from tensorflow.keras.applications.resnet_v2 import ResNet50V2, preprocess_input

# Import model with input layer
base_model = ResNet50V2(weights = 'imagenet', # The weights from the ImageNet competition
 include_top = False, # Do not include the top layer, which classifies.
 input_shape= (224, 224, 3) # Input shape. Three channels.
 )

Now we will use the model API. The core trick is that now we need to write the models in the form

``` next_layer = LAYER_FUNCTION()(previous_layer)```

This will make the model "learn" it needs to place certain input after certain output.

Our process will be to:

1. Set the base model (ResNet) to non-trainable (freeze weights).
2. Add the new model.
3. Train only the top to convergence.
4. Unfreeze the weights to fine-tune.

These steps are needed to properly adjust the weights and are the recommended practice when finetunning more complex models, as [explained here](https://keras.io/guides/transfer_learning/). This website also has an example using Xception, another model.

In [None]:
# Set the base model to untrainable.
base_model.trainable = False

In [None]:
# Create the full model using the Model API

# Input layer
inputs = keras.Input(shape=ImageSize + (3,),
 name = 'image_only_input')

# Add the ResNet model, setting it to be untrainable. 
# First we store it on a temporary variable.
x = base_model(inputs, training=False)

# Flatten to make it the same size as the original model
x = Flatten()(x)

# Now we actually add it to a layer. Note the way of writing it.
x = Dense(64, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(64, activation='relu')(x)
x = Dropout(0.5)(x)

# Add final output layer.
outputs = Dense(1, activation='relu')(x)

# Create the complete model object
ImageOnlyModel = keras.Model(inputs, outputs)

In [None]:
# This is what the model looks like now.
ImageOnlyModel.summary()

In [None]:
# Compiling the model! Note the learning rate.
opt = optimizers.Adam(learning_rate=1e-5, # Learning rate needs to be tweaked for convergence and be small!
 decay=1e-3 / 200 # Decay of the LR 10^-3 / 1 / 50 / 100 / 200
 ) 
ImageOnlyModel.compile(loss=keras.losses.MeanAbsolutePercentageError(), # This is NOT a classification problem!
 optimizer=opt
 )

Now we will train the model, using a [```flow_from_dataframe```](https://keras.io/api/preprocessing/image/) instruction. This will allow looking for images in a particular folder, starting from a known dataset. 

This will allow us to create a train and a test dataframe, but requires this dataframe to have the route to the variables. We can easily create this, noting that the images are named after the index they come in. We first create this new column with a list comprehension.

In [None]:
ImagePath = 'HousesDatasetClean/1/'
HouseData['path'] = [os.path.join(ImagePath, str(i) + '.jpg') for i in HouseData.index.values]
HouseData.head()

In [None]:
# Create a train / test split
from sklearn.model_selection import train_test_split
train, test = train_test_split(HouseData, 
 test_size = 0.3,
 random_state = 20201207)

In [None]:
train.head()

In [None]:
test.head()

Ready! We can now load the data. We have our set of pictures ready for this example. For this problem we will use a generator. A generator takes images from a directory, and feeds them to the model as needed. **This is necessary to work with big data**. We cannot expect the datasets we work here to fit in memory, so we take the images as needed.

We will first build two image generators (one for testing and one for training), which will generate new samples on the fly using our pictures as input.

We will also conduct **data augmentation**, which are a series of mathematical operations over the datasets to make them search more complex patterns. If you use augmentation, learning will take longer but be more robust. The process to work with this data is the following:

1. Create an [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) object which will process the images and load them as needed.

2. Call the ```flow_from_dataframe``` from our generator which will split the data into two parts, one for training and one for validation, and a third one for the test set.

In [None]:
# Define parameters

target_size = (224, 224)
batch_size = 128
DataDir = 'HousesDatasetClean/1'

# Define generators
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
 rescale=None, # Inputs are scaled in the preprocessing function
 shear_range=0, # Shear?
 zoom_range=0.2, # Zoom? 0.2 means from 80% to 120%
 horizontal_flip=False, # Flip horizontally?
 vertical_flip=False, # Flip vertically?
 preprocessing_function=preprocess_input, # ResNet expects specific input. Set it up with this function that comes prepackaged.
 validation_split = 0.2 # Create a validation cut?
 )

test_datagen = ImageDataGenerator(
 rescale=None, # Inputs are scaled in the preprocessing function
 shear_range=0, # Shear?
 zoom_range=0, # Zoom? 0.2 means from 80% to 120%
 horizontal_flip=False, # Flip horizontally?
 vertical_flip=False, # Flip vertically?
 preprocessing_function=preprocess_input, # VGG expects specific input. Set it up with this function that comes prepackaged.
 )

# Point to the data and **give the targets**. Note the "raw" class_mode
train_generator = train_datagen.flow_from_dataframe(train,
 directory='.', # Look from root directory
 x_col='path', # Path to images
 y_col='price', # Target
 target_size=target_size, # Same as last lab
 batch_size=batch_size,
 shuffle=True,
 class_mode='raw',
 subset='training',
 interpolation="bilinear"
 )

validation_generator = train_datagen.flow_from_dataframe(train,
 directory='.',
 x_col='path',
 y_col='price',
 target_size=target_size,
 batch_size=batch_size,
 shuffle=True,
 class_mode='raw',
 subset='validation',
 interpolation="bilinear"
 )

test_generator = test_datagen.flow_from_dataframe(test,
 directory='.',
 x_col='path',
 y_col='price',
 target_size=target_size,
 batch_size=batch_size,
 shuffle=False,
 class_mode='raw',
 interpolation="bilinear"
 )

Now let's train! We can easily train this model by calling the fit function and passing the generator. This will **only** train the dense layers, as it is recommended first. It is always a good idea to first give the training parameters somewhere to start from. This is called **model warming up**. We can train the rest of the model in a second round.

You only need to give it a few rounds.

In [None]:
# Number of epochs
epochs = 2

# Train!
ImageOnlyModel.fit(
 train_generator,
 epochs=epochs,
 validation_data=validation_generator,
 steps_per_epoch = 3, # Usually cases / batch_size = 3.
 validation_steps = 1 # Number of validation steps. Again cases / batch_size = 1.
 )

The model did not learn much, but we are only training the dense layers. Let's try to train now all layers. First, let's set the model to trainable and recompile.


In [None]:
base_model.trainable = True

# Recompile as we changed things.
ImageOnlyModel.compile(loss=keras.losses.MeanAbsolutePercentageError(), # This is NOT a classification problem!
 optimizer=opt
 )


Now we can train the model. We will also add a **[callback](https://keras.io/api/callbacks/)**. Callbacks allow us to stop the training early if we reach convergence, save the model, create temporary plots... anything really. They are fairly powerful and quite necessary when we train big models. We will do two things:

1. Add an [EarlyStopping](https://keras.io/api/callbacks/early_stopping/) callback to stop training once the validation error stays flat for a couple of epochs.
2. Add a [ModelCheckpoint](https://keras.io/api/callbacks/model_checkpoint/) callback that saves the weights of the model with the best performance automatically. I strongly suggest you save the weights to your Drive folder so you can retake training if your learning crashes or if Google Colab kicks you out. It is also a good business practice of course. You can then load models back following [these instructions](https://www.tensorflow.org/tutorials/keras/save_and_load).

In [None]:
# Define callbacks
checkpoint_path='checkpoints/ImageOnlyModel.{epoch:02d}-{val_loss:.2f}.h5'
checkpoint_dir=os.path.dirname(checkpoint_path)

my_callbacks = [
 # Stop training if validation error stays within 0.00001 for three rounds.
 tf.keras.callbacks.EarlyStopping(monitor='val_loss',
 min_delta=0.00001,
 patience=3),
 # Save the weights of the best performing model to the checkpoint folder.
 tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
 save_best_only=True,
 save_weights_only=True),
]

# Number of epochs
epochs = 10

# Train!
ImageOnlyModel.fit(
 train_generator, # Pass the train generator
 epochs=epochs, # Pass the epochs
 validation_data=validation_generator, # Pass the validation generator
 steps_per_epoch = 3, # Usually cases / batch_size = 3.
 validation_steps = 1, # Number of validation steps. Again cases / batch_size = 3.
 callbacks=my_callbacks # Add the callbacks
 )

Keras gives us the full training history of the model, which we can use to track convergence. The following code plots this history.

In [None]:
# Plotting training history.
loss = ImageOnlyModel.history.history['loss']
val_loss = ImageOnlyModel.history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Let's restore the best model.



In [None]:
# Load the weights. THIS REQUIRES FIRST CREATING THE LOGIC.
ImageOnlyModel.load_weights('/content/checkpoints/model.10-65.23.h5')

In [None]:
# Applying to the test set with a generator.
test_generator.reset()

# Get probabilities
output = ImageOnlyModel.predict(test_generator)

In [None]:
def mean_absolute_percentage_error(y_true, y_pred): 
 y_true, y_pred = np.array(y_true), np.array(y_pred)
 return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

In [None]:
mape = mean_absolute_percentage_error(test_generator.labels, output)
print('The mean absolute percentual error over the test is %.2f%%' % mape)

The model over the images is not very good. That makes sense, right? The prices of house does not depend only on how pretty the house is, we are missing the location context. We can add this with a mixed model.

## Multimodal model

Now we can train a multi-input model. The idea of this model is to create two inputs, one with the images and the second one with the structured data. We need to first transform the zipcodes into a binary input and normalize the data though. The following code does the preprocessing of the data.

In [None]:
# Import preprocessors
from sklearn.preprocessing import MinMaxScaler

# What are the continous variables?
continousCols = ["bedrooms", "bathrooms", "area"]

# Define scaler and train it over the train set.
Scaler = MinMaxScaler()
Scaler.fit(train[continousCols])

In [None]:
# Apply over sets. Ignore warning.
train[continousCols] = Scaler.transform(train[continousCols])
test[continousCols] = Scaler.transform(test[continousCols])

In [None]:
# Import and train binarizer.
from sklearn.preprocessing import LabelBinarizer
zipBinarizer = LabelBinarizer().fit(HouseData["zipcode"])

In [None]:
# Store binary variables in matrix
cat_train = zipBinarizer.transform(train['zipcode'])
cat_test = zipBinarizer.transform(test['zipcode'])

In [None]:
# Check number of columns created.
cat_train.shape

In [None]:
# Assign to new columns in dataframe. Ignore warning. 
for i in range(0, 49):
 col_name = "zipcode_" + str(i)
 train[col_name] = pd.Series(cat_train[:,i], index=train.index)
 test[col_name] = pd.Series(cat_test[:,i], index=test.index)

In [None]:
train

Now we can create the model!

We will:

1. Create two models, one for each data type.
2. Concatenate these inputs.
3. Add a few dense layers.
4. Compile the output.
5. Create a generator that can deal with this mixed data.

In [None]:
image_input = tf.keras.Input(shape=ImageSize + (3,),
 name = 'image_input')

# Load an empty ResNet
resnet_input = ResNet50V2(weights = 'imagenet', # The weights from the ImageNet competition
 include_top = False, # Do not include the top layer, which classifies.
 input_shape= (224, 224, 3) # Input shape. Three channels.
 )
resnet_input.trainable = False

# Use the model API to attach it to our input layer.
ImageClassifier = resnet_input(image_input, training=False)

# Add a Flatten layer with the model API.
ImageClassifier = Flatten()(ImageClassifier)

# Now we create the structured data layer.
predictive_features = 3 + 48 # Three regular, 48 categorical
features_input = keras.Input(shape=(predictive_features,),
 name="structured_data") 
Structured = Dense( 12, activation = 'relu' )(features_input) # Add one processing layer
Structured = Dropout(0.5)(Structured) # Dropout after Dense
Structured = Dense( 6, activation = 'relu' )(Structured)
Structured = Dropout(0.5)(Structured) # Dropout after Dense

# Merge all available features into a single large vector via concatenation
merged = concatenate([ImageClassifier, Structured])

# Add a few prediction layers
merged = Dense(256, activation='relu')(merged)
merged = Dropout(0.5)(merged)

house_price_multi = Dense(1, activation='relu', name="house_price_multi")(merged)

# Instantiate an end-to-end model predicting house_prices
multimodal_model = keras.Model(inputs=[image_input, features_input], 
 outputs=[house_price_multi])

In [None]:
# Compile with same optimizer as before.
multimodal_model.compile(optimizer = opt,
 loss='mean_absolute_percentage_error')

In [None]:
import pydot as pyd
from tensorflow.keras.utils import plot_model

#Visualize Model
plot_model(
 multimodal_model, to_file='model.png', show_shapes=False, show_layer_names=True,
 rankdir='TB', expand_nested=False, dpi=96
 )

Now the last part is to modify the generator. The structure is almost the same, except we need to code a custom function that will be able to process the multiple inputs and outputs. The following code accomplishes this.

In [None]:
# We will need the position of the variables.
train.columns

In [None]:
pred_cols = np.r_[0:3, 4, 6:(train.columns.shape[0]-1)]

train.columns[pred_cols]

In [None]:
# Define parameters

target_size = (224, 224)
batch_size = 32
DataDir = 'HousesDatasetClean'

# What are the useful columns? Note the position of the target. 
pred_cols = np.r_[0:3, 4, 6:(train.columns.shape[0]-1)]

# We only modify the generators. Note the y vector.
train_generator = train_datagen.flow_from_dataframe(train,
 directory='.',
 x_col='path',
 y_col=train.columns[pred_cols],
 target_size=target_size,
 batch_size=batch_size,
 shuffle=True,
 class_mode='raw',
 subset='training',
 interpolation="bilinear"
 )

validation_generator = train_datagen.flow_from_dataframe(train,
 directory='.',
 x_col='path',
 y_col=train.columns[pred_cols],
 target_size=target_size,
 batch_size=batch_size,
 shuffle=True,
 class_mode='raw',
 subset='validation',
 interpolation="bilinear"
 )

test_generator = train_datagen.flow_from_dataframe(test,
 directory='.',
 x_col='path',
 y_col=test.columns[pred_cols],
 target_size=target_size,
 batch_size=batch_size,
 shuffle=False,
 class_mode='raw',
 subset='training',
 interpolation="bilinear"
 )

# Define combined generator
def train_generator_func():
 count = 0

 while True:
 if count == len(train.index):
 train_generator.reset()
 break
 count += 1
 data = train_generator.next()

 # Let's identify where is what.
 target_location = 3
 predictive_columns = np.r_[0:3, 4:52]

 # Now we reshape everything. First the images.
 imgs = data[0]
 # Now we need to extract which ones are the predictive variables.
 cols = data[1][:, predictive_columns]
 # Finally we need the targets.
 targets = data[1][:, target_location]
 yield [imgs, cols], targets


def validation_generator_func():
 count = 0
 while True:
 if count == len(train.index):
 validation_generator.reset()
 break
 count += 1
 data = validation_generator.next()

 # Let's identify where is what.
 target_location = 3
 predictive_columns = np.r_[0:3, 4:52]

 # Now we reshape everything. First the images.
 imgs = data[0]
 # Now we need to extract which ones are the predictive variables.
 cols = data[1][:, predictive_columns]
 # Finally we need the targets.
 targets = data[1][:, target_location]
 yield [imgs, cols], targets
 
 
def test_generator_func():
 count = 0
 test_generator.reset()
 while True:
 if count == len(test.index):
 test_generator.reset()
 break
 count += 1
 data = test_generator.next()

 # Let's identify where is what.
 target_location = 3
 predictive_columns = np.r_[0:3, 4:52]

 # Now we reshape everything. First the images.
 imgs = data[0]
 # Now we need to extract which ones are the predictive variables.
 cols = data[1][:, predictive_columns]
 # Finally we need the targets.
 targets = data[1][:, target_location]
 yield [imgs, cols], targets

With this, we are ready to train. First, let's check one data / target batch from the train.

In [None]:
# This is how the data comes out now.
train_test_output = train_generator_func()
next(train_test_output)

Everything looks perfect. Now we train!

In [None]:
# Warmup
# Steps and epochs
epochs=3
steps_per_epoch = train_generator.samples // train_generator.batch_size
validation_steps = np.amax([validation_generator.samples // validation_generator.batch_size, 1])

# Train!
multimodal_model.fit(train_generator_func(),
 epochs=epochs,
 steps_per_epoch=steps_per_epoch,
 validation_data=validation_generator_func(),
 validation_steps=validation_steps
 )

Now that we warmed the model up, we continue with the real training.

In [None]:
# Set it as trainable
resnet_input.trainable = True

# Recompile
multimodal_model.compile(optimizer = opt,
 loss='mean_absolute_percentage_error')

# Define callbacks
checkpoint_path='checkpoints/MultimodalModel{epoch:02d}-{val_loss:.2f}.h5'
checkpoint_dir=os.path.dirname(checkpoint_path)

my_callbacks = [
 # Stop training if validation error stays within 0.00001 for three rounds.
 tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
 min_delta=0.00001,
 patience=3),
 # Save the weights of the best performing model to the checkpoint folder.
 tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
 save_best_only=True,
 save_weights_only=True,
 monitor='val_loss'
 ),
]

# Steps and epochs
epochs=5
steps_per_epoch = train_generator.samples // train_generator.batch_size
validation_steps = np.amax([validation_generator.samples // validation_generator.batch_size, 1])

# Train!
multimodal_model.fit(train_generator_func(),
 epochs=epochs,
 steps_per_epoch=steps_per_epoch,
 validation_data=validation_generator_func(),
 validation_steps=validation_steps,
 callbacks=my_callbacks
 )

We should again restore the best model.

In [None]:
multimodal_model.load_weights('/content/checkpoints/MultimodalModel01-53.15.h5')

The model is better! This is the power of a multimodal learning strategy. It allows combining background knowledge (structured data) with unstructured data to reach combined outputs. Let's check the training plot and the testing error.

In [None]:
loss = multimodal_model.history.history['loss']
val_loss = multimodal_model.history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
# Calculate outputs in test set
STEP_SIZE_TEST = test_generator.n//test_generator.batch_size

house_test = multimodal_model.predict(test_generator_func(),
 steps=STEP_SIZE_TEST+1,
 verbose=1)

In [None]:
mape = mean_absolute_percentage_error(test_generator.labels[:,3], house_test)
print('The mean absolute percentual error over the test is %.2f%%' % mape)

We have a lower error now! This is almost half of the error of a model just with the structured data (try it!). Our multimodal model is capable of learning context from our multiple inputs!

Now you know how to include multiple data sources to your models. Go create your own now!