{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Multimodal Learning - House Prices.ipynb", "provenance": [], "collapsed_sections": [], "authorship_tag": "ABX9TyP62R95ZYFONIBRePSp5o4I", "include_colab_link": true }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "61XgR25OATPI" }, "source": [ "# Multimodal Learning Lab\n", "\n", "In this lab we will show how to work with mixed data, also called multimodal learning, using deep learning. Mixed data is anything that is not just one type of data, such as mixing traditional structured data with images, or with audio, video, etc. The possibilities are endless!\n", "\n", "We will use Keras' very powerful Model API, which will allow us to create models with multiple inputs, multiple outputs, forks, and whatever our imagination takes us. This tutorial borrows a bit from [this post](https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/) where I originally found the application, and uses the data from [this GitHub](https://github.com/emanhamed/Houses-dataset) referencing the paper by Ahmed & Moustafa (2016) that can be found [here](https://arxiv.org/abs/1609.08399).\n", "\n", "The plan is to:\n", "\n", "- Import the data and prepare it for multimodal learning.\n", "- Create an architecture that receives an image and uses a pre-trained model for the image inputs, a second structure that takes the structured data and runs a small feature engineering process (i.e. a dense layer) and a final section that combines both inputs into a final set of dense layers." ] }, { "cell_type": "markdown", "metadata": { "id": "494FK6yQDja-" }, "source": [ "## Data Import and Preprocessing\n", "\n", "I have modified the original dataset for easier processing. The objective of the problem is to create a model that predicts house prices using the property's basic information (Bedrooms, bathrooms, area in sqft, zipcode) and a collage of four images in the house." ] }, { "cell_type": "code", "metadata": { "id": "-Kvbf_JI7UUP" }, "source": [ "!nvidia-smi -L" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ciqIw-4lDgco" }, "source": [ "# Download and unzip the data.\n", "!gdown 'https://drive.google.com/uc?id=1XZGDY0XVHNDawfMymX7d1BRO5a17KIN5'" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "NNyO9C8gGQWS" }, "source": [ "!unzip HousesDatasetClean.zip" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "ghev0W02UIv1" }, "source": [ "import pandas as pd\n", "\n", "HouseData = pd.read_csv('HousesDatasetClean/HousesInfo.csv')\n", "HouseData.describe()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "0fjK1O2D17ut" }, "source": [ "HouseData.head()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "O54hZemEUgLt" }, "source": [ "So, the data has the following attributes:\n", "\n", "- Number of bedrooms and bathrooms.\n", "- Geographical area it is located (categorical!).\n", "- Zipcode (categorical!)\n", "- House price.\n", "\n", "Our target is, using the information from all other variables, predict a single house price. Let's see an example of an image." ] }, { "cell_type": "code", "metadata": { "id": "hs02smwBAt3F" }, "source": [ "from IPython.display import Image\n", "Image(filename='HousesDatasetClean/1/4.jpg') " ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "NjaRFbAjAuFS" }, "source": [ "Let's start with a model that uses only the images, as a benchmark." ] }, { "cell_type": "markdown", "metadata": { "id": "FNEPfCPNTP7N" }, "source": [ "## Image-only Deep Learning model\n", "\n", "In the following code we will use ResNet50 for the purposes of our model.\n", "\n", "This model was trained over the ImageNet data, thus looking to classify among 1000 different types of objects, over a very large database of images. We can leverage these already-trained weights, and adapt just the last few layers for our purposes.\n", "\n", "We start by loading the model. Keras comes pre-packaged with a series of models, loaded into the [Applications library](https://keras.io/applications/). We start by first loading the model on-the-fly using the library. We can check the options of the model in the options of the function [ResNet50v2](https://keras.io/api/applications/resnet/#resnet50v2-function). Note that ResNet50 requires inputs between -1 and 1, so we do **not** need to rescale in the generator and we simply apply [ResNet50v2's preprocessing function](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet_v2/preprocess_input).\n", "\n", "We also need a package that allows for an efficient storage of the model using a binary format. The package is called [h5py](https://www.h5py.org/) and also allows for storing your pre-trained models. You can read a tutorial for this [here](https://machinelearningmastery.com/save-load-keras-deep-learning-models/).\n" ] }, { "cell_type": "code", "metadata": { "id": "KemswKg9TVYc" }, "source": [ "import numpy as np\n", "import h5py as h5py\n", "import PIL\n", "\n", "# Others\n", "import numpy as np\n", "from sklearn.model_selection import train_test_split\n", "\n", "# For AUC estimation and ROC plots\n", "from sklearn.metrics import roc_curve, auc\n", "\n", "# Plots\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline\n", "\n", "# Image and directories\n", "import cv2\n", "import os\n", "\n", "# Tensorflow\n", "import tensorflow as tf\n", "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n", "from tensorflow.keras import optimizers\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import *\n", "from tensorflow import keras" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "QePGGCg5TeKI" }, "source": [ "# Parameters\n", "ImageSize = (224,224)\n", "BatchSize = 32" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "qLeA2N62TzbQ" }, "source": [ "# Import base model. Using ResNet50v2.\n", "from tensorflow.keras.applications.resnet_v2 import ResNet50V2, preprocess_input\n", "\n", "# Import model with input layer\n", "base_model = ResNet50V2(weights = 'imagenet', # The weights from the ImageNet competition\n", " include_top = False, # Do not include the top layer, which classifies.\n", " input_shape= (224, 224, 3) # Input shape. Three channels.\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now we will use the model API. The core trick is that now we need to write the models in the form\n", "\n", "``` next_layer = LAYER_FUNCTION()(previous_layer)```\n", "\n", "This will make the model \"learn\" it needs to place certain input after certain output." ], "metadata": { "id": "8I4v-P-SLAmo" } }, { "cell_type": "markdown", "source": [ "Our process will be to:\n", "\n", "1. Set the base model (ResNet) to non-trainable (freeze weights).\n", "2. Add the new model.\n", "3. Train only the top to convergence.\n", "4. Unfreeze the weights to fine-tune.\n", "\n", "These steps are needed to properly adjust the weights and are the recommended practice when finetunning more complex models, as [explained here](https://keras.io/guides/transfer_learning/). This website also has an example using Xception, another model." ], "metadata": { "id": "OSx6BIvFLplO" } }, { "cell_type": "code", "metadata": { "id": "ziuR4dv7VC_y" }, "source": [ "# Set the base model to untrainable.\n", "base_model.trainable = False" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Create the full model using the Model API\n", "\n", "# Input layer\n", "inputs = keras.Input(shape=ImageSize + (3,),\n", " name = 'image_only_input')\n", "\n", "# Add the ResNet model, setting it to be untrainable. \n", "# First we store it on a temporary variable.\n", "x = base_model(inputs, training=False)\n", "\n", "# Flatten to make it the same size as the original model\n", "x = Flatten()(x)\n", "\n", "# Now we actually add it to a layer. Note the way of writing it.\n", "x = Dense(64, activation='relu')(x)\n", "x = Dropout(0.5)(x)\n", "x = Dense(64, activation='relu')(x)\n", "x = Dropout(0.5)(x)\n", "\n", "# Add final output layer.\n", "outputs = Dense(1, activation='relu')(x)\n", "\n", "# Create the complete model object\n", "ImageOnlyModel = keras.Model(inputs, outputs)" ], "metadata": { "id": "8tXnuI_KNS6t" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# This is what the model looks like now.\n", "ImageOnlyModel.summary()" ], "metadata": { "id": "wfZdP6HJQXD0" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "GW3-A2JUVZiM" }, "source": [ "# Compiling the model! Note the learning rate.\n", "opt = optimizers.Adam(learning_rate=1e-5, # Learning rate needs to be tweaked for convergence and be small!\n", " decay=1e-3 / 200 # Decay of the LR 10^-3 / 1 / 50 / 100 / 200\n", " ) \n", "ImageOnlyModel.compile(loss=keras.losses.MeanAbsolutePercentageError(), # This is NOT a classification problem!\n", " optimizer=opt\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Xcau6DQGWAI4" }, "source": [ "Now we will train the model, using a [```flow_from_dataframe```](https://keras.io/api/preprocessing/image/) instruction. This will allow looking for images in a particular folder, starting from a known dataset. \n", "\n", "This will allow us to create a train and a test dataframe, but requires this dataframe to have the route to the variables. We can easily create this, noting that the images are named after the index they come in. We first create this new column with a list comprehension." ] }, { "cell_type": "code", "metadata": { "id": "b2axXomcV3N0" }, "source": [ "ImagePath = 'HousesDatasetClean/1/'\n", "HouseData['path'] = [os.path.join(ImagePath, str(i) + '.jpg') for i in HouseData.index.values]\n", "HouseData.head()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "rvLt1jdHXKTl" }, "source": [ "# Create a train / test split\n", "from sklearn.model_selection import train_test_split\n", "train, test = train_test_split(HouseData, \n", " test_size = 0.3,\n", " random_state = 20201207)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "FFSOXQRP4qzi" }, "source": [ "train.head()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "HIgE1i4U4u7C" }, "source": [ "test.head()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "muTk2Bk6xikW" }, "source": [ "Ready! We can now load the data. We have our set of pictures ready for this example. For this problem we will use a generator. A generator takes images from a directory, and feeds them to the model as needed. **This is necessary to work with big data**. We cannot expect the datasets we work here to fit in memory, so we take the images as needed.\n", "\n", "We will first build two image generators (one for testing and one for training), which will generate new samples on the fly using our pictures as input.\n", "\n", "We will also conduct **data augmentation**, which are a series of mathematical operations over the datasets to make them search more complex patterns. If you use augmentation, learning will take longer but be more robust. The process to work with this data is the following:\n", "\n", "1. Create an [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) object which will process the images and load them as needed.\n", "\n", "2. Call the ```flow_from_dataframe``` from our generator which will split the data into two parts, one for training and one for validation, and a third one for the test set." ] }, { "cell_type": "code", "metadata": { "id": "_XFTBH4NY80A" }, "source": [ "# Define parameters\n", "\n", "target_size = (224, 224)\n", "batch_size = 128\n", "DataDir = 'HousesDatasetClean/1'\n", "\n", "# Define generators\n", "from tensorflow.keras.preprocessing.image import ImageDataGenerator\n", "\n", "train_datagen = ImageDataGenerator(\n", " rescale=None, # Inputs are scaled in the preprocessing function\n", " shear_range=0, # Shear?\n", " zoom_range=0.2, # Zoom? 0.2 means from 80% to 120%\n", " horizontal_flip=False, # Flip horizontally?\n", " vertical_flip=False, # Flip vertically?\n", " preprocessing_function=preprocess_input, # ResNet expects specific input. Set it up with this function that comes prepackaged.\n", " validation_split = 0.2 # Create a validation cut?\n", " )\n", "\n", "test_datagen = ImageDataGenerator(\n", " rescale=None, # Inputs are scaled in the preprocessing function\n", " shear_range=0, # Shear?\n", " zoom_range=0, # Zoom? 0.2 means from 80% to 120%\n", " horizontal_flip=False, # Flip horizontally?\n", " vertical_flip=False, # Flip vertically?\n", " preprocessing_function=preprocess_input, # VGG expects specific input. Set it up with this function that comes prepackaged.\n", " )\n", "\n", "# Point to the data and **give the targets**. Note the \"raw\" class_mode\n", "train_generator = train_datagen.flow_from_dataframe(train,\n", " directory='.', # Look from root directory\n", " x_col='path', # Path to images\n", " y_col='price', # Target\n", " target_size=target_size, # Same as last lab\n", " batch_size=batch_size,\n", " shuffle=True,\n", " class_mode='raw',\n", " subset='training',\n", " interpolation=\"bilinear\"\n", " )\n", "\n", "validation_generator = train_datagen.flow_from_dataframe(train,\n", " directory='.',\n", " x_col='path',\n", " y_col='price',\n", " target_size=target_size,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " class_mode='raw',\n", " subset='validation',\n", " interpolation=\"bilinear\"\n", " )\n", "\n", "test_generator = test_datagen.flow_from_dataframe(test,\n", " directory='.',\n", " x_col='path',\n", " y_col='price',\n", " target_size=target_size,\n", " batch_size=batch_size,\n", " shuffle=False,\n", " class_mode='raw',\n", " interpolation=\"bilinear\"\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "5HFFaRJFbIGj" }, "source": [ "Now let's train! We can easily train this model by calling the fit function and passing the generator. This will **only** train the dense layers, as it is recommended first. It is always a good idea to first give the training parameters somewhere to start from. This is called **model warming up**. We can train the rest of the model in a second round.\n", "\n", "You only need to give it a few rounds." ] }, { "cell_type": "code", "metadata": { "id": "34W5yok-aesL" }, "source": [ "# Number of epochs\n", "epochs = 2\n", "\n", "# Train!\n", "ImageOnlyModel.fit(\n", " train_generator,\n", " epochs=epochs,\n", " validation_data=validation_generator,\n", " steps_per_epoch = 3, # Usually cases / batch_size = 3.\n", " validation_steps = 1 # Number of validation steps. Again cases / batch_size = 1.\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The model did not learn much, but we are only training the dense layers. Let's try to train now all layers. First, let's set the model to trainable and recompile.\n" ], "metadata": { "id": "ECptdsXyfC3K" } }, { "cell_type": "code", "source": [ "base_model.trainable = True\n", "\n", "# Recompile as we changed things.\n", "ImageOnlyModel.compile(loss=keras.losses.MeanAbsolutePercentageError(), # This is NOT a classification problem!\n", " optimizer=opt\n", " )\n" ], "metadata": { "id": "917DPhkXfB_g" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now we can train the model. We will also add a **[callback](https://keras.io/api/callbacks/)**. Callbacks allow us to stop the training early if we reach convergence, save the model, create temporary plots... anything really. They are fairly powerful and quite necessary when we train big models. We will do two things:\n", "\n", "1. Add an [EarlyStopping](https://keras.io/api/callbacks/early_stopping/) callback to stop training once the validation error stays flat for a couple of epochs.\n", "2. Add a [ModelCheckpoint](https://keras.io/api/callbacks/model_checkpoint/) callback that saves the weights of the model with the best performance automatically. I strongly suggest you save the weights to your Drive folder so you can retake training if your learning crashes or if Google Colab kicks you out. It is also a good business practice of course. You can then load models back following [these instructions](https://www.tensorflow.org/tutorials/keras/save_and_load)." ], "metadata": { "id": "8G5jtoS5Q0Pf" } }, { "cell_type": "code", "source": [ "# Define callbacks\n", "checkpoint_path='checkpoints/ImageOnlyModel.{epoch:02d}-{val_loss:.2f}.h5'\n", "checkpoint_dir=os.path.dirname(checkpoint_path)\n", "\n", "my_callbacks = [\n", " # Stop training if validation error stays within 0.00001 for three rounds.\n", " tf.keras.callbacks.EarlyStopping(monitor='val_loss',\n", " min_delta=0.00001,\n", " patience=3),\n", " # Save the weights of the best performing model to the checkpoint folder.\n", " tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,\n", " save_best_only=True,\n", " save_weights_only=True),\n", "]\n", "\n", "# Number of epochs\n", "epochs = 10\n", "\n", "# Train!\n", "ImageOnlyModel.fit(\n", " train_generator, # Pass the train generator\n", " epochs=epochs, # Pass the epochs\n", " validation_data=validation_generator, # Pass the validation generator\n", " steps_per_epoch = 3, # Usually cases / batch_size = 3.\n", " validation_steps = 1, # Number of validation steps. Again cases / batch_size = 3.\n", " callbacks=my_callbacks # Add the callbacks\n", " )" ], "metadata": { "id": "ChVrFfX6Quom" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "2p3PYpJAx9fL" }, "source": [ "Keras gives us the full training history of the model, which we can use to track convergence. The following code plots this history." ] }, { "cell_type": "code", "metadata": { "id": "WtEc0Zh8bC40" }, "source": [ "# Plotting training history.\n", "loss = ImageOnlyModel.history.history['loss']\n", "val_loss = ImageOnlyModel.history.history['val_loss']\n", "epochs = range(1, len(loss) + 1)\n", "plt.plot(epochs, loss, 'bo', label='Training loss')\n", "plt.plot(epochs, val_loss, 'b', label='Validation loss')\n", "plt.title('Training and validation loss')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "plt.show()" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "CFu9PFUyyNiz" }, "source": [ "Let's restore the best model.\n", "\n" ] }, { "cell_type": "code", "source": [ "# Load the weights. THIS REQUIRES FIRST CREATING THE LOGIC.\n", "ImageOnlyModel.load_weights('/content/checkpoints/model.10-65.23.h5')" ], "metadata": { "id": "mGxm9na3RhYG" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "hlUeA9v3g4EJ" }, "source": [ "# Applying to the test set with a generator.\n", "test_generator.reset()\n", "\n", "# Get probabilities\n", "output = ImageOnlyModel.predict(test_generator)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "fG524uWJkf-c" }, "source": [ "def mean_absolute_percentage_error(y_true, y_pred): \n", " y_true, y_pred = np.array(y_true), np.array(y_pred)\n", " return np.mean(np.abs((y_true - y_pred) / y_true)) * 100" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "G4c-1gXvhGKV" }, "source": [ "mape = mean_absolute_percentage_error(test_generator.labels, output)\n", "print('The mean absolute percentual error over the test is %.2f%%' % mape)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "8J3sKh6LBz8A" }, "source": [ "The model over the images is not very good. That makes sense, right? The prices of house does not depend only on how pretty the house is, we are missing the location context. We can add this with a mixed model." ] }, { "cell_type": "markdown", "metadata": { "id": "6aspthBLchyU" }, "source": [ "## Multimodal model\n", "\n", "Now we can train a multi-input model. The idea of this model is to create two inputs, one with the images and the second one with the structured data. We need to first transform the zipcodes into a binary input and normalize the data though. The following code does the preprocessing of the data." ] }, { "cell_type": "code", "metadata": { "id": "wV8gDyONwYPs" }, "source": [ "# Import preprocessors\n", "from sklearn.preprocessing import MinMaxScaler\n", "\n", "# What are the continous variables?\n", "continousCols = [\"bedrooms\", \"bathrooms\", \"area\"]\n", "\n", "# Define scaler and train it over the train set.\n", "Scaler = MinMaxScaler()\n", "Scaler.fit(train[continousCols])" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "GW9CbnKPxubG" }, "source": [ "# Apply over sets. Ignore warning.\n", "train[continousCols] = Scaler.transform(train[continousCols])\n", "test[continousCols] = Scaler.transform(test[continousCols])" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "nA0JhBezyBX3" }, "source": [ "# Import and train binarizer.\n", "from sklearn.preprocessing import LabelBinarizer\n", "zipBinarizer = LabelBinarizer().fit(HouseData[\"zipcode\"])" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "8yn_89nTySk3" }, "source": [ "# Store binary variables in matrix\n", "cat_train = zipBinarizer.transform(train['zipcode'])\n", "cat_test = zipBinarizer.transform(test['zipcode'])" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "jB22rK-4zP4b" }, "source": [ "# Check number of columns created.\n", "cat_train.shape" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "NcZw-8ZkyjTP" }, "source": [ "# Assign to new columns in dataframe. Ignore warning. \n", "for i in range(0, 49):\n", " col_name = \"zipcode_\" + str(i)\n", " train[col_name] = pd.Series(cat_train[:,i], index=train.index)\n", " test[col_name] = pd.Series(cat_test[:,i], index=test.index)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "9LiQ5tY2Qa4V" }, "source": [ "train" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "yFOhY0cHz8Qc" }, "source": [ "Now we can create the model!\n", "\n", "We will:\n", "\n", "1. Create two models, one for each data type.\n", "2. Concatenate these inputs.\n", "3. Add a few dense layers.\n", "4. Compile the output.\n", "5. Create a generator that can deal with this mixed data." ] }, { "cell_type": "code", "metadata": { "id": "p7M8a6O5zpCZ" }, "source": [ "image_input = tf.keras.Input(shape=ImageSize + (3,),\n", " name = 'image_input')\n", "\n", "# Load an empty ResNet\n", "resnet_input = ResNet50V2(weights = 'imagenet', # The weights from the ImageNet competition\n", " include_top = False, # Do not include the top layer, which classifies.\n", " input_shape= (224, 224, 3) # Input shape. Three channels.\n", " )\n", "resnet_input.trainable = False\n", "\n", "# Use the model API to attach it to our input layer.\n", "ImageClassifier = resnet_input(image_input, training=False)\n", "\n", "# Add a Flatten layer with the model API.\n", "ImageClassifier = Flatten()(ImageClassifier)\n", "\n", "# Now we create the structured data layer.\n", "predictive_features = 3 + 48 # Three regular, 48 categorical\n", "features_input = keras.Input(shape=(predictive_features,),\n", " name=\"structured_data\") \n", "Structured = Dense( 12, activation = 'relu' )(features_input) # Add one processing layer\n", "Structured = Dropout(0.5)(Structured) # Dropout after Dense\n", "Structured = Dense( 6, activation = 'relu' )(Structured)\n", "Structured = Dropout(0.5)(Structured) # Dropout after Dense\n", "\n", "# Merge all available features into a single large vector via concatenation\n", "merged = concatenate([ImageClassifier, Structured])\n", "\n", "# Add a few prediction layers\n", "merged = Dense(256, activation='relu')(merged)\n", "merged = Dropout(0.5)(merged)\n", "\n", "house_price_multi = Dense(1, activation='relu', name=\"house_price_multi\")(merged)\n", "\n", "# Instantiate an end-to-end model predicting house_prices\n", "multimodal_model = keras.Model(inputs=[image_input, features_input], \n", " outputs=[house_price_multi])" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "hqv8cGKg0U--" }, "source": [ "# Compile with same optimizer as before.\n", "multimodal_model.compile(optimizer = opt,\n", " loss='mean_absolute_percentage_error')" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "iq_M8GTY0XAw" }, "source": [ "import pydot as pyd\n", "from tensorflow.keras.utils import plot_model\n", "\n", "#Visualize Model\n", "plot_model(\n", " multimodal_model, to_file='model.png', show_shapes=False, show_layer_names=True,\n", " rankdir='TB', expand_nested=False, dpi=96\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "7LYXIaR-51BU" }, "source": [ "Now the last part is to modify the generator. The structure is almost the same, except we need to code a custom function that will be able to process the multiple inputs and outputs. The following code accomplishes this." ] }, { "cell_type": "code", "metadata": { "id": "H2kN99D-D-QZ" }, "source": [ "# We will need the position of the variables.\n", "train.columns" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "EQkowCmvDLop" }, "source": [ "pred_cols = np.r_[0:3, 4, 6:(train.columns.shape[0]-1)]\n", "\n", "train.columns[pred_cols]" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "2aHVwsFQ5twc" }, "source": [ "# Define parameters\n", "\n", "target_size = (224, 224)\n", "batch_size = 32\n", "DataDir = 'HousesDatasetClean'\n", "\n", "# What are the useful columns? Note the position of the target. \n", "pred_cols = np.r_[0:3, 4, 6:(train.columns.shape[0]-1)]\n", "\n", "# We only modify the generators. Note the y vector.\n", "train_generator = train_datagen.flow_from_dataframe(train,\n", " directory='.',\n", " x_col='path',\n", " y_col=train.columns[pred_cols],\n", " target_size=target_size,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " class_mode='raw',\n", " subset='training',\n", " interpolation=\"bilinear\"\n", " )\n", "\n", "validation_generator = train_datagen.flow_from_dataframe(train,\n", " directory='.',\n", " x_col='path',\n", " y_col=train.columns[pred_cols],\n", " target_size=target_size,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " class_mode='raw',\n", " subset='validation',\n", " interpolation=\"bilinear\"\n", " )\n", "\n", "test_generator = train_datagen.flow_from_dataframe(test,\n", " directory='.',\n", " x_col='path',\n", " y_col=test.columns[pred_cols],\n", " target_size=target_size,\n", " batch_size=batch_size,\n", " shuffle=False,\n", " class_mode='raw',\n", " subset='training',\n", " interpolation=\"bilinear\"\n", " )\n", "\n", "# Define combined generator\n", "def train_generator_func():\n", " count = 0\n", "\n", " while True:\n", " if count == len(train.index):\n", " train_generator.reset()\n", " break\n", " count += 1\n", " data = train_generator.next()\n", "\n", " # Let's identify where is what.\n", " target_location = 3\n", " predictive_columns = np.r_[0:3, 4:52]\n", "\n", " # Now we reshape everything. First the images.\n", " imgs = data[0]\n", " # Now we need to extract which ones are the predictive variables.\n", " cols = data[1][:, predictive_columns]\n", " # Finally we need the targets.\n", " targets = data[1][:, target_location]\n", " yield [imgs, cols], targets\n", "\n", "\n", "def validation_generator_func():\n", " count = 0\n", " while True:\n", " if count == len(train.index):\n", " validation_generator.reset()\n", " break\n", " count += 1\n", " data = validation_generator.next()\n", "\n", " # Let's identify where is what.\n", " target_location = 3\n", " predictive_columns = np.r_[0:3, 4:52]\n", "\n", " # Now we reshape everything. First the images.\n", " imgs = data[0]\n", " # Now we need to extract which ones are the predictive variables.\n", " cols = data[1][:, predictive_columns]\n", " # Finally we need the targets.\n", " targets = data[1][:, target_location]\n", " yield [imgs, cols], targets\n", " \n", " \n", "def test_generator_func():\n", " count = 0\n", " test_generator.reset()\n", " while True:\n", " if count == len(test.index):\n", " test_generator.reset()\n", " break\n", " count += 1\n", " data = test_generator.next()\n", "\n", " # Let's identify where is what.\n", " target_location = 3\n", " predictive_columns = np.r_[0:3, 4:52]\n", "\n", " # Now we reshape everything. First the images.\n", " imgs = data[0]\n", " # Now we need to extract which ones are the predictive variables.\n", " cols = data[1][:, predictive_columns]\n", " # Finally we need the targets.\n", " targets = data[1][:, target_location]\n", " yield [imgs, cols], targets" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "jsnZXYQA9al_" }, "source": [ "With this, we are ready to train. First, let's check one data / target batch from the train." ] }, { "cell_type": "code", "metadata": { "id": "EHGL3Q8-9InA" }, "source": [ "# This is how the data comes out now.\n", "train_test_output = train_generator_func()\n", "next(train_test_output)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "BxGG7_I197mO" }, "source": [ "Everything looks perfect. Now we train!" ] }, { "cell_type": "code", "metadata": { "id": "NeT2V9At91K-" }, "source": [ "# Warmup\n", "# Steps and epochs\n", "epochs=3\n", "steps_per_epoch = train_generator.samples // train_generator.batch_size\n", "validation_steps = np.amax([validation_generator.samples // validation_generator.batch_size, 1])\n", "\n", "# Train!\n", "multimodal_model.fit(train_generator_func(),\n", " epochs=epochs,\n", " steps_per_epoch=steps_per_epoch,\n", " validation_data=validation_generator_func(),\n", " validation_steps=validation_steps\n", " )" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now that we warmed the model up, we continue with the real training." ], "metadata": { "id": "rZ_fdxCsiowU" } }, { "cell_type": "code", "source": [ "# Set it as trainable\n", "resnet_input.trainable = True\n", "\n", "# Recompile\n", "multimodal_model.compile(optimizer = opt,\n", " loss='mean_absolute_percentage_error')\n", "\n", "# Define callbacks\n", "checkpoint_path='checkpoints/MultimodalModel{epoch:02d}-{val_loss:.2f}.h5'\n", "checkpoint_dir=os.path.dirname(checkpoint_path)\n", "\n", "my_callbacks = [\n", " # Stop training if validation error stays within 0.00001 for three rounds.\n", " tf.keras.callbacks.EarlyStopping(monitor='val_loss', \n", " min_delta=0.00001,\n", " patience=3),\n", " # Save the weights of the best performing model to the checkpoint folder.\n", " tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,\n", " save_best_only=True,\n", " save_weights_only=True,\n", " monitor='val_loss'\n", " ),\n", "]\n", "\n", "# Steps and epochs\n", "epochs=5\n", "steps_per_epoch = train_generator.samples // train_generator.batch_size\n", "validation_steps = np.amax([validation_generator.samples // validation_generator.batch_size, 1])\n", "\n", "# Train!\n", "multimodal_model.fit(train_generator_func(),\n", " epochs=epochs,\n", " steps_per_epoch=steps_per_epoch,\n", " validation_data=validation_generator_func(),\n", " validation_steps=validation_steps,\n", " callbacks=my_callbacks\n", " )" ], "metadata": { "id": "4R80s57witZf" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We should again restore the best model." ], "metadata": { "id": "RmUwaaDPZvor" } }, { "cell_type": "code", "source": [ "multimodal_model.load_weights('/content/checkpoints/MultimodalModel01-53.15.h5')" ], "metadata": { "id": "jCvaZ4HSZuyh" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "1SVrHncw-dT_" }, "source": [ "The model is better! This is the power of a multimodal learning strategy. It allows combining background knowledge (structured data) with unstructured data to reach combined outputs. Let's check the training plot and the testing error." ] }, { "cell_type": "code", "metadata": { "id": "L4GvK3_r-OGh" }, "source": [ "loss = multimodal_model.history.history['loss']\n", "val_loss = multimodal_model.history.history['val_loss']\n", "epochs = range(1, len(loss) + 1)\n", "plt.plot(epochs, loss, 'bo', label='Training loss')\n", "plt.plot(epochs, val_loss, 'b', label='Validation loss')\n", "plt.title('Training and validation loss')\n", "plt.xlabel('Epochs')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "plt.show()" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "IQeB88re-8Si" }, "source": [ "# Calculate outputs in test set\n", "STEP_SIZE_TEST = test_generator.n//test_generator.batch_size\n", "\n", "house_test = multimodal_model.predict(test_generator_func(),\n", " steps=STEP_SIZE_TEST+1,\n", " verbose=1)" ], "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "abWqodAZ_IHb" }, "source": [ "mape = mean_absolute_percentage_error(test_generator.labels[:,3], house_test)\n", "print('The mean absolute percentual error over the test is %.2f%%' % mape)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "TPzyXkRL_fV4" }, "source": [ "We have a lower error now! This is almost half of the error of a model just with the structured data (try it!). Our multimodal model is capable of learning context from our multiple inputs!\n", "\n", "Now you know how to include multiple data sources to your models. Go create your own now!" ] } ] }