{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "4KefNczLWrL8" }, "source": [ "# 2D Convolutional Neural Networks" ] }, { "cell_type": "markdown", "metadata": { "id": "neKsl_mxWrL9" }, "source": [ "In this lab, we will train a traditional VGG16 model, where we will train the last couple of layers to adapt it to our problem. We will also study how to use Pytorch directly, without Keras, and [torchvision](https://pytorch.org/vision/main/), the image subpackage from torch.\n", "\n", "The problem is to predict one of 6 categories of pictures, whether they are. 'buildings', 'forest', 'glacier', 'mountain', 'sea' or 'street'. This problem comes from an Intel Image Classification challenge and is available [here](https://www.kaggle.com/puneet6060/intel-image-classification/version/2).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5XCoJIDC-mG1" }, "outputs": [], "source": [ "# Base\n", "import numpy as np\n", "import h5py as h5py\n", "import PIL\n", "import os\n", "from datetime import datetime\n", "import pandas as pd\n", "\n", "\n", "# Scikit-learn\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import roc_curve, auc, confusion_matrix\n", "\n", "# Plots\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import graphviz\n", "from IPython.display import Image\n", "%matplotlib inline\n", "graphviz.set_jupyter_format('jpg')\n", "\n", "# Torch\n", "import torch\n", "import torchvision\n", "from torchvision import datasets, transforms\n", "from torch.utils.data import DataLoader, random_split" ] }, { "cell_type": "markdown", "metadata": { "id": "bRJxJfgVonEc" }, "source": [ "Now, we can load the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kLUGY4T_W4Z1" }, "outputs": [], "source": [ "!gdown 'https://drive.google.com/uc?id=1HEB7JHl6uSANiENvaHBxhzmkkZ9YlhEC'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ha2b0IUlXibC" }, "outputs": [], "source": [ "!unzip IntelClassification.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iEQUpIntcRKE" }, "outputs": [], "source": [ "# Let's remove a few samples for speed.\n", "classes = ['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']\n", "for model_class in classes:\n", " l = os.listdir(os.path.join('IntelClassification/seg_train', model_class))\n", " for n in l[::2]:\n", " os.unlink(os.path.join('IntelClassification/seg_train', model_class, n))" ] }, { "cell_type": "markdown", "metadata": { "id": "3DQLotoTVJ2p" }, "source": [ "Let's see a couple of examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pC4CgxXiVJbq" }, "outputs": [], "source": [ "Image(filename='IntelClassification/seg_train/sea/17997.jpg')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YY3wzNdfWTuL" }, "outputs": [], "source": [ "Image(filename='IntelClassification/seg_test/street/20066.jpg')" ] }, { "cell_type": "markdown", "metadata": { "id": "9FQiCBWM0Zou" }, "source": [ "Finally, let's verify torch is running on GPU. The following code gets that and also allocates the device (the GPU) to a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HbeZ-qs40iEK" }, "outputs": [], "source": [ "print(f\"Is CUDA supported by this system? {torch.cuda.is_available()}\")\n", "print(f\"CUDA version: {torch.version.cuda}\")\n", "\n", "# Storing ID of current CUDA device\n", "cuda_id = torch.cuda.current_device()\n", "print(f\"ID of current CUDA device: {torch.cuda.current_device()}\")\n", "\n", "print(f\"Name of current CUDA device: {torch.cuda.get_device_name(cuda_id)}\")\n", "\n", "# Making the code device-agnostic\n", "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", "print(f\"The default device is set to {device}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "leoz9lLV05Sy" }, "source": [ "All good! Now we can run our models in GPU." ] }, { "cell_type": "markdown", "metadata": { "id": "oYYauoNEWrL-" }, "source": [ "## VGG16\n", "\n", "The VGG 16 model is a classic model in Deep Learning. It is a 16 layer model, following the structure that we discussed in the lectures.\n", "\n", "This model was trained over the ImageNet data, thus looking to classify among 1000 different types of objects, over a very large database of images. We can leverage these already-trained weights, and adapt just the last few layers for our purposes.\n", "\n", "We start by loading the VGG16 model. Torch comes pre-packaged with a series of models, loaded into the [Models library in torchvision](https://pytorch.org/vision/main/models.html). We start by first loading the model on-the-fly using the library. We can check the options of the model in the options of the function [VGG16](https://pytorch.org/vision/main/models/generated/torchvision.models.vgg16.html#torchvision.models.vgg16).\n", "\n", "This will download the model and save it to our unoriginally named variable model. It is a pretty beefy download, at around 500MB. Luckly we have fast internet in Colab :). It uses the default weights which were pretrained by Torch.\n", "\n", "We then move the model to the GPU." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "el4B7dJ_WrMD" }, "outputs": [], "source": [ "# Load the model from pytorch using default weights.\n", "model = torchvision.models.vgg16(weights='DEFAULT')\n", "print(f\"This model is trained for {model.classifier[6].out_features} classes\")\n", "\n", "# Move to the GPU\n", "model = model.to(device)" ] }, { "cell_type": "markdown", "metadata": { "id": "pOE3mUa_KoQE" }, "source": [ "We now show the basic structure of the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nLkRgm1NKrkf" }, "outputs": [], "source": [ "print(model)" ] }, { "cell_type": "markdown", "metadata": { "id": "J7pY9LJW8LEv" }, "source": [ "Due to the structure of torch, visualizing models is hard. We can however use some packages to print models. They are a bit experimental, so careful on complex models. This one though is simple." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d02R7QbX8ZYn" }, "outputs": [], "source": [ "!pip install torchview" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yeXu2iqy8d0c" }, "outputs": [], "source": [ "from torchview import draw_graph\n", "\n", "model_graph = draw_graph(model, input_size=(1,3,224,224), device=device)\n", "model_graph.visual_graph" ] }, { "cell_type": "markdown", "metadata": { "id": "xDVzqhVBWrMJ" }, "source": [ "At this point, every single parameter is trainable. We don't need this, as we want to use the parameters that come with the model. We will change the model to first keep all convolutional layers constant, and we will also change the \"top\" of the model, that is, the layers that lead to classification. These are the ones that are above the \"Flatten\" layer.\n", "\n", "The following code does that." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9FjhGiqR-qi-" }, "outputs": [], "source": [ "# List the indexes of the top layers\n", "model.classifier" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QJLpTDWzWrMM" }, "outputs": [], "source": [ "# Freeze training for all layers\n", "for param in model.features.parameters():\n", " param.require_grad = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "70y6jmxU-hh0" }, "outputs": [], "source": [ "# Replace the top layer. Reduce size to 128 and output layer to 6.\n", "model.classifier[0] = torch.nn.Linear(in_features=25088, out_features=128, bias=True)\n", "model.classifier[3] = torch.nn.Linear(in_features=128, out_features=128, bias=True)\n", "model.classifier[6] = torch.nn.Linear(in_features=128, out_features=6, bias=True)\n", "\n", "# Add a softmax layer at the end to change the logits to probabilities.\n", "model.classifier = torch.nn.Sequential(\n", " *model.classifier,\n", " torch.nn.Softmax(dim=1),\n", ")\n", "\n", "# Set the last layer of the feature part of the model as trainable\n", "trainable_layers = [28, 29, 30]\n", "for layer in trainable_layers:\n", " for param in model.features[layer].parameters():\n", " param.require_grad = True\n", "\n", "# Move to the GPU\n", "model = model.to(device)\n", "print(next(model.parameters()).is_cuda) # Should say True" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "j37nOTAzUxcr" }, "outputs": [], "source": [ "# Check the new model if it worked.\n", "print(model)" ] }, { "cell_type": "markdown", "metadata": { "id": "94fkn6CvWrMV" }, "source": [ "Ready! We can now load the data. We have our set of pictures ready for this example. For this problem we will use a generator. A generator takes images from a directory, and feeds them to the model as needed. **This is necessary to work with big data**. We cannot expect the datasets we work here to fit in memory, so we take the images as needed.\n", "\n", "We will now generate a function to load the training data. This will take a directory and apply augmentations to the data to improve training efficiency. This will use pytorch's [torchvision](https://pytorch.org/vision/stable/index.html) package that comes with functions tailored to image data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8c4AnfLGWrMW" }, "outputs": [], "source": [ "# Get the transforms for VGG16\n", "transforms_vgg16 = torchvision.models.VGG16_Weights.DEFAULT.transforms\n", "\n", "def load_image_data_trainval(data_dir='IntelClassification/seg_train',\n", " batch_size=32,\n", " validation_split=0.33,\n", " random_seed=42):\n", " \"\"\"\n", " Load an image dataset and splits it into training and validation sets.\n", "\n", " Parameters:\n", " - data_dir (str): Path to the data directory.\n", " - batch_size (int): Number of images to be loaded in each batch.\n", " - validation_split (float): The fraction of the dataset to be used as validation set.\n", " - random_seed (int): A seed to ensure reproducibility for the random split.\n", "\n", " Returns:\n", " - train_loader (DataLoader): DataLoader for the training set.\n", " - val_loader (DataLoader): DataLoader for the validation set.\n", " \"\"\"\n", "\n", " # Define a transformation pipeline\n", " transform = transforms.Compose([\n", " transforms_vgg16()\n", " ])\n", "\n", " # Create the dataset using ImageFolder\n", " full_dataset = datasets.ImageFolder(root=data_dir, transform=transform)\n", "\n", " # Determine the split sizes\n", " total_size = len(full_dataset)\n", " val_size = int(validation_split * total_size)\n", " train_size = total_size - val_size\n", "\n", " # Ensure reproducibility of the split\n", " torch.manual_seed(random_seed)\n", "\n", " # Split the dataset\n", " train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size])\n", "\n", " # Create the DataLoaders to feed data to the model\n", " train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n", " val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)\n", "\n", " return train_loader, val_loader" ] }, { "cell_type": "markdown", "metadata": { "id": "f9Mv54j0uKi6" }, "source": [ "For the test set, we will create a generator that applies no transformations to the data except resizing and centering." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nfafeS0duKJ7" }, "outputs": [], "source": [ "def load_image_test_dataset(data_dir='IntelClassification/seg_test', batch_size=32):\n", " \"\"\"\n", " Load the IntelClassification image dataset.\n", "\n", " Parameters:\n", " - data_dir (str): Path to the data directory (the directory should contain 'train' and 'test' subdirectories).\n", " - batch_size (int): Number of images to be loaded in each batch.\n", "\n", " Returns:\n", " - DataLoader: PyTorch DataLoader for the dataset.\n", " \"\"\"\n", "\n", " # Test data has no transformations beyond the basics.\n", " transforms_list = [\n", " transforms_vgg16()\n", " ]\n", "\n", " # Initialize the transformation function\n", " transform = transforms.Compose(transforms_list)\n", "\n", " # Create the dataset using ImageFolder\n", " full_data_dir = os.path.join(data_dir)\n", " dataset = datasets.ImageFolder(root=full_data_dir, transform=transform)\n", "\n", " # Create the DataLoader to feed data to the model\n", " dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False)\n", "\n", " return dataloader" ] }, { "cell_type": "markdown", "metadata": { "id": "-PbDRYPzxFZ1" }, "source": [ "We must now create our data loaders." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-hqVkoNrxH-J" }, "outputs": [], "source": [ "train_loader, val_loader = load_image_data_trainval(batch_size=64)\n", "test_loader = load_image_test_dataset(batch_size=64)\n", "\n", "print(f\"The train set has {len(train_loader.dataset)} images.\")\n", "print(f\"The validation set has {len(val_loader.dataset)} images.\")\n", "print(f\"The test set has {len(test_loader.dataset)} images.\")" ] }, { "cell_type": "markdown", "metadata": { "id": "hnRCZF6GD2tM" }, "source": [ "Let's visualize a few examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Q7vY7SYwD4mg" }, "outputs": [], "source": [ "def imshow(inp, title=None):\n", " inp = inp.numpy().transpose((1, 2, 0))\n", " # plt.figure(figsize=(10, 10))\n", " plt.axis('off')\n", " plt.imshow(inp)\n", " if title is not None:\n", " plt.title(title)\n", " plt.pause(0.001)\n", "\n", "def show_databatch(inputs, classes):\n", " out = torchvision.utils.make_grid(inputs)\n", " imshow(out, title=[x for x in classes])\n", "\n", "# Get a batch of training data\n", "inputs, classes = next(iter(train_loader))\n", "show_databatch(inputs, test_loader.dataset.classes)" ] }, { "cell_type": "markdown", "metadata": { "id": "iXm66fBIDUjj" }, "source": [ "Now we can create the train loop. In Pytorch, we must be explicit in each of the operations of the backpropagation algorithm. This allows us to control the training loop explicitely, but it does lead to a somewhat extensive codebase.\n", "\n", "The good news is that this is boilerplate and can be copy-pasted with ease for future projects. First, let's define the loss and optimizer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JG4qts27IU3h" }, "outputs": [], "source": [ "# Defining the loss function\n", "loss_fn = torch.nn.CrossEntropyLoss().to(device)\n", "\n", "# Defining the optimizer\n", "optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)" ] }, { "cell_type": "markdown", "metadata": { "id": "f8dNySdLdkC7" }, "source": [ "To monitor the training loop, we'll use the [livelossplot](https://p.migdal.pl/livelossplot/) package. It is a simple interface to visualize training as it happens." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NMTCfVnITD1z" }, "outputs": [], "source": [ "!pip install livelossplot" ] }, { "cell_type": "markdown", "metadata": { "id": "StwDzLG9dwlw" }, "source": [ "Now we'll define the training loop. The following code does ```EPOCHS``` rounds of training, showing the ongoing loss every 10 batches. Once per epoch, liveloss is updated." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VSn79525TCFN" }, "outputs": [], "source": [ "from livelossplot import PlotLosses\n", "\n", "dataloaders = {\n", " \"train\": train_loader,\n", " \"validation\": val_loader\n", "}\n", "\n", "liveloss = PlotLosses()\n", "EPOCHS = 3\n", "best_vloss = 1000000\n", "timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')\n", "\n", "for epoch in range(EPOCHS):\n", " # Run the epoch\n", " logs = {}\n", "\n", " # Run a train epoch, and then a validation epoch.\n", " for phase in ['train', 'validation']:\n", " if phase == 'train':\n", " model.train()\n", " else:\n", " model.eval()\n", "\n", " running_loss = 0.0\n", " running_corrects = 0\n", "\n", " for i, data in enumerate(dataloaders[phase]):\n", " inputs, labels = data\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = model(inputs).to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " if phase == 'train':\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " _, preds = torch.max(outputs, 1)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += torch.sum(preds == labels.data)\n", "\n", " if i % 10 == 9:\n", " batch_loss = running_loss / (10 * (i+1))\n", " print(f'{phase} batch {i+1} loss: {batch_loss:.3f}')\n", " tb_x = epoch * len(dataloaders[phase]) + i + 1\n", "\n", " # Delete the used VRAM\n", " torch.cuda.empty_cache()\n", "\n", " epoch_loss = running_loss / len(dataloaders[phase].dataset)\n", " epoch_acc = running_corrects.float() / len(dataloaders[phase].dataset)\n", "\n", " prefix = ''\n", " if phase == 'validation':\n", " prefix = 'val_'\n", "\n", " # Track best performance, and save the model's state\n", " if epoch_loss < best_vloss:\n", " best_vloss = epoch_loss\n", " model_path = 'model_{}_{}.ph'.format(timestamp, epoch)\n", " print(f'New best model found. Saving it as {model_path}')\n", " torch.save(model.state_dict(), model_path)\n", "\n", " logs[prefix + 'log loss'] = epoch_loss.item()\n", " logs[prefix + 'accuracy'] = epoch_acc.item()\n", "\n", " liveloss.update(logs)\n", " liveloss.send()" ] }, { "cell_type": "markdown", "metadata": { "id": "yxwL7VUQMSrh" }, "source": [ "As you can see, the model still has much more to learn! Keep training until you get convergence.\n", "\n", "Now, let's apply it to the test set and get a confusion matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bah8rC0sbSGt" }, "outputs": [], "source": [ "# Wrapper to save memory by not recomputing gradients.\n", "with torch.no_grad():\n", " # Set the model in evaluation mode.\n", " model.eval()\n", "\n", " # Calculate running loss and accuracy\n", " running_loss = 0.0\n", " running_corrects = 0\n", " test_labels = np.array([])\n", " test_probs = np.array([])\n", " test_predictions = np.array([])\n", "\n", " # Apply to the test set\n", " for i, data in enumerate(test_loader):\n", " inputs, labels = data\n", " test_labels = np.append(test_labels, labels.cpu().numpy())\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = model(inputs)\n", " test_probs = np.append(test_probs, outputs.cpu().numpy())\n", " outputs = outputs.to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " _, preds = torch.max(outputs, 1)\n", " test_predictions = np.append(test_predictions, preds.cpu().numpy())\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += torch.sum(preds == labels.data)\n", "\n", "test_loss = running_loss / len(test_loader.dataset)\n", "test_acc = running_corrects.float() / len(test_loader.dataset)\n", "\n", "print(f'The test set accuracy is {test_acc*100:.2f}%')\n", "print(f'The test set loss is {test_loss:.3f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2rm-uOqmiVGr" }, "outputs": [], "source": [ "test_probs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kiWE90q1dEOq" }, "outputs": [], "source": [ "# Calculate confusion matrix\n", "confusion_matrix_net = confusion_matrix(y_true = test_labels,\n", " y_pred = test_predictions)\n", "\n", "# Turn matrix to percentages\n", "confusion_matrix_net = confusion_matrix_net.astype('float') / confusion_matrix_net.sum(axis=1)[:, np.newaxis]\n", "\n", "# Turn to dataframe\n", "df_cm = pd.DataFrame(\n", " confusion_matrix_net,\n", " index=['building', 'forest', 'glacier', 'mountain', 'sea', 'street'],\n", " columns=['building', 'forest', 'glacier', 'mountain', 'sea', 'street'],\n", ")\n", "\n", "# Parameters of the image\n", "figsize = (10,7)\n", "fontsize=14\n", "\n", "# Create image\n", "fig = plt.figure(figsize=figsize)\n", "heatmap = sns.heatmap(df_cm, annot=True, fmt='.2f')\n", "\n", "# Make it nicer\n", "heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0,\n", " ha='right', fontsize=fontsize)\n", "heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45,\n", " ha='right', fontsize=fontsize)\n", "\n", "# Add labels\n", "plt.ylabel('True label')\n", "plt.xlabel('Predicted label')\n", "\n", "# Plot!\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "LnI4UU5Mj8Ro" }, "source": [ "Excellent results!! We can now take the latest weights we created and recreate the weights without any further training, or keep training to reach a better accuracy. A few more rounds and the accuracy can get up to 95% or so." ] }, { "cell_type": "markdown", "metadata": { "id": "BBQ9y9kUWrMr" }, "source": [ "## Visualizing learning\n", "\n", "As a final example. We will visualize the learning, to detect exactly what is happening.\n", "\n", "We will use [SmoothCAM](https://arxiv.org/pdf/2006.14255.pdf), a method that allows visualizing how one image activates the neural network by tracing the gradient activations. There are several Pytorch implementations and non of them is dominant. [torch-cam](https://github.com/frgfm/torch-cam) comes with both activation-level and gradient-level models, so it serves our purposes. Let's install the paper." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "t57bdZbloKw0" }, "outputs": [], "source": [ "!pip install torchcam" ] }, { "cell_type": "markdown", "metadata": { "id": "bs3lUew83H02" }, "source": [ "The package will extract the gradients automatically when we evaluate a model. Note that gradients must be allow to float, so the wrapper ```with torch.no_grad()``` should not be used in this case, but the model can be placed in evaluation mode.\n", "\n", "We also must point the model to a layer. In general, you want to point it to the last convolutional layer of the feature extractor. We do that in the definition of the object resulting from the function ```SmoothGradCAMpp```." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mINLj9T7tcHT" }, "outputs": [], "source": [ "# Set your CAM extractor\n", "from torchcam.methods import SmoothGradCAMpp\n", "from torchvision.io.image import read_image\n", "from torchvision.transforms.functional import normalize, resize, to_pil_image\n", "from torchcam.utils import overlay_mask\n", "\n", "\n", "# Set model to evaluation mode and apply the model.\n", "model.eval()\n", "cam_extractor = SmoothGradCAMpp(model, target_layer=model.features[28])" ] }, { "cell_type": "markdown", "metadata": { "id": "0Qgko3KA3vjr" }, "source": [ "Now we extract an image and apply the VGG16 transforms." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kZwXRgdwto_l" }, "outputs": [], "source": [ "# Get your input\n", "img_path = 'IntelClassification/seg_test/forest/20056.jpg'\n", "img = read_image(img_path)\n", "\n", "# Preprocess for the model\n", "transforms = transforms_vgg16 = torchvision.models.VGG16_Weights.DEFAULT.transforms()\n", "input_tensor = transforms(img)" ] }, { "cell_type": "markdown", "metadata": { "id": "i_2TlEe-3z4f" }, "source": [ "Now we are ready to get the output. We apply the model using the ```cam_extractor``` object as a wrapper, which will record our gradients and calculate the algorithm. We save the outcome on the variable ```activation_map```." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Lwe4ShCdxhVZ" }, "outputs": [], "source": [ "with cam_extractor:\n", " # Preprocess your data and feed it to the model\n", " out = model(input_tensor.unsqueeze(0).to(device))\n", " # Retrieve the CAM by passing the class index and the model output\n", " activation_map = cam_extractor(out.squeeze(0).argmax().item(), out)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mgzorNhe8lr3" }, "outputs": [], "source": [ "out" ] }, { "cell_type": "markdown", "metadata": { "id": "FddX8emG8yee" }, "source": [ "We can see the model predicts with a 100% probability that the model is of class 'forest', which is correct. Let's see what it is doing to make this happen." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "n9w524xQ1xKk" }, "outputs": [], "source": [ "# Visualize the raw CAM\n", "activation_map_raw = activation_map[0].squeeze(0).cpu().numpy()\n", "plt.imshow(activation_map_raw)\n", "plt.axis('off')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l4rbuEVj1fx8" }, "outputs": [], "source": [ "# Resize the CAM and overlay it\n", "result = overlay_mask(to_pil_image(img),\n", " to_pil_image((255*activation_map_raw).astype(np.uint8)),\n", " alpha=0.8)\n", "\n", "# Display it\n", "plt.imshow(result)\n", "plt.axis('off')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "x8EYX_fgWrNC" }, "source": [ "Now we can clearly see what is happening.\n", "\n", "This is the importance of the modelling procedures: We need to understand how much diversity we are including in our images: If we don't we might end up learning other things!\n", "\n", "Now change the image and study what is the model doing." ] } ], "metadata": { "accelerator": "GPU", "colab": { "include_colab_link": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }