{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "Vp0p_-0iXesy" }, "source": [ "# Recurrent models\n", "\n", "In this lab, we will start working with the time series of ATM failures, in order to classify them. There are many other applications possible, such as clustering, regression, or even anomaly detection using autoencoders.\n", "\n", "As always, let's first import the packages we will use and the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Rl1J04DQjGKg" }, "outputs": [], "source": [ "# Install necessasary packages, if not done before\n", "!pip install torchview\n", "!pip install livelossplot" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "352wYg3NXbZq" }, "outputs": [], "source": [ "import numpy as np\n", "import PIL\n", "import os\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.utils import class_weight\n", "from sklearn.preprocessing import OneHotEncoder, StandardScaler\n", "\n", "# For validation\n", "from sklearn.metrics import roc_auc_score, confusion_matrix, roc_curve, auc\n", "\n", "# Plots\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from IPython.display import Image\n", "from torchview import draw_graph\n", "import graphviz\n", "from livelossplot import PlotLosses\n", "graphviz.set_jupyter_format('png')\n", "%matplotlib inline\n", "\n", "# Import Pytorch lybraries\n", "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "from torch.autograd import Variable\n", "from sklearn.model_selection import train_test_split\n", "from torch.utils.data import TensorDataset, DataLoader, random_split\n", "from torch.optim.lr_scheduler import _LRScheduler" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "TOPF72a6YSXu" }, "outputs": [], "source": [ "!gdown https://drive.google.com/uc?id=1rQfs6djY8MjeMPqHxl0cHbqTCwLB3giD" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "o3QGgovzZvHO" }, "outputs": [], "source": [ "!unzip ATM.zip\n", "!rm ATM.zip" ] }, { "cell_type": "markdown", "metadata": { "id": "Dot-GLJHZQsU" }, "source": [ "Now we read the data. It comes from [this Github](https://github.com/victormvy/sigma-convkernels/blob/main/main.py) and [its related paper](https://arxiv.org/ftp/arxiv/papers/2305/2305.10059.pdf) that uses the logs from ATM to detect failures. The failures can be either because of foreign body, generic failure, jam, preventive maintenance, replacement of any part or bad usage. There are 38 measurements the ATM keeps track of over a day, divided into series of 144 points. So, our input is a matrix of size (38, 144) for every point, predicting if the machine will be functional one week into the future.\n", "\n", "The following image, from the paper, describes the data.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "PMmD_7CmVFIS" }, "source": [ "![image.png]()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ckyNW9bEZSnl" }, "outputs": [], "source": [ "# Read the data\n", "data = np.load('sigma_pdm.npy', allow_pickle=True)\n", "\n", "X = data[:, :-4, :]\n", "y = data[:, -4:, -1]\n", "\n", "del data\n", "\n", "# Move to pandas\n", "y_df = pd.DataFrame(y, columns = ['ID', 'Day', 'Type', 'Status'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aKoYOB2nTWxJ" }, "outputs": [], "source": [ "# Create Pandas DataFrame for X\n", "x_df = pd.DataFrame.from_records(X)\n", "x_df\n", "\n", "# Add the target\n", "x_df['target'] = y_df['Status'].values\n", "\n", "# Split the data\n", "seed = 20240202\n", "x_df['if_test'] = np.random.RandomState(seed=seed).binomial(1, 0.3, size=x_df.shape[0])\n", "x_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uBBIaDh-UF3j" }, "outputs": [], "source": [ "# Delete the original data\n", "del X, y" ] }, { "cell_type": "markdown", "metadata": { "id": "Fx5vDDz3dXGR" }, "source": [ "Now we can plot one of the series for one of the cases." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Vj9TDNPtjGKx" }, "outputs": [], "source": [ "# For case 3, plot series 9.\n", "plt.plot(x_df[2][8])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "97K1bMJC34Z8" }, "source": [ "Now we are ready to start training models. Let's define torch's device so it runs the most efficient way possible." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZPQ5MpXbjGKy" }, "outputs": [], "source": [ "print(f\"Is CUDA supported by this system? {torch.cuda.is_available()}\")\n", "print(f\"CUDA version: {torch.version.cuda}\")\n", "\n", "# Storing ID of current CUDA device\n", "cuda_id = torch.cuda.current_device()\n", "print(f\"ID of current CUDA device: {torch.cuda.current_device()}\")\n", "\n", "print(f\"Name of current CUDA device: {torch.cuda.get_device_name(cuda_id)}\")\n", "\n", "# Making the code device-agnostic\n", "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", "print(f\"The default device is set to {device}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "M_JOdShOYTvc" }, "source": [ "## Long-Short Term Memory (LSTM) Networks\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "SvJMgihGBoNP" }, "source": [ "First, we will train an LSTM. The LSTM is a fairly complex model, so VRAM will be a signficant constraint. Let's first set up the train and test dataset. We will normalize the data and One Hot Encode the labels using sklearn's [OneHotEncode](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) function. We will also set the class weights as we have done before." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BhG5-p25jGK0" }, "outputs": [], "source": [ "def scale_dataframe_sequences(x_df, test_var=None):\n", " # Prepare the scaler\n", " scaler = StandardScaler()\n", "\n", " # Split the data if test_var given\n", " if test_var is not None:\n", " x_train = x_df.loc[test_var==0, :]\n", " x_test = x_df.loc[test_var==1, :]\n", " else:\n", " x_train = x_df\n", "\n", " # Dictionary to store the scaled variables\n", " scaled_data = {}\n", "\n", " if test_var is not None:\n", " scaled_data_test = {}\n", "\n", " # Get the number of elements in a sequence\n", " seq_len = x_train.iloc[:, 0][0].shape[0]\n", "\n", " for column in x_df.columns:\n", " # Get data from the DataFrame and reshape to 2D array\n", " data = np.stack(x_train[column].values).reshape(-1, seq_len)\n", "\n", " # Scale the data\n", " scaled_data[column] = scaler.fit_transform(data).reshape(-1, 1, seq_len).tolist()\n", "\n", " # Apply to test set if needed\n", " if test_var is not None:\n", " data_test = np.stack(x_test[column].values).reshape(-1, seq_len)\n", " scaled_data_test[column] = scaler.transform(data_test).reshape(-1, 1, seq_len).tolist()\n", "\n", " # Create a new DataFrame with the scaled data\n", " scaled_df = pd.DataFrame(scaled_data)\n", " scaled_df = scaled_df.map(lambda x: np.array(x[0]))\n", " if test_var is not None:\n", " scaled_df_test = pd.DataFrame(scaled_data_test)\n", " scaled_df_test = scaled_df_test.map(lambda x: np.array(x[0]))\n", " return scaled_df, scaled_df_test\n", " else:\n", " return scaled_df, _\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "j6FOITwNRlXl" }, "outputs": [], "source": [ "# Create train and test datasets.\n", "x_columns = x_df.columns[np.r_[0:37]]\n", "\n", "# Normalize train and test\n", "x_train, x_test = scale_dataframe_sequences(x_df.loc[:, x_columns], x_df['if_test'])\n", "\n", "# Encode test set\n", "enc = OneHotEncoder(sparse_output=False, handle_unknown='ignore')\n", "y_train = enc.fit_transform(x_df.loc[x_df['if_test']==0, 'target'].values.reshape(-1, 1))[:, 1]\n", "y_test = enc.transform(x_df.loc[x_df['if_test']==1, 'target'].values.reshape(-1, 1))[:, 1]\n", "\n", "# Class weights\n", "pos_weight = np.sum(1 - y_train) / np.sum(y_train)\n", "pos_weight = torch.tensor(pos_weight,dtype=torch.float32).to(device)\n", "pos_weight" ] }, { "cell_type": "markdown", "metadata": { "id": "cXvwmqlSKbz9" }, "source": [ "\n", "Now we will create the LSTM. First, let's try to be naïve and see what happens if we simply try to train our LSTM using 128 cells and just passing the sequence. As we saw in the lectures, Google's LeNet showed that stacking parallel layers of convolution (so, not sequential models) seemed like a good idea. Would this idea work with LSTM?\n", "\n", "Let's find out. For this, we will create a pytorch model and it's corresponding forward pass. We will use an LSTM to create an output embedding of the time series, and then pass that to a dense layer with droput that will classify the sequence. Finally, we will determine if the sequence is a suggesting a failure or not with a sigmoid output layer.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mNEHdDWcjGK2" }, "outputs": [], "source": [ "# Define the LSTM model\n", "class LSTMClassifier(nn.Module):\n", " def __init__(self, input_dim, hidden_dim, layer_dim, classifier_dim, output_dim):\n", " super(LSTMClassifier, self).__init__()\n", "\n", " # Hidden dimensions\n", " self.hidden_dim = hidden_dim\n", "\n", " # Number of hidden layers\n", " self.layer_dim = layer_dim\n", "\n", " # Building your LSTM\n", " # batch_first=True causes input/output tensors to be of shape\n", " # (batch_dim, seq_dim, feature_dim)\n", " self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)\n", "\n", " # Classifier network\n", " self.classifier = nn.Sequential(\n", " nn.Linear(hidden_dim, classifier_dim),\n", " nn.ReLU(),\n", " nn.Dropout(0.5),\n", " nn.Linear(classifier_dim, output_dim),\n", " #nn.Softmax(dim=1) # No need for softmax with logit loss.\n", " )\n", "\n", " # Forward method\n", " def forward(self, x):\n", " # Initialize hidden state with zeros\n", " h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(device)\n", "\n", " # Initialize cell state\n", " c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(device)\n", "\n", " # Forward pass\n", " out, (hn, cn) = self.lstm(x, (h0, c0))\n", " out = self.classifier(out[:, -1, :])\n", " return out\n" ] }, { "cell_type": "markdown", "metadata": { "id": "_NbNZoYpjGK2" }, "source": [ "Let's initatiate the model and see the diagram." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZvJi6VlsjGK2" }, "outputs": [], "source": [ "# Initialize the model\n", "n_vars = 37\n", "atm_lstm_model = LSTMClassifier(n_vars, 128, 1, 256, 1).to(device)\n", "print(atm_lstm_model)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tfYkC42kjGK2" }, "outputs": [], "source": [ "# Draw the model\n", "model_graph = draw_graph(atm_lstm_model, input_size=(1, 128, n_vars),\n", " device=device,\n", " expand_nested=True)\n", "model_graph.visual_graph\n" ] }, { "cell_type": "markdown", "metadata": { "id": "9gSVdvASLTvb" }, "source": [ "Now we are ready to train. Recurrent models are very tricky in terms of their training, and the gradients usually present a very erratic behaviour. It is a good idea to clip the gradients, that is, to lower the value of the gradient so it does not explode. There is a Torch utility called [```clip_grad_norm```](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html), which allows to set a maximum value against the mean for the gradient.\n", "\n", "I would generally advice to use a value of around 1 to 2 if you are seeing erratic training behaviour.\n", "\n", "We will use the series-oriented RMSprop as our optimizer, and implemente norm clipping in the training loop." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sZgctUJIjGK3" }, "outputs": [], "source": [ "# Set up optimizer and loss\n", "learning_rate = 0.0001\n", "optimizer = optim.RMSprop(atm_lstm_model.parameters(), lr=learning_rate)\n", "loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight).to(device)\n", "\n", "# Set global run parameters\n", "best_vloss = 10000000" ] }, { "cell_type": "markdown", "metadata": { "id": "P6HBjt1qjGK4" }, "source": [ "We also need to create the data loaders. These are pretty simple: Just read from the pandas dataset and split them into train and test. We can do that easily with the following code." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Fr8A3euFjGK6" }, "outputs": [], "source": [ "def create_train_val_dataloaders(x_train_df, y_train, batch_size=64, val_size=0.33, seed=42):\n", " \"\"\"\n", " Create DataLoader instances for training and validation datasets suited for LSTM models.\n", "\n", " Parameters:\n", " x_train_df (pandas.DataFrame): DataFrame where each cell is a sequence (1D array or list) of length seq_len.\n", " y_train (numpy.ndarray): The labels for training.\n", " batch_size (int): The batch size for both train and validation loaders.\n", " val_size (float): The fraction of the data to be used for validation.\n", "\n", " Returns:\n", " train_loader (DataLoader): DataLoader for the training set.\n", " val_loader (DataLoader): DataLoader for the validation set.\n", " \"\"\"\n", "\n", " # Set the random seed for reproducibility\n", " torch.manual_seed(seed)\n", "\n", " # Convert the DataFrame of sequences into a correctly shaped 3D numpy array\n", " # sequences.shape should be (number of samples, seq_len, number of features per timestep)\n", " sequences = np.stack(x_train_df.apply(lambda s: np.stack(s.values).reshape(s.values[0].shape[0], -1), axis=1).values)\n", "\n", " # Check if y_train is a numpy array, if not convert it\n", " if not isinstance(y_train, np.ndarray):\n", " y_train = np.array(y_train)\n", "\n", " if y_train.ndim == 1:\n", " y_train = y_train[:, None] # Convert to 2D array if necessary\n", "\n", " # Convert to PyTorch tensors\n", " x_train_tensor = torch.from_numpy(sequences).float()\n", " y_train_tensor = torch.from_numpy(y_train).float()\n", "\n", " # Create the TensorDataset\n", " train_data = TensorDataset(x_train_tensor, y_train_tensor)\n", "\n", " # Split the dataset into train and validation sets\n", " total_samples = len(train_data)\n", " train_size = int((1 - val_size) * total_samples)\n", " val_size = total_samples - train_size\n", " train_dataset, val_dataset = random_split(train_data, [train_size, val_size])\n", "\n", " # Create the dataloaders\n", " train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n", " val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)\n", "\n", " return train_loader, val_loader" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0wW_Du5BjGK6" }, "outputs": [], "source": [ "# Create the data loaders\n", "batch_size = 2048 # A100\n", "train_loader, val_loader = create_train_val_dataloaders(x_train, y_train.reshape(-1,1),\n", " batch_size=batch_size)\n", "dataloaders = {\n", " \"train\": train_loader,\n", " \"validation\": val_loader\n", "}\n", "\n", "# Softmax function for the output\n", "softmax_func = np.vectorize(lambda x: 1/(1+np.exp(-1 * x)))" ] }, { "cell_type": "markdown", "metadata": { "id": "41NBhJMAjGK7" }, "source": [ "Now we are finally ready to train." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "TuNxYU9kjGK8" }, "outputs": [], "source": [ "# Set run parameters\n", "n_epochs = 200\n", "liveloss = PlotLosses()\n", "\n", "# Train!\n", "for epoch in range(n_epochs):\n", " # Run the epoch\n", " logs = {}\n", "\n", " # Run a train epoch, and then a validation epoch.\n", " for phase in ['train', 'validation']:\n", " if phase == 'train':\n", " atm_lstm_model.train()\n", " else:\n", " atm_lstm_model.eval()\n", "\n", " running_loss = 0.0\n", " running_corrects = 0\n", "\n", " for i, data in enumerate(dataloaders[phase]):\n", " inputs, labels = data\n", " # print(labels)\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_lstm_model(inputs).to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " if phase == 'train':\n", " optimizer.zero_grad()\n", " loss.backward()\n", "\n", " # Clip the gradient norm\n", " nn.utils.clip_grad_norm_(atm_lstm_model.parameters(), 2)\n", "\n", " # Backpropagate\n", " optimizer.step()\n", "\n", " preds = softmax_func(outputs.detach().cpu().numpy())\n", " preds = np.round(preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", " if i % 10 == 9:\n", " batch_loss = running_loss / (10 * (i+1))\n", " print(f'{phase} batch {i+1} loss: {batch_loss:.3f}')\n", " tb_x = epoch * len(dataloaders[phase]) + i + 1\n", "\n", " # Delete the used VRAM\n", " torch.cuda.empty_cache()\n", "\n", " epoch_loss = running_loss / len(dataloaders[phase].dataset)\n", " epoch_acc = running_corrects / len(dataloaders[phase].dataset)\n", "\n", " prefix = ''\n", " if phase == 'validation':\n", " prefix = 'val_'\n", "\n", " # Track best performance, and save the model's state\n", " if epoch_loss < best_vloss:\n", " best_vloss = epoch_loss\n", " model_path = 'best_model.ph'\n", " print(f'New best model found. Saving it as {model_path}')\n", " torch.save(atm_lstm_model.state_dict(), model_path)\n", "\n", " logs[prefix + 'log loss'] = epoch_loss.item()\n", " logs[prefix + 'accuracy'] = epoch_acc.item()\n", "\n", " liveloss.update(logs)\n", " liveloss.send()" ] }, { "cell_type": "markdown", "metadata": { "id": "1fOOwFP1MaJ4" }, "source": [ "Training is progressing! Let's apply it to the test set and see what we get. You may also want to keep training, as the model is still learning after 200 epochs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LuFDOrHJ0tPe" }, "outputs": [], "source": [ "def create_test_dataloader(x_test_df, y_test=None, batch_size=64, seed=42):\n", " \"\"\"\n", " Create DataLoader instance for the test dataset.\n", "\n", " Parameters:\n", " x_test_df (pandas.DataFrame): DataFrame where each cell is a sequence (1D array or list) of length seq_len.\n", " y_test (numpy.ndarray, optional): The labels for testing. Pass None if there are no labels.\n", " batch_size (int): The batch size for the test loader.\n", "\n", " Returns:\n", " test_loader (DataLoader): DataLoader for the test set.\n", " \"\"\"\n", "\n", " # Set the random seed for reproducibility\n", " torch.manual_seed(seed)\n", "\n", " # Convert the DataFrame of sequences into a correctly shaped 3D numpy array\n", " # sequences.shape should be (number of samples, seq_len, number of features per timestep)\n", " sequences = np.stack(x_test_df.apply(lambda s: np.stack(s.values).reshape(s.values[0].shape[0], -1), axis=1).values)\n", "\n", " # Convert to PyTorch tensors\n", " x_test_tensor = torch.from_numpy(sequences).float()\n", "\n", " # Create a TensorDataset from the input data\n", " if y_test is not None:\n", " # Check if y_test is a numpy array, if not convert it\n", " if not isinstance(y_test, np.ndarray):\n", " y_test = np.array(y_test)\n", "\n", " # Convert labels to a PyTorch tensor\n", " y_test_tensor = torch.from_numpy(y_test).float()\n", "\n", " # Assert that the number of samples matches\n", " assert len(x_test_tensor) == len(y_test_tensor), \"The number of input samples and labels must be the same.\"\n", " test_data = TensorDataset(x_test_tensor, y_test_tensor)\n", " else:\n", " test_data = TensorDataset(x_test_tensor)\n", "\n", " # Create the DataLoader for the test set\n", " test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=False)\n", "\n", " return test_loader" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2uGJrOgAjGK-" }, "outputs": [], "source": [ "test_loader = create_test_dataloader(x_test, y_test.reshape(-1,1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "C4GpibuVjGK-" }, "outputs": [], "source": [ "atm_lstm_model.load_state_dict(torch.load('best_model.ph'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aOh60-rUGQrc" }, "outputs": [], "source": [ "# Wrapper to save memory by not recomputing gradients.\n", "with torch.no_grad():\n", " # Set the model in evaluation mode.\n", " atm_lstm_model.eval()\n", "\n", " # Calculate running loss and accuracy\n", " running_loss = 0.0\n", " running_corrects = 0\n", " test_labels = np.array([])\n", " test_probs = np.array([])\n", " test_predictions = np.array([])\n", "\n", " # Apply to the test set\n", " for i, data in enumerate(test_loader):\n", " inputs, labels = data\n", " test_labels = np.append(test_labels, labels.cpu().numpy())\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_lstm_model(inputs)\n", " test_probs = np.append(test_probs, outputs.cpu().numpy())\n", " outputs = outputs.to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " preds = softmax_func(outputs.cpu().numpy())\n", " preds = np.round(preds)\n", " test_predictions = np.append(test_predictions, preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", "test_loss = running_loss / len(test_loader.dataset)\n", "test_acc = running_corrects / len(test_loader.dataset)\n", "\n", "print(f'The test set accuracy is {test_acc*100:.2f}%')\n", "print(f'The test set loss is {test_loss:.3f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "72gAC9XpGri-" }, "outputs": [], "source": [ "# Calculate confusion matrix\n", "confusion_matrix_net = confusion_matrix(y_true = test_labels,\n", " y_pred = test_predictions)\n", "\n", "# Turn matrix to percentages\n", "confusion_matrix_net = confusion_matrix_net.astype('float') / confusion_matrix_net.sum(axis=1)[:, np.newaxis]\n", "\n", "# Turn to dataframe\n", "df_cm = pd.DataFrame(\n", " confusion_matrix_net,\n", " index=np.unique(test_labels),\n", " columns=np.unique(test_labels),\n", ")\n", "\n", "# Parameters of the image\n", "figsize = (10,7)\n", "fontsize=14\n", "\n", "# Create image\n", "fig = plt.figure(figsize=figsize)\n", "heatmap = sns.heatmap(df_cm, annot=True, fmt='.2f')\n", "\n", "# Make it nicer\n", "heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0,\n", " ha='right', fontsize=fontsize)\n", "heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45,\n", " ha='right', fontsize=fontsize)\n", "\n", "# Add labels\n", "plt.ylabel('True label')\n", "plt.xlabel('Predicted label')\n", "\n", "# Plot!\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "k35SBXAwjGLA" }, "outputs": [], "source": [ "# Calculate the ROC curve points\n", "fpr, tpr, thresholds = roc_curve(test_labels, test_probs)\n", "\n", "# Save the AUC in a variable to display it. Round it first\n", "auc = np.round(roc_auc_score(y_true = test_labels,\n", " y_score = test_probs),\n", " decimals = 3)\n", "\n", "# Create and show the plot\n", "plt.plot(fpr,tpr,label=\"ATM Failures, auc=\"+str(auc))\n", "plt.legend(loc=4)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "pZlv4i1hjGLA" }, "source": [ "The model is not learning much with this architecture. Let's see if we can improve upon this." ] }, { "cell_type": "markdown", "metadata": { "id": "NutqVO_2jGLA" }, "source": [ "## LSTM with multiple layers" ] }, { "cell_type": "markdown", "metadata": { "id": "ycb_xjwljGLB" }, "source": [ "To improve training, we can try to chain LSTMs. This can help as you won't need to add just one very large (thus intractable) LSTM and instead can train two smaller ones. This is easily done in pytorch simply by setting the ```layer_dim``` parameter to a higher number. Let's try to stack two LSTMs and see the performance." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "D7_r381LLawo" }, "outputs": [], "source": [ "# Initialize the model\n", "n_vars = 37\n", "atm_lstm_model = LSTMClassifier(n_vars, 256, 2, 256, 1).to(device)\n", "print(atm_lstm_model)\n", "\n", "# Draw the model\n", "model_graph = draw_graph(atm_lstm_model, input_size=(1, 256, n_vars),\n", " device=device,\n", " expand_nested=True)\n", "model_graph.visual_graph\n" ] }, { "cell_type": "markdown", "metadata": { "id": "6DLoYlPUGxIg" }, "source": [ "As we can see, this model now passes a much more reduced number of features to the LSTM layer, thus allowing for a reduced complexity. Of course, this will only be as good as the features are, so you'll need to experiment to get this right.\n", "\n", "Let's train the model." ] }, { "cell_type": "markdown", "metadata": { "id": "OYXlDNTRx8-2" }, "source": [ "We will be very aggressive with the norm clipping now. You can identify the need for this if the losses are very unstable. Experiment with this value for your own applications! We'll also use Adam as our optimizer." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HAPOj-R4jGLC" }, "outputs": [], "source": [ "# Set up optimizer and loss\n", "learning_rate = 0.001\n", "optimizer = optim.RMSprop(atm_lstm_model.parameters(), lr=learning_rate)\n", "loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight).to(device)\n", "\n", "# Set global run parameters\n", "best_vloss = 10000000" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jdyX6khzjGLC" }, "outputs": [], "source": [ "# Create the data loaders\n", "train_loader, val_loader = create_train_val_dataloaders(x_train, y_train.reshape(-1,1),\n", " batch_size=batch_size)\n", "dataloaders = {\n", " \"train\": train_loader,\n", " \"validation\": val_loader\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U3OfKnDbjGLD" }, "outputs": [], "source": [ "# Set run parameters\n", "n_epochs = 300\n", "liveloss = PlotLosses()\n", "\n", "\n", "# Train!\n", "for epoch in range(n_epochs):\n", " # Run the epoch\n", " logs = {}\n", "\n", " # Run a train epoch, and then a validation epoch.\n", " for phase in ['train', 'validation']:\n", " if phase == 'train':\n", " atm_lstm_model.train()\n", " else:\n", " atm_lstm_model.eval()\n", "\n", " running_loss = 0.0\n", " running_corrects = 0\n", "\n", " for i, data in enumerate(dataloaders[phase]):\n", " inputs, labels = data\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_lstm_model(inputs).to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " if phase == 'train':\n", " optimizer.zero_grad()\n", " loss.backward()\n", " # Clip the gradient norm\n", " nn.utils.clip_grad_norm_(atm_lstm_model.parameters(), 0.1)\n", " # Backpropagate\n", " optimizer.step()\n", "\n", " preds = softmax_func(outputs.detach().cpu().numpy())\n", " preds = np.round(preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", " if i % 10 == 9:\n", " batch_loss = running_loss / (10 * (i+1))\n", " print(f'{phase} batch {i+1} loss: {batch_loss:.3f}')\n", " tb_x = epoch * len(dataloaders[phase]) + i + 1\n", "\n", " # Delete the used VRAM\n", " torch.cuda.empty_cache()\n", "\n", " epoch_loss = running_loss / len(dataloaders[phase].dataset)\n", " epoch_acc = running_corrects / len(dataloaders[phase].dataset)\n", "\n", " prefix = ''\n", " if phase == 'validation':\n", " prefix = 'val_'\n", "\n", " # Track best performance, and save the model's state\n", " if epoch_loss < best_vloss:\n", " best_vloss = epoch_loss\n", " model_path = 'best_model.ph'\n", " print(f'New best model found. Saving it as {model_path}')\n", " torch.save(atm_lstm_model.state_dict(), model_path)\n", "\n", " logs[prefix + 'log loss'] = epoch_loss.item()\n", " logs[prefix + 'accuracy'] = epoch_acc.item()\n", "\n", " liveloss.update(logs)\n", " liveloss.send()" ] }, { "cell_type": "markdown", "metadata": { "id": "rx0ZhYzWxTf1" }, "source": [ "That's better! We can see the LSTM now reaches a much better loss. Remember, this was done without **any** data cleaning, just standardization! Also of note is how long it took the optimizer to find a proper direction of descent. It pays to have a good patience!\n", "\n", "We can see that learning stalls around 100 epochs. This is why we use callbacks, we can now simply recover the optimal model and train again with a lower rate if needed.\n", "\n", "Finally, I did not **acid test** this model. It is very easy to add redundant layers to the model or even redundant series, as our model has 37 series to get patterns from. Try playing around with the model and continue training. It should not be hard to improve these results further." ] }, { "cell_type": "markdown", "metadata": { "id": "XYN6HKCjxoHq" }, "source": [ "Let's see the performance now over the test set. We start by loading the best model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U3ClcS2DjGLE" }, "outputs": [], "source": [ "atm_lstm_model.load_state_dict(torch.load('best_model.ph'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9I65_c3fjGLE" }, "outputs": [], "source": [ "# Wrapper to save memory by not recomputing gradients.\n", "with torch.no_grad():\n", " # Set the model in evaluation mode.\n", " atm_lstm_model.eval()\n", "\n", " # Calculate running loss and accuracy\n", " running_loss = 0.0\n", " running_corrects = 0\n", " test_labels = np.array([])\n", " test_probs = np.array([])\n", " test_predictions = np.array([])\n", "\n", " # Apply to the test set\n", " for i, data in enumerate(test_loader):\n", " inputs, labels = data\n", " test_labels = np.append(test_labels, labels.cpu().numpy())\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_lstm_model(inputs)\n", " test_probs = np.append(test_probs, outputs.cpu().numpy())\n", " outputs = outputs.to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " preds = softmax_func(outputs.cpu().numpy())\n", " preds = np.round(preds)\n", " test_predictions = np.append(test_predictions, preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", "test_loss = running_loss / len(test_loader.dataset)\n", "test_acc = running_corrects / len(test_loader.dataset)\n", "\n", "print(f'The test set accuracy is {test_acc*100:.2f}%')\n", "print(f'The test set loss is {test_loss:.3f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "QLQCeHU1jGLF" }, "outputs": [], "source": [ "# Calculate confusion matrix\n", "confusion_matrix_net = confusion_matrix(y_true = test_labels,\n", " y_pred = test_predictions)\n", "\n", "# Turn matrix to percentages\n", "confusion_matrix_net = confusion_matrix_net.astype('float') / confusion_matrix_net.sum(axis=1)[:, np.newaxis]\n", "\n", "# Turn to dataframe\n", "df_cm = pd.DataFrame(\n", " confusion_matrix_net,\n", " index=np.unique(test_labels),\n", " columns=np.unique(test_labels),\n", ")\n", "\n", "# Parameters of the image\n", "figsize = (10,7)\n", "fontsize=14\n", "\n", "# Create image\n", "fig = plt.figure(figsize=figsize)\n", "heatmap = sns.heatmap(df_cm, annot=True, fmt='.2f')\n", "\n", "# Make it nicer\n", "heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0,\n", " ha='right', fontsize=fontsize)\n", "heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45,\n", " ha='right', fontsize=fontsize)\n", "\n", "# Add labels\n", "plt.ylabel('True label')\n", "plt.xlabel('Predicted label')\n", "\n", "# Plot!\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "3NrQHgs01X5b" }, "source": [ "This is a pretty good model! The ROC curve will give us a better view of what is happening." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "id": "W3gjKFDMjGLG" }, "outputs": [], "source": [ "# Calculate the ROC curve points\n", "fpr, tpr, thresholds = roc_curve(test_labels, test_probs)\n", "\n", "# Save the AUC in a variable to display it. Round it first\n", "auc = np.round(roc_auc_score(y_true = test_labels,\n", " y_score = test_probs),\n", " decimals = 3)\n", "\n", "# Create and show the plot\n", "plt.plot(fpr,tpr,label=\"ATM Failures - LSTM, auc=\"+str(auc))\n", "plt.legend(loc=4)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "Wk2sAqh_jGLG" }, "source": [ "We are obtaining an amazing AUC! Given how unbalanced our model is though, the AUC may be misleading. An alternative would be to calculate the AUPRC and use that. In any case, our model is looking very good!" ] }, { "cell_type": "markdown", "metadata": { "id": "f2Vlv_O_2GYz" }, "source": [ "## GRU" ] }, { "cell_type": "markdown", "metadata": { "id": "E2_y20ug1ncL" }, "source": [ "Now we will train a GRU. In theory, a GRU will be able to reach the same results as the LSTM but using less parameters." ] }, { "cell_type": "markdown", "metadata": { "id": "qHazPX4b10ML" }, "source": [ "GRUs are more efficient, so you can either use the same size that will run faster, or increase the size to try to learn more. Let's do the former." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mGhDAKwejGLH" }, "outputs": [], "source": [ "# Define the LSTM model\n", "class GRUClassifier(nn.Module):\n", " def __init__(self, input_dim, hidden_dim, layer_dim, classifier_dim, output_dim):\n", " super(GRUClassifier, self).__init__()\n", "\n", " # Hidden dimensions\n", " self.hidden_dim = hidden_dim\n", "\n", " # Number of hidden layers\n", " self.layer_dim = layer_dim\n", "\n", " # Building your GRU\n", " # batch_first=True causes input/output tensors to be of shape\n", " # (batch_dim, seq_dim, feature_dim)\n", " self.gru = nn.GRU(input_dim, hidden_dim, layer_dim, batch_first=True)\n", "\n", " # Classifier network\n", " self.classifier = nn.Sequential(\n", " nn.Linear(hidden_dim, classifier_dim),\n", " nn.ReLU(),\n", " nn.Dropout(0.5),\n", " nn.Linear(classifier_dim, output_dim),\n", " #nn.Softmax(dim=1) # No need for softmax with logit loss.\n", " )\n", "\n", " # Forward method\n", " def forward(self, x):\n", " # Initialize hidden state with zeros\n", " h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).to(device)\n", "\n", " # Forward pass\n", " out, hn = self.gru(x, h0)\n", " out = self.classifier(out[:, -1, :])\n", " return out\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aeWDJASY2SX8" }, "outputs": [], "source": [ "# Initialize the model\n", "n_vars = 37\n", "atm_gru_model = GRUClassifier(n_vars, 256, 2, 256, 1).to(device)\n", "print(atm_gru_model)\n", "\n", "# Draw the model\n", "model_graph = draw_graph(atm_gru_model, input_size=(1, 256, n_vars),\n", " device=device,\n", " expand_nested=True)\n", "model_graph.visual_graph\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Ts7cN7ByjGLI" }, "source": [ "Perfect! Let's train the model now. The process is the same as before." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9nw8L2-5jGLJ" }, "outputs": [], "source": [ "# Set up optimizer and loss\n", "learning_rate = 0.001\n", "optimizer = optim.RMSprop(atm_gru_model.parameters(), lr=learning_rate)\n", "loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight).to(device)\n", "\n", "# Set global run parameters\n", "best_vloss = 10000000" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Tf4rsUHmjGLJ" }, "outputs": [], "source": [ "# Create the data loaders\n", "train_loader, val_loader = create_train_val_dataloaders(x_train, y_train.reshape(-1,1),\n", " batch_size=batch_size)\n", "dataloaders = {\n", " \"train\": train_loader,\n", " \"validation\": val_loader\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Wu_KftImjGLJ" }, "outputs": [], "source": [ "# Set run parameters\n", "n_epochs = 200\n", "liveloss = PlotLosses()\n", "\n", "# Train!\n", "for epoch in range(n_epochs):\n", " # Run the epoch\n", " logs = {}\n", "\n", " # Run a train epoch, and then a validation epoch.\n", " for phase in ['train', 'validation']:\n", " if phase == 'train':\n", " atm_gru_model.train()\n", " else:\n", " atm_gru_model.eval()\n", "\n", " running_loss = 0.0\n", " running_corrects = 0\n", "\n", " for i, data in enumerate(dataloaders[phase]):\n", " inputs, labels = data\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_gru_model(inputs).to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " if phase == 'train':\n", " optimizer.zero_grad()\n", " loss.backward()\n", " # Clip the gradient norm\n", " nn.utils.clip_grad_norm_(atm_gru_model.parameters(), 0.1)\n", " # Backpropagate\n", " optimizer.step()\n", "\n", " preds = softmax_func(outputs.detach().cpu().numpy())\n", " preds = np.round(preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", " if i % 10 == 9:\n", " batch_loss = running_loss / (10 * (i+1))\n", " print(f'{phase} batch {i+1} loss: {batch_loss:.3f}')\n", " tb_x = epoch * len(dataloaders[phase]) + i + 1\n", "\n", " # Delete the used VRAM\n", " torch.cuda.empty_cache()\n", "\n", " epoch_loss = running_loss / len(dataloaders[phase].dataset)\n", " epoch_acc = running_corrects / len(dataloaders[phase].dataset)\n", "\n", " prefix = ''\n", " if phase == 'validation':\n", " prefix = 'val_'\n", "\n", " # Track best performance, and save the model's state\n", " if epoch_loss < best_vloss:\n", " best_vloss = epoch_loss\n", " model_path = 'best_gru_model.ph'\n", " print(f'New best model found. Saving it as {model_path}')\n", " torch.save(atm_gru_model.state_dict(), model_path)\n", "\n", " logs[prefix + 'log loss'] = epoch_loss.item()\n", " logs[prefix + 'accuracy'] = epoch_acc.item()\n", "\n", " liveloss.update(logs)\n", " liveloss.send()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VCFaTnrIjGLK" }, "outputs": [], "source": [ "atm_gru_model.load_state_dict(torch.load('best_gru_model.ph'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6_24kZRNjGLK" }, "outputs": [], "source": [ "# Wrapper to save memory by not recomputing gradients.\n", "with torch.no_grad():\n", " # Set the model in evaluation mode.\n", " atm_gru_model.eval()\n", "\n", " # Calculate running loss and accuracy\n", " running_loss = 0.0\n", " running_corrects = 0\n", " test_labels = np.array([])\n", " test_probs = np.array([])\n", " test_predictions = np.array([])\n", "\n", " # Apply to the test set\n", " for i, data in enumerate(test_loader):\n", " inputs, labels = data\n", " test_labels = np.append(test_labels, labels.cpu().numpy())\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", "\n", " outputs = atm_gru_model(inputs)\n", " test_probs = np.append(test_probs, outputs.cpu().numpy())\n", " outputs = outputs.to(device)\n", " loss = loss_fn(outputs, labels)\n", "\n", " preds = softmax_func(outputs.cpu().numpy())\n", " preds = np.round(preds)\n", " test_predictions = np.append(test_predictions, preds)\n", " running_loss += loss.detach() * inputs.size(0)\n", " running_corrects += np.sum(preds.flatten() == labels.data.flatten().cpu().numpy())\n", "\n", "test_loss = running_loss / len(test_loader.dataset)\n", "test_acc = running_corrects / len(test_loader.dataset)\n", "\n", "print(f'The test set accuracy is {test_acc*100:.2f}%')\n", "print(f'The test set loss is {test_loss:.3f}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "id": "6EybNNfijGLL" }, "outputs": [], "source": [ "# Calculate confusion matrix\n", "confusion_matrix_net = confusion_matrix(y_true = test_labels,\n", " y_pred = test_predictions)\n", "\n", "# Turn matrix to percentages\n", "confusion_matrix_net = confusion_matrix_net.astype('float') / confusion_matrix_net.sum(axis=1)[:, np.newaxis]\n", "\n", "# Turn to dataframe\n", "df_cm = pd.DataFrame(\n", " confusion_matrix_net,\n", " index=np.unique(test_labels),\n", " columns=np.unique(test_labels),\n", ")\n", "\n", "# Parameters of the image\n", "figsize = (10,7)\n", "fontsize=14\n", "\n", "# Create image\n", "fig = plt.figure(figsize=figsize)\n", "heatmap = sns.heatmap(df_cm, annot=True, fmt='.2f')\n", "\n", "# Make it nicer\n", "heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0,\n", " ha='right', fontsize=fontsize)\n", "heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45,\n", " ha='right', fontsize=fontsize)\n", "\n", "# Add labels\n", "plt.ylabel('True label')\n", "plt.xlabel('Predicted label')\n", "\n", "# Plot!\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "id": "JV3Jzl55jGLL" }, "outputs": [], "source": [ "# Calculate the ROC curve points\n", "fpr, tpr, thresholds = roc_curve(test_labels, test_probs)\n", "\n", "# Save the AUC in a variable to display it. Round it first\n", "auc = np.round(roc_auc_score(y_true = test_labels,\n", " y_score = test_probs),\n", " decimals = 3)\n", "\n", "# Create and show the plot\n", "plt.plot(fpr,tpr,label=\"ATM Failures - GRU, auc=\"+str(auc))\n", "plt.legend(loc=4)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "FxyntjFs6tlu" }, "source": [ "We got an even better model! And it took only 200 epochs to get here. Try to see if you can reach better ones! Let's load the optimal parameters and measure performance." ] }, { "cell_type": "markdown", "metadata": { "id": "O4j-ndvT7pFL" }, "source": [ "Can you do better? Experiment with the parameters, set a [Learning Rate Scheduler](https://www.kaggle.com/code/isbhargav/guide-to-pytorch-learning-rate-scheduling) to train more slowly after epoch 200. Experiment! I got these results playing around with the parameters in a few hours. See what you can get!\n", "\n", "In any case, sequence-based models have evolved greatly beyond LSTM and GRU. In particular, one specific transform has been shown to be a significant advancement over these models: The Transformer." ] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [], "include_colab_link": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 0 }