{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/CBravoR/AdvancedAnalyticsLabs/blob/master/notebooks/python/Lab_2_Capital_Requirements_and_Pandas.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lg7bCCwd0bHd"
      },
      "source": [
        "## Reading the data\n",
        "\n",
        "For this exercise we will read a dataset from credit scoring. I previously uploaded the data to Google, and it is available at https://docs.google.com/spreadsheets/d/1Am74y2ZVQ6dRFYVZUv_VoyP-OTS8BM4x0svifHQvtNc/export?gid=819627738&format=csv\n",
        "\n",
        "The dataset (called **Bankloan**, from IBM) has a set of 1,000 loans with default information. It includes the following variables:\n",
        "\n",
        "- Customer: ID, or unique label, of the borrower (NOT predictive).\n",
        "- Age: Age of the borrower in years.\n",
        "- Education: Maximum education level the borrower reached.\n",
        "1: Complete primary. 2: Completed Secondary. 3: Incomplete Higher Ed. 4: Complete Higher Ed. 5: With postgraduate studies (complete MSc or PhD).\n",
        "- Employ: Years at current job.\n",
        "- Address: Years at current address.\n",
        "- Income: Income in ‘000s USD.\n",
        "- Leverage: Debt/Income Ratio.\n",
        "- CredDebt: Credit card standing debt.\n",
        "- OthDebt: Other debt in ‘000s USD.\n",
        "- MonthlyLoad: Monthly percentage from salary used to repay debts.\n",
        "- Default: 1 If default has occurred, 0 if not (Target variable).\n",
        "- PD: The calibrated probability of default of the loan.\n",
        "- LGD: The estimated LGD for the loan.\n",
        "- Outstanding: EAD.\n",
        "\n",
        "goal:whether the loan is going to default or not"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Fp2he3jP0acB"
      },
      "source": [
        "!gdown https://drive.google.com/uc?id=1lyEd01JaoVbL1mbgn-wr3YvLmURAgQ8B"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HO8ZUMz1Mzf0"
      },
      "source": [
        "!head /content/bankloan_scored_nodefault.csv"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "h-P17DdYM-40"
      },
      "source": [
        "import pandas as pd\n",
        "import numpy as np"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "djm2FGO3AjyR"
      },
      "source": [
        "!pip install scorecardpy"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0yg6tQqrAr-h"
      },
      "source": [
        "import scorecardpy as scp"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hqPM4W2VxYZ1"
      },
      "source": [
        "bankloan_data = pd.read_csv('/content/bankloan_scored_nodefault.csv')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "bankloan_data.head()"
      ],
      "metadata": {
        "id": "nAh6m2j0a9LJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "pjY06WDPNFWN"
      },
      "source": [
        "bankloan_data.dtypes"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EPZ8Ol9SNHrR"
      },
      "source": [
        "summary statistics of the numerical variables"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "uluQVPhLNK2q"
      },
      "source": [
        "bankloan_data.describe()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g13R0-vQNO_a"
      },
      "source": [
        " plot the histograms of the variables"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "d1sqyXdANQDa"
      },
      "source": [
        "import matplotlib.pyplot as plt\n",
        "%matplotlib inline"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R_opsrRVNTD2"
      },
      "source": [
        "histograms = bankloan_data.hist()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XKj1LjVR0H9A"
      },
      "source": [
        "bankloan_data.loc[bankloan_data.loc[:, 'Age'] < 37, :]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "bankloan_data.iloc[0:5, 1:2]"
      ],
      "metadata": {
        "id": "1p-ktEfzb7iB"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lvoO_TfRNcSS"
      },
      "source": [
        "import seaborn as sns\n",
        "import numpy as np"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rdiV2Gc9NeAG"
      },
      "source": [
        "sns.set(color_codes=True)\n",
        "sns.pairplot(bankloan_data)\n",
        "plt.savefig('Hist.pdf')\n",
        "plt.savefig('Hist.jpg')\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NpaRoXKD3x3b"
      },
      "source": [
        "## Basel III Capital Requirements\n",
        "\n",
        "Recalling the last lecture, the equation for the capital requirement of any operation is:\n",
        "\n",
        "$$\n",
        "K = LGD \\cdot \\left\\{ N\\left( \\sqrt{\\frac{1}{1-R}} \\cdot N^{-1}(PD) + \\sqrt{\\frac{R}{1-R}} \\cdot N^{-1}(0.999) \\right) - PD \\right\\} \\left( \\frac{1 + (M - 2.5)b}{1 - 1.5b}\\right)\n",
        "$$\n",
        "\n",
        "The values of $b$ and $M$ will be variable for bonds, but for retail and mortgages the maturity is fixed at 1, and the b term dissapears. The correlations are given by the regulation:\n",
        "\n",
        "- Mortgages: $R = 0.15$\n",
        "- Revolving: $R = 0.04$\n",
        "- Other retail: $R = 0.03 \\left( \\frac{1 - e^{-35PD}}{1 - e^{-35}} \\right) + 0.16 \\left( 1 - \\frac{1 - e^{-35PD}}{1 - e^{-35}} \\right)$\n",
        "- Corporate and sovereign exposures $ R = 0.12 \\left( \\frac{1 - e^{-50PD}}{1 - e^{-50}} \\right) + 0.24 \\left( 1 - \\frac{1 - e^{-50PD}}{1 - e^{-50}} \\right)$\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "W_fu2Owu9Gtq"
      },
      "source": [
        "#Other retail\n",
        "def capital_requirement_retail(PD, LGD):\n",
        "  import numpy as np\n",
        "  from scipy.stats import norm\n",
        "  # Check if PD satisfies floor\n",
        "  if PD < 0.0003:\n",
        "    PD = 0.0003\n",
        "  # First part of the equation, lower correlation\n",
        "  R =  0.03 * ( (1 - np.exp(-35 * PD)) / (1 - np.exp(-35)) )\n",
        "  # Second part of the equation, higher correlation\n",
        "  R += 0.16 * (1 - ( (1 - np.exp(-35 * PD)) / (1 - np.exp(-35)) ) )\n",
        "  # Now we can calculate the capital\n",
        "  K = norm.cdf(np.sqrt( (1 - R) ** (-1) ) * norm.ppf(PD) +\n",
        "               np.sqrt( R / (1 - R) ) * norm.ppf(0.999) ) - PD\n",
        "  K *= LGD\n",
        "  return(K)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "znRZuAgFLlot"
      },
      "source": [
        "capital_requirement_retail(0.5, 0.5)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Or we can print it in a nicer format using a [f-string](https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html)."
      ],
      "metadata": {
        "id": "5-Eu9y0rKq9j"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "L0UdCTHKLsbR"
      },
      "source": [
        "print(f'PD = 0.5 & LGD = 0.5. K = {capital_requirement_retail(0.5, 0.5):.3f}')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EXczxRiABF5o"
      },
      "source": [
        "Xseries = np.arange(0, 1, 0.001)\n",
        "LGD = 1\n",
        "Yseries = [capital_requirement_retail(x, LGD) for x in Xseries]\n",
        "plt.plot(Xseries, Yseries)\n",
        "plt.title('PD curve at LGD = 1')\n",
        "plt.xlabel('PD')\n",
        "plt.ylabel('Capital Req. %')\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Now, let's apply the result to the full dataset. For this, we need a [lambda function](https://www.w3schools.com/python/python_lambda.asp) that will map the vector inputs to the function inputs."
      ],
      "metadata": {
        "id": "JBL_uW0oLMeU"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lwNbnu-dE3uP"
      },
      "source": [
        "bankloan_data['CapitalReq'] = bankloan_data.apply(lambda x : capital_requirement_retail(x['PD'], x['LGD']), axis = 1)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "bankloan_data['CapitalReq']"
      ],
      "metadata": {
        "id": "8hIlKDqLeYhF"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "And now we can plot the distribution using Seaborn. The distplot function does this and adds the KDE."
      ],
      "metadata": {
        "id": "6UaI2ZlYLeRj"
      }
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Xxf5cWTVR3jP"
      },
      "source": [
        "sns.displot(bankloan_data['CapitalReq'], kde=True)\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "14A51czTSKet"
      },
      "source": [
        "And we can finally calculate the maximum Risk Weighted Asset (RWA) value that would be required to cover these instruments. Assuming a factor $F = 8\\%$, remember that:\n",
        "\n",
        "$$\n",
        "RWA = \\frac{1}{F} * K * EAD\n",
        "$$\n",
        "\n",
        "in retail lending the Exposure at Default is equal to the outstanding amount, leading to:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "StQaz_o5SdP4"
      },
      "source": [
        "RWA = (1 / 0.08) * np.dot(bankloan_data['CapitalReq'], bankloan_data['Outstanding'])\n",
        "RWA"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nh_dpo2vTtbF"
      },
      "source": [
        "Every bank will have a different factor of the RWA which it must conserve. This will depend on its own characteristics. If, for example, the bank had a 12% requirement, then its (total) regulatory capital would be equal to:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "afkmkLclT7tO"
      },
      "source": [
        "RWA = (1 / 0.12) * np.dot(bankloan_data['CapitalReq'], bankloan_data['Outstanding'])\n",
        "\n",
        "\n",
        "# To format money correctly\n",
        "import locale\n",
        "locale.setlocale( locale.LC_ALL, '' )\n",
        "\n",
        "# Display\n",
        "out = locale.currency( RWA, grouping=True )\n",
        "print('The maximum value for the RWA at a 12% capital requirement is equal to ' + out)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "However, Basel says that the RWA per business line is 12.5 times the capital requirement (i.e. it does not adjust by the bank's own load), so the 12.5 factor is the correct value to use when calculating the RWA of the line."
      ],
      "metadata": {
        "id": "CoKMc5oRC5vm"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "8z25GuXoeUD_"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}