"Open

## Reading the data

For this exercise we will read a dataset from credit scoring. I previously uploaded the data to Google, and it is available at https://docs.google.com/spreadsheets/d/1Am74y2ZVQ6dRFYVZUv_VoyP-OTS8BM4x0svifHQvtNc/export?gid=819627738&format=csv

The dataset (called **Bankloan**, from IBM) has a set of 1,000 loans with default information. It includes the following variables:

- Customer: ID, or unique label, of the borrower (NOT predictive).
- Age: Age of the borrower in years.
- Education: Maximum education level the borrower reached.
1: Complete primary. 2: Completed Secondary. 3: Incomplete Higher Ed. 4: Complete Higher Ed. 5: With postgraduate studies (complete MSc or PhD).
- Employ: Years at current job.
- Address: Years at current address.
- Income: Income in ‘000s USD.
- Leverage: Debt/Income Ratio.
- CredDebt: Credit card standing debt.
- OthDebt: Other debt in ‘000s USD.
- MonthlyLoad: Monthly percentage from salary used to repay debts.
- Default: 1 If default has occurred, 0 if not (Target variable).
- PD: The calibrated probability of default of the loan.
- LGD: The estimated LGD for the loan.
- Outstanding: EAD.

goal:whether the loan is going to default or not

In [None]:
!gdown https://drive.google.com/uc?id=1lyEd01JaoVbL1mbgn-wr3YvLmURAgQ8B

In [None]:
!head /content/bankloan_scored_nodefault.csv

In [None]:
import pandas as pd
import numpy as np

In [None]:
!pip install scorecardpy

In [None]:
import scorecardpy as scp

In [None]:
bankloan_data = pd.read_csv('/content/bankloan_scored_nodefault.csv')

In [None]:
bankloan_data.head()

In [None]:
bankloan_data.dtypes

summary statistics of the numerical variables

In [None]:
bankloan_data.describe()

 plot the histograms of the variables

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
histograms = bankloan_data.hist()

In [None]:
bankloan_data.loc[bankloan_data.loc[:, 'Age'] < 37, :]

In [None]:
bankloan_data.iloc[0:5, 1:2]

In [None]:
import seaborn as sns
import numpy as np

In [None]:
sns.set(color_codes=True)
sns.pairplot(bankloan_data)
plt.savefig('Hist.pdf')
plt.savefig('Hist.jpg')
plt.show()

## Basel III Capital Requirements

Recalling the last lecture, the equation for the capital requirement of any operation is:

$$
K = LGD \cdot \left\{ N\left( \sqrt{\frac{1}{1-R}} \cdot N^{-1}(PD) + \sqrt{\frac{R}{1-R}} \cdot N^{-1}(0.999) \right) - PD \right\} \left( \frac{1 + (M - 2.5)b}{1 - 1.5b}\right)
$$

The values of $b$ and $M$ will be variable for bonds, but for retail and mortgages the maturity is fixed at 1, and the b term dissapears. The correlations are given by the regulation:

- Mortgages: $R = 0.15$
- Revolving: $R = 0.04$
- Other retail: $R = 0.03 \left( \frac{1 - e^{-35PD}}{1 - e^{-35}} \right) + 0.16 \left( 1 - \frac{1 - e^{-35PD}}{1 - e^{-35}} \right)$
- Corporate and sovereign exposures $ R = 0.12 \left( \frac{1 - e^{-50PD}}{1 - e^{-50}} \right) + 0.24 \left( 1 - \frac{1 - e^{-50PD}}{1 - e^{-50}} \right)$



In [None]:
#Other retail
def capital_requirement_retail(PD, LGD):
 import numpy as np
 from scipy.stats import norm
 # Check if PD satisfies floor
 if PD < 0.0003:
 PD = 0.0003
 # First part of the equation, lower correlation
 R = 0.03 * ( (1 - np.exp(-35 * PD)) / (1 - np.exp(-35)) )
 # Second part of the equation, higher correlation
 R += 0.16 * (1 - ( (1 - np.exp(-35 * PD)) / (1 - np.exp(-35)) ) )
 # Now we can calculate the capital
 K = norm.cdf(np.sqrt( (1 - R) ** (-1) ) * norm.ppf(PD) +
 np.sqrt( R / (1 - R) ) * norm.ppf(0.999) ) - PD
 K *= LGD
 return(K)

In [None]:
capital_requirement_retail(0.5, 0.5)

Or we can print it in a nicer format using a [f-string](https://statics.teams.cdn.office.net/evergreen-assets/safelinks/1/atp-safelinks.html).

In [None]:
print(f'PD = 0.5 & LGD = 0.5. K = {capital_requirement_retail(0.5, 0.5):.3f}')

In [None]:
Xseries = np.arange(0, 1, 0.001)
LGD = 1
Yseries = [capital_requirement_retail(x, LGD) for x in Xseries]
plt.plot(Xseries, Yseries)
plt.title('PD curve at LGD = 1')
plt.xlabel('PD')
plt.ylabel('Capital Req. %')
plt.show()

Now, let's apply the result to the full dataset. For this, we need a [lambda function](https://www.w3schools.com/python/python_lambda.asp) that will map the vector inputs to the function inputs.

In [None]:
bankloan_data['CapitalReq'] = bankloan_data.apply(lambda x : capital_requirement_retail(x['PD'], x['LGD']), axis = 1)

In [None]:
bankloan_data['CapitalReq']

And now we can plot the distribution using Seaborn. The distplot function does this and adds the KDE.

In [None]:
sns.displot(bankloan_data['CapitalReq'], kde=True)
plt.show()

And we can finally calculate the maximum Risk Weighted Asset (RWA) value that would be required to cover these instruments. Assuming a factor $F = 8\%$, remember that:

$$
RWA = \frac{1}{F} * K * EAD
$$

in retail lending the Exposure at Default is equal to the outstanding amount, leading to:

In [None]:
RWA = (1 / 0.08) * np.dot(bankloan_data['CapitalReq'], bankloan_data['Outstanding'])
RWA

Every bank will have a different factor of the RWA which it must conserve. This will depend on its own characteristics. If, for example, the bank had a 12% requirement, then its (total) regulatory capital would be equal to:

In [None]:
RWA = (1 / 0.12) * np.dot(bankloan_data['CapitalReq'], bankloan_data['Outstanding'])


# To format money correctly
import locale
locale.setlocale( locale.LC_ALL, '' )

# Display
out = locale.currency( RWA, grouping=True )
print('The maximum value for the RWA at a 12% capital requirement is equal to ' + out)

However, Basel says that the RWA per business line is 12.5 times the capital requirement (i.e. it does not adjust by the bank's own load), so the 12.5 factor is the correct value to use when calculating the RWA of the line.