Ada-Lending_Club_Default_Prediction-PROSAGA-码农传奇

This repository consisted a Machine Learning Model(Predictive Analysis) to predict the default rate of Lending Club. Lending Club is an American peer-to-peer lending platform connecting investor to borrower. The dataset has 396,000 observations ranging from 2007 to 2016 with data imbalanced 1-5 in favor of Fully Paid. In 2019, default borrower wiped off roughly $811 million USD from Lending Club’s investors.
Since this is an imbalanced data on classification problem. The data preprocessing included Robust Scale, Standarization, QuantileTransform, SMOTENC, ADASYN and under-sampling to feed to predictive models included: LogisticRegression, AdaptiveBoosting, RandomForest, Neural Network and Extreme Gradient Boost.
Overall the Adaptive Boosting seems to performed better than other models by Recall Metrics(aka. correctly classify default borrower-minimise False Negative). However, Hyperparameters and Probability Calibration provided a better result in term of F1 score and roc curve.
End Notes: The million dollar question is which side should the company endorse in the trade-off, for this model, it is the trade-off between investor’s return and company profitability.
Key finding: Revolving Line Utilization Rate, DTI, Interest rate, Grade, Employment length, total number of credit lines are key indicators for default.