项目作者: RickFSA

项目描述 :
Classify default borrowers from initial loan application for Lending Club
高级语言: Jupyter Notebook
项目地址: git://github.com/RickFSA/Lending_Club_Default_Prediction.git
创建时间: 2020-09-21T10:37:17Z
项目社区:https://github.com/RickFSA/Lending_Club_Default_Prediction

开源协议:

下载


This repository consisted a Machine Learning Model(Predictive Analysis) to predict the default rate of Lending Club. Lending Club is an American peer-to-peer lending platform connecting investor to borrower. The dataset has 396,000 observations ranging from 2007 to 2016 with data imbalanced 1-5 in favor of Fully Paid. In 2019, default borrower wiped off roughly $811 million USD from Lending Club’s investors.
Since this is an imbalanced data on classification problem. The data preprocessing included Robust Scale, Standarization, QuantileTransform, SMOTENC, ADASYN and under-sampling to feed to predictive models included: LogisticRegression, AdaptiveBoosting, RandomForest, Neural Network and Extreme Gradient Boost.
Overall the Adaptive Boosting seems to performed better than other models by Recall Metrics(aka. correctly classify default borrower-minimise False Negative). However, Hyperparameters and Probability Calibration provided a better result in term of F1 score and roc curve.
End Notes: The million dollar question is which side should the company endorse in the trade-off, for this model, it is the trade-off between investor’s return and company profitability.
Key finding: Revolving Line Utilization Rate, DTI, Interest rate, Grade, Employment length, total number of credit lines are key indicators for default.