项目作者: macemaclean

项目描述 :
Python scripts for regression models, using the Scikit-Learn framework: Diagnostic plots, confidence intervals & approximate Shapley values
高级语言: Jupyter Notebook
项目地址: git://github.com/macemaclean/regression-model-tools.git
创建时间: 2020-06-23T19:01:00Z
项目社区:https://github.com/macemaclean/regression-model-tools

开源协议:GNU General Public License v3.0

下载


Regression model tools

Python scripts for regression models, using the Scikit-Learn framework:

  • Diagnostic plots
  • Bootstrapped confidence intervals for predictions
  • Approximate Shapley values

Diagnostic plots

While ML models do not generally have the same residual distribution assumptions as for classical linear regression, there is still value in examining residual plots.

  1. import lightgbm as lgb
  2. import pandas as pd
  3. from sklearn.datasets import load_boston
  4. from sklearn.model_selection import train_test_split
  5. from regression_diagnostics import RegressionDiagnostics
  6. import warnings
  7. warnings.filterwarnings('ignore')
  8. # Load the boston house-prices dataset and fit a regression model
  9. boston = load_boston()
  10. X = pd.DataFrame(boston["data"], columns=boston.feature_names)
  11. y = boston.target
  12. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  13. # Fit model
  14. lgb_model = lgb.LGBMRegressor()
  15. lgb_model.fit(X_train, y_train)
  16. # Generate diagnostic plots
  17. diagnostics = RegressionDiagnostics(lgb_model)
  18. diagnostics.fit(X_test, y_test)
  1. # Fitted values against actual values
  2. diagnostics.fitted_actual()

Fitted values against actual values

  1. # Residuals against fitted values
  2. diagnostics.residuals_fitted()

Residuals against fitted values

  1. # Histogram of residuals
  2. diagnostics.hist_residuals()

Residuals against fitted values

  1. # QQ plot of residuals
  2. diagnostics.qq_plot()

Residuals against fitted values

Bootstrapped confidence intervals for predictions

A script to generate local bootstrapped confidence intervals for predictions using observed residuals for k nearest neighbours in a reference data set. Increasing the value of k obtains results closer to a global error interval.

Residuals against fitted values