项目作者: madrury

项目描述 :
Generalized Linear Models in Sklearn Style
高级语言: Python
项目地址: git://github.com/madrury/py-glm.git
创建时间: 2017-08-26T04:49:07Z
项目社区:https://github.com/madrury/py-glm

开源协议:BSD 3-Clause "New" or "Revised" License

下载


py-glm: Generalized Linear Models in Python

py-glm is a library for fitting, inspecting, and evaluating Generalized Linear Models in python.

Installation

The py-glm library can be installed directly from github.

  1. pip install git+https://github.com/madrury/py-glm.git

Features

Model Fitting

py-glm supports models from various exponential families:

  1. from glm.glm import GLM
  2. from glm.families import Gaussian, Bernoulli, Poisson, Exponential
  3. linear_model = GLM(family=Gaussian())
  4. logistic_model = GLM(family=Bernoulli())
  5. poisson_model = GLM(family=Poisson())
  6. exponential_model = GLM(family=Exponential())

Models with dispersion parameters are also supported. The dispersion parameters
in these models are estimated using the deviance.

  1. from glm.families import QuasiPoisson, Gamma
  2. quasi_poisson_model = GLM(family=QuasiPoisson())
  3. gamma_model = GLM(family=Gamma())

Fitting a model proceeds in sklearn style, and uses the Fisher scoring algorithm:

  1. logistic_model.fit(X, y_logistic)

If your data resides in a pandas.DataFrame, you can pass this to fit along with a model formula.

  1. logistic_model.fit(X, formula="y ~ Moshi + SwimSwim")

Offsets and sample weights are supported when fitting:

  1. linear_model.fit(X, y_linear, sample_weights=sample_weights)
  2. poisson_nmodel.fit(X, y_poisson, offset=np.log(expos))

Predictions are also made in sklearn style:

  1. logistic_model.predict(X)

Note: There is one major place we deviate from the sklearn interface. The predict method on a GLM object always returns an estimate of the conditional expectation E[y | X]. This is in contrast to sklearn behavior for classification models, where it returns a class assignment. We make this choice so that the py-glm library is consistent with its use of predict. If the user would like class assignments from a model, they will need to threshold the probability returned by predict manually.

Inference

Once the model is fit, parameter estimates, parameter covariance estimates, and p-values from a standard z-test are available:

  1. logistic_model.coef_
  2. logistic_model.coef_covariance_matrix_
  3. logistic_model.coef_standard_error_
  4. logistic_model.p_values_

To get a quick summary, use the summary method:

  1. logistic_model.summary()
  2. Binomial GLM Model Summary.
  3. ===============================================
  4. Name Parameter Estimate Standard Error
  5. -----------------------------------------------
  6. Intercept 1.02 0.01
  7. Moshi -2.00 0.02
  8. SwimSwim 1.00 0.02

Re-sampling methods are also supported in the simulation subpackage: the
parametric and non-parametric bootstraps:

  1. from glm.simulation import Simulation
  2. sim = Simulation(logistic_model)
  3. sim.parametric_bootstrap(X, n_sim=1000)
  4. sim.non_parametric_bootstrap(X, n_sim=1000)

Regularization

Ridge regression is supported for each model (note, the regularization parameter is called alpha instead of lambda due to lambda being a reserved word in python):

  1. logistic_model.fit(X, y_logistic, alpha=1.0)

References

Warning

The glmnet code included in glm.glmnet is experimental. Please use at your own risk.