项目作者: haziqj

项目描述 :
Binary and multinomial probit regression using I-priors
高级语言: R
项目地址: git://github.com/haziqj/iprobit.git
创建时间: 2017-04-06T15:32:17Z
项目社区:https://github.com/haziqj/iprobit

开源协议:GNU General Public License v3.0

下载


R/iprobit: Binary and multinomial probit regression using I-priors

Build Status AppVeyor Build Status Coverage Status

This is an R package which extends I-prior regression to unordered categorical responses via a probit link function. This allows the user to fit models for classification or inference using fitted probabilities. Estimation is performed using a variational EM algorithm. Visit http://phd.haziqj.ml for details.

Binary classification (toy example)

Model fitting

  1. dat <- gen_spiral(n = 300, seed = 123) # generate binary toy example data set
  2. mod <- iprobit(y ~ X1 + X2, dat, one.lam = TRUE, kernel = "fbm")
  3. ## ==========================================
  4. ## Converged after 56 iterations.

Model summary

  1. summary(mod)
  2. ## Call:
  3. ## iprobit(formula = y ~ X1 + X2, data = dat, kernel = "fbm", one.lam = TRUE)
  4. ##
  5. ## Classes: 1, 2
  6. ##
  7. ## RKHS used:
  8. ## Fractional Brownian motion with Hurst 0.5 (X1 + X2)
  9. ##
  10. ## Hyperparameters:
  11. ## Mean S.D. 2.5% 97.5%
  12. ## Intercept 0.0000 0.0577 -0.1132 0.1132
  13. ## lambda 5.6718 0.2320 5.2171 6.1265
  14. ## ---
  15. ##
  16. ## Closed-form VB-EM algorithm. Iterations: 56/100
  17. ## Converged to within 1e-05 tolerance. Time taken: 3.573205 secs
  18. ## Variational lower bound: -140.711
  19. ## Training error: 0%. Brier score: 0.01466541

Boundary plot for two-dimensional covariates

  1. iplot_predict(mod)

Multiclass classification (toy example)

Model fit report and parameter estimates

  1. dat <- gen_mixture(n = 400, m = 4, sd = 1.5, seed = 123) # generate 4-class
  2. # toy example data set
  3. (mod <- iprobit(y ~ X1 + X2, dat, train.samp = sample(1:400, size = 392),
  4. control = list(maxit = 10))) # set aside 8 points for testing
  5. ## ===========================================================================
  6. ## Convergence criterion not met.
  7. ## Training error rate: 6.89 %
  8. ## Lower bound value: -208.9813
  9. ##
  10. ## Class = 1 Class = 2 Class = 3 Class = 4
  11. ## Intercept 0.32094 0.26087 0.30916 0.46155
  12. ## lambda[1,] -0.21341 0.00000 0.62978 0.00000
  13. ## lambda[2,] 0.00000 -0.50221 0.00000 -2.66854

Boundary plot for two-dimensional covariates

  1. iplot_predict(mod, dec.bound = TRUE, plot.test = TRUE, grid.len = 50)

Obtain out-of-sample test error rates, predicted classes and probabilities

  1. predict(mod)
  2. ## Training error: 0.000%
  3. ## Brier score : 0.071
  4. ##
  5. ## Predicted classes:
  6. ## [1] 1 1 2 2 3 4 4 4
  7. ## Levels: 1 2 3 4
  8. ##
  9. ## Predicted probabilities:
  10. ## 1 2 3 4
  11. ## 1 0.750 0.126 0.007 0.117
  12. ## 2 0.852 0.137 0.000 0.010
  13. ## 3 0.064 0.831 0.103 0.002
  14. ## 4 0.046 0.859 0.095 0.001
  15. ## 5 0.000 0.089 0.906 0.005
  16. ## 6 0.399 0.011 0.015 0.576
  17. ## 7 0.136 0.050 0.229 0.585
  18. ## 8 0.146 0.015 0.110 0.729

Fisher’s Iris data set

Model fitting (common RKHS scale across classes for each covariate)

  1. mod <- iprobit(Species ~ ., iris, kernel = "fbm", one.lam = TRUE,
  2. common.RKHS.scale = TRUE, common.intercept = FALSE,
  3. control = list(alpha0 = 1, theta0 = 1,
  4. stop.crit = 1e-1))
  5. ## ==========================
  6. ## Converged after 34 iterations.
  7. summary(mod)
  8. ## Call:
  9. ## iprobit(formula = Species ~ ., data = iris, kernel = "fbm", one.lam = TRUE,
  10. ## common.intercept = FALSE, common.RKHS.scale = TRUE, control = list(alpha0 = 1,
  11. ## theta0 = 1, stop.crit = 0.1))
  12. ##
  13. ## Classes: setosa, versicolor, virginica
  14. ##
  15. ## RKHS used:
  16. ## Fractional Brownian motion with Hurst 0.5 (Sepal.Length + ... + Petal.Width)
  17. ##
  18. ## Hyperparameters:
  19. ## Mean S.D. 2.5% 97.5%
  20. ## Intercept[1] 0.8813 0.0816 0.7213 1.0413
  21. ## Intercept[2] 1.0581 0.0816 0.8980 1.2181
  22. ## Intercept[3] 1.0606 0.0816 0.9006 1.2207
  23. ## lambda 0.3589 0.0120 0.3353 0.3824
  24. ## ---
  25. ##
  26. ## Closed-form VB-EM algorithm. Iterations: 34/100
  27. ## Converged to within 0.1 tolerance. Time taken: 5.10163 secs
  28. ## Variational lower bound: -50.39342
  29. ## Training error: 4%. Brier score: 0.02759783

Obtain training error rates, predicted classes and probabilities with posterior quantiles

  1. fitted(mod, quantiles = TRUE)
  2. ## 2.5% 25% 50% 75% 97.5%
  3. ## Training error (%) 2.000 2.667 3.333 4.000 5.683
  4. ## Brier score 0.025 0.027 0.029 0.032 0.036
  5. ##
  6. ## Predicted probabilities for Class = setosa
  7. ## 2.5% 25% 50% 75% 97.5%
  8. ## 1 0.966 0.979 0.985 0.989 0.993
  9. ## 2 0.937 0.964 0.975 0.982 0.991
  10. ## 3 0.957 0.977 0.982 0.988 0.994
  11. ## 4 0.938 0.961 0.972 0.981 0.990
  12. ## 5 0.967 0.980 0.986 0.990 0.994
  13. ## # ... with 145 more rows
  14. ##
  15. ## Predicted probabilities for Class = versicolor
  16. ## 2.5% 25% 50% 75% 97.5%
  17. ## 1 0.003 0.006 0.008 0.012 0.022
  18. ## 2 0.004 0.010 0.014 0.022 0.042
  19. ## 3 0.002 0.004 0.008 0.012 0.021
  20. ## 4 0.004 0.009 0.014 0.022 0.033
  21. ## 5 0.002 0.005 0.007 0.011 0.021
  22. ## # ... with 145 more rows
  23. ##
  24. ## Predicted probabilities for Class = virginica
  25. ## 2.5% 25% 50% 75% 97.5%
  26. ## 1 0.002 0.004 0.006 0.009 0.014
  27. ## 2 0.003 0.007 0.010 0.015 0.028
  28. ## 3 0.003 0.006 0.009 0.013 0.030
  29. ## 4 0.004 0.008 0.012 0.017 0.034
  30. ## 5 0.002 0.004 0.006 0.008 0.015
  31. ## # ... with 145 more rows

Monitor convergence

  1. iplot_lb(mod)

Plot of training error over time

  1. iplot_error(mod)

Plot of fitted probabilities

  1. iplot_fitted(mod)


Copyright (C) 2017 Haziq Jamil.