项目作者: atecon

项目描述 :
Forward stagewise sparse regression estimation implemented for gretl.
高级语言: Python
项目地址: git://github.com/atecon/fsboost.git
创建时间: 2019-04-05T16:32:26Z
项目社区:https://github.com/atecon/fsboost

开源协议:GNU General Public License v3.0

下载


The gretl fsboost package

Package for computing forward-stagewise shrinkage and selection regressions.

Introduction

So called shrinkage and/ or selection estimators such as Ridge or Lasso among others are known to handle such issues by imposing an additional restriction to an otherwise ordinary least square setting. Another alternative estimation approach is the so called forward-stagewise regression approach (fsboost henceforth).

fsboost is a simple strategy for constructing a sequence of sparse regression estimates: Initially set all coefficients to zero, and iteratively update the coefficient (by a small amount, depending on the learning rate) of the variable that achieves (under quadratic loss) the maximal absolute correlation with the current residual.
Learning from the residuals has some connection to an approach known as boosting in the machine-learning community.

References

  • Hastie, T., Taylor, J., Tibshirani R. and Walther G. (2007): “Forward stagewise regression and the monotone lasso”, Electronic Journal of Statistics, Vol. 1, 1-29.
  • Tibshirani, R. (2015): “A General Framework for Fast Stagewise Algorithms”, Journal of Machine Learning Research, 16, 2543-2588.

Features

  • Support for linear regression.
  • Simple API.
  • Plot coefficient paths.
  • GUI access through the gretl menu.

Detailed help file

A detailed help file can be found here: https://github.com/atecon/fsboost/blob/master/docs/fsboost.pdf

Installation and usage

Get the package from the gretl package server and install it:

  1. pkg install fsboost

GUI interface

Once the package is installed, the user can access the GUI interface via the “Model —> Other linear models —> Forward Stagewise” menu. The interface will look like this:

sample

Simple scripting example

Here is a sample script on how to use it (see also: https://raw.githubusercontent.com/atecon/fsboost/master/src/fsboost_sample.inp):

First, load the package and open a the well-known cross-sectional data set mroz87.gdt (723 observations). In this example, we ‘model’ the hourly wages of women, WW, by means of 17 features (exogenous variables). The fsreg() function calles the the linear regression computation. All relevant output is stored in the returned bundle (a kind of struct data type) named B here:

  1. clear
  2. set verbose off
  3. include fsboost.gfn
  4. list RHS = const dataset
  5. RHS -= LHS WW # drop endogenous variable
  6. bundle B = fsreg(WW, RHS, opts) # Run estimation
  7. print B # Print content of the returned bundle

Estimation does not even take half a second.

Once the computation is finished, the user can print the summary results by means of the print_fsboost_results() function:

  1. print_fsboost_results(B)

The printed table looks like this:

  1. Forward-stagewise regression results (no inference)
  2. -------------------------------------------------------
  3. coefficient std. error z p-value
  4. ---------------------------------------------------
  5. const -1.24238 NA NA NA
  6. LFP 2.60587 NA NA NA
  7. WHRS -0.000284255 NA NA NA
  8. WE 0.132787 NA NA NA
  9. RPWG 0.494871 NA NA NA
  10. FAMINC 1.02652e-05 NA NA NA
  11. MTR -0.644518 NA NA NA
  12. Learning rate = 0.0002
  13. Number of iterations = 4964
  14. Correl. w. residuals = -0.0578633
  15. S.E. of regression = 2.18792
  16. R-squared = 0.544504
  17. R-squared alt. = 0.547703

The list of the active set (variable’s with non-zero coefficients) can be retrieved from the resulting model bundle:

  1. list X_final = B.X_final # Retrieve list of selected regressors
  2. eval varnames(X_final) # Print names of selected regressors

The estimated coefficient paths can be plotted through the function plot_coefficient_paths():

  1. plot_coefficient_paths(B)

The resulting plot looks similar to the following one:

sample

For more details on information available read the pdf help.

Unit-Tests

The gretl script including unit-tests can be found under “./tests/run_tests.inp”. The coverage is already quite high (probably > 75%). The script can be executed through the shell script “./run_tests.sh”.

Changelog

v0.1, September 2020

  • initial release