项目作者: KISysBio

项目描述 :
Mixed Integer Piecewise Regression Algorithm with Regularisation
高级语言: Python
项目地址: git://github.com/KISysBio/OPLRAreg.git
创建时间: 2018-02-16T16:37:46Z
项目社区:https://github.com/KISysBio/OPLRAreg

开源协议:GNU General Public License v3.0

下载


OPLRAreg

Mixed Integer Piecewise Linear Regression Algorithm with Regularisation

OPLRAreg is a regression technique based on mathematical programming that splits data into separate regions and
identifies independent linear equations for each region.

This repository only contains the regression algorithm and is meant for general purpose use. For the application on Quantitative Structure-Activity Relationship models, please refer to: https://github.com/KISysBio/qsar-models .

Requirements

To run oplrareg, you will need:

Installing

The easiest way to install these packages is using pip:

  1. pip3 install oplra_reg

Alternatively, you can download one release and from within the extracted directory, type:

  1. pip3 install -e .

This will install oplrareg package along with its dependencies and it will also create a command oplrareg,
which can be used to run OPLRAreg algorithm.

Dependencies

The following packages will be automatically installed with oplrareg:

  1. - Pyomo 5.3
  2. - scikit-learn 0.19.0
  3. - scipy 0.19.1
  4. - numpy 1.13.1
  5. - pandas 0.20.3

Running OPLRAreg

The command oplrareg runs OplraRegularised on the provided data, it accepts tabular and other space delimited files
(.tab, .data, .dat), csv files (.csv) and Excel spreadsheets (.xls and .xlsx).

The only thing to keep in mind is that all columns must be integer/numeric and the outcome variable to be predicted
must be the last column in the data.

Run it with:

  1. oplrareg --input <filename>

or

  1. oplrareg -i <filename>

This will execute OPLRAreg with default parameters:

  • lambda = 0.005 (Regularisation parameter. If lambda = 0, no regularisation is used, ideal values are within 0.001 and 0.2)
  • beta = 0.03 (Stopping criteria, lower values will slow down the algorithm)
  • epsilon = 0.01 (Interval between regions, lower values will slow down the algorithm)
  • solver = glpk (If you have a commercial license for CPLEX, use ‘cplex’ instead)

To test different parameters, pass one of the arguments to the oplrareg.py script:

  1. oplrareg --input <filename> --lambda 0.005 --beta 0.03 --epsilon 0.01 --solver glpk

Advanced options

It is possible to control which feature will be used to partition the data.
Simply pass the command line argument --partition_feature along with the desired column name:

Example:

  1. oplrareg --input yacht_hydrodynamics.data --partition_feature beam_draught_ratio

By default, OPLRAreg will determine the best number of regions to fit the data but, if desirable, you can control
the number of regions with the parameter --regions:

  1. oplrareg --input yacht_hydrodynamics.data --partition_feature beam_draught_ratio --regions 2