项目作者: shaunfg

项目描述 :
Improve local search of Optimal Classification Trees by implementing parallelization to expand the node search radius (Julia)
高级语言: Jupyter Notebook
项目地址: git://github.com/shaunfg/parallel-node-search.git
创建时间: 2020-11-18T15:19:54Z
项目社区:https://github.com/shaunfg/parallel-node-search

开源协议:

下载


Parallel Local Search on Optimal Classification Trees

(Shaun Fendi Gan, Arkira Tanglertsumpun, MIT Master of Business Analytics, Dec 2020)

The local-search heuristic was introduced as an attempt to remedy the limitations on the scalability of Bertsimas and Dunn’s Optimal Classification Tree mixed-integer optimization formulation. This repo examined the effectiveness of this heuristic as well as introducing four parallelized versions that could improve heuristic optimality.

Key Results

  • The Half-Split method was found to be the most scalable and time efficient, reducing run time by 30%.
  • Standard Multi-threading on random restarts did not have a significant improvement (<5%),
  • Deep Subtree Search Class Assignments were found to only benefit on datasets with many features (m > 15).

How to Use

Open benchmark.jl to find all functions used to test and benchmark different local search. Original local search can be found as the function LocalSearch in local_search.jl The MIO formulation implement in JuMP/ Gurobi can also be found in oct.jl

Files

  • local_search.jl- Main file containing serially optimized functions to run the local search heuristic
  • tree.jl - Serially optimized functions to create tree structures
  • local_search_z.jl - Contains function for Local Search with parallel Observation assignments
  • local_search_half.jl - Contains function for Local Search with parallel half splits
  • local_search_deep.jl - Contains function for Local Search for a deep subtree search
  • oct.jl - Contains the JuMP MIO Formulation for an Optimal Classification Tree

Testing Files

  • model_evaluation.jl - Contains function for calculating accuracy of a tree’s predictions
  • benchmark.jl - Contains code used to benchmark performance of different Local Search Functions
  • unit_test.jl - Contains function test_tree to verify Tree struct output from Local Search
  • testing OCT.ipynb - Contains IntepretableAI Optimal Classification Tree structure, used as reference
  • onethread_oct.ipynb - Contains MIO OCT function used to test on one thread only (Gurobi Default != 1)

Source: [Dimitris Bertsimas, Jack Dunn.Machine Learning under a ModernOptimization Lens. Dynamic Ideas LLC, Massachusetts, 2019]

Report

Illustration
Full Report