Light curve classification prototyping
Currently focusing on broad classification of astronomical objects. Project in
hibernation as of May 2018. Like a bear.
LCML
environment variable to repo checkout’s pathexport LCML=/Users/*/code/light_curve_ml
)cd $LCML && pip install -e . --user
See instructions in conf/dev/ubuntu_install.txt
Supervised and unsupervised machine learning pipelines are run via therun_pipeline.py
entry point. It expects the path to a job (config) file and
file name for logger output. For example:
python3 lcml/pipeline/run_pipeline.py --path conf/local/supervised/macho.json
--logFileName super_macho.log
The pipeline expects a job file (macho.json
in above example) specifying the
configuration of the pipeline and detailed declaration of experiment parameters.
The specified job file supercedes and overrides the default job file
(conf/common/pipeline.json
) on a per field basis recursively. So any, or none,
of the default fields may be overridden. The default settings are located atconf/common/pipeline.json
.
Job files have the following structure:
globalParams
- Parameters used across multiple pipeline stagesdatabase
- All db config and table namesloadData
- Stage coverting raw data into coherent light curves preprocessData
- Stage cleaning and preprocessing light curvesextractFeatures
- Stage extracting features from cleaned light curvespostprocessFeatures
- Stage further processing extracted featuresmodelSearch
- Stage testing several ML models with differing hyperparametersfunction
- search function namemodel
- ML model spec including non-searched parametersparams
- parameters controlling the model searchserialization
- Stage persisting ML model and metadata to diskPipeline ‘stages’ are customizable processors. Each stage definition has the
following components:
skip
- Boolean determining whether stage should executeparams
- stage-specific parameterswriteTable
- name of db table to which output is written Some representative job files provided in this repo include:
local/supervised/fast_macho.json
- Runs tiny portion of MACHO datasetlocal/supervised/macho.json
- Full supervised learning pipeline for MACHOfeets
library for feature extraction and random forests forlocal/supervised/ogle3.json
- Ditto for OGLE3local/unsupervised/macho.json
- Unsupervised learning pipeline for MACHOlcml.data.acquisistion
- Scripts used to acquire and/or process variouslcml.poc
- One-off proof-of-concept scripts for various libariesThe LoggingManager
class allows for convenient customization of Python Logger
objects. The default Logging config is specified conf/common/logging.json
.
This config should contain the following main keys:
basicConfig
- values passed to logging.basicConfig
handlers
- handler definitions with a type
attribute, which may bestream
or file
modules
- list of module specific logger level settingsMain modules should initialize the manager by invoking LoggingManager.initLogging
at the start of execution before logger objects have been created.