项目作者: pnb

项目描述 :
Unsupervised Deep Autoencoders for Feature Extraction with Educational Data
高级语言: Python
项目地址: git://github.com/pnb/dlwed17.git
创建时间: 2017-06-27T15:32:08Z
项目社区:https://github.com/pnb/dlwed17

开源协议:MIT License

下载


Unsupervised Deep Autoencoders for Feature Extraction with Educational Data

This repository contains the code for the paper (see bosch-dlwed17-camera.pdf) presented at the
Deep Learning with Educational Data workshop at the
2017 Educational Data Mining conference.

Citation

Bosch, N., & Paquette, L. (2017). Unsupervised deep autoencoders for feature extraction with educational data. In Deep Learning with Educational Data Workshop at the 10th International Conference on Educational Data Mining.

Requirements

The code was tested with Keras 2.0.3 and Tensorflow 1.1.0 neural network libraries.

Data were from Betty’s Brain. These data
are required for the code to run, and are not publicly available. However, the code could be
(relatively) easily adapted to another dataset.

Model-building steps

Model building generally consists of data preprocessing, autoencoder feature extraction, and
supervised learning phases.

Data preprocessing

  1. preprocess_bromp.py - takes raw BROMP files created by the HART application and combines them
    into an easily-used format
  2. preprocess_timeseries.py - creates timeseries (evenly spaced in time) data from Betty’s Brain
    interaction logs
  3. preprocess_seq.py - creates sequences suitable for training RNN models from the timeseries
    data; sequences are saved to numpy binary files for faster loading later

Autoencoder feature extraction

  1. ae_lstm.py - this and similar files (e.g., vae_lstm.py) trains the autoencoders
  2. extracy_embeddings.py - takes a trained model, feeds in data sequences, and saves the
    embeddings generated by the model to be used as features for supervised models
  3. align_embeddings+labels.py - matches up BROMP affect/behavior labels to the embeddings
    extracted from a model, saving only the rows with labels to create a file with features and labels
    which can be used for supervised learning

Supervised learning

  1. supervised/ae_feats_test.py - trains a decision tree (CART) model with the autoencoder features
  2. supervised/expert_feats_extract.py - extracts some simple features with the traditional method
    (manual design by experts) of feature extraction for model building
  3. supervised/expert_feats_test.py - builds a model using the expert features to serve as a
    baseline

Visualization

visualize_activations.py generates images of model activations by feeding in a random subset of
samples to a trained autoencoder and creating histograms of the activations of every layer in the
network. For layers with several neurons (> 15), a subset of neurons is sampled to create a more
tractable image.

The model structure is also visualized (requires the pydot package).