Multidimensional Scaling using Cliques (MDS-Clique)
Generates cliques using MDS-Clique from topics extracted by LDA from a corpus.
Uses Python 3 and various Python libraries (gensim
, networkx
, scikit-learn
, etc.)
virtualenv -p python3 venv
)source venv/bin/activate
(you should now see something like (venv)
in your console)pip install -r requirements.txt
v0.19.0
python setup.py build
and then python setup.py install
(this may take a while)config.ini
python corpus.py
python gen_topic.py
python sim.py -dim 2 -data corpus
Run corpus.py
with config.ini
filled out (see config.ini section) which reads a corpus (a directory of text documents) for pre-processing (e.g. stemming and tokenization). Then execute gen_topic.py
which uses the output artifacts of corpus.py
in order to perform LDA topic modeling over the pre-processed corpus. Then execute sim.py
which will either perform a specific experiment or simply execute MDS-Clique (see sections below).
sim.py
Execute python sim.py --help
(make sure you are in your virtualenv) and go through the commands
sim.py
Generate a random pre-computed dissimilarity matrix, run MDS-Clique using the standard deviation (named stress
) measure, it will write cliques to out/cliques_<num>
python sim.py -dim 2 -data random --matrix -clique stress
Use extracted LDA topics and run MDS-Clique using the distance measure, write cliques to out/cliques_<num>
, note that there -clusters <num>
needs to be higher than the number of topics extracted or an error will be thrownpython sim.py -dim 2 -data corpus -clusters 3 -clique distance
Run the RMDS experiment, set -data none
since each sample will generate its own random data setpython sim.py -dim 2 -data none --matrix -clique stress --rmds
Each experiment is denoted with a flag --<experiment_codename>
, by default an experiment will run 8 samples, you can manually specify number of samples with -e <num_samples>
, and utilize 1/4 of the max cores available on the system, you can manually specify number of cores with -c <num_cores>
Relative MDS experiment (k
-values are hard-coded)python sim.py -dim 2 -data random --matrix --relative
MDS-Clique RMDS experimentpython sim.py -dim 2 -data none --matrix --rmds
MDS-Clique experimentpython sim.py -dim 2 -data none --matrix --rclique
Relative Online experimentpython sim.py -dim 2 -data none --matrix --relativeonline
Online Clique experimentpython sim.py -dim 2 -data none --matrix -clique stress --onlineclique
Online experimentpython sim.py -dim 2 -data none --matrix --online
config.ini
corpus
: directory to the corpus (text documents)corpus.py
mds_seed
: set MDS random_state
sim.py
config.ini
[Global]
corpus = /path/to/sample-corpus/
mds_seed = 7
Run in interactive debug modeipython -i -c "%run -dim 2 -data corpus" --pdb
You may need to manually create store
, out/final
, out/experiment
, and out/ident
directories