项目作者: geniusai-research

项目描述 :
A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.
高级语言: Python
项目地址: git://github.com/geniusai-research/email-summarization.git
创建时间: 2018-08-01T15:02:10Z
项目社区:https://github.com/geniusai-research/email-summarization

开源协议:

下载


email-summarization

A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.

This code in this repository compliments this Medium article.

Instructions

  • The code is written in Python 2.
  • The module uses code of the Skip-Thoughts paper which can be found here. Do:
    1. git clone https://github.com/ryankiros/skip-thoughts
  • The code for the skip-thoughts paper uses Theano. Make sure you have Theano installed and GPU acceleration is functional for faster execution.
  • Clone this repository and copy the file email_summarization.py to the root of the cloned skip-thoughts repository. Do:
    1. git clone https://github.com/jatana-research/email-summarization
    2. cp email-summarization/email_summarization.py skip-thoughts/
  • Install dependencies. Do:
    1. pip install -r email-summarization/requirements.txt
    2. python -c 'import nltk; nltk.download("punkt")'
  • Download the pre-trained models. The total download size will be of around 5 GB. Do:
    1. mkdir skip-thoughts/models
    2. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/dictionary.txt
    3. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/utable.npy
    4. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/btable.npy
    5. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz
    6. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz.pkl
    7. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz
    8. wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz.pkl
  • Verify the MD5 hashes of the downloaded files to ensure that the files haven’t been corrupted during the download. Do:
    1. md5sum skip-thoughts/models/*
    The output should be:
    1. 9a15429d694a0e035f9ee1efcb1406f3 bi_skip.npz
    2. c9b86840e1dedb05837735d8bf94cee2 bi_skip.npz.pkl
    3. 022b5b15f53a84c785e3153a2c383df6 btable.npy
    4. 26d8a3e6458500013723b380a4b4b55e dictionary.txt
    5. 8eb7c6948001740c3111d71a2fa446c1 uni_skip.npz
    6. e1a0ead377877ff3ea5388bb11cfe8d7 uni_skip.npz.pkl
    7. 5871cc62fc01b79788c79c219b175617 utable.npy
  • Change Lines:23-24 in the file skip-thoughts/skipthoughts.py to provide the correct paths to the downloaded models.
    1. path_to_models = 'models/'
    2. path_to_tables = 'models/'

Running the module

  • Find any English emails dataset online or create a small one on your own.
  • The module expects a list of emails as input and returns a list of summaries.
  • Open the Python interpreter in the skip-thoughts/ folder and do:
    1. >>> from email_summarization import summarize
    2. >>> summaries = summarize(emails) # emails is a Python list containing English emails.