项目作者: DrompiX

项目描述 :
Document summarisation and query expansion
高级语言: Python
项目地址: git://github.com/DrompiX/ir_hw2.git
创建时间: 2019-04-04T12:31:47Z
项目社区:https://github.com/DrompiX/ir_hw2

开源协议:

下载


Document summarization & query expansion - Information Retrieval homework 2, Innopolis University

How to run the code?

  • Download two folders (data.nosync and engine_data) from my Google Drive
  • Put both folders in a root directory of the project
  • Make sure that Python version is 3.7+
  • Install all required packages by running pip3 install -r requirements.txt
    P.S: better to use virtual environment

Now you can run the code by simply typing python3 doc_sum.py for document summarization task and python3 query_exp.py for query expansion one.
To provide any other query for document summarization, please consider changin code in doc_sum.py in line query = "your query here" in launch() function.

If you will have a problem with nltk (probably not loaded datasets), please use

  1. import nltk
  2. nltk.download('wordnet') # required for query expansion
  3. nltk.download('stopwords') # required for both parts