项目作者: pskrunner14

项目描述 :
Boolean Query Model for Information Retrieval in Python
高级语言: Python
项目地址: git://github.com/pskrunner14/info-retrieval.git
创建时间: 2018-09-12T18:17:01Z
项目社区:https://github.com/pskrunner14/info-retrieval

开源协议:MIT License

下载


Boolean Query Model for IR

Codacy Badge

This is a Boolean Query Model for Information Retrieval in Python. Information retrieval is the activity of obtaining information system resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. We only implement text based indexing in this project. We use a Boolean Query Model to retrieve relevant information from our documents. The Boolean model of information retrieval is a classical information retrieval model and, at the same time, the first and most-adopted one. It is used by many IR systems to this day.

Getting Started

To be able to run the search script, you’ll need a few dependencies first:

  1. pip install nltk

You also need to download and install Python Algorithms Library from sources using:

  1. cd python-algorithms/
  2. python setup.py install

Once all that is done, change the docs and stop_words lists in search.py and get searching:

  1. python search.py

Results

  1. ~$ python search.py
  2. INVERTED INDEX:
  3. hello: [1, 2]
  4. i: [1]
  5. m: [1]
  6. machin: [1, 4]
  7. learn: [1, 4]
  8. engin: [1, 2, 4]
  9. bad: [2, 3]
  10. world: [2, 3]
  11. peopl: [2]
  12. place: [3]
  13. great: [4]
  14. that: [4]
  15. Enter boolean query: machine AND engineer
  16. Processing time: 0.00031224 secs
  17. Doc IDS:
  18. [1, 4]
  19. Enter boolean query: hello OR machine AND NOT engineer
  20. Processing time: 0.00019799 secs
  21. Doc IDS:
  22. [1, 2]

Built With

  • Python
  • NLTK