项目作者: sayarghoshroy

项目描述 :
A Search Engine for the Wikipedia Dump built from scratch
高级语言: Jupyter Notebook
项目地址: git://github.com/sayarghoshroy/wikisearch.git
创建时间: 2020-08-22T16:09:36Z
项目社区:https://github.com/sayarghoshroy/wikisearch

开源协议:

下载


Wikisearch

A Search Engine for the Wikipedia Dump built from scratch

  • Refer to ./problem_statements for complete specifications of requirements and deliverables

The raw dump can be accessed here.

  • File paths in phase_2_src are currently referencing the storage structure on the IIIT Ada Server
  • Make appropriate path changes prior to running the index maker and search functionalities

Sample Query

Top 10 results for t:presidential campaign b:obama i:2012

Results

Doc ID Doc Title
33115 barack obama 2012 presidential campaign
6222757 barack obama
2693794 barack obama 2008 presidential campaign
890287 2012 united states presidential election
8217429 list of barack obama 2008 presidential campaign endorsements
1518579 barack obama 2008 presidential primary campaign
885341 presidency of barack obama
6149017 2008 united states presidential election
5646093 john mccain 2008 presidential campaign
574345 2012 democratic national convention
Time Taken: 4.5 seconds