项目作者: handsomezebra

项目描述 :
Fast fuzzy string search
高级语言: C++
项目地址: git://github.com/handsomezebra/ffsearch.git
创建时间: 2018-03-09T23:59:29Z
项目社区:https://github.com/handsomezebra/ffsearch

开源协议:MIT License

下载


Fast fuzzy string search

Usage

cd src
make
./ffsearch <query_file_name> <dictionary_file_name>

Example data

cd test_data
./download_data.sh
../src/ffsearch noisy_entity.txt entity_name_unique.txt
../src/ffsearch noisy_query_en_1000.txt frequency_dictionary_en_500_000.txt

Benchmark

  • using entity_name_unqiue.txt
  • 4,243,940 strings
  • 92MB file size
  • Machine: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

Our

  • 981 MB memory, 5s for loading
  • “NewYork”: 10 results, 0.225 ms
  • “Loma Linda Estate Coloni”: 1 result, 0.0018 ms

SymSpell