项目作者: mendab1e

项目描述 :
Countries synonyms in all languages for Solr
高级语言: Ruby
项目地址: git://github.com/mendab1e/SolrCountriesSynonyms.git
创建时间: 2018-01-07T14:18:13Z
项目社区:https://github.com/mendab1e/SolrCountriesSynonyms

开源协议:

下载


SolrCountriesSynonyms

This repo contains the synonyms file with all countries in all languages for analyzers that accept Solr format. It can be used to configure a synonym token filter for explicit tokenization of country names in various languages to country names in English.

example

Countries data has been gathered from country-list.

Usage

If you use ElasticSearch, you can define a synonym token filter like this:

  1. "filter" : {
  2. "countries_synonyms" : {
  3. "type" : "synonym",
  4. "synonyms_path" : "countries_synonyms.txt"
  5. }
  6. }

Then use countries_synonyms in any custom analyzer. You can find more information about Synonym Token Filter in the documentation.

Tokenization to other languages

In case if you need other explicit languages beside English you can generate a synonyms file yourself:

Download countries data

  1. wget https://github.com/umpirsky/country-list/archive/master.zip

Extract it

  1. unzip -e master.zip

Check all available languages

  1. ls country-list-master/data

Run generator with a language option. Here is the example for Russian language

  1. ruby main.rb ru_RU