Countries synonyms in all languages for Solr
This repo contains the synonyms file with all countries in all languages for analyzers that accept Solr format. It can be used to configure a synonym token filter for explicit tokenization of country names in various languages to country names in English.
Countries data has been gathered from country-list.
If you use ElasticSearch, you can define a synonym token filter like this:
"filter" : {
"countries_synonyms" : {
"type" : "synonym",
"synonyms_path" : "countries_synonyms.txt"
}
}
Then use countries_synonyms
in any custom analyzer. You can find more information about Synonym Token Filter in the documentation.
In case if you need other explicit languages beside English you can generate a synonyms file yourself:
Download countries data
wget https://github.com/umpirsky/country-list/archive/master.zip
Extract it
unzip -e master.zip
Check all available languages
ls country-list-master/data
Run generator with a language option. Here is the example for Russian language
ruby main.rb ru_RU