项目作者: patrick-llgc

项目描述 :
Fuzzy matching of company records
高级语言: Python
项目地址: git://github.com/patrick-llgc/fuzzy_match_company_name.git
创建时间: 2019-01-28T06:53:43Z
项目社区:https://github.com/patrick-llgc/fuzzy_match_company_name

开源协议:

下载


Fuzzy Matching of Company Names

This repo documents the functions to match company record by the following criteria:

  • company name (with fuzzy matching)
  • distance from zipcode or tuple of city and state
  • street address (wip)

error log

  1. [4, 14, 21, 26, 27, 37, 38, 42, 45, 47, 49, 56, 65, 72, 75, 76, 90, 96]
  2. with jelly
  3. 4: 86 mismatch
  4. 14: 86 mismatch
  5. 21: 91, 85
  6. 26: 86: # if in NY, then use 5 miles as cutoff
  7. 27: 86 NJ
  8. 37: BEAR STEARN', 'Bear Stearns Asset Management Inc.') # maybe there is a bug
  9. 38: add academy, healthcare to keys,
  10. 42: add property to keys
  11. 45: need to merge record in target dataset,SEQA and seneca
  12. 47: rid of 'the'
  13. 49: more than 30 miles away
  14. 56: SATURN CAPITAL MANAGEMENT LLC', 'Saturn Partners, LLC'
  15. 65: add lab to keys
  16. 72: add america, asia to keys
  17. 75: add health to keys
  18. 76: RFI intl vs RFI investments
  19. 90: 87 score
  20. 96: add law to keys, 85 scores
  21. [16, 26, 28, 37, 45, 49, 56, 60, 86]
  22. with fuzzy
  • add software, academy, school to keys
  • adjust threahold based on how far away they are