项目作者: wosaku

项目描述 :
Python function to generate a mask analysis
高级语言: Jupyter Notebook
项目地址: git://github.com/wosaku/data-profiling-mask-analyzer.git
创建时间: 2017-07-22T07:23:22Z
项目社区:https://github.com/wosaku/data-profiling-mask-analyzer

开源协议:

下载


Data Profilig Mask Analyzer

A python custom function to generate a mask analysis.

The mask analysis or string pattern is useful for fields like city, postal code, phone, etc. It shows us how the fields have been populated and we can infer some data quality issues.

Rules:

  • lower case letter returns ‘l’
  • Capital case letter returns ‘L’
  • Number returns ‘D’
  • Space returns ‘s’
  • Special character returns itself
  • Missing value returns ‘-null-‘

Examples:

  • ‘Van’ returns ‘Lll’
  • ‘VAN’ returns ‘LLL’
  • ‘Van BC’ returns ‘LllsLL’
  • ‘+1 123-1234-5555 returns ‘+DsDDD-DDDD-DDDD’
  • The standard for the Canadian Postal Code should be ‘LDLsDLD’