项目作者: mibarg

项目描述 :
HyperLogLog with intersection
高级语言: Python
项目地址: git://github.com/mibarg/py-hyperminhash.git
创建时间: 2020-07-01T10:19:12Z
项目社区:https://github.com/mibarg/py-hyperminhash

开源协议:MIT License

下载


HyperMinHash

This repository is a Python>=3.6 port of golang hyperminhash:

Besides being a compact and pretty speedy HyperLogLog implementation for cardinality counting, this modified HyperLogLog allows intersection and similarity estimation of different HyperLogLogs.

Install

  1. pip install hyperminhash

Example Usage

  1. from hyperminhash import HyperMinHash
  2. sk1 = HyperMinHash()
  3. sk2 = HyperMinHash()
  4. for i in range(10000):
  5. sk1.add(i)
  6. print(len(sk1))
  7. # 10001 (should be 10000)
  8. for i in range(3333, 23333):
  9. sk2.add(i)
  10. print(len(sk2))
  11. # 19977 (should be 20000)
  12. print(sk1.similarity(sk2))
  13. # 0.284589082 (should be 0.2857326533)
  14. print(sk1.intersection(sk2))
  15. # 6623 (should be 6667)
  16. sk1.merge(sk2)
  17. print(sk1.cardinality())
  18. # 23271 (should be 23333)