项目作者: eachain

项目描述 :
A simple Golang implementation of Aho-Corasick algorithm.
高级语言: Go
项目地址: git://github.com/eachain/aca.git
创建时间: 2017-06-29T09:10:47Z
项目社区:https://github.com/eachain/aca

开源协议:

下载


ACA (Aho-Corasick automation)

Golang implementation of Aho-Corasick algorithm.

Aho-Corasick automation,1975年产生于贝尔实验室,著名的多模匹配算法。

Prerequisite

  • Golang 1.7+

Preview

Debug mode, you can print the hole trie (with color).

调试模式下,可以把整棵树(带颜色地)打印出来.

  1. a := aca.New()
  2. a.Add("abcdefg")
  3. a.Add("abcd")
  4. a.Add("bcdefg")
  5. a.Add("cde")
  6. a.Add("cdefg")
  7. a.Add("defg")
  8. a.Add("efg")
  9. a.Add("fg")
  10. a.Add("g")
  11. a.Add("abcdeg")
  12. a.Build()
  13. a.Debug()

Then, this is the trie:

然后, 这是整棵树:

  1. .[0](fail->0)
  2. ├── a[1](fail->0)
  3. | └── b[2](fail->9)
  4. | └── c[3](fail->10)
  5. | └── d[4](fail->11)
  6. | └── e[5](fail->12)
  7. | ├── f[6](fail->13)
  8. | | └── g[7](fail->14)
  9. | └── g[8](fail->29)
  10. ├── b[9](fail->0)
  11. | └── c[10](fail->15)
  12. | └── d[11](fail->16)
  13. | └── e[12](fail->17)
  14. | └── f[13](fail->18)
  15. | └── g[14](fail->19)
  16. ├── c[15](fail->0)
  17. | └── d[16](fail->20)
  18. | └── e[17](fail->21)
  19. | └── f[18](fail->22)
  20. | └── g[19](fail->23)
  21. ├── d[20](fail->0)
  22. | └── e[21](fail->24)
  23. | └── f[22](fail->25)
  24. | └── g[23](fail->26)
  25. ├── e[24](fail->0)
  26. | └── f[25](fail->27)
  27. | └── g[26](fail->28)
  28. ├── f[27](fail->0)
  29. | └── g[28](fail->29)
  30. └── g[29](fail->0)

打印规则: “字符[ID](fail->ID)”,如果是完整单词,会在后面出现”√”。

Usage

1. Find out all words appear in a sentence.(查找所有出现过的词)

REMEBER: Your call sequence should always be: “Add/Del”, “Build”, and then “Find”.

注意: 你应该永远以”Add/Del”, “Build”, “Find”的顺序调用。

If you call “Find” before “Build”, it will panic.

如果你在”Build”之前调用了”Find”,你的程序会挂逼。

  1. package main
  2. import (
  3. "fmt"
  4. "github.com/eachain/aca"
  5. )
  6. func main() {
  7. a := aca.New()
  8. a.Add("say")
  9. a.Add("erh")
  10. a.Add("she")
  11. a.Add("shr")
  12. a.Del("erh")
  13. a.Add("he")
  14. a.Del("shr")
  15. a.Add("her")
  16. a.Build()
  17. words := a.Find("yasherhs")
  18. fmt.Println(words) // prints: [she he her]
  19. }

2. You should write uniq function yourself.(你应该自己实现唯一函数。)

For example:

  1. func Uniq(words []string) []string {
  2. if len(words) == 0 {
  3. return words
  4. }
  5. sort.Strings(words)
  6. n := 0
  7. for i := 1; i < len(words); i++ {
  8. if words[n] != words[i] {
  9. n++
  10. words[n] = words[i]
  11. }
  12. }
  13. return words[:n+1]
  14. }

3. If you want any concurrency for both read and write, encapsulate it yourself.(如果你想要读写并发安全,你应该自己封装实现。)

For example:

  1. type Filter struct {
  2. a *aca.ACA
  3. mtx sync.RWMutex
  4. }
  5. func (f *Filter) AddAndBuild(word string) {
  6. f.mtx.Lock()
  7. if f.a == nil {
  8. f.a = aca.New()
  9. }
  10. f.a.Add(word)
  11. f.a.Build()
  12. f.mtx.Unlock()
  13. }
  14. func (f *Filter) DelAndBuild(word string) {
  15. f.mtx.Lock()
  16. f.a.Del(word)
  17. f.a.Build()
  18. f.mtx.Unlock()
  19. }
  20. func (f *Filter) Find(s string) []string {
  21. f.mtx.RLock()
  22. words := f.a.Find(s)
  23. f.mtx.RUnlock()
  24. return words
  25. }

But if you just build once, and only find after that, it is concurrently-read-secure. Like this:

如果你只会build一次,之后只find,那它本身是并发读安全的。

  1. package main
  2. import (
  3. "fmt"
  4. "time"
  5. "github.com/eachain/aca"
  6. )
  7. var globalAca *aca.ACA
  8. func init() {
  9. a := aca.New()
  10. a.Add("say")
  11. a.Add("she")
  12. a.Add("shr")
  13. a.Add("he")
  14. a.Add("her")
  15. a.Build()
  16. globalAca = a
  17. }
  18. func main() {
  19. for i := 0; i < 100; i++ {
  20. go func() {
  21. words := globalAca.Find("yasherhs")
  22. fmt.Println(words) // prints: [she he her]
  23. }()
  24. }
  25. time.Sleep(time.Second)
  26. }

Contribution

If you want to participate, you can create an issue or request a ‘Pull Request’.

Welcome any and all suggestions.