项目作者: bcicen

项目描述 :
Streaming JSON parser for Go
高级语言: Go
项目地址: git://github.com/bcicen/jstream.git
创建时间: 2018-06-25T19:39:12Z
项目社区:https://github.com/bcicen/jstream

开源协议:MIT License

下载


jstream

#

GoDoc

jstream is a streaming JSON parser and value extraction library for Go.

Unlike most JSON parsers, jstream is document position- and depth-aware — this enables the extraction of values at a specified depth, eliminating the overhead of allocating encompassing arrays or objects; e.g:

Using the below example document:
jstream

we can choose to extract and act only the objects within the top-level array:

  1. f, _ := os.Open("input.json")
  2. decoder := jstream.NewDecoder(f, 1) // extract JSON values at a depth level of 1
  3. for mv := range decoder.Stream() {
  4. fmt.Printf("%v\n ", mv.Value)
  5. }

output:

  1. map[desc:RGB colors:[red green blue]]
  2. map[desc:CMYK colors:[cyan magenta yellow black]]

likewise, increasing depth level to 3 yields:

  1. red
  2. green
  3. blue
  4. cyan
  5. magenta
  6. yellow
  7. black

optionally, kev:value pairs can be emitted as an individual struct:

  1. decoder := jstream.NewDecoder(f, 2).EmitKV() // enable KV streaming at a depth level of 2
  1. jstream.KV{desc RGB}
  2. jstream.KV{colors [red green blue]}
  3. jstream.KV{desc CMYK}
  4. jstream.KV{colors [cyan magenta yellow black]}

Installing

  1. go get github.com/bcicen/jstream

Commandline

jstream comes with a cli tool for quick viewing of parsed values from JSON input:

  1. jstream -d 1 < input.json
  1. {"colors":["red","green","blue"],"desc":"RGB"}
  2. {"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}

detailed output with -v option:

  1. cat input.json | jstream -v -d -1
  2. depth start end type | value
  3. 2 018 023 string | "RGB"
  4. 3 041 046 string | "red"
  5. 3 048 055 string | "green"
  6. 3 057 063 string | "blue"
  7. 2 039 065 array | ["red","green","blue"]
  8. 1 004 069 object | {"colors":["red","green","blue"],"desc":"RGB"}
  9. 2 087 093 string | "CMYK"
  10. 3 111 117 string | "cyan"
  11. 3 119 128 string | "magenta"
  12. 3 130 138 string | "yellow"
  13. 3 140 147 string | "black"
  14. 2 109 149 array | ["cyan","magenta","yellow","black"]
  15. 1 073 153 object | {"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}
  16. 0 000 155 array | [{"colors":["red","green","blue"],"desc":"RGB"},{"colors":["cyan","magenta","yellow","black"],"desc":"CMYK"}]

Options

Opt Description
-d \ emit values at depth n. if n < 0, all values will be emitted
-kv output inner key value pairs as newly formed objects
-v output depth and offset details for each value
-h display help dialog

Benchmarks

Obligatory benchmarks performed on files with arrays of objects, where the decoded objects are to be extracted.

Two file sizes are used — regular (1.6mb, 1000 objects) and large (128mb, 100000 objects)

input size lib MB/s Allocated
regular standard 97 3.6MB
regular jstream 175 2.1MB
large standard 92 305MB
large jstream 404 69MB

In a real world scenario, including initialization and reader overhead from varying blob sizes, performance can be expected as below:
jstream