项目作者: mongoeye

项目描述 :
Schema and data analyzer for MongoDB written in Go.
高级语言: Go
项目地址: git://github.com/mongoeye/mongoeye.git
创建时间: 2017-04-10T14:57:17Z
项目社区:https://github.com/mongoeye/mongoeye

开源协议:GNU General Public License v3.0

下载


Schema and data analyzer for MongoDB written in Go.

GoDoc
Coverage Status
Build Status
Go Report Card

Overview

Mongoeye provides a quick overview of the data in your MongoDB database.

Key features

  • Fast: the fastest schema analyzer for MongoDB
  • Single binary: pre-built binaries for Windows, Linux, and MacOS (Darwin)
  • Local analysis: quick local analysis using a parallel algorithm (MongoDB 2.0+)
  • Remote analysis: distributed analysis in database using the aggregation framework (MongoDB 3.5.10+)
  • Rich features: histogram (value, length, weekday, hour), most frequent values, …
  • Integrable: table, JSON or YAML output

Demo



Table of Contents

Installation

Mongoeye is one executable binary file.

You can download the archive from GitHub releases page and extract the binary file for your platform.

Compilation

It is required to have Go 1.8. All external dependencies are part of the repository in the vendor directory.

Compilation process:

  1. $ go get github.com/mongoeye/mongoeye
  2. $ cd $GOPATH/src/github.com/mongoeye/mongoeye
  3. $ make build

For development, you need additional dependencies that can be installed using make get-deps.

The test architecture uses the Docker to create the testing MongoDB database.

If you want to contribute to this project, see the actions in Makefile and the _contrib directory.

Usage

  1. mongoeye [host] database collection [flags]

The command mongoeye --help lists all available options.

Table output

Default output format is table. It shows only schema without other analyzes.

Example table output:

  1. KEY COUNT %
  2. ────────────────────────────────────────────
  3. all documents 2548
  4. analyzed documents 1000 39.2
  5. _id - objectId 1000 100.0
  6. address 1000 100.0
  7. - int 1 0.1
  8. └╴- string 999 99.9
  9. address line 2 - string 1000 100.0
  10. name - string 1000 100.0
  11. outcode - string 1000 100.0
  12. postcode - string 1000 100.0
  13. rating 1000 100.0
  14. - int 523 52.3
  15. - double 451 45.1
  16. └╴- string 26 2.6
  17. type_of_food - string 1000 100.0
  18. URL - string 1000 100.0
  19. OK 0.190s (local analysis)
  20. 1000/2548 docs (39.2%)
  21. 9 fields, depth 2

JSON and YAML output

Use --format json or --format yaml flags to set these formats.

For output to a file use the option -F /path/to/file.

Features

This chapter explains the features of Mongoeye and their various outputs.

Use --format json or --format yaml to get detailed results, otherwise only the schema table will appear.

The output of the analysis always contains these basic keys:

  • database: database name
  • collection: collection name
  • plan: local for local analysis, db for analysis using aggregation framework
  • duration: duration of analysis
  • allDocs: number of all documents in collection
  • analyzedDocs: number of analyzed documents from collection
  • fieldsCount: number of found fields
  • fields: result of the analysis for each field
    • name: name of field
    • level: level of nested field, 0 is root level
    • count: number of occurrences
    • types: result of the analysis for each type of field
      • type: name of type
      • count: number of occurrences of type

Example result:

  1. database: company
  2. collection: users
  3. plan: local
  4. duration: 46.515331ms
  5. allDocs: 2548
  6. analyzedDocs: 1000
  7. fieldsCount: 9
  8. fields:
  9. - name: rating
  10. level: 0
  11. count: 1000
  12. types:
  13. - type: int
  14. count: 549
  15. < other outputs according to settings >

Value - min, max, avg

Use the flag --value or -v to enable calculation of minimum, maximum, and average values.

Supported types:

  • Minimum and maximum: objectId, double, string, bool, date, int, timestamp, long, decimal
  • Average: double, bool, int, long, decimal

Example result:

  1. value:
  2. min: 11.565586
  3. max: 60.206787
  4. avg: 38.51128

Length - min, max, avg

Use the flag --length or -l to enable calculation of minimum, maximum, and average lengths.

Supported types: string, array, object

Example result:

  1. length:
  2. min: 29
  3. max: 153
  4. avg: 112

Number of unique values

Use the flag --count-unique to count all unique values.

Supported types: double, string, date, int, timestamp, long, decimal

Example result:

  1. unique: 894

Frequency of values

Use the flag --most-freq N or --least-freq N to get the most or least occurring values.

Supported types: double, string, date, int, timestamp, long, decimal

Example result:

  1. mostFrequent:
  2. - value: USD
  3. count: 599
  4. - value: EUR
  5. count: 21
  6. - value: GBP
  7. count: 5
  8. - value: CAD
  9. count: 4
  10. leastFrequent:
  11. - value: EUR
  12. count: 21
  13. - value: GBP
  14. count: 5
  15. - value: CAD
  16. count: 4
  17. - value: JPY
  18. count: 3

Value histogram

Use the flag --value-hist or -V to generate value histogram.

Supported types: objectId - processed as a date, double, date, int, long, decimal

Calculation of step

Flag --value-hist-steps sets the maximum number of steps (default 100).

  • Step of the int and long type is a whole number
  • Step of the double and decimal type is:
    • the smallest possible multiplication of [1, 5 or 2.5] and 10^n so the max. number of steps is kept
    • eg. …, 100, 50, 25, 10, 5, 2.5, 1, 0.5, 0.25, 0.1, …
  • Step of the date and objectId type is rounded to:
    • 1, 2, 5, 10, 15, 30 seconds
    • 1, 2, 5, 10, 15, 30 minutes
    • 1, 2, 3, 6, 12 hours
    • 1, 2, 3, 4, … days

Example result:

  1. valueHistogram:
  2. start: 2.5
  3. end: 12
  4. range: 9.5
  5. step: 0.5
  6. numOfSteps: 19
  7. intervals: [36, 25, 14, 81, 95, 86, 59, 6, 82, 84, 62, 33, 19, 9, 1, 14, 67, 2, 45]

Graphic representation:



Length histogram

Use the flag --length-hist or -L to generate length histogram.

Flag --length-hist-steps sets the maximum number of steps (default 100).

Supported types: string, array, object

Example result:

  1. lengthHistogram:
  2. start: 0
  3. end: 300
  4. range: 300
  5. step: 50
  6. numOfSteps: 6
  7. intervals: [96, 78, 3, 1, 1, 0]

Weekday histogram

Use the flag --weekday-hist or -W to generate weekday histogram.

To determine the day of week it uses the time zone from the --timezone flag (default local).

First day is Sunday.

Example result:

  1. weekdayHistogram: [5, 48, 23, 124, 45, 15, 87]

Hour histogram

Use the flag --hour-hist or -H to generate weekday histogram.

To determine the hour it uses the time zone from the --timezone flag (default local).

First value is for interval [ 00, 01 ), last for interval [ 23, 24 ).

Example result:

  1. hourHistogram: [47, 73, 18, 26, 30, 46, 91, 13, 28, 11, 52, 99, 76, 25, 94, 51, 87, 86, 19, 22, 11, 62, 28, 47]

Scope of analysis

The scope of analysis is defined by the following options.

The --match option is applied as the first:

  • it selects documents for the analysis using $match aggregation
  • value is a string in JSON format
  • suitable for include/exclude documents from analysis
  • by default, all documents are included (if the argument is not present)

The --sample option is applied as the second:

  • determines the sampling method using $sort, $limit and $sample aggregations
  • valid values are: all, first:N, last:N, random:N, where N > 1
  • default value is random:1000

The --project option is applied as the third:

  • before the analysis it modifies document using $project aggregation
  • value is a string in JSON format
  • suitable for include/exclude fields from analysis
  • default is not applied (if the argument is not present)

*Note: Be sure to escape JSON options correctly, eg. --project "{\"Field\": 0}".*

List of flags and options

Connection options

  1. --host mongodb host (default "localhost:27017")
  2. --connection-mode connection mode (default "SecondaryPreferred")
  3. --connection-timeout connection timeout (default 5)
  4. --socket-timeout socket timeout (default 300)
  5. --sync-timeout sync timeout (default 300)

Authentication

  1. -u, --user username for authentication (default "admin")
  2. -p, --password password for authentication
  3. --auth-db auth database (default: same as the working db)
  4. --auth-mech auth mechanism

Input options

  1. --db database for analysis
  2. --col collection for analysis
  3. --match filter documents before analysis (json, $match aggregation)
  4. -s, --sample all, first:N, last:N, random:N (default "random:1000")
  5. --project filter/project fields before analysis (json, $project aggregation)
  6. -d, --depth max depth in nested documents (default 2)

Output options

  1. --full all available analyzes
  2. -v, --value get min, max, avg value
  3. -l, --length get min, max, avg length
  4. -V, --value-hist get value histogram
  5. --value-hist-steps max steps of value histogram >=3 (default 100)
  6. -L, --length-hist get length histogram
  7. --length-hist-steps max steps of length histogram >=3 (default 100)
  8. -W, --weekday-hist get weekday histogram for dates
  9. -H, --hour-hist get hour histogram for dates
  10. --count-unique get count of unique values
  11. --most-freq get the N most frequent values
  12. --least-freq get the N least frequent values
  13. -f, --format output format: table, json, yaml (default "table")
  14. -F, --file path to the output file

Other options

  1. -t, --timezone timezone, eg. UTC, Europe/Berlin (default "local")
  2. --use-aggregation analyze with aggregation framework (mongodb 3.5.10+)
  3. --string-max-length max string length (default 100)
  4. --array-max-length analyze only first N array elements (default 20)
  5. --concurrency number of local processes (default 0 = auto)
  6. --buffer size of the buffer between local stages (default 5000)
  7. --batch size of batch from database (default 500)
  8. --no-color disable color output
  9. --version show version
  10. -h, --help show this help

Environment variables

Environment variables can also be used for configuration.

The names of the environment variables have the MONGOEYE_ prefix and match the flags.

Instead of the --count-unique flag, for example, you can use export MONGOEYE_COUNT-UNIQUE=true.

TODO

  • Create a shared library for integration into other languages (Python, Node.js, …)
  • Selection of fields for analysis (include and exclude list)
  • TLS/SSL support
  • Create a web interface.

Donation

If is this tool useful to you, so feel free to support its further development.

paypal

License

Mongoeye is under the GPL-3.0 license. See the LICENSE file for details.


AMDG