generate tons of random data based on custom definitions in parallel
Tool to generate json or csv files based on generic definitions
Example of a definition:
{
"number": ["number", 100],
"string": ["string", 3],
"string-with-range": ["string-range", 3, 8],
"string-with-placeholder": ["placeholder", "hello-*-test", 100],
"random-from-collection": ["take-random", ["first", "second", "third", "fourth"]]
}
Example of one of the items generated from this definition:
in json:
{
"number": 47,
"string": "cVU",
"string-with-range": "Bq2jO4t",
"string-with-placeholder": "hello-96-test",
"random-from-collection": "second"
}
in csv (order of columns is lexicographical order of keys in a definition hashmap)
81 third AXq hello-69-test Z05jhpup
This tool was developed during an evaluation of different storage options for a completely unrelated project. I needed to generate a big amount of data to compare query times on big data sets.
If you have lein
, you can run the app without building a jar
file. Use ./bin/generate
script with parameters.
Otherwise, you can use an already built jar
file in the build
folder.java -jar data-gen
and pass supported params.
Generate 10 files, each of size ~5 GBs based on an “employee” definition from “some-definitions.json” file and put results into ./output
folder.java -jar data-gen.jar -f some-definitions.json -d employee -n 10 -s 5000 -o json -u ./output
Generator value types and parameters are specified as tuples in a definition.
Example of a definition file that contains one main
definition:
{
"main": {
"number": ["number", 100],
"string": ["string", 3],
"string-with-range": ["string-range", 3, 8],
"string-with-placeholder": ["placeholder", "hello-*-test", 100],
"random-from-collection": ["take-random", ["first", "second", "third", "fourth"]]
}
}
Types:
["number", inclusive-upper-boundary]
["string", length-of-random-string]
["string-range", min-chars-in-string, max-chars-in-string(inclusive)]
["placeholder", string-with-single-start-placholder, max-placeholder(inclusive)]
["take-random", array-with-items-from-which-to-pick]