项目作者: artem0

项目描述 :
System benchmarks over JVM with JMH - SIMD (superscalar processing), Branch prediction, False sharing.
高级语言: Java
项目地址: git://github.com/artem0/benchmarking.git
创建时间: 2017-08-24T15:46:52Z
项目社区:https://github.com/artem0/benchmarking

开源协议:GNU General Public License v3.0

下载


Benchmarks with JMH

This project contains the next benchmarks:

  1. Demonstration how JVM handle problems with false sharing
    with @Contended annotation
  2. Branch prediction on example of processing sorted vs unsorted array
  3. SIMD showcase: loop with incrementation operators

Prerequisites

  • Java 8+
  • Gradle

Branch prediction

Benchmarks for branch predictions demonstrates phenomena when sorted arrays are processed faster than an unsorted array,
check this wonderful discussion on stackoverflow
for more details.

@OperationsPerInvocation set an abstract unit of work for benchmark and allow JMH to adjust the scores appropriately.

The benchmark generates the next output for me:

  1. Benchmark Mode Cnt Score Error Units
  2. BranchPredictionBenchmark.sorted avgt 25 4.289 ± 0.754 ns/op
  3. BranchPredictionBenchmark.unsorted avgt 25 9.742 ± 0.466 ns/op

AverageTime mode has been used there, it means that lower value is better

More details about modes in JMH see
specification

False sharing benchmark

The -XX:-RestrictContended flag is significant for usage @Contended annotation.
JOL can be very helpful for analysis of layout schema in JVM and
understanding how JVM allocates memory for objects.

The benchmark compares increment operation for padded class according to preventing false sharing with filling extra
space in cache line and unpadded - without filling extra space:

The output for both cases:

  1. Benchmark Mode Cnt Score Error Units
  2. ContendedBenchmarks.padded thrpt 30 236434399.252 ± 9209297.004 ops/s
  3. ContendedBenchmarks.padded:updatePaddedA thrpt 30 114409851.688 ± 4803042.336 ops/s
  4. ContendedBenchmarks.padded:updatePaddedB thrpt 30 122024547.564 ± 5108447.473 ops/s
  5. ContendedBenchmarks.unpadded thrpt 30 63941093.638 ± 2478065.296 ops/s
  6. ContendedBenchmarks.unpadded:updateUnpaddedA thrpt 30 32050855.915 ± 1371581.512 ops/s
  7. ContendedBenchmarks.unpadded:updateUnpaddedB thrpt 30 31890237.723 ± 1344691.282 ops/s

Throughput mode has been used there, it means that higher value is better

Check more details about false sharing in Java in my @rukavitsya/what-is-false-sharing-and-how-jvm-prevents-it-82a4ed27da84">article.

SIMD benchmark

SIMD is a class of parallel computers in Flynn’s taxonomy.
It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

This benchmark compares incrementation of values in an array with and without SIMD.

JVM use flag -XX:+UseSuperWord for transformation of scalar operations into superword operations.
This option is enabled by default.
For benchmark which uses SIMD use should specify -XX:+UseSuperWord in jvmArgsAppend via @Fork annotation
or leave blank, for disabling SIMD for comparison purpose use flag -XX:+UseSuperWord.
Only the Java HotSpot Server VM supports this option.

SIMD incrementation benchmark:

With SIMD:

  1. Iteration 1: 258.638 ns/op
  2. Iteration 2: 257.273 ns/op
  3. Iteration 3: 260.226 ns/op
  4. Iteration 4: 268.770 ns/op
  5. Iteration 5: 255.863 ns/op
  6. Iteration 6: 256.047 ns/op
  7. Iteration 7: 259.685 ns/op
  8. Iteration 8: 261.838 ns/op
  9. Iteration 9: 273.019 ns/op
  10. Iteration 10: 265.190 ns/op
  11. Result "increment":
  12. 261.655 ±(99.9%) 8.604 ns/op [Average]
  13. (min, avg, max) = (255.863, 261.655, 273.019), stdev = 5.691
  14. CI (99.9%): [253.050, 270.259] (assumes normal distribution)

Without SIMD:

  1. Iteration 1: 981.102 ns/op
  2. Iteration 2: 1024.397 ns/op
  3. Iteration 3: 1010.879 ns/op
  4. Iteration 4: 1025.998 ns/op
  5. Iteration 5: 980.072 ns/op
  6. Iteration 6: 1003.025 ns/op
  7. Iteration 7: 998.950 ns/op
  8. Iteration 8: 1026.248 ns/op
  9. Iteration 9: 991.682 ns/op
  10. Iteration 10: 972.846 ns/op
  11. Result "increment":
  12. 1001.520 ±(99.9%) 30.348 ns/op [Average]
  13. (min, avg, max) = (972.846, 1001.520, 1026.248), stdev = 20.073
  14. CI (99.9%): [971.172, 1031.868] (assumes normal distribution)

AverageTime mode has been used there, it means that lower value is better

Running

Launching main method with benchmark and Gradle (class is specified via main parameter in task runMain):

gradle runMain

You can make *.jar file via gradle jar command, main class is specified via Main-Class parameter.

After generating *.jar file, you cal launch app with next command:

java -jar build/libs/bencmarking-1.0.jar

Support in Intellij Idea

For launching benchmarks in Intellij Idea append support of annotation processor compilation:

For using with Intelij Idea follow the next steps: Settings - > Compiler -> Annotation Processor -> Enable annotation processing,
check Processor Path and put the path of the exported .jar file.

License: GNU General Public License v3.0