项目作者: YanzheL

项目描述 :
Matrix multiplication optimization
高级语言: C
项目地址: git://github.com/YanzheL/matmul_demo.git
创建时间: 2018-12-03T06:17:40Z
项目社区:https://github.com/YanzheL/matmul_demo

开源协议:

下载


Description

12 different implementations of matrix multiplication

Basic versions

Function Name Description
mm1_ijk simple ijk version
mm1_ikj simple ikj version
mm1_kji simple kji version
mm1_ikj_v vectorized version of mm1_ikj, unroll loop by length 8
mm2 aviod 1 load and 1 store in inner iteration
mm2_v vectorized mm2, unroll loop by length 8
mm4 2 by 2 mini matrix multiplications in registers, n must be multiple of 2
mm4_cache_block_ijk Cache friendly block multiplication, n must be multiple of BSIZE (32)
mm4_cache_block_ikj ikj version of mm4_cache_block, n must be multiple of BSIZE (32)

V Strassen Algorithm

Function Name Description
v_strassen V Strassen Core
mm5_vstrassen_ijk ijk version
mm5_vstrassen_ikj Ikj version
mm5_vstrassen_ikj_v partially vectorized Ikj version

V Strassen Core

  1. /*
  2. [Illustration]
  3. Draft Zone NxN Matrix A NxN Matrix B NxN Matrix C
  4. nmx := next mx
  5. --------------------- --------------------- --------------------- ---------------------
  6. | | | | | | | | | | | |
  7. | mx | nmx | | a00 | a01 | | b00 | b01 | | c00 | c01 |
  8. | | | | | | | | | | | |
  9. --------------------- --------------------- X --------------------- = ---------------------
  10. | | | | | | | | | | | |
  11. | azone | bzone | | a10 | a11 | | b10 | b11 | | c10 | c11 |
  12. | | | | | | | | | | | |
  13. --------------------- --------------------- --------------------- ---------------------
  14. m1 = (a00+a11)*(b00+b11) c00 = m1 + m4 - m5 + m7
  15. m2 = (a10+a11)*b00 c01 = m3 + m5
  16. m3 = a00*(b01-b11) c10 = m2 + m4
  17. m4 = a11*(b10-b00) c11 = m1 + m3 - m2 + m6
  18. m5 = (a00+a01)*b11
  19. m6 = (a10-a00)*(b00+b01)
  20. m7 = (a01-a11)*(b10+b11)
  21. [Params]
  22. Inputs:
  23. a00 := pointer of the element at matrix A's UP-LEFT corner
  24. b00 := pointer of the element at matrix B's UP-LEFT corner
  25. n := the size of current square window of operation
  26. nn := the size of original matrix before first recursion, this value shouldn't be changed during recursion
  27. mx := the allocated draft zone provided by caller, should be initialized to 0 before first recursion
  28. length = nn * nn, size = sizeof(double) * nn * nn
  29. Outputs:
  30. c00 := pointer of the element at result matrix C's UP-LEFT corner
  31. */

Get Started

Build the source

  1. git clone https://github.com/YanzheL/matmul_demo.git
  2. cd matmul_demo
  3. mkdir cmake-build-debug && cd cmake-build-debug
  4. cmake ..
  5. make

Run

  1. ./matmul_demo

You can modify main.c to change test cases. (e.g. n range)

Test Results

Platform Info

OS CPU Memory Compiler
Ubuntu 18.04 Intel Core i7-8700, 6 Core, 4.6 GHz, L1=384 KiB, L2=1.5 MiB, L3=15 MiB DDR4 16GB 3200MHz Clang-8.0

Test cases

Use mm1_ijk version as standard result to test other implementation’s correctness. If the multiplication result is incorrect, it will output an ERROR message and Mean Square Error (MSE) of the corresponding mm function.

More benchmark results are located at benchmark directory

  1. With compiler optimization flag -O3, running each test case one-by-one (multithread disabled)

    test_suite_2048_O3

  2. Without compiler optimization flag, running each test case one-by-one (multithread disabled)

    test_suite_2048_O0.png