Martin Kroeker
4cf7315a5d
Adjust ARMV8 SGEMM unrolling when using the C fallback kernel_2x2 for IOS
7 years ago
Arjan van de Ven
6eb4b9ae7c
Tune HASWELL SWITCH_RATIO as well
Similar to the SKYLAKEX patch, 32 seems to work best
(much better than 4 or 16)
Before (4)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15554.3 7.2 0.2% 30353.8 3.7 0.3%
64 x 64 30346.8 8.7 1.6% 63495.0 4.1 -0.1%
65 x 65 81668.1 3.4 -123.3% 82705.2 3.3 -21.2%
80 x 80 105045.9 4.9 -95.5% 115226.0 4.5 -2.2%
96 x 96 152461.2 5.8 -74.3% 148156.3 6.0 16.4%
112 x 112 188505.2 7.5 -42.2% 171187.3 8.2 36.4%
128 x 128 257884.0 8.1 -39.5% 224764.8 9.3 46.0%
Intermediate (16)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15565.7 7.2 0.2% 30378.9 3.7 0.2%
64 x 64 30430.2 8.7 1.3% 63046.4 4.2 0.6%
65 x 65 27306.0 10.1 25.3% 38879.2 7.1 43.0%
80 x 80 51008.7 10.1 5.1% 61007.6 8.4 45.9%
96 x 96 70856.7 12.5 19.0% 83403.1 10.6 53.0%
112 x 112 84769.9 16.6 36.0% 99920.1 14.1 62.9%
128 x 128 84213.2 25.0 54.5% 113024.2 18.6 72.8%
After (32)
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 15537.3 7.2 0.3% 30537.0 3.6 -0.3%
64 x 64 30352.7 8.7 1.6% 62597.8 4.2 1.3%
65 x 65 36857.0 7.5 -0.8% 56167.6 4.9 17.7%
80 x 80 42552.6 12.1 20.8% 69536.7 7.4 38.3%
96 x 96 52101.5 17.1 40.5% 91016.1 9.7 48.7%
112 x 112 63853.7 22.1 51.8% 110507.4 12.7 58.9%
128 x 128 73966.1 28.4 60.0% 163146.4 12.9 60.8%
7 years ago
Arjan van de Ven
5c6f008365
Tune param.h for SkylakeX
param.h defines a per-platform SWITCH_RATIO, which is used as a measure for how fine
grained the blocks for gemm need to be split up. Many platforms define this to 4.
The reality is that the gemm low level implementation for SkylakeX likes bigger blocks
due to the nature of SIMD... by tuning the SWITCH_RATIO to 32 the threading performance
improves significantly:
Before
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 10756.0 10.5 -0.5% 18296.7 6.1 -1.7%
64 x 64 20490.0 12.9 1.4% 40615.0 6.5 0.0%
65 x 65 83528.3 3.3 -210.9% 96319.0 2.9 -83.3%
80 x 80 101453.5 5.1 -166.3% 128021.7 4.0 -76.6%
96 x 96 149795.1 5.9 -143.1% 168059.4 5.3 -47.4%
112 x 112 191481.2 7.3 -105.8% 204165.0 6.9 -14.6%
128 x 128 265019.2 7.9 -99.0% 272006.4 7.7 -5.3%
After
Matrix SGEMM cycles MPC DGEMM cycles MPC
48 x 48 10666.3 10.6 0.4% 18236.9 6.2 -1.4%
64 x 64 20410.1 13.0 1.8% 39925.8 6.6 1.7%
65 x 65 34983.0 7.9 -30.2% 51494.6 5.4 2.0%
80 x 80 39769.1 13.0 -4.4% 63805.2 8.1 12.0%
96 x 96 45169.6 19.7 26.7% 80065.8 11.1 29.8%
112 x 112 57026.1 24.7 38.7% 99535.5 14.2 44.1%
128 x 128 64789.8 32.5 51.3% 117407.2 17.9 54.6%
With this change, threading starts to be a win already at 96x96
7 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
7 years ago
Martin Kroeker
d94d7baf7e
Add mips32r2 api target
7 years ago
Shivraj Patil
e3d844b062
Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
8 years ago
Gian-Carlo Pascutto
832a272784
Revert Zen param.h to Haswell values (instead of Excavator).
8 years ago
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
8 years ago
Martin Kroeker
cd135e2b59
Merge pull request #1130 from quickwritereader/develop
Blas 3 for single precision
8 years ago
Abdurrauf
08786c4b95
strmm and ctrmm
8 years ago
Abdurrauf
82e80fa82b
initial strmm(sgemm). not tuned yet
8 years ago
Martin Kroeker
ffc1d6c468
Merge pull request #1108 from ashwinyes/develop_20170203_thunderx2t99
Optimized Implementations for ThunderX2T99
8 years ago
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
8 years ago
Abdurrauf
0d96b0e2a7
Merge branch 'z13' into develop
8 years ago
Abdurrauf
848cb27b1e
ztrmm kernel.
8 years ago
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
8 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
8 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
8 years ago
Andrew Pinski
fb200c7245
ARM64: Add Cavium THUNDERX Target
8 years ago
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
9 years ago
Zhang Xianyi
b678471d65
Merge branch 'z13' into develop
Conflicts:
CONTRIBUTORS.md
9 years ago
Abdurrauf
6418667818
dtrmm and dgemm for z13
9 years ago
Shivraj Patil
9687437928
MIPS n32 ABI and build time mips simd support check
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Shivraj Patil
d1c6469283
MIPS n32 ABI support, MSA support detection and rename ARCH, ARCHFLAGS
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Shivraj Patil
beb1d076a4
Added MSA optimization for GEMV_N, GEMV_T, ASUM, DOT functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Zhang Xianyi
8a592ee386
Merge pull request #924 from ashwinyes/develop_aarch64_improvements_20160714
Improvements to Aarch64 kernels
9 years ago
Ashwin Sekhar T K
0a5ff9f9f9
Improvements to TRMM and GEMM kernels
9 years ago
Shivraj Patil
57df7956ee
Added CGEMM, ZGEMM, STRMM, DTRMM, CTRMM, ZTRMM. Updated macros in SGEMM, DGEMM, STRMM.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Shivraj Patil
c4ba40e308
SGEMM optimization for MIPS P5600 and I6400 using MSA. Unrolled k loop in DGEMM kernel function
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Werner Saar
88011f625d
Merge pull request #876 from wernsaar/develop
optimized dgemm on power8 for 20 threads
9 years ago
Werner Saar
8310d4d3f7
optimized dgemm for 20 threads
9 years ago
Shivraj Patil
085cf236c2
conflict resolved by syncing with 'xianyi:develop'
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Shivraj Patil
b7b3d8ec8e
DGEMM optimization for MIPS P5600 and I6400 using MSA
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Zhang Xianyi
cd7af5260a
Merge pull request #847 from sva-img/develop
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
9 years ago
Werner Saar
782f75ba94
optimized param.h for POWER8
9 years ago
Werner Saar
0d0c6f7d7d
optimized dgemm for POWER8
9 years ago
Werner Saar
40ac64ae4f
updated param.h for EXCAVATOR
9 years ago
Werner Saar
089aad57f7
updated param.h for POWER8
9 years ago
Werner Saar
879a51165f
Optimized zgemm and tested zgemm again
9 years ago
Shivraj Patil
2c3dfe2bf3
MIPS P5600(32 bit) and I6400(64 bit) cores support added.
Seperated mips and mips64 files.
Configurations support for mips 32 bit.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
9 years ago
Werner Saar
3c6294ca3d
added optimized sgemm_tcopy for power8
9 years ago
Zhang Xianyi
dd43661cfd
Init IBM z system (s390x) porting.
9 years ago
Werner Saar
e173c51c04
updated zgemm- and ztrmm-kernel for POWER8
9 years ago
Werner Saar
9c42f0374a
Updated cgemm- and sgemm-kernel for POWER8 SMP
9 years ago
Werner Saar
a51102e9b7
bugfixes for sgemm- and cgemm-kernel
9 years ago
Werner Saar
c5b1fbcb2e
updated optimized cgemm- and ctrmm-kernel for POWER8
9 years ago
Werner Saar
6a9bbfc227
updated sgemm- and strmm-kernel for POWER8
9 years ago
Werner Saar
e1df5a6e23
fixed sgemm- and strmm-kernel
9 years ago
Werner Saar
5c658f8746
add optimized cgemm- and ctrmm-kernel for POWER8
9 years ago
Werner Saar
96284ab295
added sgemm- and strmm-kernel for POWER8
9 years ago