Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
5 years ago
Ali Saidi
c623a965f9
Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
5 years ago
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
5 years ago
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
5 years ago
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
5 years ago
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
5 years ago
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
5 years ago
wjc404
903854c168
Add files via upload
5 years ago
wjc404
a2ff577a30
Update KERNEL.ZEN
5 years ago
wjc404
97a32cb0a5
Update KERNEL.HASWELL
5 years ago
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
5 years ago
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
5 years ago
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
5 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
5 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
5 years ago
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
5 years ago
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
5 years ago
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
5 years ago
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
5 years ago
wjc404
e3368cbf18
AVX512 STRMM kernel
5 years ago
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
5 years ago
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
5 years ago
Martin Kroeker
486c35c5dc
Update icamin_power8.S
5 years ago
Martin Kroeker
5ba3699f41
Update isamin_power8.S
5 years ago
Martin Kroeker
8eefa530cd
Update isamax_power8.S
5 years ago
Martin Kroeker
de40d47edf
Update isamin_power8.S
5 years ago
Martin Kroeker
7c162b8a21
Update isamax_power8.S
5 years ago
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
5 years ago
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
5 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
5 years ago
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
5 years ago
Martin Kroeker
7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM
5 years ago
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
5 years ago
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
5 years ago
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
5 years ago
wjc404
096da2f51a
Update dgemm_kernel_16x2_skylakex.c
5 years ago
wjc404
081b188529
Update KERNEL.SKYLAKEX
5 years ago
wjc404
8019e70211
AVX512 16x2 DGEMM kernel
5 years ago
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
5 years ago
Martin Kroeker
70f45749b9
Merge pull request #2367 from wjc404/develop
Improve paralleled SGEMM performance on SKYLAKEX CPUs
5 years ago
wjc404
e5dcdeb550
Update sgemm_direct_skylakex.c
5 years ago
wjc404
952cc2ba38
Update sgemm_kernel_16x4_skylakex_2.c
5 years ago
wjc404
feaafbedd3
make skylakex sgemm code more friendly for readers
BTW some kernels were adjusted to improve performance
5 years ago
Martin Kroeker
b36018be6d
Merge pull request #2365 from wjc404/develop
Fix SKYLAKEX STRMM issues
5 years ago
wjc404
3a100b2797
Update KERNEL.SKYLAKEX
5 years ago
Martin Kroeker
38742d5547
Merge pull request #2361 from wjc404/develop
Optimize AVX2 SGEMM & STRMM
5 years ago
wjc404
bd4c032f52
Update sgemm_kernel_8x4_haswell.c
5 years ago
wjc404
9dc9b7b95e
Update sgemm_kernel_8x4_haswell.c
5 years ago
wjc404
92b10212de
optimize AVX2 SGEMM
5 years ago
wjc404
b73bf01378
optimize AVX2 SGEMM
5 years ago