Martin Kroeker
144be81ca1
fix initialization to zero in the NEON SGEMM_BETA kernel as well
5 years ago
Martin Kroeker
07cdd5d05c
Fix zero initialization for beta=0 case
use immediate initialization instead of multiplication in case register content is a NaN
5 years ago
Martin Kroeker
567d2760e6
Merge pull request #2520 from wjc404/develop
Fix avx512 sgemm performance bug when ldc is a multiple of 1024
5 years ago
wjc404
b8307768e2
Add files via upload
5 years ago
Martin Kroeker
af8a619e1f
Merge pull request #2517 from wjc404/develop
Temporary fix for SKX STRSM
5 years ago
wjc404
62b9608986
Update KERNEL.SKYLAKEX
5 years ago
Martin Kroeker
a1b181cea2
Merge pull request #2516 from wjc404/develop
AVX2 STRSM kernels
5 years ago
wjc404
cdc0e9011e
Update KERNEL.ZEN
5 years ago
wjc404
fa049d49c2
AVX2 STRSM kernel
5 years ago
s00548429
bec7923a0d
Fix the functional bugs for zamax.
5 years ago
Rajalakshmi Srinivasaraghavan
2afc074803
Fix DYNAMIC_ARCH build for POWER9
Setting DYNAMIC_ARCH=1 on POWER9 does not build POWER9 files due to some
compiler version checks. This patch fixes some of the macros that are used
to check compiler version. On fixing those checks, there are some new make
failures related to icamin, icamax, isamin, isamax and caxpy files on POWER9.
This patch fixes those failures as well.
5 years ago
Martin Kroeker
4f371b0fbf
Use POWER8 kernels on big-endian POWER9 for now
5 years ago
Martin Kroeker
ea8eec5d17
Merge pull request #2422 from wjc404/develop
Adjust SkylakeX GEMM3M parameters, add an AVX512 STRMM kernel and fix performance bugs in AVX2 s/c/z GEMM
5 years ago
Ali Saidi
c623a965f9
Add Neoverse-N1 core
The implementation is a hybird of the ARMV8 one with some of the
improved TX2 rountines along with specifying -march=v8.2-a
5 years ago
wjc404
dd22eb7621
Update cgemm_kernel_8x2_haswell.c
5 years ago
wjc404
2352331e60
Update zgemm_kernel_4x2_haswell.c
5 years ago
Xianyi Zhang
265ab484c8
Change default RISC-V 64-bit corename to RISCV64_GENERIC
e.g. make CC=riscv64-unknown-linux-gnu-gcc FC=riscv64-unknown-linux-gnu-gfortran TARGET=RISCV64_GENERIC HOSTCC=gcc
5 years ago
Xianyi Zhang
44020a42a4
Fixed compile bug for RV64.
5 years ago
Xianyi Zhang
4aa2d89217
Merge branch 'develop' into risc-v
5 years ago
wjc404
1b980001dd
Update zgemm_kernel_4x2_haswell.c
5 years ago
wjc404
2515e1152f
Update cgemm_kernel_8x2_haswell.c
5 years ago
Martin Kroeker
ddcbed6690
Merge pull request #2437 from martin-frbg/issue2434
[WIP] Add support for Ampere EMAG8180 ARMV8 cpu
5 years ago
wjc404
903854c168
Add files via upload
5 years ago
wjc404
a2ff577a30
Update KERNEL.ZEN
5 years ago
wjc404
97a32cb0a5
Update KERNEL.HASWELL
5 years ago
Martin Kroeker
07454bf4d5
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
5 years ago
Martin Kroeker
4046985913
Add proper defaults for IxMIN/IxMAX kernels
the fallbacks from Makefile.L1 assume a combined source for absolute value and non-absolute (with ifdef USE_ABS) but here we have separate implementations
5 years ago
Martin Kroeker
e57b11acca
Add preliminary support for EMAG8180
5 years ago
Martin Kroeker
0b39cf95b0
Fix endianness conditionals
5 years ago
Martin Kroeker
9f39f0a2c3
Specify ismin/ismax assembly kernels for POWER8 directly
to fix utest failure in new ismin test - Makefile.L1 defaults look wrong
5 years ago
Martin Liska
aeea14ee40
Come up with LOAD_AND_COMPARE_TO_MXX macro in iamax_sse.S.
5 years ago
Martin Liska
18bcc36a69
Fix implementation of iamax_sse.S as reported in #2116 .
The was a typo in iamax_sse.S where one of the comparison
was cmpeqps instead of cmpeqss. That misdetected index
for sequences where the minimum value was 0.
5 years ago
Martin Liska
0e7f43c898
Add missing USE_MIN in kernel/CMakeLists.txt.
5 years ago
wjc404
f566787e6e
Update KERNEL.SKYLAKEX
5 years ago
wjc404
e3368cbf18
AVX512 STRMM kernel
5 years ago
Martin Kroeker
cafdd999b8
Update caxpy_power8.S
5 years ago
Martin Kroeker
92ca92a46c
Update caxpy_power8.S
5 years ago
Martin Kroeker
486c35c5dc
Update icamin_power8.S
5 years ago
Martin Kroeker
5ba3699f41
Update isamin_power8.S
5 years ago
Martin Kroeker
8eefa530cd
Update isamax_power8.S
5 years ago
Martin Kroeker
de40d47edf
Update isamin_power8.S
5 years ago
Martin Kroeker
7c162b8a21
Update isamax_power8.S
5 years ago
Martin Kroeker
0544cbc806
Fix syntax of endianness conditional
5 years ago
Martin Kroeker
120d20731f
Fix syntax of endianness conditional
5 years ago
Martin Kroeker
dc345d84df
Fix syntax of endianness conditional and add gcc version check for workaround
5 years ago
Bart Oldeman
7ea5e07d1c
Fix inline asm in dscal: mark x, x1 as clobbered. Fixes #2408
The leaq instructions in dscal_kernel_inc_8 modify x and x1 so they
must be declared as input/output constraints, otherwise the compiler
may assume the corresponding registers are not modified.
5 years ago
Martin Kroeker
7e5cbb6f35
Fix bad conditional syntax that caused spurious application of USE_TRMM
5 years ago
wjc404
3447d04eaf
Update dgemm_kernel_16x2_skylakex.c
5 years ago
wjc404
8b5cdcc64c
Update sgemm_kernel_8x4_haswell.c
5 years ago
wjc404
4e00d96a78
Update dgemm_kernel_16x2_skylakex.c
5 years ago