Martin Kroeker
cd8e57040c
Merge pull request #3691 from martin-frbg/issue3679-sparc
SPARC: fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
6c118b7977
Fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Martin Kroeker
c43ec53bdd
Merge pull request #3690 from RajalakshmiSR/cdotp10
POWER: Fix complex dot function failures
3 years ago
Martin Kroeker
b7c65d08cb
Merge pull request #3689 from RajalakshmiSR/dgemvgcc10
POWER10: dgemv builtin rename
3 years ago
Martin Kroeker
06ef015234
fix DNRM2 returning INF instead of zero due to intermediate overflow
3 years ago
Rajalakshmi Srinivasaraghavan
a612e78a97
POWER: Fix complex dot function failures
There are some test failures in complex dot functions when compiling with gcc12.
The machine constraints used now do not update all the four elements in the
expected result array. Fixing this with a reduced level of optimization.
This is not changing any performance numbers but will be converted to C code in future.
3 years ago
Rajalakshmi Srinivasaraghavan
432fd99445
POWER10: dgemv builtin rename
Add check to use correct builtin name for older versions
of gcc10 compilers.
3 years ago
gxw
4dd05e526b
LoongArch64: Fix dnrm2_tiny testcase failure
3 years ago
gxw
cce4b1d956
MIPS64: Fix dnrm2_tiny testcase failure
3 years ago
Martin Kroeker
e12d474780
Eliminate uses of CREAL on left-hand side of assignments
3 years ago
Martin Kroeker
9e29598575
workaround fault with ssq=inf,scale=0
3 years ago
Honglin Zhu
123e0dfb62
Neoverse N2 sbgemm:
1. Modify the algorithm to resolve multithreading failures
2. No memory allocation in sbgemm kernel
3. Optimize when alpha == 1.0f
3 years ago
Honglin Zhu
bc3728475f
format code
3 years ago
Honglin Zhu
55d686d41e
neoverse n2 sbgemm:
implement ncopy tcopy kernel_8x4
3 years ago
Honglin Zhu
04593bb27c
neoverse n2 sbgemm: init file
3 years ago
Martin Kroeker
be5500e704
Merge pull request #3669 from VFerrari/fix_small_matrix_kernel
POWER: fix issues with the small matrix kernel
3 years ago
Martin Kroeker
92275a7902
Merge pull request #3642 from nursik/develop
Add ARM64 support for Windows
3 years ago
VFerrari
cac634fce3
POWER10: Fix multithreading check when USE_THREAD=0
This patch fixes an issue when OpenBLAS is compiled for TARGET=POWER10
and the flag USE_THREAD is set to 0.
The function `num_cpu_avail` is only available when USE_THREAD=1,
so SMP is defined.
3 years ago
Martin Kroeker
9283c7c0b5
Merge pull request #3655 from RajalakshmiSR/zgemmasmp10
POWER10: Fix ZGEMM testcase failures
3 years ago
Rajalakshmi Srinivasaraghavan
f191bc652b
POWER10: Fix ZGEMM testcase failures
This patch fixes storing and restoring non volatile registers
in zgemm POWER10 kernel.
3 years ago
Rajalakshmi Srinivasaraghavan
8419d538ff
POWER10: convert dgemv inline assembly
This patch makes use of compiler builtins and matches with assembly
performance. Tested with clang14 and gcc12.
3 years ago
Xianyi Zhang
5e9a912591
Merge branch 'develop' into risc-v
3 years ago
Xianyi Zhang
968e1f51d8
Update RISC-V Intrinsic API.
3 years ago
Nursultan Zarlyk
1bb7993a97
Fix MSVC ARM64 build. Add generic kernel for ARM64
3 years ago
Martin Kroeker
dc49edd4e6
Revert "roll back DGEMM kernel ... for DYNAMIC_ARCH"
3 years ago
Rajalakshmi Srinivasaraghavan
b62173c5a0
POWER10: Changing store instructions for Level1 functions
This patch changes 32 bytes stores to two 16 bytes stores
to fix a recent degradation due to 32 bytes stores.
3 years ago
Martin Kroeker
84cb58b7fb
Fix generator rules for ?laswp_ncopy and ?neg_tcopy
3 years ago
Martin Kroeker
05dcfa176e
fix undefined prefetchsizes
3 years ago
Martin Kroeker
2bbb9f05c7
fix undefined prefetchsize
3 years ago
Martin Kroeker
115bc9b98f
CortexX1 is ARMV8 like A7x
3 years ago
Martin Kroeker
b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2
3 years ago
Martin Kroeker
40302558ed
Remove extraneous (and wrong) definition of sbgemm_r on x86_64
3 years ago
Caroline Newcombe
5cc1111383
fix unsafe read of Y in assembly kernel
3 years ago
Xianyi Zhang
45786b05da
Merge branch 'develop' into risc-v
3 years ago
Wangyang Guo
225683218c
Small Matrix: use proper inline asm input constraint for AVX512 mask
3 years ago
Martin Kroeker
9c626e466e
really fix definition of SHUFFLE_MAGIC_NO
3 years ago
Martin Kroeker
0698212c8c
Remove stray $
3 years ago
Martin Kroeker
9d7429406f
Declare SHUFFLE_MAGIC_NO as const to placate clang
3 years ago
Martin Kroeker
d9894f45d3
Define sbgemm_r to fix DYNAMIC_ARCH builds
3 years ago
Martin Kroeker
522f809825
Merge pull request #3542 from martin-frbg/issue3540
Fix compilation for CooperLake on Windows/clang
3 years ago
Mosè Giordano
abbc947edb
Fix compilation of Skylake AVX512 kernels with GCC 6
3 years ago
Martin Kroeker
c62f8e2c01
Prevent compiler attempts to use k0 as mask register
3 years ago
Martin Kroeker
80eb581c83
Fix non-portable u_int64_t
3 years ago
Martin Kroeker
73ffabe6ba
Guard uses of _mm512_reduce_add_p?
3 years ago
Martin Kroeker
7656aba00e
Merge pull request #3493 from martin-frbg/casts+cleanup
WIP casts and cleanups
3 years ago
Martin Kroeker
addc2a7aaa
Add proper defaults for IMIN/IMAX
3 years ago
Martin Kroeker
299d4d70a3
Add default KERNEL file for Elbrus E2K arch
3 years ago
Martin Kroeker
3492bea602
Create Makefile
3 years ago
Martin Kroeker
898cf5faf3
Add Elbrus e2k architecture support
3 years ago
Martin Kroeker
c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm
SVE zgemm&cgemm (and other BLAS 3 complex)
3 years ago