Martin Kroeker
3677b3886c
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Martin Kroeker
f3c364c2cc
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN
1 year ago
Martin Kroeker
2a5fe97e3b
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
1 year ago
Martin Kroeker
c1019d5832
Handle INF and NAN in inputs
1 year ago
Martin Kroeker
9e24121e7e
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN
1 year ago
Martin Kroeker
a11f086c17
Update sscal_msa.c
1 year ago
Martin Kroeker
541e1b6959
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf
1 year ago
Martin Kroeker
c08113c279
fix special cases of x= NAN or INF
1 year ago
Martin Kroeker
bd47630bcf
exclude the alpha=0 branch as it does not handle NaN or Inf in x
1 year ago
Martin Kroeker
68f2501958
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x
1 year ago
Martin Kroeker
0a744a939a
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x
1 year ago
Martin Kroeker
7f8f037a36
handle INF and NAN in input
1 year ago
Martin Kroeker
f1248b849d
handle INF and NAN in input
1 year ago
Martin Kroeker
a2ee4b1966
Merge branch 'OpenMathLib:develop' into issue4728
1 year ago
Martin Kroeker
3ec59922b6
Add a clobber list to fix utest errors seen with gcc13 on Apple M
1 year ago
Martin Kroeker
3d8054fb16
add clobber list
1 year ago
Martin Kroeker
dd7efcf9ef
Avoid exceeding the configured thread count in x86_64 TOBF16 ( #4748 )
* avoid setting nthreads higher than available
1 year ago
Martin Kroeker
6ffaf99817
disable da=0 shortcut to handle NAN and INF correctly
1 year ago
Martin Kroeker
c7cacd9b38
disable the shortcut for da=0 to ensure proper handling of INF and NAN
1 year ago
Martin Kroeker
5ed4f24d6e
Handle corner cases with INF and NAN arguments
1 year ago
Martin Kroeker
2bd43ad0eb
Merge branch 'OpenMathLib:develop' into issue4728
1 year ago
Martin Kroeker
1abafcd9b2
handle corner cases involving NAN and/or INF
1 year ago
Martin Kroeker
442dec28df
Merge pull request #4738 from martin-frbg/issue4737
Disable GEMM3M for generic targets (not implemented)
1 year ago
Martin Kroeker
2787c9f8e4
Disable GEMM3M for generic targets (not implemented)
1 year ago
gxw
af73ae6208
LoongArch: Fixed issue 4728
1 year ago
gxw
8ab2e9ec65
LoongArch: DGEMM small matrix opt
2 years ago
Martin Kroeker
83bc8d5dd8
Merge pull request #4712 from RajalakshmiSR/zscalp10
POWER: Fix issues in zscal to address lapack failures
1 year ago
Martin Kroeker
020b3e1682
fix handling of INF arguments
1 year ago
Martin Kroeker
8c05765a5a
fix other corner cases where x=INF
1 year ago
Martin Kroeker
516743f7dc
fix other instances of mishandling INF
1 year ago
Martin Kroeker
9ff4e9714e
additional fixes for handling INF arguments
1 year ago
Martin Kroeker
ce130f11d2
Update zscal.c
1 year ago
Martin Kroeker
ab13cfef93
more fixes for infinite x
1 year ago
Martin Kroeker
ad2b5c67c8
fix another corner case involving infinity
1 year ago
Bart Oldeman
62f7b244ff
Replace use of FLT_MAX in x86_64 zscal.c by isinf()
Commit def4996
fixed issues with inf and nan values in zscal,
but used FLT_MAX, where DBL_MAX or isinf() is more appropriate,
as FLT_MAX is for single precision only.
Using FLT_MAX caused test case failures in the LAPACK tests.
isinf() is consistent with the later fix 969601a1
1 year ago
Rajalakshmi Srinivasaraghavan
e112191b54
POWER: Fix issues in zscal to address lapack failures
This patch fixes following lapack failures with clang compiler on POWER.
zed.out: ZVX: 18 out of 5190 tests failed to pass the threshold
zgd.out: ZGV drivers: 25 out of 1092 tests failed to pass the threshold
zgd.out: ZGV drivers: 6 out of 1092 tests failed to pass the threshold
1 year ago
Martin Kroeker
aa259b141d
Merge pull request #4704 from amritahs-ibm/saxpy_perf_fix
Fix regression SAXPY when compiler with OpenXL compiler.
1 year ago
Matthias Langer
0050a9660b
Correctly detect ARM Neoverse V2 CPUs.
1 year ago
Chip Kerchner
3a1417671a
POWER: Fixing endianness issue in cswap/zswap kernel for AIX
1 year ago
Amrita H S
87b3d9054f
Fix regression SAXPY when compiler with OpenXL compiler.
SAXPY built with OpenXL regresses when compared to SAXPY
built with gcc. OpenXL compiler doesn't know that the
SAXPY inner kernel assembly is a 64 element loop and
to it the remainder loop is the main loop. It vectorizes
and interleaves the remainder to be a 48 elements per
iteration loop. With a max of 63 iterations, a 48 element
loop is mostly not going to get executed, so the 1 element
scalar loop that is the remainder after that is probably
mostly what gets executed.
This can be fixed by adding a pragma, loop interleave_count(2)
which will result in 8 element loop.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
1 year ago
Martin Kroeker
8da6f7e5f2
Merge pull request #4686 from XiWeiGu/loongarch64_dgemm_kernel_16x6
Loongarch64: Improving the Performance and Stability of dgemm
1 year ago
gxw
f9a26240a7
loongarch64: Fixed icamax_lsx
1 year ago
gxw
cb0f707409
loongarch64: Fixed utest fork:safety
1 year ago
Martin Kroeker
b45d8e1ab2
remove stray comma
1 year ago
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
1 year ago
Martin Kroeker
992b71fea2
remove stray comma
1 year ago
Martin Kroeker
d421dec278
Merge pull request #4656 from zboszor/fix-x86-64-build-v2
Add forgotten conditional uses of PREFETCH
1 year ago
Martin Kroeker
ae695d4ca0
Merge pull request #4642 from XiWeiGu/loongarch64_clang
CI: Add clang test for loongarch64
1 year ago
gxw
7cd438a5ac
loongarch64: Fixed clang compilation issues
1 year ago
Zoltán Böszörményi
ca64861ce8
Add forgotten conditional uses of PREFETCH
This fixes a (cross-)compilation/linker error for PRESCOTT
on Yocto.
Signed-off-by: Zoltán Böszörményi <zoltan.boszormenyi@xenial.com>
1 year ago