Martin Kroeker
db5328e85b
make array dimensions constant
1 year ago
Martin Kroeker
d9ae4609fb
remove C99 requirement
1 year ago
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
1 year ago
Martin Kroeker
24acdd6bbb
correct offset
1 year ago
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
1 year ago
Martin Kroeker
15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
1 year ago
Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
Martin Kroeker
949a7f9393
Merge pull request #4811 from yamazakimitsufumi/add_a64fx_to_dynamic_arch
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Martin Kroeker
b613754143
Update scal..c
1 year ago
Martin Kroeker
4140ac45d7
Merge pull request #4813 from martin-frbg/issue4812
Fix incompatible definitions of MAXLOC in f2c-converted LAPACK sources
1 year ago
Martin Kroeker
0096482f03
fix incompatible definitions of MAXLOC
1 year ago
Martin Kroeker
ed82fd24fc
Merge pull request #4810 from martin-frbg/issue4805
Work around a gcc14.1 bug that breaks utest on Loongarch
1 year ago
yamazaki-mitsufumi
821ef34635
Add A64FX to the list of CPUs supported by DYNAMIC_ARCH
1 year ago
Martin Kroeker
29f3e759b9
work around a gcc14.1 bug observed on Loongarch
1 year ago
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
1 year ago
Martin Kroeker
73f8866ffb
make NAN handling depend on DUMMY2 parameter
1 year ago
Martin Kroeker
dfbc2348a8
fix NAN handling
1 year ago
Martin Kroeker
c064319ecb
fix alpha=NAN case
1 year ago
Martin Kroeker
c2ffd90e8c
make NAN handling depend on dummy2 parameter
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Martin Kroeker
a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
1 year ago
Martin Kroeker
dd6c33d34d
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
5a845ef1f4
Merge pull request #4809 from penghongbo/reorder_gemm_gemvt
Change computational order in GEMV and GEMM Power6 kernel
1 year ago
Hong Bo Peng
db98f8753f
Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
2. Remove the FADD insn from GEMM and ZGEMM code.
3. Reorder the compution of the Imaginary part in ZGEMM code.
1 year ago
Chris Sidebottom
a9edddb695
Unroll TN further
1 year ago
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
1 year ago
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
1 year ago
Martin Kroeker
2020569705
fix NAN handling and make it depend on dummy2 parameter
1 year ago
Martin Kroeker
3870995f01
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
7284c533b5
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
73751218a4
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
1 year ago
Martin Kroeker
ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
gxw
34b80ce03f
mips64: Fixed numpy CI failure
1 year ago
gxw
f6d6c14a96
mips: Fixed numpy CI failure
1 year ago
Martin Kroeker
e9f6aa46a4
Merge pull request #4800 from vlad0x00/patch-2
Add missing parentheses
1 year ago
Martin Kroeker
b1aa2e1768
Merge pull request #4802 from markdryan/markdryan/rvv_axpby_incy0
Fix axpby_rvv kernels for cases where inc_y = 0
1 year ago
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
Vladimir Nikolić
56e1782ffb
Add another missing parenthesis
1 year ago
Vladimir Nikolić
127ea5d0d9
Add missing parenthesis
1 year ago
Martin Kroeker
a3c10c6c25
Merge pull request #4799 from martin-frbg/issue4762
Improve the error message for (p)thread creation failure
1 year ago
Martin Kroeker
a373d0f107
Improve the error message for thread creation failure
1 year ago
Mark Ryan
67bf4b6998
Fix axpby_rvv kernels for cases where inc_y = 0
The following openblas_utest tests fail when the RISCV64_ZVL128B is
enabled.
TEST 89/103 axpby:zaxpby_inc_0 [FAIL]
TEST 92/103 axpby:caxpby_inc_0 [FAIL]
TEST 95/103 axpby:daxpby_inc_0 [FAIL]
TEST 98/103 axpby:saxpby_inc_0 [FAIL]
The issue is that the vectorized kernels do not work when inc_y == 0.
This patch updates the kernels to fall back to the scalar algorithms
when inc_y == 0, fixing the failing tests.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
1 year ago
Mark Ryan
3b715e6162
Add autodetection for riscv64
Implement DYNAMIC_ARCH support for riscv64. Three cpu types are
supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b.
The two non-generic kernels require CPU support for RVV 1.0 to
function correctly. Detecting that a riscv64 device supports
RVV 1.0 is a little complicated as there are some boards on the
market that advertise support for V via hwcap but only support
RVV 0.7.1, which is not binary compatible with RVV 1.0. The
approach taken is to first try hwprobe. If hwprobe is not
available, we fall back to hwcap + an additional check to distinguish
between RVV 1.0 and RVV 0.7.1.
Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only
the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no
vector.
A compiler with RVV 1.0 support must be used to build OpenBLAS for
riscv64 when DYNAMIC_ARCH=1.
Signed-off-by: Mark Ryan <markdryan@rivosinc.com>
1 year ago
gxw
9b3e80efe2
utest: Add test_gemv
1 year ago
gxw
3f39c8f94f
LoongArch: Fixed numpy CI failure
1 year ago
Martin Kroeker
6013b36b16
Merge pull request #4796 from martin-frbg/ppcbuf
Suffix BUFFER_SIZEs on POWER as UL to prevent int overflow in computations
1 year ago