Martin Kroeker
217324d880
Merge pull request #5162 from taoye9/add_sbgemv_tests
add beta and alpha testcase for sbgemv
7 months ago
Martin Kroeker
e4630ed15a
Merge pull request #5160 from taoye9/sbgemv_n_neon
Add SBGEMVN Kernel for ARM64
7 months ago
Martin Kroeker
35914aa9a2
Expose the option to build without LAPACKE to ccmake
7 months ago
Martin Kroeker
2b941c44b5
Merge branch 'develop' into sbgemv_n_neon
7 months ago
Martin Kroeker
c797e27a1c
Merge pull request #5159 from annop-w/sbgemv_t_bfdot
Add sbgemv_t_bfdot kernel for ARM64
7 months ago
Ye Tao
4346b91559
add beta and alpha testcase for sbgemv
7 months ago
Ye Tao
35bdbca153
Add sbgemv_n_neon kernel for arm64.
7 months ago
Annop Wongwathanarat
edaf51dd99
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
7 months ago
Martin Kroeker
ef9e3f7159
Merge pull request #5149 from martin-frbg/fixup5077-5088
Make the Neoverse GEMM/GEMV throttling code conditional on SMP
7 months ago
Martin Kroeker
09ba099461
make throttling code conditional on SMP
7 months ago
Harishmcw
030ae1fd97
Redefined threading logic for WoA
7 months ago
Martin Kroeker
1533fe49be
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
7 months ago
Martin Kroeker
c03a81b927
Merge pull request #5141 from michalowski-arm/fork-throttle
Add throttling profile for SGEMM and SGEMV on `NEOVERSEV2`
7 months ago
Martin Kroeker
643966d9c7
Merge pull request #5146 from martin-frbg/issue5123
Fix "dummy2" flag reading in PPC970 S/DSCAL
7 months ago
Martin Kroeker
77fba0f400
Fix "dummy2" flag handling
7 months ago
Ye Tao
f0bea79a6e
dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting
7 months ago
Martin Kroeker
20d1118865
Merge pull request #5143 from martin-frbg/issue5111
Fix GEMMT transforming the input array B in some complex cases
7 months ago
Martin Kroeker
75b958a018
Transform the B array back if necessary before returning
7 months ago
Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
7 months ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
7 months ago
Martin Kroeker
ceb8f1e34b
Merge pull request #5140 from martin-frbg/issue5139
Add ARM64 options for NVIDIA HPC
7 months ago
Martin Kroeker
f1fa370579
fix missing endif
7 months ago
Martin Kroeker
6d1444be3a
Add ARM64 options for NVIDIA HPC
7 months ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
7 months ago
Martin Kroeker
abbd78aa59
Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
7 months ago
Martin Kroeker
ebcab90976
Handle flang-new runtime library linking on Linux like classic-flang
7 months ago
Martin Kroeker
ed1584666c
Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
7 months ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
7 months ago
Martin Kroeker
86cf9d8a2e
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
7 months ago
Martin Kroeker
0b3c56968d
Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
7 months ago
Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
7 months ago
Martin Kroeker
77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
7 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
7 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
7 months ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
7 months ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
7 months ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
7 months ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
7 months ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
7 months ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
7 months ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
7 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
7 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
7 months ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
7 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
7 months ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
7 months ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
7 months ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
7 months ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
8 months ago