Marek Michalowski
650a062e19
Add thread throttling profile for SGEMV on `NEOVERSEV2`
9 months ago
Marek Michalowski
b723c1b7b7
Add thread throttling profile for SGEMM on `NEOVERSEV2`
9 months ago
Martin Kroeker
ceb8f1e34b
Merge pull request #5140 from martin-frbg/issue5139
Add ARM64 options for NVIDIA HPC
9 months ago
Martin Kroeker
f1fa370579
fix missing endif
9 months ago
Martin Kroeker
6d1444be3a
Add ARM64 options for NVIDIA HPC
9 months ago
Martin Kroeker
eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1
Support for SGEMM_DIRECT Kernel based on SME1
9 months ago
Martin Kroeker
abbd78aa59
Merge pull request #5138 from martin-frbg/issue5131
Ensure that gmake builds with flang-new link the flang runtime into the shared library
9 months ago
Martin Kroeker
ebcab90976
Handle flang-new runtime library linking on Linux like classic-flang
9 months ago
Martin Kroeker
ed1584666c
Merge pull request #5137 from martin-frbg/issue5136
Fix the CMake build to define USE_TRMM for RISCV64 targets as well
9 months ago
Martin Kroeker
b9ae246f20
define USE_TRMM for RISCV64 targets as well
9 months ago
Martin Kroeker
86cf9d8a2e
Merge pull request #5133 from OpenMathLib/revert-4920-issue4917
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
9 months ago
Martin Kroeker
0b3c56968d
Merge pull request #5135 from martin-frbg/ghwf-n2
CI: remove the express NeoverseN2 target from the Cobalt100 job in the gh workflow
9 months ago
Martin Kroeker
c1bb90a823
remove the express NeoverseN2 target from the Cobalt100 job
9 months ago
Martin Kroeker
77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO"
9 months ago
Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
9 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
11 months ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
9 months ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
9 months ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
9 months ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
9 months ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
9 months ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
9 months ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
9 months ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
9 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
9 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
9 months ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
9 months ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
9 months ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
9 months ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
10 months ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
10 months ago
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
9 months ago
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
9 months ago
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
9 months ago
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
9 months ago
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
9 months ago
Martin Kroeker
60d0be0e97
Update nrm2.c
9 months ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
9 months ago
Martin Kroeker
1b85b6a396
Merge pull request #5108 from taoye9/sbgemm_neoversev1
Add SBGEMM for arm neoversev1
9 months ago
Martin Kroeker
cae480683a
Merge pull request #5113 from martin-frbg/issue5112
Ensure that GEMMTR name appears in XERBLA if GEMMT was called as such
9 months ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
9 months ago
Martin Kroeker
ff30ac9666
Update Makefile
9 months ago
Martin Kroeker
7c3e169b67
Update gemmt.c
9 months ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
9 months ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
11 months ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
c139b63342
Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
9 months ago
John Hein
6cd9bbe531
fix signedness of pointer to integer type passed to blas_lock()
9 months ago