Vaisakh K V
f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1
9 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
11 months ago
Martin Kroeker
a64b75a2e0
Merge pull request #5127 from Harishmcw/gesv-threshold
Refined GESV Parallelization Logic for Windows on ARM64
9 months ago
Martin Kroeker
453efbd103
Merge pull request #5128 from martin-frbg/issue5120
Add -O2 to flang flags when building on WoA in Release mode
9 months ago
Martin Kroeker
877d5a5be6
Add -O2 to flang flags when building on WoA in Release mode
9 months ago
Martin Kroeker
8d487ef6eb
Merge pull request #5124 from XiWeiGu/LoongArch64-LA264-lapack-fixed
LoongArch64: Fixed lapack test for LA264
9 months ago
Harish-Gits
daf16b8229
Adjusted GESV threading logic for optimal performance on WoA
9 months ago
Martin Kroeker
e8b11a126b
Merge pull request #5125 from martin-frbg/issue5122
Fix SGEMV on POWER8 by reverting to the non-vectorized earlier code
9 months ago
Martin Kroeker
9a3948df82
Merge pull request #5126 from martin-frbg/cirrusbsd4
CirrusCI: Update FreeBSD jobs to 14.2
9 months ago
Martin Kroeker
7f1f776f58
Update FreeBSD jobs to 14.2
9 months ago
Martin Kroeker
81eed868b6
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
Martin Kroeker
98b5ef929c
Restore the non-vectorized code from before PR4880 for POWER8
9 months ago
gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
9 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
9 months ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
9 months ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
9 months ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
9 months ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
10 months ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
10 months ago
Martin Kroeker
f42ce7067f
Merge pull request #5116 from martin-frbg/issue5110
Handle INCX=0 in ?NRM2
9 months ago
Martin Kroeker
7478c10268
Merge branch 'OpenMathLib:develop' into issue5110
9 months ago
Martin Kroeker
c54f5417cc
Merge pull request #5118 from martin-frbg/zrot_utestext
Disable extended utests for CSROT/ZDROT that invoke undefined behavior
9 months ago
Martin Kroeker
57208b8bce
Disable tests with incx,incy=0 (undefined behavior)
9 months ago
Martin Kroeker
3a4a9b21eb
Disable tests with incx,incy=0 (undefined behavior)
9 months ago
Martin Kroeker
60d0be0e97
Update nrm2.c
9 months ago
Martin Kroeker
0fd5448b2c
Handle INCX=0
9 months ago
Martin Kroeker
1b85b6a396
Merge pull request #5108 from taoye9/sbgemm_neoversev1
Add SBGEMM for arm neoversev1
9 months ago
Martin Kroeker
cae480683a
Merge pull request #5113 from martin-frbg/issue5112
Ensure that GEMMTR name appears in XERBLA if GEMMT was called as such
9 months ago
Martin Kroeker
db7e5f1fa7
Update gemmt.c
9 months ago
Martin Kroeker
ff30ac9666
Update Makefile
9 months ago
Martin Kroeker
7c3e169b67
Update gemmt.c
9 months ago
Martin Kroeker
09414a4187
Ensure that GEMMTR name appears in XERBLA if gemmt was called as such
9 months ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
11 months ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
1 year ago
Martin Kroeker
c139b63342
Merge pull request #5107 from jhgit/develop
fix signedness of pointer to integer type passed to blas_lock()
9 months ago
John Hein
6cd9bbe531
fix signedness of pointer to integer type passed to blas_lock()
9 months ago
Martin Kroeker
5de5072940
Improve flang-new identification and add CI job for it on OSX-x86_64 ( #5103 )
* AzureCI: Add LLVM/flang-new build on OSX-x86_64
* distinguish classic flang from flang-new in name based recognition
9 months ago
Martin Kroeker
1f74fb9a07
Merge pull request #5101 from martin-frbg/issue5100
Fix CMake build for PPCG4 breaking due to unparsable KERNEL file
10 months ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
10 months ago
Martin Kroeker
3375a0c990
Merge pull request #5099 from martin-frbg/issue5097-2
Simplify build instructions for Windows on Arm
10 months ago
Martin Kroeker
7a27e2b00d
Simplify build instructions for Windows on Arm
10 months ago
Martin Kroeker
fdeac17237
Merge pull request #5098 from martin-frbg/issue5095
Fix compilation with BUILD_BFLOAT16 enabled
10 months ago
Martin Kroeker
1829ac5b44
Add (dummy) declaration of SBROT_M
10 months ago
Martin Kroeker
53d20a83f3
Merge pull request #5089 from annop-w/gemv_t
Simplify gemv_t_sve_v1x3 kernel
10 months ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
10 months ago
Martin Kroeker
9b11fd5802
Merge pull request #5088 from michalowski-arm/develop
Add thread throttling profile for SGEMV on `NEOVERSEV1`
10 months ago
Martin Kroeker
5930c162ef
Merge pull request #5097 from matthew-brett/fix-woa-cmd
Fix Windows on ARM build instructions
10 months ago
Marek Michalowski
838bb57e27
Merge branch 'develop' into develop
10 months ago
Matthew Brett
252c43265d
Fix Windows on ARM build instructions
The command as merged uses the compiler target as the compiler path.
I have run and tested a build with this command.
@Mugundanmcw - is this correct?
10 months ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
10 months ago