gxw
2c4a5cc6e6
LoongArch64: Fixed snrm2_lsx.S and cnrm2_lsx.S
When the data type is single-precision real or single-precision complex,
converting it to double precision does not prevent overflow (as exposed in LAPACK tests).
The only solution is to follow C's approach: find the maximum value in the
array and divide each element by that maximum to avoid this issue
7 months ago
gxw
9e75d6b3d1
LoongArch64: Fixed swap_lsx.S
Fixed the error when the stride is zero
7 months ago
gxw
e8c740368c
LoongArch64: Fixed rot_lsx.S ane crot_lsx.S
Do not check whether the input parameters c and s are zero,
as this may cause errors with special values (same as scal).
Although OpenBLAS's own test suite doesn't catch this, it will
cause LAPACK test cases to fail.
7 months ago
Hao Chen
c2212d0abd
LoongArch64: Fixed copy_lsx.S
Fixed incorrect store operation
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
7 months ago
Hao Chen
7f1ebc7ae6
LoongArch64: Fixed iamax_lsx.S
Fixed index retrieval issue when there are
identical maximum absolute values
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
7 months ago
Hao Chen
31d326f895
LoongArch64: Fixed dot_lsx.S
Fixed incorrect register usage in instructions
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
8 months ago
Hao Chen
5d6356bc16
LoongArch64: Fixed amax_lsx.S
Fixed register zeroing operation
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Signed-off-by: gxw <guxiwei-hf@loongson.cn>
8 months ago
Ye Tao
c748e6a338
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
10 months ago
Aditya Tewari
4379a6fbe3
* checkpoint sbgemm for SVE-256
11 months ago
Martin Kroeker
d7036cfd74
Remove trailing blanks that break the cmake parser
8 months ago
Martin Kroeker
6e393a5599
Merge branch 'develop' into gemv_t
8 months ago
Martin Kroeker
876ba58e28
Merge pull request #5091 from goplanid/develop
Small gemm kernel improvements for AArch64
8 months ago
Martin Kroeker
180ba5e7d0
Merge pull request #5069 from tingboliao/dev_rotm_20250107
Further rearranged the rotm kernel for the different architectures.
8 months ago
Deeksha Goplani
d1bfa979f7
small gemm kernel packing modifications
8 months ago
Martin Kroeker
1a6a9fb22f
add another generator line for rotm
8 months ago
Martin Kroeker
4924319c50
fix position of srotm, qrotm
8 months ago
Martin Kroeker
b58cba9eb6
fix qrotm build rules
8 months ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
Annop Wongwathanarat
c0318cea6e
Simplify gemv_t_sve_v1x3 kernel
8 months ago
Martin Kroeker
87083fdbf6
[WIP] Work around assembler limitations in current LLVM for Windows on Arm ( #5076 )
* Protect align directives in assembly files that are currently problematic with LLVM on WoA
* use the armv8 zdot on WoA to work around other LLVM issues
8 months ago
tingbo.liao
ef7f54b357
Optimized the gemm_tcopy_8_rvv to be compatible with the vlens 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
gxw
e0a8216554
LoongArch64: Update dsymv LSX version
8 months ago
gxw
a9070ba3f9
LoongArch64: Update ssymv LSX version
8 months ago
Xi Ruoyao
af10c132b8
LoongArch64: Fix dsymv and ssymv LASX version
"fmov.d $f2, $f4" leaves all the bits higher than the 63-th bit
unpredictable but it's obvious that the following code uses the value of
those high bits. We actually want to replicate the lower 64 bits here,
so we should use xvreplve0.d instead.
LA464 (Loongson 3[A-Z]-5000) happens to replicate them for us due to
some uarch internal details so the issue was not detected, but for LA664
(Loongson 3[A-Z]-6000) and future uarch we need to do things correctly
or we end up getting a lot of test failures.
Closes: https://bbs.aosc.io/t/topic/302
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
8 months ago
Martin Kroeker
d74eb02954
Merge pull request #5057 from martin-frbg/issue5050
Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds
8 months ago
Martin Kroeker
30f7a4120b
Merge pull request #5056 from tingboliao/dev_omatcopy_20250108
Optimize the omatcopy_cn/zomatcopy_cn kernels with RVV 1.0 intrinsic.
8 months ago
gxw
20a8e48f25
LoongArch64: Update ssymv LASX version
8 months ago
gxw
e0748588b8
LoongArch64: Update dsymv LASX version
8 months ago
Martin Kroeker
d91d4fa6e9
convert the beta=0 branch to a for loop as well
8 months ago
Martin Kroeker
09e75f1588
fix absurd typo
8 months ago
Martin Kroeker
2891fd8d6d
Replace while loop with for
8 months ago
tingbo.liao
0a5dbf13d3
Optimize the omatcopy_cn and zomatcopy_cn kernels with RVV 1.0 intrinsic.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
Sergey Fedorov
229efa42ff
scal.S: use r11 on 32-bit Darwin on powerpc
9 months ago
Sergey Fedorov
81e1be8d90
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9
.
9 months ago
Martin Kroeker
9b9c0aa5c9
temporarily disable the default S/DSCAL kernel
9 months ago
tingbo.liao
c37509c213
Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
tingbo.liao
0bea1cfd9d
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
tingbo.liao
d00cc400b1
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
9 months ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
9 months ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
9 months ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
9 months ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
9 months ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
9 months ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
9 months ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
9 months ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
9 months ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
9 months ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
9 months ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
9 months ago