Sergey Fedorov
229efa42ff
scal.S: use r11 on 32-bit Darwin on powerpc
9 months ago
Sergey Fedorov
81e1be8d90
Revert "temporarily disable the default S/DSCAL kernel"
This reverts commit 9b9c0aa5c9
.
9 months ago
Martin Kroeker
9b9c0aa5c9
temporarily disable the default S/DSCAL kernel
9 months ago
tingbo.liao
c37509c213
Optimize the nrm2_rvv function to further improve performance.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
tingbo.liao
0bea1cfd9d
Optimize the zgemm_tcopy_4_rvv function to be compatible with the situations where the vector lengths(vlens) are 128 and 256.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
tingbo.liao
d00cc400b1
Replaced the __riscv_vid_v_i32m2 and __riscv_vid_v_i64m2 with __riscv_vid_v_u32m2 and __riscv_vid_v_u64m2 for riscv64-unknown-linux-gnu-gcc compiling.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
9 months ago
Martin Kroeker
229d8a025e
Merge pull request #4959 from CDAC-Bengaluru/level-1-sve
SVE Implementation for Level-1 BLAS Routines
9 months ago
SushilPratap04
3368a4e697
Update swap_kernel_sve.c
9 months ago
CDAC-SSDG
dd71e4234a
Added Updated swap and rot sve kernels.
9 months ago
CDAC-SSDG
06ffd411a5
Update KERNEL.ARMV8SVE
9 months ago
CDAC-SSDG
765850194e
Delete kernel/arm64/swap_kernel_sve.c
9 months ago
CDAC-SSDG
c17c19fbcf
Delete kernel/arm64/swap_kernel_c.c
9 months ago
CDAC-SSDG
f6416c0e37
Delete kernel/arm64/swap.c
9 months ago
CDAC-SSDG
3b7b74664c
Delete kernel/arm64/scal_kernel_sve.c
9 months ago
CDAC-SSDG
95a97012e8
Delete kernel/arm64/scal_kernel_c.c
9 months ago
CDAC-SSDG
5540f2121e
Delete kernel/arm64/scal.c
9 months ago
CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
9 months ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
9 months ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
9 months ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
10 months ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
10 months ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
10 months ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
10 months ago
Juliya32
3b2421cba0
Add files via upload
11 months ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
11 months ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
11 months ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
11 months ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
11 months ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
11 months ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
11 months ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
11 months ago
gxw
73c6a28073
x86_64: opt somatcopy_ct with AVX
11 months ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
11 months ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
11 months ago
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
11 months ago
Chip Kerchner
ab71a1edf2
Better VSX.
11 months ago
gxw
bb31bbef52
LoongArch64: Opt somatcopy_ct with LASX
11 months ago
gxw
b37129341b
LoongArch64: Opt somatcopy_cn with LASX
11 months ago
gxw
acf6cab304
LoongArch64: Opt somatcopy_rn with LASX
11 months ago
gxw
15edb441bf
LoongArch64: Opt somatcopy_rt with LASX
11 months ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
11 months ago
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
1 year ago
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
1 year ago
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
1 year ago
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago