CDAC-SSDG
f62519cc87
Delete kernel/arm64/rot_kernel_sve.c
11 months ago
CDAC-SSDG
10857c9df4
Delete kernel/arm64/rot_kernel_c.c
11 months ago
CDAC-SSDG
b9f51a5cf7
Delete kernel/arm64/rot.c
11 months ago
Martin Kroeker
81666de4ef
Merge pull request #5007 from martin-frbg/issue5006
Revert the NRM2 kernels for NeoverseN2 and ARMV8SVE targets to the generic NEON version
11 months ago
Martin Kroeker
3345007d8f
retire the thunderx2 NRM2 kernels due to reported inaccuracies and NAN
11 months ago
Martin Kroeker
5fe983db29
retire the thunderx2 nrm2 kernels for now due to NAN and inaccuracies
11 months ago
Iha, Taisei
4918beecbe
Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1
11 months ago
Juliya32
3b2421cba0
Add files via upload
1 year ago
Juliya32
012fe4da36
Delete kernel/arm64/rot_kernel_sve.c
1 year ago
Juliya32
d90ee00f85
Delete kernel/arm64/rot_kernel_c.c
1 year ago
Juliya32
668e28adc4
Delete kernel/arm64/rot.c
1 year ago
SushilPratap04
fa880ab1cf
Update KERNEL.ARMV8SVE
updated KERNEL.ARMV8SVE for level 1 sve (swap, rot and scal) kernels.
1 year ago
SushilPratap04
7822ae9617
Added sve kernels for rot routine.
1 year ago
SushilPratap04
b8bc2a752e
Added sve optimized kernels for swap routine
1 year ago
CDAC-SSDG
0667cf6c92
Added optimized scal routine files
1 year ago
gxw
73c6a28073
x86_64: opt somatcopy_ct with AVX
1 year ago
Ayappan Perumal
020cce1068
Fix build issues with gcc compiler as well
1 year ago
Ayappan Perumal
b6ec73e77c
Fix AIX build
1 year ago
Martin Kroeker
016bdb9b0b
Merge pull request #4946 from XiWeiGu/la64_omatcopy_lasx
LoongArch64: Opt somatcopy with LASX
1 year ago
Chip Kerchner
ab71a1edf2
Better VSX.
1 year ago
gxw
bb31bbef52
LoongArch64: Opt somatcopy_ct with LASX
1 year ago
gxw
b37129341b
LoongArch64: Opt somatcopy_cn with LASX
1 year ago
gxw
acf6cab304
LoongArch64: Opt somatcopy_rn with LASX
1 year ago
gxw
15edb441bf
LoongArch64: Opt somatcopy_rt with LASX
1 year ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
1 year ago
Martin Kroeker
e52d9b4cf1
Merge pull request #4928 from austinpagan/czgemm_in_c
CGEMM & ZGEMM using C code, Power only, P10 only.
1 year ago
Gordon Fossum
0b7fb5c791
CGEMM & ZGEMM using C code.
1 year ago
Martin Kroeker
9783dd07ab
Rename KERNEL.LOONGSONGENERIC to KERNEL.LA64_GENERIC
1 year ago
Martin Kroeker
c9e92348a6
Handle inf/nan if dummy2 flag is set
1 year ago
Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
1 year ago
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
083faf7556
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
75472b830a
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Henry Chen
ef94b96530
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c .
1 year ago
Martin Kroeker
7ca835a82c
address clang array overflow warning
1 year ago
Martin Kroeker
46e331a917
remove the unworkable GEMM3M restriction from GENERIC again
1 year ago
Martin Kroeker
ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM
1 year ago
Martin Kroeker
f1c9803f9a
add proper return statement
1 year ago
Martin Kroeker
60abcc3991
add proper return statement
1 year ago
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Martin Kroeker
9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
1 year ago
Martin Kroeker
edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments ( #4829 )
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
1 year ago
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
1 year ago
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
1 year ago
Martin Kroeker
24acdd6bbb
correct offset
1 year ago