Martin Kroeker
d714013ab9
change sgemm kernel to 4x4 as the 16x4 altivec goes out of bounds
1 year ago
Martin Kroeker
de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
30af9278dc
LoongArch64: Enable cmake cross-compilation
1 year ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Deeksha Goplani
4894c54055
Improve TN case with further unrolling
1 year ago
Martin Kroeker
e05d98d00a
expressly use fld.d/fst.d for floating point registers instead of LD/ST macros
1 year ago
Chip Kerchner
a0aeba631d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
083faf7556
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Chip Kerchner
75472b830a
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Henry Chen
ef94b96530
Use ldc1 and sdc1 for the prologue and epilogue on LOONGSON3A
This fix is similar to
2d8064174c
.
1 year ago
Martin Kroeker
7ca835a82c
address clang array overflow warning
1 year ago
Martin Kroeker
46e331a917
remove the unworkable GEMM3M restriction from GENERIC again
1 year ago
Martin Kroeker
ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM
1 year ago
Martin Kroeker
f1c9803f9a
add proper return statement
1 year ago
Martin Kroeker
60abcc3991
add proper return statement
1 year ago
Chip Kerchner
1a7b8c650d
Merge branch 'develop' into betterPowerGEMVTail
1 year ago
Martin Kroeker
9afd0c8afd
Merge pull request #4814 from Mousius/gemv-proxy
Forward GEMM to GEMV when one argument is actually a vector
1 year ago
Martin Kroeker
edbf093c98
Update zarch SCAL kernels to handle INF and NAN arguments ( #4829 )
* handle INF and NAN in input (for S/D only if DUMMY2 argument is set)
1 year ago
Chris Sidebottom
ba2e989c67
Add accumulators to AArch64 GEMV Kernels
This helps to reduce values going missing as we accumulate.
1 year ago
Martin Kroeker
a875304eb0
fix inverted conditional for NAN handling
1 year ago
Martin Kroeker
24acdd6bbb
correct offset
1 year ago
Martin Kroeker
fb7c53c5e5
Merge pull request #4807 from martin-frbg/scalfixes
[WIP]Make NAN handling in the SCAL kernels depend on the dummy2 parameter
1 year ago
Martin Kroeker
15c53dd2e0
Merge pull request #4794 from XiWeiGu/Fixed_Numpy_CI_Test
Try to fixed numpy ci test failures
1 year ago
Martin Kroeker
a4e56e0452
Merge pull request #4806 from Mousius/small-gemm
Small GEMM for AArch64 with SVE
1 year ago
yamazaki-mitsufumi
88caf02f62
Fix ambiguous error on Mac OS
1 year ago
Martin Kroeker
b613754143
Update scal..c
1 year ago
Martin Kroeker
f5d04318e3
Merge branch 'OpenMathLib:develop' into scalfixes
1 year ago
Martin Kroeker
73f8866ffb
make NAN handling depend on DUMMY2 parameter
1 year ago
Martin Kroeker
dfbc2348a8
fix NAN handling
1 year ago
Martin Kroeker
c064319ecb
fix alpha=NAN case
1 year ago
Martin Kroeker
c2ffd90e8c
make NAN handling depend on dummy2 parameter
1 year ago
Chris Sidebottom
ea4ab3b310
Better header guard around bridge
1 year ago
Chris Sidebottom
7311d93016
Unroll TT further
1 year ago
Martin Kroeker
a815594fd1
Merge pull request #4801 from markdryan/markdryan/riscv-dynamic-arch
Add autodetection for riscv64
1 year ago
Martin Kroeker
dd6c33d34d
make NAN handling depend on dummy2 parameter
1 year ago
Hong Bo Peng
db98f8753f
Try to fix LAPACK testing failures on P7.
1. Remove the FADD insn from the GEMV Transpose code.
2. Remove the FADD insn from GEMM and ZGEMM code.
3. Reorder the compution of the Imaginary part in ZGEMM code.
1 year ago
Chris Sidebottom
a9edddb695
Unroll TN further
1 year ago
Chris Sidebottom
9984c5ce9d
Clean up k2 removal more and unroll SGEMM more
1 year ago
Chris Sidebottom
b1c9fafabb
Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM
1 year ago
Martin Kroeker
2020569705
fix NAN handling and make it depend on dummy2 parameter
1 year ago
Martin Kroeker
3870995f01
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
7284c533b5
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
73751218a4
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
b9bfc8ce09
make NAN handling depend on dummy2 parameter
1 year ago
Martin Kroeker
eb4879e04c
make NAN handling depend on the dummy2 parameter
1 year ago
Martin Kroeker
ee87cb90d0
Merge pull request #4803 from iha-taisei/SVESupportSDGEMV
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago
gxw
34b80ce03f
mips64: Fixed numpy CI failure
1 year ago
gxw
f6d6c14a96
mips: Fixed numpy CI failure
1 year ago
Chip Kerchner
ba47c7f4f3
Vectorize reduction stage of sgemv_t.
1 year ago
iha fujitsu
0985fdc82b
A64FX: Add support for SVE to SGEMV/DGEMV kernels.
1 year ago