Martin Kroeker
675cd551da
fix improper function prototypes (empty parentheses)
2 years ago
gxw
d15e0a055c
LoongArch64: Fixed compilation issues when enable DYNAMIC_ARCH
2 years ago
gxw
4670eb1462
LoongArch64: Add dtrsm kernel
2 years ago
gxw
f2cf929374
LoongArch64: Add sgemv kernel
2 years ago
Martin Kroeker
8e6d93359d
Merge pull request #4196 from TiborGY/obsolete_inlines
Modernize obsolete inline order
2 years ago
gxw
394a1fd1bf
LoongArch64: Compatible with early internal toolchain
__loongarch_grlen and __loongarch_frlen were introduced in gcc version 8.3.0
(Loongnix 8.3.0-6.lnd.vec.31) internally within Loongson to standardize the
general and floating-point register widths. However, previous versions did
not have them, requiring additional checks to be added.
2 years ago
Martin Kroeker
9c4ae4d4fb
Merge pull request #4206 from martin-frbg/issue4201-2
Work around miscompilation of zdot_thunderx2t99 by the current NVIDIA HPC compiler
2 years ago
Martin Kroeker
88435104c8
Merge pull request #4204 from martin-frbg/llvm17-2
Work around LLVM17 miscompiling the AVX512 microkernels for CASUM/ZASUM
2 years ago
Martin Kroeker
fc8894dd98
Workaround miscompilation by NVIDIA nvc
2 years ago
Martin Kroeker
7a6203ffa1
restore default Neoverse SVE build instructions for non-NVIDIA compilers
2 years ago
Martin Kroeker
2c3034ff7f
Disable the C/ZASUM AVX512 microkernels when compiling with LLVM17 as well
2 years ago
Martin Kroeker
8794544b43
Add support for compiling the Neoverse SVE kernels with the NVIDIA HPC compiler
2 years ago
gxw
553cc1372f
LoongArch64: Add sgemm_kernel
2 years ago
Martin Kroeker
12ede72ab7
Merge pull request #4192 from imciner2/im/clangfix
Fix cooperlake and sapphire rapids march flags on clang
2 years ago
Ian McInerney
79c15db348
Fix power10 gcc intrinsic check
__builtin_vsx_assemble_pair was only in GCC 10-11.2 and was replaced by
__builtin_vsx_build_pair thereafter.
2 years ago
TGY
b5ba95a6c0
Modernize obsolete inline order
2 years ago
Ian McInerney
8a8a8479be
Fix cooperlake and sapphire rapids march flags on clang
The march=cooperlake and march=sapphirerapids flags were never getting
added when building with Clang targetting those architectures. Instead
it was falling back to the skylake AVX512 implementation.
Clang added support for these two architectures in Clang 9 and Clang 12,
so introduce new checks for those versions to enable the appropriate
march flag, and fallback to skylake otherwise.
2 years ago
Martin Kroeker
34da1a067d
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
07e32c4cb8
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
c211da0688
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
a34a0a7abc
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
54d3246fc6
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
7dd441d5db
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
f692178792
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
d15ffb7fdf
Allow negative INCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
a2d867f4d1
Allow negative iNCX (API change from version 3.10 of the reference implementation)
2 years ago
Martin Kroeker
afdc56a421
Merge pull request #4158 from XiWeiGu/loongarch64_update_dgemm_kernel
LoongArch64: Update dgemm kernel
2 years ago
gxw
e8b571d245
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S V2
2 years ago
gxw
71fcee6eef
LoongArch64: Update dgemm kernel
2 years ago
Martin Kroeker
0f521ece25
Merge pull request #4183 from martin-frbg/issue4181
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC in gmake builds
2 years ago
Martin Kroeker
41c31bc1d4
Revert "LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S"
2 years ago
Martin Kroeker
61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC
2 years ago
Martin Kroeker
f8ee309402
Merge pull request #4153 from XiWeiGu/dgemv
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2 years ago
gxw
ec1e96aac8
LoongArch64: Add dgemv_t_8_lasx.S and dgemv_n_8_lasx.S
2 years ago
gxw
d46772e037
LoongArch64: Add compiler feature checks
2 years ago
Martin Kroeker
4664b57e6e
use shortcut only when both incx and incy are zero
2 years ago
Martin Kroeker
09131f79a6
Merge pull request #4164 from martin-frbg/issue4162
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2 years ago
Martin Kroeker
6a428b5629
Update casum_microk_skylakex-2.c
2 years ago
Martin Kroeker
ebb447e32e
Update zasum_microk_skylakex-2.c
2 years ago
Martin Kroeker
9f6847583a
nvc currently miscompiles this, hopefully fixed in release 23.09
2 years ago
Martin Kroeker
fe54ee3d15
nvc currently miscompiles this, hopefully fixed in release 23.09
2 years ago
Martin Kroeker
5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
2 years ago
Chris Sidebottom
84a268b6ca
Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core
This patch removes the prefetches from cgemm/zgemm which improves the performance similar to sgemm/dgemm did in #3868 , this means I'm happy to enable this on any applicable cores.
I also replicated the unrolling the copies from sgemm and dgemm.
2 years ago
Chris Sidebottom
730ca04b48
Fix ZHEMM copy for SVE
Whilst disambiguating whilelt, I inadvertantly used the wrong datatype
for offsets, which can be negative. This rectifies that.
2 years ago
Martin Kroeker
2a62d2df96
Enable use of AVX512 microkernels with NVIDIA HPC from version 22.3
2 years ago
Martin Kroeker
849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels
Use latest non-SVE kernels in ARMV8SVE
2 years ago
Chris Sidebottom
24586bc4ff
Disambiguate whilelt
2 years ago
Chris Sidebottom
aea2a4622b
Use latest non-SVE kernels in ARMV8SVE
These are generally better and, in some cases, include threading which helps in the cores we're targeting here.
2 years ago
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2 years ago
Martin Kroeker
76ef1672f8
Override DSDOT with generic code to get rid of qemu precision error
2 years ago