You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 10 months ago
..
KERNEL fix assignment of default CSUM kernel 1 year ago
KERNEL.A64FX A64FX: Add support for SVE to SGEMV/DGEMV kernels. 1 year ago
KERNEL.ARMV8 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.ARMV8SVE Small GEMM for AArch64 1 year ago
KERNEL.ARMV9SME Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
KERNEL.CORTEXA53 optimize cgemm on ARM cortex A53 & cortex A55 3 years ago
KERNEL.CORTEXA55 Reduce duplication in kernel definitions 1 year ago
KERNEL.CORTEXA57 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.CORTEXA72 Simplifying ARMv8 build parameters 6 years ago
KERNEL.CORTEXA73 Simplifying ARMv8 build parameters 6 years ago
KERNEL.CORTEXA76 Add support for Cortex-A76 1 year ago
KERNEL.CORTEXA510 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.CORTEXA710 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.CORTEXX1 CortexX1 is ARMV8 like A7x 3 years ago
KERNEL.CORTEXX2 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.EMAG8180 Add preliminary support for EMAG8180 5 years ago
KERNEL.FALKOR Simplifying ARMv8 build parameters 6 years ago
KERNEL.FT2000 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 3 years ago
KERNEL.NEOVERSEN1 revert the C/Z NRM2 kernels to the base NEON kernel as well 1 year ago
KERNEL.NEOVERSEN2 Merge pull request #3846 from lilh9598/sbgemm_opt 2 years ago
KERNEL.NEOVERSEV1 Add accumulators to AArch64 GEMV Kernels 1 year ago
KERNEL.NEOVERSEV2 Correctly detect ARM Neoverse V2 CPUs. 1 year ago
KERNEL.THUNDERX Add workaround for NVIDIA HPC 4 years ago
KERNEL.THUNDERX2T99 Add SVE implementation for sdot/ddot 2 years ago
KERNEL.THUNDERX3T110 Reduce duplication in kernel definitions 1 year ago
KERNEL.TSV110 Add workaround for NVIDIA HPC 4 years ago
KERNEL.VORTEX Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
KERNEL.generic Fix MSVC ARM64 build. Add generic kernel for ARM64 3 years ago
Makefile added experimental support for ARMV8 12 years ago
amax.S ARM64: Convert all labels to local labels 8 years ago
asum.S ARM64: Convert all labels to local labels 8 years ago
axpy.S ARM64: Convert all labels to local labels 8 years ago
casum.S ARM64: Convert all labels to local labels 8 years ago
casum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
cgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
cgemm_kernel_8x4.S move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
cgemm_kernel_8x4_cortexa53.c optimize cgemm on ARM cortex A53 & cortex A55 3 years ago
cgemm_kernel_8x4_thunderx2t99.S Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
cgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
cgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
cgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
copy.S ARM64: Convert all labels to local labels 8 years ago
copy_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
csum.S Add ARM64 implementations of ?sum 6 years ago
csum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 1 year ago
ctrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
ctrmm_kernel_8x4.S Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
ctrmm_kernel_sve_v1x4.S add cgemm ctrmm sve kernels 3 years ago
dasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
daxpy_thunderx.c aarch64 fix std=c18 compilation 5 years ago
daxpy_thunderx2t99.S ARM64: Improve DAXPY for ThunderX2 5 years ago
ddot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dgemm_beta.S Fix zero initialization for beta=0 case 5 years ago
dgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_4x4_cortexa53.c MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 3 years ago
dgemm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4_thunderx2t99.S ARM64: Move parameters from parameter.c to param.h 7 years ago
dgemm_kernel_sve_v1x8.S some clean-up & commentary 3 years ago
dgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 2 years ago
dgemm_ncopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_ncopy_8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_tn_sve.c Improve TN case with further unrolling 1 year ago
dgemm_small_kernel_tt_sve.c Better header guard around bridge 1 year ago
dgemm_tcopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_tcopy_8.S Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
dot.S ARM64: Fix utest dsdot errors 7 years ago
dot.c Wrap SVE header with __has_include check 2 years ago
dot_kernel_asimd.c Add SVE implementation for sdot/ddot 2 years ago
dot_kernel_sve.c add clobber list 1 year ago
dot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dtrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_8x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
dtrmm_kernel_sve_v1x8.S some clean-up & commentary 3 years ago
dznrm2_thunderx2t99.c remove another early exit for incx < 0 1 year ago
dznrm2_thunderx2t99_fast.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
gemm_ncopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_ncopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemm_small_kernel_permit_sve.c Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 1 year ago
gemm_tcopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_tcopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemv_n.S ARM64: Convert all labels to local labels 8 years ago
gemv_n_sve.c Fix ambiguous error on Mac OS 1 year ago
gemv_t.S Add accumulators to AArch64 GEMV Kernels 1 year ago
gemv_t_sve.c Add accumulators to AArch64 GEMV Kernels 1 year ago
iamax.S ARM64: Convert all labels to local labels 8 years ago
iamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
izamax.S ARM64: Convert all labels to local labels 8 years ago
izamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
nrm2.S Fix accidental duplication of jump instruction 6 years ago
rot.S ARM64: Convert all labels to local labels 8 years ago
sasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
sbgemm_beta_neoversen2.c neoverse n2 sbgemm: init file 3 years ago
sbgemm_kernel_8x4_neoversen2.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_kernel_8x4_neoversen2_impl.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_ncopy_4_neoversen2.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_ncopy_8_neoversen2.c bugfix for sbgemm_ncopy_8_neoversen2 2 years ago
sbgemm_tcopy_4_neoversen2.c Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2 years ago
sbgemm_tcopy_8_neoversen2.c Improve the performance of sbgemm_tcopy on neoversen2 2 years ago
scal.S make NAN handling depend on the dummy2 parameter 1 year ago
scnrm2_thunderx2t99.c remove another early exit for incx < 0 1 year ago
sgemm_beta.S fix initialization to zero in the NEON SGEMM_BETA kernel as well 5 years ago
sgemm_direct_arm64_sme1.c Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
sgemm_direct_sme1.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
sgemm_direct_sme1_preprocess.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
sgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 5 years ago
sgemm_kernel_16x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_16x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_sve_v1x8.S add sgemm kernel and copy functions for sgemm and ssymm 3 years ago
sgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 2 years ago
sgemm_ncopy_4.S change line endings from CRLF to LF 2 years ago
sgemm_ncopy_8.S sgemm copy source init 5 years ago
sgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_tn_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_tt_sve.c Better header guard around bridge 1 year ago
sgemm_tcopy_8.S sgemm copy source init 5 years ago
sgemm_tcopy_16.S change line endings from CRLF to LF 2 years ago
strmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8_cortexa53.S use general register to speedup 5 years ago
strmm_kernel_16x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
strmm_kernel_sve_v1x8.S strmm sve v1x8 kernel 3 years ago
sum.S Add ARM64 implementations of ?sum 6 years ago
swap.S ARM64: Convert all labels to local labels 8 years ago
swap_thunderx2t99.S THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 8 years ago
symm_lcopy_sve.c Disambiguate whilelt 2 years ago
symm_ucopy_sve.c Disambiguate whilelt 2 years ago
trmm_lncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_ltcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_uncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_utcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_kernel_LN_sve.c add sve ztrsm 3 years ago
trsm_kernel_LT_sve.c add sve ztrsm 3 years ago
trsm_kernel_RN_sve.c add sve ztrsm 3 years ago
trsm_kernel_RT_sve.c add sve ztrsm 3 years ago
trsm_lncopy_sve.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
trsm_uncopy_sve.c Disambiguate whilelt 2 years ago
trsm_utcopy_sve.c Disambiguate whilelt 2 years ago
zamax.S Fix the functional bugs for zamax. 5 years ago
zasum.S ARM64: Convert all labels to local labels 8 years ago
zasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
zaxpy.S ARM64: Convert all labels to local labels 8 years ago
zdot.S ARM64: Convert all labels to local labels 8 years ago
zdot_thunderx2t99.c Add a clobber list to fix utest errors seen with gcc13 on Apple M 1 year ago
zgemm_kernel_4x4.S move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
zgemm_kernel_4x4_cortexa53.c MOD: add comments to a53 zgemm kernel 3 years ago
zgemm_kernel_4x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
zgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
zgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemv_n.S ARM64: Convert all labels to local labels 8 years ago
zgemv_t.S ARM64: Convert all labels to local labels 8 years ago
zhemm_ltcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
zhemm_utcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
znrm2.S Remove automatic label postfixes from macro included only once 6 years ago
zrot.S ARM64: Convert all labels to local labels 8 years ago
zscal.S Fix handling of NAN 1 year ago
zsum.S Add ARM64 implementations of ?sum 6 years ago
zsum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 1 year ago
zsymm_lcopy_sve.c Disambiguate whilelt 2 years ago
zsymm_ucopy_sve.c Disambiguate whilelt 2 years ago
ztrmm_kernel_4x4.S Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
ztrmm_kernel_sve_v1x4.S fix sve ztrmm kernel 3 years ago
ztrmm_lncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_ltcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_uncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_utcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrsm_lncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_uncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_utcopy_sve.c Disambiguate whilelt 2 years ago