You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 2 months ago
..
KERNEL Further rearranged the rotm kernel for the different architectures. 8 months ago
KERNEL.A64FX Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
KERNEL.AMPERE1 reduce duplicate kernel code 3 months ago
KERNEL.ARMV8 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.ARMV8SVE Use SVE kernel for S/DGEMVN for SVE machines 5 months ago
KERNEL.ARMV9SME Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
KERNEL.CORTEXA53 optimize cgemm on ARM cortex A53 & cortex A55 3 years ago
KERNEL.CORTEXA55 Reduce duplication in kernel definitions 1 year ago
KERNEL.CORTEXA57 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.CORTEXA72 Simplifying ARMv8 build parameters 6 years ago
KERNEL.CORTEXA73 Simplifying ARMv8 build parameters 6 years ago
KERNEL.CORTEXA76 Add support for Cortex-A76 1 year ago
KERNEL.CORTEXA510 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.CORTEXA710 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.CORTEXX1 CortexX1 is ARMV8 like A7x 3 years ago
KERNEL.CORTEXX2 Fix outdated SVE kernel definitions for Cortex cpus by aliasing to ARMV8SVE 1 year ago
KERNEL.EMAG8180 Add preliminary support for EMAG8180 5 years ago
KERNEL.FALKOR Simplifying ARMv8 build parameters 6 years ago
KERNEL.FT2000 Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 3 years ago
KERNEL.NEOVERSEN1 Merge pull request #5225 from annop-w/gemv_n 5 months ago
KERNEL.NEOVERSEN2 Use SVE kernel for S/DGEMVN for SVE machines 5 months ago
KERNEL.NEOVERSEV1 Optimized BGEMV for NEOVERSEV1 target 2 months ago
KERNEL.NEOVERSEV2 Add dispatch of SBGEMVNKERNEL for NEOVERSEN2 and NEOVERSEV2 7 months ago
KERNEL.THUNDERX Add workaround for NVIDIA HPC 4 years ago
KERNEL.THUNDERX2T99 Add SVE implementation for sdot/ddot 2 years ago
KERNEL.THUNDERX3T110 Reduce duplication in kernel definitions 1 year ago
KERNEL.TSV110 Add workaround for NVIDIA HPC 4 years ago
KERNEL.VORTEX Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
KERNEL.generic Further rearranged the rotm kernel for the different architectures. 8 months ago
Makefile added experimental support for ARMV8 12 years ago
amax.S ARM64: Convert all labels to local labels 8 years ago
asum.S ARM64: Convert all labels to local labels 8 years ago
axpy.S ARM64: Convert all labels to local labels 8 years ago
bgemm_beta_neon.c Add optimized BGEMM kernel for NEOVERSEV1 target 2 months ago
bgemm_kernel_4x4_neoversev1.c Add optimized BGEMM kernel for NEOVERSEV1 target 2 months ago
bgemm_kernel_4x4_neoversev1_impl.c Add optimized BGEMM kernel for NEOVERSEV1 target 2 months ago
bgemv_n_sve_v3x4.c Optimized BGEMV for NEOVERSEV1 target 2 months ago
casum.S ARM64: Convert all labels to local labels 8 years ago
casum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
cgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
cgemm_kernel_8x4.S move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
cgemm_kernel_8x4_cortexa53.c optimize cgemm on ARM cortex A53 & cortex A55 3 years ago
cgemm_kernel_8x4_thunderx2t99.S Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
cgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
cgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
cgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
copy.S ARM64: Convert all labels to local labels 8 years ago
copy_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 8 months ago
csum.S Add ARM64 implementations of ?sum 6 years ago
csum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 1 year ago
ctrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
ctrmm_kernel_8x4.S Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
ctrmm_kernel_sve_v1x4.S add cgemm ctrmm sve kernels 3 years ago
dasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 8 months ago
daxpy_thunderx.c aarch64 fix std=c18 compilation 5 years ago
daxpy_thunderx2t99.S ARM64: Improve DAXPY for ThunderX2 5 years ago
ddot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dgemm_beta.S Fix zero initialization for beta=0 case 5 years ago
dgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_4x4_cortexa53.c MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 3 years ago
dgemm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4_thunderx2t99.S ARM64: Move parameters from parameter.c to param.h 7 years ago
dgemm_kernel_sve_v1x8.S some clean-up & commentary 3 years ago
dgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 2 years ago
dgemm_ncopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_ncopy_8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
dgemm_small_kernel_tn_sve.c small gemm kernel packing modifications 8 months ago
dgemm_small_kernel_tt_sve.c small gemm kernel packing modifications 8 months ago
dgemm_tcopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_tcopy_8.S Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
dot.S ARM64: Fix utest dsdot errors 7 years ago
dot.c Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
dot_kernel_asimd.c Accumulate results in output register explicitly 3 months ago
dot_kernel_sve.c add clobber list 1 year ago
dot_kernel_sve_v8.c Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
dot_sve_v8.c Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
dot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dtrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_8x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
dtrmm_kernel_sve_v1x8.S some clean-up & commentary 3 years ago
dznrm2_thunderx2t99.c remove another early exit for incx < 0 1 year ago
dznrm2_thunderx2t99_fast.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
gemm_ncopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_ncopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemm_small_kernel_permit_sve.c Remove k2 loop from DGEMM TN and use a more conservative heuristic for SGEMM 1 year ago
gemm_tcopy_complex_sve_v1x4.c Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
gemm_tcopy_sve_v1x8.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
gemv_n.S ARM64: Convert all labels to local labels 8 years ago
gemv_n_sve.c Optimize gemv_n_sve kernel 6 months ago
gemv_n_sve_v1x3.c Optimize gemv_n_sve_v1x3 kernel 3 months ago
gemv_n_sve_v4x3.c fixed a potential out-of-bounds on gemv. 5 months ago
gemv_t.S Add accumulators to AArch64 GEMV Kernels 1 year ago
gemv_t_sve.c Add accumulators to AArch64 GEMV Kernels 1 year ago
gemv_t_sve_v1x3.c Simplify gemv_t_sve_v1x3 kernel 8 months ago
gemv_t_sve_v4x3.c Loop-unrolled transposed [SD]GEMV kernels for A64FX and Neoverse V1 10 months ago
iamax.S ARM64: Convert all labels to local labels 8 years ago
iamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
izamax.S ARM64: Convert all labels to local labels 8 years ago
izamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
nrm2.S Fix accidental duplication of jump instruction 6 years ago
rot.S ARM64: Convert all labels to local labels 8 years ago
rot.c Added Updated swap and rot sve kernels. 9 months ago
rot_kernel_c.c Added Updated swap and rot sve kernels. 9 months ago
rot_kernel_sve.c Added Updated swap and rot sve kernels. 9 months ago
sasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 8 months ago
sbgemm_beta_neoversen2.c neoverse n2 sbgemm: init file 3 years ago
sbgemm_beta_neoversev1.c * checkpoint sbgemm for SVE-256 8 months ago
sbgemm_kernel_4x4_neoversev1.c optimized sbgemm kernel for neoverse-v1 (sve-256) 7 months ago
sbgemm_kernel_4x4_neoversev1_impl.c optimized sbgemm kernel for neoverse-v1 (sve-256) 7 months ago
sbgemm_kernel_8x4_neoversen2.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_kernel_8x4_neoversen2_impl.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_ncopy_4_neoversen2.c Change file name to match the norm and delete useless code. 2 years ago
sbgemm_ncopy_4_neoversev1.c optimized sbgemm kernel for neoverse-v1 (sve-256) 7 months ago
sbgemm_ncopy_8_neoversen2.c bugfix for sbgemm_ncopy_8_neoversen2 2 years ago
sbgemm_tcopy_4_neoversen2.c Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2 years ago
sbgemm_tcopy_4_neoversev1.c Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 3 months ago
sbgemm_tcopy_8_neoversen2.c Improve the performance of sbgemm_tcopy on neoversen2 2 years ago
sbgemv_n_neon.c fix bugs in aarch64 sbgemv_n kernel 6 months ago
sbgemv_t_bfdot.c Optimized BGEMV for NEOVERSEV1 target 2 months ago
scal.S make NAN handling depend on the dummy2 parameter 1 year ago
scnrm2_thunderx2t99.c remove another early exit for incx < 0 1 year ago
sgemm_beta.S fix initialization to zero in the NEON SGEMM_BETA kernel as well 5 years ago
sgemm_direct_alpha_beta_arm64_sme1.c SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2 months ago
sgemm_direct_arm64_sme1.c Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 4 months ago
sgemm_direct_sme1.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
sgemm_direct_sme1_preprocess.S Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 7 months ago
sgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 5 years ago
sgemm_kernel_16x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_16x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_sve_v1x8.S add sgemm kernel and copy functions for sgemm and ssymm 3 years ago
sgemm_kernel_sve_v2x8.S Remove prefetches from SVE kernels 2 years ago
sgemm_ncopy_4.S Optimize aarch64 sgemm_ncopy 6 months ago
sgemm_ncopy_8.S Optimize aarch64 sgemm_ncopy 6 months ago
sgemm_small_kernel_nn_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_nt_sve.c Better header guard around bridge 1 year ago
sgemm_small_kernel_tn_sve.c small gemm kernel packing modifications 8 months ago
sgemm_small_kernel_tt_sve.c small gemm kernel packing modifications 8 months ago
sgemm_tcopy_8.S sgemm copy source init 5 years ago
sgemm_tcopy_16.S change line endings from CRLF to LF 2 years ago
sgemv_n_neon.c Improve performance for SGEMVN on NEONVERSEN1 5 months ago
sme_abi.h SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2 months ago
strmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8_cortexa53.S use general register to speedup 5 years ago
strmm_kernel_16x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
strmm_kernel_sve_v1x8.S strmm sve v1x8 kernel 3 years ago
sum.S Add ARM64 implementations of ?sum 6 years ago
swap.S ARM64: Convert all labels to local labels 8 years ago
swap.c Added Updated swap and rot sve kernels. 9 months ago
swap_kernel_c.c Added Updated swap and rot sve kernels. 9 months ago
swap_kernel_sve.c Update swap_kernel_sve.c 9 months ago
swap_thunderx2t99.S THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 8 years ago
symm_lcopy_sve.c Disambiguate whilelt 2 years ago
symm_ucopy_sve.c Disambiguate whilelt 2 years ago
symv_L_asimd_4x4.c Add symv kernels for arm64 5 months ago
symv_L_sve_v1x4.c Add symv kernels for arm64 5 months ago
symv_U_asimd_4x4.c Add symv kernels for arm64 5 months ago
symv_U_sve_v1x4.c Add symv kernels for arm64 5 months ago
symv_microk_asimd_4x4.c Add symv kernels for arm64 5 months ago
symv_microk_sve_v1x4.c Add symv kernels for arm64 5 months ago
trmm_lncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_ltcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_uncopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trmm_utcopy_sve_v1.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_kernel_LN_sve.c add sve ztrsm 3 years ago
trsm_kernel_LT_sve.c add sve ztrsm 3 years ago
trsm_kernel_RN_sve.c add sve ztrsm 3 years ago
trsm_kernel_RT_sve.c add sve ztrsm 3 years ago
trsm_lncopy_sve.c Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
trsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
trsm_uncopy_sve.c Disambiguate whilelt 2 years ago
trsm_utcopy_sve.c Disambiguate whilelt 2 years ago
zamax.S Fix the functional bugs for zamax. 5 years ago
zasum.S ARM64: Convert all labels to local labels 8 years ago
zasum_thunderx2t99.c [WIP] Work around assembler limitations in current LLVM for Windows on Arm (#5076) 8 months ago
zaxpy.S ARM64: Convert all labels to local labels 8 years ago
zdot.S ARM64: Convert all labels to local labels 8 years ago
zdot_thunderx2t99.c Add a clobber list to fix utest errors seen with gcc13 on Apple M 1 year ago
zgemm_kernel_4x4.S move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
zgemm_kernel_4x4_cortexa53.c MOD: add comments to a53 zgemm kernel 3 years ago
zgemm_kernel_4x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
zgemm_kernel_sve_v1x4.S Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
zgemm_ncopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemm_tcopy_sve_v1.c Disambiguate whilelt 2 years ago
zgemv_n.S ARM64: Convert all labels to local labels 8 years ago
zgemv_t.S ARM64: Convert all labels to local labels 8 years ago
zhemm_ltcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
zhemm_utcopy_sve.c Fix ZHEMM copy for SVE 2 years ago
znrm2.S Remove automatic label postfixes from macro included only once 6 years ago
zrot.S ARM64: Convert all labels to local labels 8 years ago
zscal.S fix handling of dummy2 3 months ago
zsum.S Add ARM64 implementations of ?sum 6 years ago
zsum_thunderx2t99.c add csum/zsum kernels (trivially derived from the asum ones)s) 1 year ago
zsymm_lcopy_sve.c Disambiguate whilelt 2 years ago
zsymm_ucopy_sve.c Disambiguate whilelt 2 years ago
ztrmm_kernel_4x4.S Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago
ztrmm_kernel_sve_v1x4.S fix sve ztrmm kernel 3 years ago
ztrmm_lncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_ltcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_uncopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrmm_utcopy_sve_v1.c Disambiguate whilelt 2 years ago
ztrsm_lncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_ltcopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_uncopy_sve.c Disambiguate whilelt 2 years ago
ztrsm_utcopy_sve.c Disambiguate whilelt 2 years ago