You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 3 years ago
..
KERNEL Fix paths to C kernels for nrm2 7 years ago
KERNEL.A64FX fix UNROLL_MN and add to targets for SVE 4 years ago
KERNEL.ARMV8 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.ARMV8SVE fix UNROLL_MN and add to targets for SVE 4 years ago
KERNEL.CORTEXA53 optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
KERNEL.CORTEXA55 optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
KERNEL.CORTEXA57 Add workaround for NVIDIA HPC mishandling of the asm DOT kernels 4 years ago
KERNEL.CORTEXA72 Simplifying ARMv8 build parameters 7 years ago
KERNEL.CORTEXA73 Simplifying ARMv8 build parameters 7 years ago
KERNEL.EMAG8180 Add preliminary support for EMAG8180 5 years ago
KERNEL.FALKOR Simplifying ARMv8 build parameters 7 years ago
KERNEL.NEOVERSEN1 arm64: Fix nrm2 for input vectors with Inf 4 years ago
KERNEL.NEOVERSEN2 OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 3 years ago
KERNEL.NEOVERSEV1 OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 3 years ago
KERNEL.THUNDERX Add workaround for NVIDIA HPC 4 years ago
KERNEL.THUNDERX2T99 arm64: Fix nrm2 for input vectors with Inf 4 years ago
KERNEL.THUNDERX3T110 arm64: Fix nrm2 for input vectors with Inf 4 years ago
KERNEL.TSV110 Add workaround for NVIDIA HPC 4 years ago
KERNEL.VORTEX Use Neoverse's current mix of ThunderX2 kernels for Vortex as well 4 years ago
Makefile added experimental support for ARMV8 12 years ago
amax.S ARM64: Convert all labels to local labels 8 years ago
asum.S ARM64: Convert all labels to local labels 8 years ago
axpy.S ARM64: Convert all labels to local labels 8 years ago
casum.S ARM64: Convert all labels to local labels 8 years ago
casum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
cgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
cgemm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
cgemm_kernel_8x4_cortexa53.c optimize cgemm on ARM cortex A53 & cortex A55 4 years ago
cgemm_kernel_8x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
copy.S ARM64: Convert all labels to local labels 8 years ago
copy_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
csum.S Add ARM64 implementations of ?sum 6 years ago
ctrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
ctrmm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
dasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
daxpy_thunderx.c aarch64 fix std=c18 compilation 5 years ago
daxpy_thunderx2t99.S ARM64: Improve DAXPY for ThunderX2 5 years ago
ddot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dgemm_beta.S Fix zero initialization for beta=0 case 5 years ago
dgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_4x4_cortexa53.c MOD: optimize normal DGEMM on ARMV8 cortex-A53 & cortex-A55 4 years ago
dgemm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_kernel_8x4_thunderx2t99.S ARM64: Move parameters from parameter.c to param.h 7 years ago
dgemm_kernel_sve_v1x8.S some clean-up & commentary 4 years ago
dgemm_kernel_sve_v2x8.S some clean-up & commentary 4 years ago
dgemm_ncopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_ncopy_8.S ARM64: Convert all labels to local labels 8 years ago
dgemm_ncopy_sve_v1.c some clean-up & commentary 4 years ago
dgemm_tcopy_4.S ARM64: Convert all labels to local labels 8 years ago
dgemm_tcopy_8.S Remove unused TEMP2 and reshuffle to leave x18 unused (reserved on OSX) 4 years ago
dgemm_tcopy_sve_v1.c some clean-up & commentary 4 years ago
dot.S ARM64: Fix utest dsdot errors 7 years ago
dot_thunderx.c ARM64: Rename kernel files to have consistent naming 8 years ago
dot_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
dtrmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_4x8.S ARM64: Convert all labels to local labels 8 years ago
dtrmm_kernel_8x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
dtrmm_kernel_sve_v1x8.S some clean-up & commentary 4 years ago
dznrm2_thunderx2t99.c arm64: add the missing d9 register to the clobber list 4 years ago
dznrm2_thunderx2t99_fast.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
gemv_n.S ARM64: Convert all labels to local labels 8 years ago
gemv_t.S ARM64: Convert all labels to local labels 8 years ago
iamax.S ARM64: Convert all labels to local labels 8 years ago
iamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
izamax.S ARM64: Convert all labels to local labels 8 years ago
izamax_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
nrm2.S Fix accidental duplication of jump instruction 6 years ago
rot.S ARM64: Convert all labels to local labels 8 years ago
sasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
scal.S ARM64: Convert all labels to local labels 8 years ago
scnrm2_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
sgemm_beta.S fix initialization to zero in the NEON SGEMM_BETA kernel as well 5 years ago
sgemm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_8x8_cortexa53.S fix INIT8x4 5 years ago
sgemm_kernel_16x4.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_16x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
sgemm_kernel_sve_v1x8.S add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
sgemm_kernel_sve_v2x8.S sgemm v2x8 SVE kernel 4 years ago
sgemm_ncopy_4.S Use arm neon instructions to optimize ncopy operation 5 years ago
sgemm_ncopy_8.S sgemm copy source init 5 years ago
sgemm_ncopy_sve_v1.c add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
sgemm_tcopy_8.S sgemm copy source init 5 years ago
sgemm_tcopy_16.S Use x21 for I to leave x18 unused (reserved on OSX) 4 years ago
sgemm_tcopy_sve_v1.c add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
strmm_kernel_4x4.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8.S ARM64: Convert all labels to local labels 8 years ago
strmm_kernel_8x8_cortexa53.S use general register to speedup 5 years ago
strmm_kernel_16x4.S Move temp to x21 to leave x18 unused (reserved on OSX) 4 years ago
strmm_kernel_sve_v1x8.S strmm sve v1x8 kernel 4 years ago
sum.S Add ARM64 implementations of ?sum 6 years ago
swap.S ARM64: Convert all labels to local labels 8 years ago
swap_thunderx2t99.S THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations 8 years ago
symm_lcopy_sve.c add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
symm_ucopy_sve.c add sgemm kernel and copy functions for sgemm and ssymm 4 years ago
trmm_lncopy_sve_v1.c trmm sve copy fucntions for single precision 4 years ago
trmm_ltcopy_sve_v1.c trmm sve copy fucntions for single precision 4 years ago
trmm_uncopy_sve_v1.c trmm sve copy fucntions for single precision 4 years ago
trmm_utcopy_sve_v1.c trmm sve copy fucntions for single precision 4 years ago
zamax.S Fix the functional bugs for zamax. 5 years ago
zasum.S ARM64: Convert all labels to local labels 8 years ago
zasum_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
zaxpy.S ARM64: Convert all labels to local labels 8 years ago
zdot.S ARM64: Convert all labels to local labels 8 years ago
zdot_thunderx2t99.c Fixed a few more unnecessary calls to num_cpu_avail. 7 years ago
zgemm_kernel_4x4.S move alpha to x19/x20 to leave x18 unused for OSX 4 years ago
zgemm_kernel_4x4_cortexa53.c MOD: add comments to a53 zgemm kernel 4 years ago
zgemm_kernel_4x4_thunderx2t99.S ARM64: Convert all labels to local labels 8 years ago
zgemv_n.S ARM64: Convert all labels to local labels 8 years ago
zgemv_t.S ARM64: Convert all labels to local labels 8 years ago
znrm2.S Remove automatic label postfixes from macro included only once 6 years ago
zrot.S ARM64: Convert all labels to local labels 8 years ago
zscal.S ARM64: Convert all labels to local labels 8 years ago
zsum.S Add ARM64 implementations of ?sum 6 years ago
ztrmm_kernel_4x4.S Move alphaI to x22 to leave x18 unused (reserved on OSX) 4 years ago