Werner Saar
|
faa5e2e5e3
|
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
|
9 years ago |
Werner Saar
|
fdf291be30
|
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
|
9 years ago |
Werner Saar
|
c99cc41cbd
|
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
|
9 years ago |
Werner Saar
|
acdff55a6a
|
Bugfix for ztrmv
|
9 years ago |
Zhang Xianyi
|
7d6b68eb4a
|
Refs #786. Revert to default assembly kernel.
|
9 years ago |
Werner Saar
|
cd5241d0cf
|
modified KERNEL for power, to use the generic DSDOT-KERNEL
|
9 years ago |
Zhang Xianyi
|
8c43d7fa5f
|
Merge remote-tracking branch 'origin/power8' into develop
Refs #774
|
9 years ago |
Werner Saar
|
085f215257
|
Modified assembly label name, so that they are hidden.
Added license informations.
|
9 years ago |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
9 years ago |
Werner Saar
|
0afc76fd65
|
enabled gemm_beta assembly kernels
|
9 years ago |
Werner Saar
|
91e1c5080c
|
modified configuration, to use power6 sgemm kernel for power8
|
9 years ago |
Werner Saar
|
73f04c2c72
|
enabled hemv assemly function for power8
|
9 years ago |
Werner Saar
|
3e633152c6
|
enabled symv assembly kernels on power8
|
9 years ago |
Werner Saar
|
d5130ce7e3
|
enabled gemv assembly on power8
|
9 years ago |
Werner Saar
|
4824b88fcb
|
enabled all level1 assembly kernels for power8
|
9 years ago |
Werner Saar
|
b752858d6c
|
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
|
9 years ago |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
9 years ago |
Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
9 years ago |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
9 years ago |
Zhang Xianyi
|
d06b92906a
|
Add gemm3m building for CMake.
|
9 years ago |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
9 years ago |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
9 years ago |
Werner Saar
|
63a7d7fb24
|
updated gemv_n_vfpv3.S for armv7
|
9 years ago |
Werner Saar
|
b4ede558a5
|
updated nrm2 kernel for armv7
|
9 years ago |
Werner Saar
|
de3e2d4349
|
updated trmm kernels for armv7
|
9 years ago |
Werner Saar
|
a0e51e96f1
|
updated gemm kernels for armv7
|
9 years ago |
Werner Saar
|
c2891330bc
|
updated KERNEL.ARMV6
|
9 years ago |
Werner Saar
|
ceaa931e48
|
updated gemv kernel for armv6
|
9 years ago |
Werner Saar
|
eaa63165df
|
updated cgemv and zgemv kernels for armv6
|
9 years ago |
Werner Saar
|
c65357c566
|
updated trmm_kernels for armv6
|
9 years ago |
Werner Saar
|
e63e9f9f26
|
updated gemm_kernels for armv6
|
9 years ago |
Werner Saar
|
aafd3ab60e
|
updated cdot and zdot on arm
|
9 years ago |
Werner Saar
|
d2f84c9c8a
|
Ref #740: updated nrm2_vfp.S
|
9 years ago |
Werner Saar
|
ca32253f32
|
Ref #740: updated asum_vfp.S and iamax_vfp.S
|
9 years ago |
Werner Saar
|
9066d1f982
|
Ref #750 and Ref #740 : bugfix for sdot, dsdot and ddot on arm
|
9 years ago |
Werner Saar
|
692d9c881c
|
Ref #740: simple solution to clear floating point register on arm
|
9 years ago |
Zhang Xianyi
|
3602a2cd1f
|
#736 Revert #733 patch to fix bus error on ARM.
|
9 years ago |
Zhang Xianyi
|
e3e20e2242
|
Merge pull request #733 from yuyichao/arm-asm
Do not use vsub to clear the register values
|
9 years ago |
Yichao Yu
|
594b9f4c73
|
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
|
9 years ago |
Werner Saar
|
c8f2c5d636
|
added optimized trsm_kernels
|
9 years ago |
Ashwin Sekhar T K
|
318f0949c3
|
lapack-test fixes in nrm2 kernels for Cortex A57
|
10 years ago |
Ashwin Sekhar T K
|
98965da2e8
|
lapack-test fixes for Cortex A57
|
10 years ago |
Ashwin Sekhar T K
|
c99c43d51e
|
Optimized trmm kernels for CORTEXA57
|
10 years ago |
Ashwin Sekhar T K
|
1397b47197
|
Optimized zgemm kernel for CORTEXA57
|
10 years ago |
Ashwin Sekhar T K
|
45f78963ac
|
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
|
10 years ago |
Ashwin Sekhar T K
|
402443bf9c
|
Optimized dgemm kernel for CORTEXA57
|
10 years ago |
Ashwin Sekhar T K
|
19fdbee291
|
Improve the sgemm kernel for CORTEXA57
|
10 years ago |
Ashwin Sekhar T K
|
3b0cdfab1e
|
Optimized gemv kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
10 years ago |
Ashwin Sekhar T K
|
46efa6a1da
|
Optimized swap kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
10 years ago |
Ashwin Sekhar T K
|
ea1465cdf8
|
Optimized scal kernels for CORTEXA57
Co-Authored-By: Ralph Campbell <ralph.campbell@broadcom.com>
|
10 years ago |