wernsaar
|
e125a3dc33
|
Merge pull request #824 from wernsaar/develop
added optimized drot-kernel and srot-kernel for POWER8
|
9 years ago |
Werner Saar
|
35f1f21a7f
|
added drot- and srot-kernel optimimized for POWER8
|
9 years ago |
Zhang Xianyi
|
7b4b7179ba
|
Merge pull request #819 from ashwinyes/develop_20160324_fixes_optimizations
Cortex-A57: Fixes and Optimizations
|
9 years ago |
Werner Saar
|
3d9a50e841
|
added optimized sswap kernel for POWER8
|
9 years ago |
Werner Saar
|
828c849b44
|
added optimized ccopy kernel for POWER8
|
9 years ago |
Werner Saar
|
ecc0bc9813
|
added optimized scopy kernel for POWER8
|
9 years ago |
Werner Saar
|
12f209b7b0
|
added optimized zswap kernel for POWER8
|
9 years ago |
Werner Saar
|
7316a87930
|
added optimized dswap kernel for POWER8
|
9 years ago |
Werner Saar
|
0bff057a87
|
added optimized dcopy kernel for POWER8
|
9 years ago |
Werner Saar
|
1e6cf9808c
|
added optimized dscal kernel for POWER8
|
9 years ago |
Ashwin Sekhar T K
|
278511ad2d
|
Cortex-A57: Fix clang compilation errors
|
9 years ago |
Ashwin Sekhar T K
|
3b5ffb49d3
|
Cortex-A57: Improve DGEMM 8x4 Implementation
|
9 years ago |
Werner Saar
|
55eda3813b
|
added optimized zaxpy kernel for POWER8
|
9 years ago |
Werner Saar
|
0664ba4c97
|
added optimized daxpy kernel for POWER8
|
9 years ago |
Werner Saar
|
11c44dede1
|
added optimized sdot kernel for POWER8
|
9 years ago |
Werner Saar
|
9e4584d069
|
added optimized zdot kernel for POWER8
|
9 years ago |
Werner Saar
|
cd9fafc054
|
ddot for POWER8: updated licence information
|
9 years ago |
Werner Saar
|
84b92e6373
|
added optimized ddot kernel for POWER8
|
9 years ago |
wernsaar
|
c279a53ed8
|
Merge pull request #806 from wernsaar/develop
adding optimized single precision blas level3 kernels for POWER8
|
9 years ago |
Werner Saar
|
e1df5a6e23
|
fixed sgemm- and strmm-kernel
|
9 years ago |
Werner Saar
|
5c658f8746
|
add optimized cgemm- and ctrmm-kernel for POWER8
|
9 years ago |
Ashwin Sekhar T K
|
5ac02f6dc7
|
Optimize Dgemm 4x4 for Cortex A57
|
9 years ago |
Ashwin Sekhar T K
|
7aa1ad4923
|
Functional Assembly Kernels for CortexA57
Adding functional (non-optimized) kernels for Cortex-A57
with the following layouts.
SGEMM - 16x4, 8x8
CGEMM - 8x4
DGEMM - 8x4, 4x8
|
9 years ago |
Werner Saar
|
dcd15b546c
|
BUGFIX: KERNEL.POWER8
|
9 years ago |
Werner Saar
|
96284ab295
|
added sgemm- and strmm-kernel for POWER8
|
9 years ago |
Werner Saar
|
faa5e2e5e3
|
FIX: forgot the add the files cgemv_n_4.c and cgemv_t_4.c
|
9 years ago |
Werner Saar
|
fdf291be30
|
Added optimized cgemv_n and cgemv_t kernels for bulldozer, piledriver and steamroller
|
9 years ago |
Werner Saar
|
c99cc41cbd
|
Added optimized zgemv_n kernel for bulldozer, piledriver and steamroller
|
9 years ago |
Werner Saar
|
acdff55a6a
|
Bugfix for ztrmv
|
9 years ago |
Zhang Xianyi
|
7d6b68eb4a
|
Refs #786. Revert to default assembly kernel.
|
9 years ago |
Werner Saar
|
cd5241d0cf
|
modified KERNEL for power, to use the generic DSDOT-KERNEL
|
9 years ago |
Zhang Xianyi
|
8c43d7fa5f
|
Merge remote-tracking branch 'origin/power8' into develop
Refs #774
|
9 years ago |
Werner Saar
|
085f215257
|
Modified assembly label name, so that they are hidden.
Added license informations.
|
9 years ago |
Zhang Xianyi
|
8f758eeff9
|
Refs #786. avoid old assembly c/zgemv kernels.
|
9 years ago |
Werner Saar
|
0afc76fd65
|
enabled gemm_beta assembly kernels
|
9 years ago |
Werner Saar
|
91e1c5080c
|
modified configuration, to use power6 sgemm kernel for power8
|
9 years ago |
Werner Saar
|
73f04c2c72
|
enabled hemv assemly function for power8
|
9 years ago |
Werner Saar
|
3e633152c6
|
enabled symv assembly kernels on power8
|
9 years ago |
Werner Saar
|
d5130ce7e3
|
enabled gemv assembly on power8
|
9 years ago |
Werner Saar
|
4824b88fcb
|
enabled all level1 assembly kernels for power8
|
9 years ago |
Werner Saar
|
b752858d6c
|
added dgemm-, dtrmm-, zgemm- and ztrmm-kernel for power8
|
9 years ago |
Zhang Xianyi
|
efa4f5c936
|
Refs #695 #783. Replace default x86_64 cgemv_t
asm kernel by C kernel.
|
9 years ago |
Zhang Xianyi
|
74b0672223
|
Fix c/zaxpyc kernel bug on Cortex-A57.
|
9 years ago |
Zhang Xianyi
|
6e7be06e07
|
Refs JuliaLang/julia#5728. Fix gemv performance bug on Haswell Mac OSX.
On Mac OS X, it should use .align 4 (equal to .align 16 on Linux).
I didn't get the performance benefit from .align. Thus, I deleted it.
|
9 years ago |
Zhang Xianyi
|
d06b92906a
|
Add gemm3m building for CMake.
|
9 years ago |
Zhang Xianyi
|
962376664d
|
Refs #768. Swap the result of zdot x87 fp kernel.
|
9 years ago |
Zhang Xianyi
|
c44ff4d648
|
Refs #714. avoid compiling warnings.
|
9 years ago |
Werner Saar
|
63a7d7fb24
|
updated gemv_n_vfpv3.S for armv7
|
9 years ago |
Werner Saar
|
b4ede558a5
|
updated nrm2 kernel for armv7
|
9 years ago |
Werner Saar
|
de3e2d4349
|
updated trmm kernels for armv7
|
9 years ago |