wernsaar
|
a135f5d9ed
|
added gemm_tcopy_2_bulldozer.S
|
12 years ago |
wernsaar
|
d0b6299b13
|
added dgemm_tcopy_8_bulldozer.S
|
12 years ago |
wernsaar
|
9e58dd509e
|
added gemm_ncopy_2_bulldozer.S
|
12 years ago |
wernsaar
|
7c8227101b
|
cleanup of dgemv_n_bulldozer.S and optimization of inner loop
|
12 years ago |
wernsaar
|
f67fa62851
|
added dgemv_n_bulldozer.S
|
12 years ago |
Zhang Xianyi
|
cd1d473ba0
|
Merge pull request #230 from wernsaar/develop
Refs #230. New dgemm and sgemm Kernel for BULLDOZER
|
12 years ago |
wernsaar
|
0ded1fcc1c
|
performance optimizations in sgemm_kernel_16x2_bulldozer.S
|
12 years ago |
wernsaar
|
a789b588cd
|
added cgemm_kernel_4x2_bulldozer.S
|
12 years ago |
wernsaar
|
8eaa04acbb
|
added zgemm_kernel_2x2_bulldozer.S
|
12 years ago |
wernsaar
|
d854b30ae6
|
Added UNROLL values for 3M to getarch_2nd.c, Makefile.system and Makefile.L3
|
12 years ago |
wernsaar
|
d65bbec99b
|
added new sgemm kernel for BULLDOZER
|
12 years ago |
wernsaar
|
e4c39c7c26
|
changed stack touching
|
12 years ago |
wernsaar
|
25491e42f9
|
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
|
12 years ago |
Zhang Xianyi
|
9f59f384d8
|
Refs #223. Fixed s/dgemv bug on windows.
|
12 years ago |
wangqian
|
23965f164c
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86_64.
|
12 years ago |
wangqian
|
6a72840945
|
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
|
12 years ago |
wernsaar
|
69aa6c8fb1
|
bad performance with some data
|
12 years ago |
wernsaar
|
60b263f3d2
|
removed trsm_kernel_RT_4x4_bulldozer.S. wrong results
|
12 years ago |
wernsaar
|
7ac306e0da
|
added trsm_kernel_RT_4x4_bulldozer.S
|
12 years ago |
wernsaar
|
4cb454cdf2
|
added trsm_kernel_LT_4x4_bulldozer.S
|
12 years ago |
wernsaar
|
19ad2fb128
|
prefetch improved. Defined 2 different kernels for inner loop
|
12 years ago |
wernsaar
|
6821677489
|
minor improvements and code cleanup
|
12 years ago |
Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
12 years ago |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
12 years ago |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
12 years ago |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
12 years ago |
Zhang Xianyi
|
724ae159ce
|
Fixed the Windows x86_64 ABI bug in s/daxpy kernels.
|
12 years ago |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
12 years ago |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
12 years ago |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
12 years ago |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
12 years ago |
Zhang Xianyi
|
a1ead62f28
|
Disable the warning of sgemm bulldozer kernel.
|
12 years ago |
Zhang Xianyi
|
0133580148
|
Used sgemm bulldozer kernel on 64 bit.
|
12 years ago |
Zhang Xianyi
|
274246651d
|
Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer
|
12 years ago |
Zhang Xianyi
|
299b5a44dc
|
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
|
12 years ago |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
12 years ago |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
12 years ago |
Zhang Xianyi
|
99d1978df7
|
Fixed #180. the typos in kernel/x86_64/sgemv_t.S
|
12 years ago |
Zhang Xianyi
|
08bf6674d5
|
Refs #177. Fixed sgemv_t compiling bug on Win64.
|
12 years ago |
Zhang Xianyi
|
69200884e1
|
Refs #173. Fixed overflow internal buffer bug of gemv_n on x86
|
12 years ago |
Zhang Xianyi
|
0d1518add9
|
Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86
|
12 years ago |
Zhang Xianyi
|
91ed4e4450
|
Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
|
12 years ago |
Zhang Xianyi
|
fd3046b32a
|
Refs #173. Fixed overflow internal buffer bug of gemv_t on x86.
|
12 years ago |
Julian Taylor
|
9fb341a9f8
|
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
|
12 years ago |
wernsaar
|
d48cff8cf1
|
Added optimized sgemm_kernel
|
13 years ago |
Zhang Xianyi
|
f19af5ecc0
|
Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
|
13 years ago |
Zhang Xianyi
|
bfaaa975e6
|
Added BULLDOZER target. So far it uses barcelona kernels.
|
13 years ago |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
13 years ago |
Zhang Xianyi
|
cea1a885b5
|
Refs #154. Fixed the build bug of dgemv_t on MinW64.
|
13 years ago |
Zhang Xianyi
|
5f0117385e
|
Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.
Thank @wangqian for this patch.
|
13 years ago |