Zhang Xianyi
|
3326f3152c
|
Merge pull request #213 from wernsaar/develop
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
|
12 years ago |
wernsaar
|
7641f6e253
|
Merged some improvements into dgemm_kernel_4x4_bulldozer.S.
Changed the copy functions to generic to solve prefetch conflicts
|
12 years ago |
Zhang Xianyi
|
3ad29452d1
|
Merge pull request #211 from wernsaar/develop
New version of dgemm_kernel_4x4_bulldozer.S
|
12 years ago |
wernsaar
|
6e3f6f25a5
|
New version of dgemm_kernel_4x4_bulldozer.S
The peak performance with 8 cores is now 90 GFlops
|
12 years ago |
Zhang Xianyi
|
724ae159ce
|
Fixed the Windows x86_64 ABI bug in s/daxpy kernels.
|
12 years ago |
wernsaar
|
f300ce3df5
|
new optimization of dgemm kernel for bulldozer: 10% performance increase
|
12 years ago |
wernsaar
|
66e64131ed
|
optimized again bulldozer dgemm kernel
|
12 years ago |
wernsaar
|
9405f26f4b
|
new dgemm_kernel for bulldozer
|
12 years ago |
Zhang Xianyi
|
5c8bf6ae0e
|
Merge branch 'bulldozer' into develop
|
12 years ago |
Zhang Xianyi
|
d311236dfd
|
Refs #189. Fixed the bug of s/cdot about invalid reading NAN on x86_64.
|
12 years ago |
Zhang Xianyi
|
0b08f7479e
|
Refs #154. Fixed gemv_t bug about overflow 16MB buffer on x86.
|
12 years ago |
Zhang Xianyi
|
99d1978df7
|
Fixed #180. the typos in kernel/x86_64/sgemv_t.S
|
12 years ago |
Zhang Xianyi
|
08bf6674d5
|
Refs #177. Fixed sgemv_t compiling bug on Win64.
|
12 years ago |
Zhang Xianyi
|
69200884e1
|
Refs #173. Fixed overflow internal buffer bug of gemv_n on x86
|
12 years ago |
Zhang Xianyi
|
0d1518add9
|
Refs #173. Fixed overflow internal buffer bug of sgemv_t on x86
|
12 years ago |
Zhang Xianyi
|
91ed4e4450
|
Refs #171. Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
|
12 years ago |
Zhang Xianyi
|
fd3046b32a
|
Refs #173. Fixed overflow internal buffer bug of gemv_t on x86.
|
12 years ago |
Julian Taylor
|
9fb341a9f8
|
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
|
12 years ago |
Zhang Xianyi
|
f19af5ecc0
|
Refs #54. Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
|
13 years ago |
Zhang Xianyi
|
bfaaa975e6
|
Added BULLDOZER target. So far it uses barcelona kernels.
|
13 years ago |
Zhang Xianyi
|
b7c0fa6bd2
|
Init AMD Bulldozer codebase.
|
13 years ago |
Zhang Xianyi
|
cea1a885b5
|
Refs #154. Fixed the build bug of dgemv_t on MinW64.
|
13 years ago |
Zhang Xianyi
|
5f0117385e
|
Refs #154. Fixed a SEGFAULT bug of dgemv_t when m is very large.
It overflowed the internal buffer. Thus, we split vector x into blocks when m is very large.
Thank @wangqian for this patch.
|
13 years ago |
Zhang Xianyi
|
2573311308
|
refs #140. Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
|
13 years ago |
Jameson Nash
|
d0e731e8b8
|
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
|
13 years ago |
Xianyi Zhang
|
25f1a573fd
|
Fixed the build bug when DYNAMIC_ARCH=0.
|
13 years ago |
wangqian
|
857a0fa0df
|
Fixed the issue of mixing AVX and SSE codes in S/D/C/ZGEMM.
|
13 years ago |
wangqian
|
d34fce56e4
|
Refs #83 Fixed S/DGEMM calling conventions bug on windows.
|
13 years ago |
wangqian
|
6cfcb54a28
|
Fixed align problem in S and C precision GEMM kernels.
|
13 years ago |
wangqian
|
3ef96aa567
|
Fixed bug in MOVQ redefine and ALIGN SIZE problem.
|
13 years ago |
wangqian
|
f76f952547
|
Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions.
|
13 years ago |
Zhang Xianyi
|
eefd30881c
|
Refs #113. Fixed the build bug on AMD Bobcat 64-bit OS.
|
13 years ago |
Zhang Xianyi
|
d3b67d0bd8
|
Refs #113. Fixed the typo BOBCATE -> BOBCAT
|
13 years ago |
Zhang Xianyi
|
d6cab3f37e
|
Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
|
13 years ago |
Xianyi Zhang
|
a53c6e2440
|
Merge branch 'develop' into sandybridge
|
13 years ago |
Xianyi Zhang
|
5d657c6e67
|
Fixed #96 a SEGFAULT bug in samax on x86.
|
13 years ago |
Xianyi Zhang
|
03b0eb19f7
|
Refs #86. Test alpha=Nan in x86/x86_64 dscale.
|
13 years ago |
Xianyi Zhang
|
19a48b82cf
|
Init Sandybridge codes based on Nehalem.
|
13 years ago |
Xianyi Zhang
|
3871b6a86d
|
Merge branch 'loongson3b' into release-0.1.0
|
13 years ago |
Xianyi Zhang
|
83ecfbb9b3
|
Merge branch 'loongson3a' into release-0.1.0
|
13 years ago |
unknown
|
dff146e306
|
refs #80. Used GEMV SSE2 kernels on x86.
|
13 years ago |
Wang Qian
|
8e53b57bb2
|
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
|
13 years ago |
Wang Qian
|
66904fc4e8
|
BLAS3 used standard MIPS instructions without extensions on Loongson 3B.
|
14 years ago |
Xianyi Zhang
|
0884f6b78d
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3b
|
14 years ago |
traz
|
2d78fb05c8
|
Add conjugate condition to gemv.
|
14 years ago |
Xianyi Zhang
|
b95ad4cfaf
|
Support detecting ICT Loongson-3B CPU.
|
14 years ago |
Xianyi Zhang
|
3bbe3ddb31
|
Merge branch 'develop' of github.com:xianyi/OpenBLAS into loongson3b
|
14 years ago |
traz
|
a32e56500a
|
Fix the compute error of gemv when incx and incy are negative numbers.
|
14 years ago |
traz
|
c1e618ea2d
|
Add complete gemv function on Loongson3a platform.
|
14 years ago |
traits
|
19f5b5c132
|
Fixed #66 the bug in zgemv kernel with transpose matrix on 64-bit MingW (Windows).
|
14 years ago |