Zhang Xianyi
53b6023a6c
Fix cmake bug on MSVC 32-bit.
10 years ago
Zhang Xianyi
7df0820160
Use C kernels for s/dgemv on x86.
10 years ago
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
10 years ago
wernsaar
0884b73c69
Lapack-test Windows 32bit now error free
11 years ago
wernsaar
9bd9472ae9
Lapack-test: cleanup of x86 32bit KERNEL file
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
793509a3b5
replaced files for sdot, sgemv_n and sgemv_t for bug #348
11 years ago
wernsaar
9423f980f6
modified trsm kernel
12 years ago
wernsaar
c6156b2ef2
added trsm kernels from origin
12 years ago
wernsaar
6216ab8a7e
removed obsolete gemm_kernels from haswell branch
12 years ago
Zhang Xianyi
f51a849d91
Merge pull request #278 from wernsaar/haswell
Merge wernsaar's Haswell gemm kernels.
12 years ago
wernsaar
4070d9a123
added dgemm_kernel_16x2_haswell.S
12 years ago
wernsaar
0b90c0ec64
added sgemm_kernel_16x4_haswell.S
12 years ago
Zhang Xianyi
2638370844
Init code base for Intel Haswell.
12 years ago
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
12 years ago
Zhang Xianyi
fa916a0fac
Fixed #238 bug in lsame on x86.
12 years ago
wangqian
6a72840945
Fixed overflow internal buffer bug of (s/d/c/z)gemv on x86.
12 years ago
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
12 years ago
Zhang Xianyi
0b08f7479e
Refs #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.
12 years ago
Zhang Xianyi
69200884e1
Refs #173 . Fixed overflow internal buffer bug of gemv_n on x86
12 years ago
Zhang Xianyi
0d1518add9
Refs #173 . Fixed overflow internal buffer bug of sgemv_t on x86
12 years ago
Zhang Xianyi
91ed4e4450
Refs #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
12 years ago
Zhang Xianyi
fd3046b32a
Refs #173 . Fixed overflow internal buffer bug of gemv_t on x86.
12 years ago
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
13 years ago
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
13 years ago
Zhang Xianyi
2573311308
refs #140 . Fixed zdot incompatibility ABI issue with GCC 4.7 on Win 32.
GCC 4.7 uses MSVC ABI on Win 32. This means the caller pops the hidden pointer for returning
aggregate structures larger than 8 bytes.
13 years ago
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
13 years ago
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
13 years ago
Xianyi Zhang
a53c6e2440
Merge branch 'develop' into sandybridge
13 years ago
Xianyi Zhang
5d657c6e67
Fixed #96 a SEGFAULT bug in samax on x86.
13 years ago
Xianyi Zhang
03b0eb19f7
Refs #86 . Test alpha=Nan in x86/x86_64 dscale.
13 years ago
Xianyi Zhang
19a48b82cf
Init Sandybridge codes based on Nehalem.
13 years ago
unknown
dff146e306
refs #80 . Used GEMV SSE2 kernels on x86.
13 years ago
Zhang Xiianyi
7b410b7f0e
Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.
In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32 ). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).
14 years ago
traits
b1fe26c45a
refs #55 . Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes.
14 years ago
Xianyi
31040e4d80
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.
14 years ago
Xianyi
272f62a2b6
Changed movlps macro name in capital in x86/zdot_sse2.S file.
14 years ago
Xianyi
36016fe349
On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.
This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8 . Fixed #9 .
14 years ago
Xianyi
12214e1d0f
Fixed #7 . Modified axpy kernel codes to avoid unloop with incx==0 or incy==0 in x86 32bits arch.
14 years ago
Xianyi
bfaa80c316
fixed #4 csrot & drot returned the wrong result when incx==incy==0 on i686 arch.
14 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
14 years ago