Browse Source

On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps in zdot_sse2.S line 191.

This would casue zdotu & zdotc failures. Instead, use movlpd to walk around it. Fixed #8. Fixed #9.
tags/v0.1alpha1
Xianyi 14 years ago
parent
commit
36016fe349
2 changed files with 5 additions and 2 deletions
  1. +3
    -0
      Changelog.txt
  2. +2
    -2
      kernel/x86/zdot_sse2.S

+ 3
- 0
Changelog.txt View File

@@ -13,6 +13,9 @@ common:
* Imported GotoBLAS2 1.13 BSD version

x86/x86 64:
* On x86 32bits, gcc 4.4.3 generated wrong codes (movsd) from movlps
in zdot_sse2.S line 191. This would casue zdotu & zdotc failures.
Instead,Walk around it. (Refs issue #8 #9 on github)
* Modified ?axpy functions to return same netlib BLAS results
when incx==0 or incy==0 (Refs issue #7 on github)
* Modified ?swap functions to return same netlib BLAS results


+ 2
- 2
kernel/x86/zdot_sse2.S View File

@@ -1188,8 +1188,8 @@
testl $1, N
jle .L48

movlps -16 * SIZE(X), %xmm4
movlps -16 * SIZE(Y), %xmm6
movlpd -16 * SIZE(X), %xmm4
movlpd -16 * SIZE(Y), %xmm6

pshufd $0x4e, %xmm6, %xmm3
mulpd %xmm4, %xmm6


Loading…
Cancel
Save