Zhang Xianyi
2c9a203bd1
Merge pull request #198 from wernsaar/develop
new optimization of dgemm kernel for bulldozer: 10% performance increase
12 years ago
wernsaar
f300ce3df5
new optimization of dgemm kernel for bulldozer: 10% performance increase
12 years ago
Zhang Xianyi
e2c7c75715
Merge pull request #197 from wernsaar/develop
optimized again bulldozer dgemm kernel
12 years ago
wernsaar
66e64131ed
optimized again bulldozer dgemm kernel
12 years ago
Zhang Xianyi
5900b1462e
Merge pull request #195 from wernsaar/develop
Develop dgemm for bullozer
12 years ago
wernsaar
9405f26f4b
new dgemm_kernel for bulldozer
12 years ago
Zhang Xianyi
54e7b37630
Merge branch 'develop'
12 years ago
Zhang Xianyi
529f1b5006
Refs#194. Export the missing LAPACK s/dlamc3 functions.
12 years ago
Zhang Xianyi
e5ac3007e0
Merge branch 'develop'
12 years ago
Zhang Xianyi
0d0405b434
Updated the doc for 0.2.6 version.
12 years ago
Zhang Xianyi
f1ce74ffdd
Improved the print when OS don't support AVX.
12 years ago
Zhang Xianyi
d744c9590a
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
12 years ago
Zhang Xianyi
3cc6ae793e
Refs #174 . Return sb pointer when OpenMP or Windows.
12 years ago
Zhang Xianyi
4c2123c334
Fixed the overflowing bug in single thread cholesky factorization.
12 years ago
Zhang Xianyi
5155e3f509
Refs #174 . Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
12 years ago
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
12 years ago
Zhang Xianyi
6ae2f868fd
Set the affinity. Only use 1 core of each module on bulldozer.
12 years ago
Zhang Xianyi
a1ead62f28
Disable the warning of sgemm bulldozer kernel.
12 years ago
Zhang Xianyi
0133580148
Used sgemm bulldozer kernel on 64 bit.
12 years ago
Zhang Xianyi
274246651d
Merge branch 'bulldozer' of git://github.com/wernsaar/OpenBLAS into bulldozer
12 years ago
Zhang Xianyi
299b5a44dc
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
12 years ago
Zaheer Chothia
a9500d0079
Missing line continuation -- follow-up to last commit ( 64ad8b9809
).
12 years ago
Zaheer Chothia
64ad8b9809
Refs #193 . Don't use C99 complex numbers when building C++ code.
12 years ago
Zaheer Chothia
875d520ccf
Refs #193 . cblas: move #include out of extern "C" block.
Standard headers may contain C++ templates which are not permitted inside an
extern "C" block. This might be the case when we include <complex.h>.
12 years ago
Zhang Xianyi
d311236dfd
Refs #189 . Fixed the bug of s/cdot about invalid reading NAN on x86_64.
12 years ago
Zhang Xianyi
36e0982966
Refs #187 . Use perl to generate cblas_noconst.h instead of sed.
Thank Dan Povey's patch. https://github.com/xianyi/OpenBLAS/issues/187
12 years ago
Zhang Xianyi
8cdb795438
Refs #187 . Use binary code for xgetbv, which is compatible with old compiler.
12 years ago
Zaheer Chothia
4db6660de4
Refs #185 . Add missing 'const' to declarations in <cblas.h>. Thanks to Dan Povey!
The 'const' modifications were done automatically using this scripts:
https://kaldi.svn.sourceforge.net/svnroot/kaldi/sandbox/dan/tools/for_openblas
12 years ago
Zhang Xianyi
0b08f7479e
Refs #154 . Fixed gemv_t bug about overflow 16MB buffer on x86.
12 years ago
Zaheer Chothia
200e4acf15
cblas: typedef enums for improved compatibility with Intel MKL.
Netlib style:
enum CBLAS_XYZ {X=1, Y=2, Z=3};
Intel MKL style:
typedef enum {X=1, Y=2, Z=3} CBLAS_XYZ;
With this hybrid style, code written in the latter form won't need any
modifications to be built with OpenBLAS. This change should not affect existing
code, although a warning may be emitted for C code which does the following
(does not occur with C++):
typedef enum CBLAS_XYZ CBLAS_XYZ;
warning: redefinition of typedef 'CBLAS_XYZ' [-pedantic]
13 years ago
Zhang Xianyi
99d1978df7
Fixed #180 . the typos in kernel/x86_64/sgemv_t.S
12 years ago
Zhang Xianyi
08bf6674d5
Refs #177 . Fixed sgemv_t compiling bug on Win64.
12 years ago
Zhang Xianyi
8b122ff9dc
Refs #176 . Fixed make.inc overriding RANLIB bug when cross-compiling LAPACK.
12 years ago
Zhang Xianyi
69200884e1
Refs #173 . Fixed overflow internal buffer bug of gemv_n on x86
12 years ago
Zhang Xianyi
0d1518add9
Refs #173 . Fixed overflow internal buffer bug of sgemv_t on x86
12 years ago
Zhang Xianyi
91ed4e4450
Refs #171 . Prevent loading the dirty number from the buffer in sgemv_t x86 kernel.
12 years ago
Zhang Xianyi
fd3046b32a
Refs #173 . Fixed overflow internal buffer bug of gemv_t on x86.
12 years ago
Zhang Xianyi
a4ee6f3915
Fixed #172 . Support Intel Xeon E7540.
12 years ago
Zhang Xianyi
a0363e9b48
Merge branch 'master' into develop
12 years ago
Zhang Xianyi
b471d52e61
Merge pull request #170 from juliantaylor/athlon-defaults
set parameters for CORE_ATHLON
12 years ago
Julian Taylor
9fb341a9f8
set parameters for CORE_ATHLON
else dgemm_p is set to zero leading to a segfault in alloc_mmap due to
allocsize being zero
12 years ago
Zhang Xianyi
fba6b590f2
Merge branch 'master' into develop
12 years ago
Zhang Xianyi
97f68f7f3a
Merge pull request #169 from juliantaylor/sanity-check-cpu
add a sanity check on the detected cpu type
12 years ago
Julian Taylor
1138817dd2
add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
12 years ago
Zhang Xianyi
13f8fc0b1a
Write FMA4 flag to the configure file.
13 years ago
Zhang Xianyi
bdf8d9411e
Refs #163 . Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.
Example:
#include <stdio.h>
extern char * openblas_get_config();
void main()
{
printf("%s\n",openblas_get_config());
return;
}
13 years ago
Zhang Xianyi
bb10cb8442
Refs #165 . fall back of DTB_DEFAULT_ENTRIES for some virtual machines.
13 years ago
wernsaar
d48cff8cf1
Added optimized sgemm_kernel
13 years ago
Zhang Xianyi
f19af5ecc0
Refs #54 . Added AMD Bulldozer x86_64 dgemm kernel developed by Werner Saar <wernsaar at googlemail.com>
Based on the dgemm kernel for AMD Barcelona, he used AVX and FMA4 instructions.
Thank Werner Saar!
13 years ago
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
13 years ago