wernsaar
5400a9f4e4
redefined functions for TIMING and YIELDING for ARMV7 processor
12 years ago
Sébastien Villemot
eae4cfa3f6
Avoid failure on qemu guests declaring an Athlon CPU without 3dnow!
The present patch verifies that, on machines declaring an Athlon CPU model and
family, the 3dnow and 3dnowext feature flags are indeed present. If they are
not, it fallbacks on the most generic x86 kernel. This prevents crashes due to
illegal instruction on qemu guests with a weird configuration.
Closes #272
12 years ago
Zhang Xianyi
673e453b3f
Enable bulldozer kernels.
12 years ago
Zhang Xianyi
143cca4dd5
Merge branch 'develop' into bulldozer
12 years ago
Zhang Xianyi
534c5ec919
Fixed #261 . Use strncmp instead of a comparing trick.
12 years ago
Zhang Xianyi
5b504d6c23
Refs #263 . Rollback bulldozer and piledriver kernels to barcelona kernels.
12 years ago
Zhang Xianyi
72b1edaf1b
Merge branch 'develop' into bulldozer
Conflicts:
kernel/x86_64/KERNEL.BULLDOZER
12 years ago
Zhang Xianyi
4471c77905
Fixed #261 . Use strncmp instead of a comparing trick.
12 years ago
Zhang Xianyi
77b572fa0b
Merge branch 'loongson3a' into develop
Conflicts:
Makefile.system
12 years ago
Zhang Xianyi
2a7503e563
Refs #225 . Fixed a bug in GEMM OpenMP threading.
12 years ago
grisuthedragon
c19a488af2
create openblas_get_parallel to retrieve information which
parallelization model is used by OpenBLAS.
12 years ago
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
12 years ago
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
12 years ago
Zhang Xianyi
f54f5bac9e
Refs #248 . Fixed the LSB compatiable issue for BLAS only.
For example, make CC=lsbcc NO_LAPACK=1.
12 years ago
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
12 years ago
Zhang Xianyi
886cbaf4e4
Support AMD Piledriver by bulldozer kernels.
12 years ago
Zhang Xianyi
32dbeb636d
Refs #221 . Set stack limit to 16MB to prevent a SEGFAULT bug on Mac OS X with DYNAMIC_ARCH=1 & NUM_THREADS=256.
12 years ago
Dan Luu
88ef307cef
Refs #241 . Add Haswell support (using sandybridge optimizations)
12 years ago
Zhang Xianyi
cd1d473ba0
Merge pull request #230 from wernsaar/develop
Refs #230 . New dgemm and sgemm Kernel for BULLDOZER
12 years ago
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
12 years ago
Zhang Xianyi
65ffead0cf
Refs #124 . Check XSAVE flag on x86 CPU.
12 years ago
Xianyi Zhang
6b01d58712
Disable the optimization of muli-threading gemm on the Loongson3A.
12 years ago
Zhang Xianyi
f1ce74ffdd
Improved the print when OS don't support AVX.
12 years ago
Zhang Xianyi
d744c9590a
In OpenMP threading, preallocate the thread buffer instead of allocating the buffer every time. This patch improved the performance slightly.
12 years ago
Zhang Xianyi
3cc6ae793e
Refs #174 . Return sb pointer when OpenMP or Windows.
12 years ago
Zhang Xianyi
5155e3f509
Refs #174 . Fixed the overflowing buffer bug of multithreading hbmv and sbmv.
Instead of using thread 0 buffer, each thread uses its own sb buffer.
Thus, it can avoid overflowing thread 0 buffer.
12 years ago
Zhang Xianyi
5c8bf6ae0e
Merge branch 'bulldozer' into develop
12 years ago
Zhang Xianyi
6ae2f868fd
Set the affinity. Only use 1 core of each module on bulldozer.
12 years ago
Zhang Xianyi
299b5a44dc
Merge branch 'develop' of github.com:xianyi/OpenBLAS into bulldozer
12 years ago
Zhang Xianyi
8cdb795438
Refs #187 . Use binary code for xgetbv, which is compatible with old compiler.
12 years ago
Zhang Xianyi
a4ee6f3915
Fixed #172 . Support Intel Xeon E7540.
12 years ago
Zhang Xianyi
fba6b590f2
Merge branch 'master' into develop
12 years ago
Julian Taylor
1138817dd2
add a sanity check on the detected cpu type
if we have 64 bit pointers we can't have a 32 bit cpu, so fall back to
the 64bit cpu fallback (prescott)
E.g. the cpu detection fails in amd qemu64 emulation (family 6 model 2)
causing it to use the uninitialized gotoblas_ATHLON
12 years ago
Zhang Xianyi
bdf8d9411e
Refs #163 . Obtain the build configure on runtime.
openblas_get_config function returns the configure string.
So far, it supports USE64BITINT, NO_CBLAS, NO_LAPACK, NO_LAPACKE,
DYNAMIC_ARCH, NO_AFFINITY.
Example:
#include <stdio.h>
extern char * openblas_get_config();
void main()
{
printf("%s\n",openblas_get_config());
return;
}
13 years ago
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
13 years ago
Zhang Xianyi
b7c0fa6bd2
Init AMD Bulldozer codebase.
13 years ago
Zhang Xianyi
6751f7b9a7
Fixed #157 . Only detect the number of physical CPU cores on Mac OSX.
13 years ago
Zhang Xianyi
538c764d2b
Refs #153 . Restore the original CPU affinity when calling openblas_set_num_threads(1).
Please read the issue on github.com for the detail.
13 years ago
Zhang Xianyi
6c5899dff5
Don't use xgetbv instruction when NO_AVX=1
13 years ago
Zhang Xianyi
735ca38b8f
Refs #139 . Check OS supporting AVX on runtime.
13 years ago
Zhang Xianyi
f76a384841
Refs #139 . Added NO_AVX flag to use old Nehalem kernels on Sandy Bridge.
For example, make NO_AVX=1 or make DYNAMIC_ARCH=1 NO_AVX=1
13 years ago
Jameson Nash
d0e731e8b8
provide support for passing CFLAGS, FFLAGS, PFLAGS, FPFLAGS to make on the command line
13 years ago
Zhang Xianyi
fe4ab95cd5
Refs #136 . Fixed a bug about controlling the number of threads on Windows.
13 years ago
Xianyi Zhang
801383effe
Fixed a hang bug when shutdown blas threads server on Windows. Added the feature about dynamic changing the number of threads on Windows.
13 years ago
Zhang Xianyi
54cd65e47f
Use sandy bridge kernel when DYNAMIC_ARCH=1.
13 years ago
Zhang Xianyi
a55821a2ec
Refs #132 . Kill the threads when unload the library.
13 years ago
Zhang Xianyi
d007cca61d
Refs #134 . Fixed the building bug on IBM Power.
13 years ago
Xianyi Zhang
25f1a573fd
Fixed the build bug when DYNAMIC_ARCH=0.
13 years ago
Sylvestre Ledru
3692b4d631
Improve the detection of sparc
13 years ago
Xianyi Zhang
a507b56ab1
Refs #119 #118 . Fixed disabling hyper threading bug.
13 years ago