wernsaar
7aae4a62e7
enabled use of GEMM3M functions
11 years ago
wernsaar
1d33547222
optimized zgemm kernel for haswell
11 years ago
wernsaar
3ea4dadd30
optimizations for trsm
11 years ago
wernsaar
1b10ff129a
optimizations for trmm
11 years ago
wernsaar
125610d23b
allow to set custom value for ?GEMM_DEFAULT_UNROLL_MN, optimizations for syrk
11 years ago
wernsaar
be94db096c
disabled *3M functions for x86_64 platforms
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
wernsaar
c947ab85dc
changed level3.c
12 years ago
wernsaar
2840d56aeb
added dgemm_kernel for Piledriver
12 years ago
Zhang Xianyi
77b572fa0b
Merge branch 'loongson3a' into develop
Conflicts:
Makefile.system
12 years ago
Zhang Xianyi
32d2ca3035
Refs #214 , #221 , #246 . Fixed the getrf overflow bug on Windows.
I used a smaller threshold since the stack size is 1MB on windows.
12 years ago
wernsaar
6f008abcef
replaced defined(DOUBLE) by !defined(XDOUBLE)
12 years ago
Zhang Xianyi
5d3312142a
Refs #221 #246 . Fixed the overflowing stack bug in mutlithreading BLAS3.
When NUM_THREADS(MAX_CPU_NUNBERS) is very large ,e.g. 256.
typedef struct {
volatile BLASLONG working[MAX_CPU_NUMBER][CACHE_LINE_SIZE * DIVIDE_RATE];
} job_t;
job_t job[MAX_CPU_NUMBER];
The job array is equal 8MB.
Thus, We use malloc instead of stack allocation.
12 years ago
wernsaar
25491e42f9
New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S
12 years ago
Xianyi Zhang
6b01d58712
Disable the optimization of muli-threading gemm on the Loongson3A.
12 years ago
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
14 years ago
traz
9fe3049de6
Adding conditional compilation(#if defined(LOONGSON3A)) to avoid affecting the performance of other platforms.
14 years ago
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
14 years ago
Xianyi Zhang
1b97ec1a7c
Added DEBUG option in Makefile.rule. Fixed DEBUG typo mistakes.
14 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
14 years ago