wernsaar
e73a0113ec
added optimized gemv kernels
11 years ago
wernsaar
44f2bf9bae
added optimized dgemv_t kernel for haswell
11 years ago
Martin Koehler
a057e5434d
add CBLAS interface for s/d/c/zimatcopy
11 years ago
wernsaar
cd34e9701b
removed obsolete files
11 years ago
Martin Köhler
7794766d3c
Add cblas_(s/d/c/z)omatcopy in order to have cblas interface for them.
11 years ago
wernsaar
658939faaa
optimized dgemv_n kernel for small sizes
11 years ago
wernsaar
f511807fc0
modified multithreading threshold
11 years ago
wernsaar
c4d9d4e5f8
added haswell optimized kernel
11 years ago
wernsaar
7c0a94ff47
bugfix in sgemv_n_microk_haswell-4.c
11 years ago
wernsaar
cbbc80aad3
added optimized sgemv_t kernel for haswell
11 years ago
wernsaar
2be5c7a640
bugfix for windows
11 years ago
wernsaar
80f7786875
enabled optimized sgemv kernels for piledriver
11 years ago
wernsaar
553e275407
optimized sgemv_n kernel for sandybridge
11 years ago
wernsaar
7b3932b3f3
optimized sgemv_n kernel for nehalem
11 years ago
wernsaar
75207b1148
optimized sgemv_n for very small size of m
11 years ago
wernsaar
274828fa50
optimizations for very small sizes
11 years ago
wernsaar
5ae1731fe6
better optimzations for sgemv_t kernel
11 years ago
wernsaar
c8eaf3ae2d
optimized sgemv_t_4 kernel for very small sizes
11 years ago
wernsaar
3a7ab47ee9
optimized sgemv_t
11 years ago
wernsaar
cf5544b417
optimization for small size
11 years ago
wernsaar
d143f84dd2
added optimized sgemv_n kernel for haswell
11 years ago
wernsaar
7794237475
undef WHEREAMI
11 years ago
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
11 years ago
wernsaar
2021d0f9d6
experimentally removed expensive function calls
11 years ago
wernsaar
6df7a88930
optimized sgemv_t for sandybridge
11 years ago
wernsaar
53de943690
bugfix for sgemv_n_4.c
11 years ago
wernsaar
7f910010a0
optimized sgemv_n kernel for small sizes
11 years ago
wernsaar
3a5d8dbff9
optimized sgemv_n_4.c
11 years ago
wernsaar
2a60c6d4b0
optimized sgemv_n for small sizes
11 years ago
wernsaar
0fc560ba23
bugfix for buffer overflow
11 years ago
wernsaar
d1800397f5
optimized interface/gemv.c for multithreading
11 years ago
wernsaar
f4ff889491
updated interface/gemv.c for multithreading
11 years ago
wernsaar
210bec9111
added plot-header to compare multithreading
11 years ago
wernsaar
f3b50dcf5b
removed obsolete instructions from sgemv_t_4.c
11 years ago
wernsaar
93eaba959d
optimized sgemv_t for bulldozer
11 years ago
wernsaar
9570e56965
optimized sgemv_t_4.c for small sizes
11 years ago
wernsaar
d7f91f8b4f
extended gemv.c benchmark
11 years ago
wernsaar
53f1277b6b
modified benchmark/gemv.c
11 years ago
wernsaar
bc99faef1b
optimized sgemv_t_4.c for uneven sizes
11 years ago
wernsaar
848c0f16f7
optimized sgemv_t_4.c for small size
11 years ago
wernsaar
e2fc8c8c2c
changed 1 test value (bug in lapack-testing?)
11 years ago
wernsaar
53e6dbf6ca
optimized sgemv_t kernel for small sizes
11 years ago
Zhang Xianyi
868f8a8756
Merge pull request #443 from idunham/fix
Workaround PIC limitations in cpuid.
11 years ago
Isaac Dunham
db7e6366cd
Workaround PIC limitations in cpuid.
cpuid uses register ebx, but ebx is reserved in PIC.
So save ebx, swap ebx & edi, and return edi.
Copied from Igor Pavlov's equivalent fix for 7zip (in CpuArch.c),
which is public domain and thus OK license-wise.
11 years ago
Zhang Xianyi
2702323f7d
Merge pull request #440 from wernsaar/develop
optimizations for leve1 and level2 blas functions
11 years ago
wernsaar
20cd850125
modification for clang compiler
11 years ago
wernsaar
5fa6158731
renoved flag no-integrated-as, because not working on macosx
11 years ago
wernsaar
84badf8086
EXPERIMENTAL: added the flag -no-integrated-as for clang compiler in Makefile.system
11 years ago
Zhang Xianyi
c8cc4a0d22
Fixed the typo in Changelog.txt
11 years ago
wernsaar
3885eebdb8
added optimized zaxpy bulldozer kernel
11 years ago