Martin Kroeker
96d2f2c9b2
Merge pull request #1831 from brada4/hemv
disable threading in C/ZSWAP copying from S/DSWAP
7 years ago
Andrew
2992e3886a
disable threading in C/ZSWAP copying from S/DSWAP
7 years ago
Martin Kroeker
e3c262e5cf
Merge pull request #1825 from brada4/hemv
Delay _hemv threading in attempt to address #1820
7 years ago
Andrew
a293bdcd5e
re-arrange new code for readability
7 years ago
Andrew
c7bbf9c987
Attempt to tame _hemv threading #1820
7 years ago
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
7 years ago
Martin Kroeker
b991570210
Merge pull request #1762 from martin-frbg/issue1710-2
Add explicit casts to silence compiler warnings
7 years ago
Martin Kroeker
f3c262156e
Add an explicit cast to silence a warning
for #1710
7 years ago
Martin Kroeker
30f5a69ab8
Add explicit cast to silence a warning
for #1710
7 years ago
Martin Kroeker
4a553e8678
Merge pull request #1713 from martin-frbg/issue1710
Introduce blasabs macro and use it to switch between abs and labs for INTERFACE64
7 years ago
Martin Kroeker
165f00c159
fabs -> fabsl
7 years ago
Martin Kroeker
933896a1d0
Use blasabs to switch between abs and labs as needed for INTERFACE64
7 years ago
Steven G. Johnson
a4e321400b
fabs -> fabsl
Fixes two calls that were using `fabs` on a `long double` argument rather than `fabsl`, which looks like it is doing an unintentional truncation to `double` precision.
7 years ago
Martin Kroeker
9cf22b7d91
Build cblas_iXamin interfaces
7 years ago
Craig Donner
c2545b0fd6
Fixed a few more unnecessary calls to num_cpu_avail.
I don't have as many benchmarks for these as for gemm, but it should still
make a difference for small matrices.
7 years ago
Craig Donner
66316b9f4c
Improve performance of GEMM for small matrices when SMP is defined.
Always checking num_cpu_avail() regardless of whether threading will actually
be used adds noticeable overhead for small matrices. Most other uses of
num_cpu_avail() do so only if threading will be used, so do the same here.
7 years ago
Martin Kroeker
e8880c1699
Use a single thread for small input size
copies daxpy improvement from #27 , see #1560
7 years ago
Martin Kroeker
1d27fa8507
Merge pull request #1539 from martin-frbg/ztrmv-1332
Disable multithreading in ztrmv
7 years ago
Martin Kroeker
a8ed428bab
Disable multithreading in ztrmv
BLAS-Tester shows that the same problem exists as with DTRMV (issue #1332 )
7 years ago
Martin Kroeker
809fd0d451
Rewrite ROTMG to address cases not covered by the netlib algorithm ( #1480 )
* Rewrite ROTMG based on the new implementation in GONUM based on the algorithm proposed by Tim Hopkins, see issue 1452 for the reference
* Correct ROTMG utest for issue1452 and add another from gonum, also correct transposition of expected and observed values in error messages
7 years ago
Martin Kroeker
72f14a0363
Fix conditionals in the rescaling against GAMSQ
7 years ago
Martin Kroeker
798f1595d5
Fix condition in both second scaling loops
7 years ago
Martin Kroeker
0464aa6784
Remove debug printfs
7 years ago
Martin Kroeker
55840f0bc9
Keep the flag handling separate from the scaling loops
Fixes #1452 and is more in line with how ATLAS does it. The earlier fix from #356 only moved the bug elsewhere, but we will never want the iterative rescaling to change the dflag setting and variable associations with each cycle.
7 years ago
Andrew
47deec2c1a
fix couple of dead assignment warnings
7 years ago
Martin Kroeker
38763ec4f3
Disable multithreading for trmv
as a (hopefully temporary) workaround for #1332
7 years ago
Martin Kroeker
9251a2efde
Merge pull request #1359 from brada4/develop
Eliminate mode variable where not needed in syrk interface
7 years ago
Martin Kroeker
b46e2b57cc
Make return parameter of cblas_Xdotc_sub, cblas_Xdotu_sub a void pointer as well
7 years ago
Martin Kroeker
3ce401f51b
Make last parameter of cblas_Xdotc_sub/cblas_Xdotu_sub a void pointer as well
7 years ago
Andrew
27575d200a
Eliminate mode variable where not needed
7 years ago
Martin Kroeker
2c222f1faa
Modify complex CBLAS functions to take void pointers
Modify complex CBLAS functions to take void pointers instead of float or double arguments (to bring the prototypes in line with netlib and other implementations' cblas.h)
8 years ago
Martin Kroeker
742f54c235
Merge pull request #1303 from martin-frbg/imatcopy-rowscols
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
8 years ago
Martin Kroeker
d674fbb4c7
Fix cols/rows mixup in omatcopy 2nd step for BlasTrans cases
Equivalent of #1244 (issue #899 ) for the non-complex cases. Fixes #1289
8 years ago
Martin Kroeker
46c9357c72
Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
8 years ago
Abdurrauf
1cfdb2295d
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
8 years ago
Martin Kroeker
00740c0e34
Merge pull request #1290 from martin-frbg/imatcopy
Use in-place transform shortcut only if matrix is square
8 years ago
Martin Kroeker
254db9bd7c
Use in-place transform shortcut only if matrix is square
8 years ago
Isuru Fernando
d245caa49a
Support out-of-source build
8 years ago
Martin Kroeker
376048156b
Use in-place transform shortcut only if matrix is square
8 years ago
Martin Kroeker
d1c5b8f913
Add files via upload
8 years ago
Martin Kroeker
91bde7d315
Exchange rows and cols in final omatcopy with BlasTrans
This is MicMuc's patch from #899
8 years ago
Martin Kroeker
1e06b49854
Update xerbla.c
8 years ago
Martin Kroeker
7f546f54fa
Add cblas_xerbla
8 years ago
Martin Kroeker
a809431e34
Add cblas_xerbla()
8 years ago
Andrew
99880f7906
Address unlikely memleak in zimatcopy interface ( #1129 )
* fix unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
* fix only unlikely memleak in zimatcopy interface
8 years ago
Martin Kroeker
211d2eceb5
Update zdot.c
8 years ago
Martin Kroeker
5813ed095b
Update zdot.c
8 years ago
Martin Kroeker
e44b028fe5
Replace gnu _real_, _imag_ extensions in initializers
8 years ago
Ashwin Sekhar T K
071a830e8b
THUNDERX2T99: Add optimized S/D/C/Z SWAP Implementations
8 years ago
Werner Saar
dd6212e684
updated some level1 funcions, that are not thread save
8 years ago