gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
1 year ago
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
1 year ago
Chris Sidebottom
90eb863d4b
Re-add accidental removal
1 year ago
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
1 year ago
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
1 year ago
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
1 year ago
Martin Kroeker
2f12a47405
fix build options for CAXPYC/ZAXPYC
1 year ago
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
1 year ago
Martin Kroeker
076766df4e
Update CMakeLists.txt
1 year ago
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
1 year ago
Martin Kroeker
362a063396
remove return value
1 year ago
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
1 year ago
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
1 year ago
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
1 year ago
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
1 year ago
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
Martin Kroeker
d277c6d15b
Merge pull request #4585 from martin-frbg/issue1881
Cap the number of parallel threads for GEMM;GETRF and POTRF to ensure sensible workloads on big systems
1 year ago
Igor Zhuravlov
22d305e2df
fix dtrtrs_ and ztrtrs_ to accept case-insensitive parameters uplo and diag
Changes to be committed:
modified: interface/lapack/trtrs.c
modified: interface/lapack/ztrtrs.c
1 year ago
Martin Kroeker
68ab5185d0
Update potrf.c
1 year ago
Martin Kroeker
19b29b3448
Update getrf.c
1 year ago
Martin Kroeker
a3354a7630
Cap the number of parallel threads
1 year ago
Martin Kroeker
5da4c93ef2
Cap the number of parallel threads
1 year ago
Martin Kroeker
496106642f
Cap the number of parallel threads
1 year ago
Martin Kroeker
cb8131cfd9
Merge pull request #4499 from kseniyazaytseva/new-tests
Tests for BLAS-like and BLAS API
1 year ago
Martin Kroeker
baf88564bc
Fix potential buffer overflow
1 year ago
kseniyazaytseva
7e9b1c0807
fix uninitialized data usage
1 year ago
kseniyazaytseva
c6f30fd414
check for zero inc
1 year ago
kseniyazaytseva
5e9ead09ac
fix info return
1 year ago
Martin Kroeker
500ac4de5e
fix incompatible pointer types
1 year ago
Martin Kroeker
d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments
1 year ago
Martin Kroeker
68d354814f
Fix incompatible pointer type in BFLOAT16 mode
1 year ago
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
1 year ago
Martin Kroeker
47bd064763
Fix names in build rules
1 year ago
Martin Kroeker
a7d004e820
Fix CBLAS prototype
1 year ago
Martin Kroeker
b54cda8490
Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
kseniyazaytseva
f89e0034a4
Fix LAPACK usage from BLAS
1 year ago
Martin Kroeker
f7cf637d7a
redo lost edit
2 years ago
Martin Kroeker
85548e66ca
Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list
2 years ago
Martin Kroeker
f129161453
restore C/Z SPMV, SPR, SYR,SYMV
2 years ago
Martin Kroeker
5b4df851d7
fix stray blank on continuation line
2 years ago
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2 years ago
Martin Kroeker
d2fc4f3b4d
Increase multithreading threshold by a factor of 50
1 year ago
Martin Kroeker
a7ed60bfe9
Add lower limit for multithreading
1 year ago
Angelika Schwarz
5ffbe646e1
Improve matcopy interface
* rows = 0 or cols = 0 is now a legal input and
takes quick return path
* Follow BLAS/LAPACK convention that the leading
dimensions must be at least 1.
1 year ago
Martin Kroeker
cd8eb83bae
Fix allocations and compiler warnings in ZROTG ( #4289 )
* Clean up ZROTG
1 year ago
Martin Kroeker
4a0f86397b
Merge pull request #4235 from angsch/develop
Fix division by zero in [z]rotg
2 years ago
Martin Kroeker
13ba4edf43
fix function prototypes (empty parentheses)
2 years ago