Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Martin Kroeker
0cf656fd3e
Add copies of GEMMT under its new name GEMMTR
11 months ago
Chris Daley
cb48505251
optimize gemv forwarding on ARM64 systems
11 months ago
Chip Kerchner
36bd3eeddf
Vectorize BF16 GEMV (VSX & MMA). Use GEMM_GEMV_FORWARD_BF16 (for Power).
11 months ago
Chip Kerchner
1d51ca5798
Change multi-threading logic for SBGEMV to be the same as SGEMV.
11 months ago
Martin Kroeker
9762464718
Fix CBLAS interface filling in the wrong triangle for Row-Major
11 months ago
gxw
48698b2b1d
LoongArch64: Rename core
Use microarchitecture name instead of meaningless strings to name the core,
the legacy core is still retained.
1. Rename LOONGSONGENERIC to LA64_GENERIC
2. Rename LOONGSON3R5 to LA464
3. Rename LOONGSON2K1000 to LA264
1 year ago
Martin Kroeker
7878976236
disable forwarding from SBGEMM to SBGEMV for now
1 year ago
Chris Sidebottom
b26424c6a2
Allow opt into GEMM -> GEMV forwarding
1 year ago
Chris Sidebottom
90eb863d4b
Re-add accidental removal
1 year ago
Chris Sidebottom
28b5334f22
Complete implementation of GEMV forwarding
1 year ago
Martin Kroeker
3db5dbc88e
forward to GEMV when one argument is actually a vector
1 year ago
gxw
f3cebb3ca3
x86: Fixed numpy CI failure when the target is ZEN.
1 year ago
Martin Kroeker
2f12a47405
fix build options for CAXPYC/ZAXPYC
1 year ago
Martin Kroeker
db9f7bc552
fix float array types to include bfloat16
1 year ago
Martin Kroeker
076766df4e
Update CMakeLists.txt
1 year ago
Martin Kroeker
ff6670cb83
don't generate non-cblas files for gemm_batch
1 year ago
Martin Kroeker
362a063396
remove return value
1 year ago
Martin Kroeker
89c7bbcba6
add cblas_?gemm_batch
1 year ago
Martin Kroeker
2957281275
Introduce a lower limit for multithreading
1 year ago
Martin Kroeker
5fd871d7ea
Introduce a lower limit for multithreading
1 year ago
gxw
637c650f4f
loongarch64: Add buffer offset for target LOONGSON3R5
1 year ago
Martin Kroeker
93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
gxw
d8c4ea8793
loongarch: Optimizing the performance of the GEMM on servers
1 year ago
Martin Kroeker
d277c6d15b
Merge pull request #4585 from martin-frbg/issue1881
Cap the number of parallel threads for GEMM;GETRF and POTRF to ensure sensible workloads on big systems
1 year ago
Igor Zhuravlov
22d305e2df
fix dtrtrs_ and ztrtrs_ to accept case-insensitive parameters uplo and diag
Changes to be committed:
modified: interface/lapack/trtrs.c
modified: interface/lapack/ztrtrs.c
1 year ago
Martin Kroeker
68ab5185d0
Update potrf.c
1 year ago
Martin Kroeker
19b29b3448
Update getrf.c
1 year ago
Martin Kroeker
a3354a7630
Cap the number of parallel threads
1 year ago
Martin Kroeker
5da4c93ef2
Cap the number of parallel threads
1 year ago
Martin Kroeker
496106642f
Cap the number of parallel threads
1 year ago
Martin Kroeker
cb8131cfd9
Merge pull request #4499 from kseniyazaytseva/new-tests
Tests for BLAS-like and BLAS API
1 year ago
Martin Kroeker
baf88564bc
Fix potential buffer overflow
1 year ago
kseniyazaytseva
7e9b1c0807
fix uninitialized data usage
1 year ago
kseniyazaytseva
c6f30fd414
check for zero inc
1 year ago
kseniyazaytseva
5e9ead09ac
fix info return
1 year ago
Martin Kroeker
500ac4de5e
fix incompatible pointer types
1 year ago
Martin Kroeker
d4db6a9f16
Separate the interface for SBGEMMT from GEMMT due to differences in GEMV arguments
1 year ago
Martin Kroeker
68d354814f
Fix incompatible pointer type in BFLOAT16 mode
1 year ago
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
1 year ago
Martin Kroeker
47bd064763
Fix names in build rules
1 year ago
Martin Kroeker
a7d004e820
Fix CBLAS prototype
1 year ago
Martin Kroeker
b54cda8490
Unify creation of CBLAS interfaces for ?AMIN/?AMAX and C/ZAXPYC between gmake and cmake builds
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
kseniyazaytseva
f89e0034a4
Fix LAPACK usage from BLAS
1 year ago
Martin Kroeker
f7cf637d7a
redo lost edit
2 years ago
Martin Kroeker
85548e66ca
Fix build failures seen with the NO_LAPACK option - cspr/csymv/csyr belong on the LAPACK list
2 years ago
Martin Kroeker
f129161453
restore C/Z SPMV, SPR, SYR,SYMV
2 years ago
Martin Kroeker
5b4df851d7
fix stray blank on continuation line
2 years ago
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2 years ago