gxw
990507e3b8
LoongArch64: Opt zgemv with LASX
1 year ago
gxw
d51ffec3a2
LoongArch64: Opt cgemv with LASX
1 year ago
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
1 year ago
Sergei Lewis
ba17758c02
fix axpy implementations where y has a stride of 0
1 year ago
Dmitry Mikushin
d0f5dc763b
Adding USE_GEMM3M macro to kernel targets, so that the *gemm3m functions and parameters can be included into the gotoblas structure. Fixes #4500
1 year ago
Sergei Lewis
ff1523163f
Fix axpy test hangs when n==0. Reenable zaxpy_vector kernel for C910V.
1 year ago
pengxu
fe3da43b7d
Optimized zgemm kernel 8*4 LASX, 4*4 LSX and cgemm kernel 8*4 LSX for LoongArch
1 year ago
Martin Kroeker
e5d2725e5a
Merge pull request #4185 from XiWeiGu/mips_enable_msa
MIPS: Enable MSA
1 year ago
Martin Kroeker
b537528feb
Merge pull request #4480 from XiWeiGu/loongarch64-fixed-{s/d}amin-lsx
LoongArch64: Fixed {s/d}amin LSX optimization
1 year ago
Martin Kroeker
6d8a273cca
Handle zero increment(s) in C910V ?AXPBY ( #4483 )
* Handle zero increment(s)
1 year ago
Martin Kroeker
dbcf4f8b7d
Merge pull request #4479 from XiWeiGu/loongarch-opt-axpby
Loongarch opt axpby
1 year ago
Martin Kroeker
dc802dd637
Merge pull request #4474 from ChipKerchner/sgemmIncopy_PR
Vectorize in-copy packing/copying for SGEMM - up to 4X faster.
1 year ago
gxw
adde725321
LoongArch64: Fixed {s/d}amin LSX optimization
1 year ago
gxw
7bc93d95a1
LoongArch64: Opt {c/z}axpby
1 year ago
gxw
1e1f487dc7
LoongArch64: Fixed {s/d}axpby
1 year ago
Martin Kroeker
4d8dee508c
temporarily disable the CAXPY/ZAXPY kernels
1 year ago
Chip Kerchner
2bb7ea64a1
Only vectorize 64-bit version for Power8.
1 year ago
Sergei Lewis
3ffd6868d7
Merge branch 'develop' into dev/slewis/merge-from-riscv
1 year ago
Sergei Lewis
a3b0ef6596
Restore riscv64 fixes from develop branch: dot product double precision accumulation, zscal NaN handling
1 year ago
Martin Kroeker
d1343302bd
Merge pull request #4465 from XiWeiGu/utest-zscal
utest: Add tests for zscal
1 year ago
gxw
969601a1dc
X86_64: Fixed bug in zscal
Fixed handling of NAN and INF arguments when
inc is greater than 1.
1 year ago
Martin Kroeker
98c9ff3194
Merge pull request #4464 from XiWeiGu/loongarch64-zscal
LoongArch64: Handle NAN and INF
1 year ago
Chip Kerchner
09bb48d1b9
Vectorize in-copy packing/copying for SGEMM - 4X faster.
1 year ago
gxw
83ce97a4ca
LoongArch64: Handle NAN and INF
1 year ago
gxw
a79d117405
LoogArch64: Fixed bug for {s/d}amin
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
Martin Kroeker
889c5d026a
Merge pull request #4456 from kseniyazaytseva/riscv-rvv10
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
1 year ago
Martin Kroeker
4e2a32ff51
Merge pull request #4454 from kseniyazaytseva/riscv-rvv07
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
1 year ago
gxw
276e3ebf9e
LoongArch64: Add dzamax and dzamin opt
1 year ago
Martin Kroeker
a21b2fa5e4
Merge pull request #4452 from kseniyazaytseva/riscv-generic
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
1 year ago
Andrey Sokolov
9c49a81d54
Resolve conflicts
1 year ago
kseniyazaytseva
e1afb23811
Fix BLAS and LAPACK tests for C910V and RISCV64_ZVL256B targets
* Fixed bugs in dgemm, [a]min\max, asum kernels
* Added zero checks for BLAS kernels
* Added dsdot implementation for RVV 0.7.1
* Fixed bugs in _vector files for C910V and RISCV64_ZVL256B targets
* Added additional definitions for RISCV64_ZVL256B target
2 years ago
Octavian Maghiar
deecfb1a39
Merge branch 'risc-v' into img-riscv64-zvl128b
1 year ago
kseniyazaytseva
5222b5fc18
Added axpby kernels for GENERIC RISC-V target
2 years ago
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2 years ago
kseniyazaytseva
b193ea3d7b
Fix BLAS and LAPACK tests for RVV 1.0 target, update to 0.12.0 intrincics
* Update intrincics API to 0.12.0 version (Stride Segment Loads/Stores)
* Fixed nrm2, axpby, ncopy, zgemv and scal kernels
* Added zero size checks
1 year ago
Martin Kroeker
88e994116c
Merge pull request #4354 from imaginationtech/img-rvv-kernel-generator
[RISC-V] Improve RVV kernel generator LMUL usage
1 year ago
Dirreke
ec89466e14
Add CSKY support
1 year ago
Sergei Lewis
9edb805e64
fix builds with t-head toolchains that use old versions of the intrinsics spec
1 year ago
Martin Kroeker
0d2e486edf
Handle NAN and INF
1 year ago
Martin Kroeker
5f5b7c4f45
Merge pull request #4423 from martin-frbg/issue4422
Check compiler support for AVX512BF16 and base COL/SPR kernel choice on that
1 year ago
Martin Kroeker
f31bea07dd
Merge pull request #4419 from martin-frbg/issue4413
[WIP] Add fixes and utests for ZSCAL with NaN or Inf arguments
1 year ago
Martin Kroeker
20413ee6ec
Update zscal.c
1 year ago
Martin Kroeker
b57627c27f
Handle NAN and INF
1 year ago
Martin Kroeker
995a990e24
Make AVX512 BFLOAT16 kernels conditional on compiler capability
1 year ago
Martin Kroeker
7df363e1e2
temporarily disable the MSA C/ZSCAL kernels
1 year ago
Chip-Kerchner
058dd2a4cb
Replace two vector loads with one vector pair load and fix endianess of stores - DGEMM versions.
1 year ago
Martin Kroeker
1c31f56e5a
Handle NAN
1 year ago
Martin Kroeker
7ee1ee38e2
Handle NaN in input
1 year ago
Martin Kroeker
f637e12713
Handle INF and NAN
1 year ago