pengxu
a978ad3180
Loongarch64: add C functions of zgemm_ncopy_16
4 months ago
tingbo.liao
3c8df6358f
Further rearranged the rotm kernel for the different architectures.
Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>
8 months ago
Martin Kroeker
d91d4fa6e9
convert the beta=0 branch to a for loop as well
9 months ago
Martin Kroeker
09e75f1588
fix absurd typo
9 months ago
Martin Kroeker
2891fd8d6d
Replace while loop with for
9 months ago
Martin Kroeker
ccc23338d7
have the dummy GEMM3M kernel at least forward to regular GEMM
1 year ago
gxw
6017ad7146
loongarch64: Update dgemm_kernel_16x4 to dgemm_kernel_16x6
1 year ago
pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2 years ago
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2 years ago
Martin Kroeker
cfa0a80664
Restore initialization of data variables
2 years ago
Martin Kroeker
9567305e4c
Restore initialization of data01,data02
2 years ago
Sergei Lewis
cb0a70e0e2
dot.c early bail fix
2 years ago
Sergei Lewis
2406958629
* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2 years ago
Ivan Pribec
802e71bf05
Add const attribute to lsame
3 years ago
Martin Kroeker
ef24712030
Move a conditionally used variable
4 years ago
Wangyang Guo
619588fbab
sbgemm: remove unnecessary b0 files
4 years ago
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
4 years ago
Wangyang Guo
989e6bbdd3
Small Matrix: reduce generic kernel source files
4 years ago
Wangyang Guo
6b58bca18b
Small Matrix: disable low performance default kernel
4 years ago
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
4 years ago
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
5 years ago
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
5 years ago
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
5 years ago
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
5 years ago
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
5 years ago
damonyu
ef8e7d0279
Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
5 years ago
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Qiyu8
60e6c68e38
Adapt ARM architect
5 years ago
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
5 years ago
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
5 years ago
Rajalakshmi Srinivasaraghavan
a87793e03c
Fix DYNAMIC_ARCH compilation errors
5 years ago
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
5 years ago
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
5 years ago
Andrew
1e531701b7
fix small typo
7 years ago
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
7 years ago
Andrew
e5cc3d72c0
core.IdenticalExpr clang501 checker
7 years ago
Andrew
9fa986337d
add missing brackets to silence indentation warnings gcc721
7 years ago
Andrew
3eed97f6b9
Initialize values to silence cppcheck
7 years ago
Andrew
d602b99386
LAPACK helpers in C that need care too
7 years ago
Andrew
4d0b005e5b
Eliminate remaining unused results in kernels (clang5 analyzer)
7 years ago
Andrew
03e5ff0687
initialize potentially unitialized variables (clang5)
7 years ago
Andrew
47deec2c1a
fix couple of dead assignment warnings
7 years ago
Andrew
281a2b952f
warning cleanup ( #1380 )
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
7 years ago
Martin Kroeker
8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
7 years ago
Andrew
441a9c8385
more dead increments clang4 scan-build deadcode.deadstores
7 years ago
Andrew
1236dbe5a6
Eliminate 2-8 dead increments code
7 years ago
Martin Kroeker
65bf0a343c
Remove unused variable btpr
8 years ago
Martin Kroeker
9d92f526dd
Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
8 years ago