pengxu
4787a55c64
Optimized cgemm kernel 16x4 LASX for LoongArch
1 year ago
Sergei Lewis
1093def0d1
Merge branch 'risc-v' into develop
1 year ago
kseniyazaytseva
ff41cf5c49
Fix BLAS, BLAS-like functions and Generic RISC-V kernels
* Fixed gemmt, imatcopy, zimatcopy_cnc functions
* Fixed cblas_cscal testing in ctest
* Removed rotmg unreacheble code
* Added zero size checks
2 years ago
martin-frbg
7976deff80
Fix file permissions (issue 4095)
2 years ago
Martin Kroeker
cfa0a80664
Restore initialization of data variables
2 years ago
Martin Kroeker
9567305e4c
Restore initialization of data01,data02
2 years ago
Sergei Lewis
cb0a70e0e2
dot.c early bail fix
2 years ago
Sergei Lewis
2406958629
* update intrinsics to match latest spec at https://github.com/riscv-non-isa/rvv-intrinsic-doc (in particular, __riscv_ prefixes for rvv intrinsics)
* fix multiple numerical stability and corner case issues
* add a script to generate arbitrary gemm kernel shapes
* add a generic zvl256b target to demonstrate large gemm kernel unrolls
2 years ago
Ivan Pribec
802e71bf05
Add const attribute to lsame
3 years ago
Martin Kroeker
ef24712030
Move a conditionally used variable
4 years ago
Wangyang Guo
619588fbab
sbgemm: remove unnecessary b0 files
4 years ago
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
4 years ago
Wangyang Guo
989e6bbdd3
Small Matrix: reduce generic kernel source files
4 years ago
Wangyang Guo
6b58bca18b
Small Matrix: disable low performance default kernel
4 years ago
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
4 years ago
Xianyi Zhang
6022e5629c
Refs #2587 fix small matrix c/zgemm bug.
5 years ago
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
5 years ago
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
5 years ago
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
5 years ago
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
5 years ago
damonyu
ef8e7d0279
Add the support for RISC-V Vector.
Change-Id: Iae7800a32f5af3903c330882cdf6f292d885f266
5 years ago
Martin Kroeker
756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Qiyu8
60e6c68e38
Adapt ARM architect
5 years ago
Qiyu8
1b1a757f5f
Optimize the performance of dot by using universal intrinsics in X86/ARM
5 years ago
Rajalakshmi Srinivasaraghavan
d23419accc
powerpc: Optimized SHGEMM kernel for POWER10
This patch introduces new optimized version of SHGEMM kernel
using power10 Matrix-Multiply Assist (MMA) feature introduced in
POWER ISA v3.1. This patch makes use of new POWER10 compute instructions
for matrix multiplication operation.
Tested on simulator and there are no new test failures.
5 years ago
Rajalakshmi Srinivasaraghavan
a87793e03c
Fix DYNAMIC_ARCH compilation errors
5 years ago
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
5 years ago
Qiyu8
ff42e68652
Optimize genenal Gemm Beta
5 years ago
Andrew
1e531701b7
fix small typo
7 years ago
Martin Kroeker
7a7619af6d
Revert changes from PR#1419
at least one of these changes apparently is an oversimplification, leading to TRMM breakage on some platforms as observed in #1563
7 years ago
Andrew
e5cc3d72c0
core.IdenticalExpr clang501 checker
7 years ago
Andrew
9fa986337d
add missing brackets to silence indentation warnings gcc721
7 years ago
Andrew
3eed97f6b9
Initialize values to silence cppcheck
7 years ago
Andrew
d602b99386
LAPACK helpers in C that need care too
7 years ago
Andrew
4d0b005e5b
Eliminate remaining unused results in kernels (clang5 analyzer)
7 years ago
Andrew
03e5ff0687
initialize potentially unitialized variables (clang5)
7 years ago
Andrew
47deec2c1a
fix couple of dead assignment warnings
7 years ago
Andrew
281a2b952f
warning cleanup ( #1380 )
* dead increments in driver/level2
* dead increments in kernel/generic
* part dead increments in kernel/x86_64
7 years ago
Martin Kroeker
8213385ab8
Work around compiler warnings for unused variables in the generic zgemm3m_Xcopy kernels
7 years ago
Andrew
441a9c8385
more dead increments clang4 scan-build deadcode.deadstores
7 years ago
Andrew
1236dbe5a6
Eliminate 2-8 dead increments code
7 years ago
Martin Kroeker
65bf0a343c
Remove unused variable btpr
7 years ago
Martin Kroeker
9d92f526dd
Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
8 years ago
Martin Kroeker
f96afd94b0
Fix out-of-bounds accesses where the data should be zero anyway
8 years ago
Andrew
becf8bc7a0
remove dead code
9 years ago
Yichao Yu
594b9f4c73
Do not use vsub to clear the register values since it doesn't work with non-normal numbers.
9 years ago
Ashwin Sekhar T K
45f78963ac
Optimized cgemm kernel for CORTEXA57
Also, add a generic ztrmm 4x4 kernel
10 years ago
Martin Koehler
711ca33bc6
Improved Ximatcopy when lda==ldb.
The Ximatcopy functions create a copy of the input matrix
although they seem to work inplace. The new routines
XIMATCOPY_K_YY perform the operations inplace if the leading
dimension does not change.
10 years ago
Zhang Xianyi
1cf2b10224
Use pure C generic target on x86 and x86_64.
make TARGET=GENERIC
?gemm3m is unimplemented on generic target.
10 years ago
Werner Saar
9bd962f655
modified haswell parameter dgemm_unroll_n
10 years ago