Rajendra Prasad Matcha
eae0abfdb6
SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API.
2 months ago
Srangrang
ec14e1648c
fix: resolve non-RISCV host build failed issue
- adjust interface to disable "small matrix" pathway
- separate HFLOAT16 from BFLOAT16
- remove SHGEMM_UNROLL_M and SHGEMM_UNROLL_N equal conditions
Related to PR#5290
Co-authored-by Martin
3 months ago
gkdddd
670ec6f757
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B
Added HFLOAT16 support for RISCV64
Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16
The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0
Related to issue #5279
Co-authored-by Linjin Li <linjin_li@163.com>
4 months ago
Martin Kroeker
5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS ( #5222 )
* Fix ARMV9SME target and add support_sme1 code for MacOS
* make sgemm_direct unconditionally available on all arm64
* build a (dummy) sgemm_direct kernel on all arm64
* Update dynamic_arm64.c
4 months ago
Vaisakh K V
d23eb3b93e
Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API
* Added ARMV9SME target
* Added SGEMM_DIRECT kernel based on SME1
10 months ago
Martin Kroeker
46e331a917
remove the unworkable GEMM3M restriction from GENERIC again
1 year ago
Martin Kroeker
2787c9f8e4
Disable GEMM3M for generic targets (not implemented)
1 year ago
Martin Kroeker
04bc801999
(Re)apply fixes for supporting only a subset of precision types from PR 3915
1 year ago
Rajalakshmi Srinivasaraghavan
9f42570e33
POWER: Increase macro size limit for AIX
This patch increases the macro size limit from 4096 to 16384 to
allow compiling larger assembly files in AIX.
Tested with GCC and IBM Open XL C.
2 years ago
Rajalakshmi Srinivasaraghavan
71d733e5f7
POWER: Avoid m4 conversions for C files
This patch removes intermediate m4 conversions used in sbgemm
compilation as it is not needed for .c files.
Tested on AIX with gcc and IBM Open XL C.
2 years ago
Martin Kroeker
61d803547a
Apply USE_TRMM to MIPS64_GENERIC as to GENERIC
2 years ago
Martin Kroeker
898cf5faf3
Add Elbrus e2k architecture support
3 years ago
Bine Brank
b6a445cfd8
adapt Makefile for SVE trsm
3 years ago
Bine Brank
bb33446b40
fix makefile.L3
3 years ago
Bine Brank
07fa6fa3b1
configure Makefile for sve
3 years ago
Bine Brank
0140373802
add sve ztrmm
3 years ago
Bine Brank
774267fdac
adjust Makefile.L3 for SVE
3 years ago
Bine Brank
86ae89bf33
add sgemm kernel and copy functions for sgemm and ssymm
3 years ago
Bine Brank
9b9cb90bb1
modify Makefile for SVE copy
3 years ago
Bine Brank
9388f05a3c
configure SVE Makefile
3 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Martin Kroeker
f1e3305974
Add workaround for Windows10 macro name clash
4 years ago
Wangyang Guo
619588fbab
sbgemm: remove unnecessary b0 files
4 years ago
Wangyang Guo
1d83ca4bca
Small Matrix: support BFLOAT16 data type
4 years ago
Wangyang Guo
989e6bbdd3
Small Matrix: reduce generic kernel source files
4 years ago
Wangyang Guo
5dc7c3c8e5
Small Matrix: add GEMM_SMALL_MATRIX_PERMIT to tune small matrics case
4 years ago
Xianyi Zhang
57ed58cefe
Refs #2587 Add small matrix optimization reference kernel for c/zgemm.
5 years ago
Xianyi Zhang
17d32a4a82
Change a1b0 gemm to b0 gemm.
5 years ago
Xianyi Zhang
59cb5de46b
Refs #2587 Fix typos.
5 years ago
Xianyi Zhang
be3349405d
Add alpha=1.0 beta=0.0 for small gemm.
5 years ago
Xianyi Zhang
0a2077901c
Add small marix optimization kernel interface.
make SMALL_MATRIX_OPT=1
5 years ago
Martin Kroeker
c4da892ba0
Only filter out -mavx on Sandybridge ZGEMM/ZTRMM kernels
4 years ago
Martin Kroeker
bd60fb6ffc
filter out -mavx flag on zgemm kernels as it can cause problems with older gcc
4 years ago
gxw
4b548857d6
Add msa support for loongson
1. Using core loongson3r3 and loongson3r4 for loongson
2. Add DYNAMIC_ARCH for loongson
Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1
4 years ago
Zhang Xianyi
d7ba7679b6
Merge branch 'develop' into risc-v
5 years ago
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
5 years ago
Martin Kroeker
3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types
5 years ago
Martin Kroeker
9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support
Enable COOPERLAKE build target
5 years ago
Martin Kroeker
75eeb265d7
[WIP] Refactor the driver code for direct SGEMM ( #2782 )
Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available
(on x86_64 targets only for now) in DYNAMIC_ARCH builds
* Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt
* Add direct_sgemm functions to the gotoblas struct in common_param.h
* Move sgemm_direct_performant helper to separate file
* Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h
* (Conditionally) add sgemm_direct functions in setparam-ref.c
5 years ago
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
5 years ago
Rajalakshmi Srinivasaraghavan
475b5c95b9
Remove extra symbol in Makefile
While trying out different unroll values, noted that
make failed due to this extra symbol.
5 years ago
Martin Kroeker
da17abec87
fix trailing whitespace
5 years ago
Martin Kroeker
b144423f0f
Do not define USE_TRMM for 32bit POWER8
5 years ago
Martin Kroeker
ed7e155c35
Merge branch 'develop' into aix
5 years ago
Martin Kroeker
c854ef5471
Fix variable names in conditional
5 years ago
Martin Kroeker
c0afc11742
Fix POWERPC builds on AIX (gcc/gfortran 7)
1. macro preprocessing for POWER8 and later kernels only
2. default buffer size used by AIX version of m4 is too small
5 years ago
Kavana Bhat
df4ade070f
Fix for #2671
5 years ago
Rajalakshmi Srinivasaraghavan
9fe930f205
powerpc: Add support for future processor
This is the initial patch to support build infrastructure
for POWER10 architecture.
5 years ago
Martin Kroeker
5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF ( #2590 )
* make building the bfloat16 BLAS functions conditional on BUILD_HALF
* pass the BUILD_HALF option to gensymbol
* Pass BUILD_HALF as a compiler define for dynamic_arch builds
5 years ago