OpenBLAS

Commit Graph

Author	SHA1	Message	Date
Chris Sidebottom	740efd71c4	Add optimized BGEMM kernel for NEOVERSEV1 target This also improves the testing and generic kernel by re-using the BF16 conversion functions. Built on top of https://github.com/OpenMathLib/OpenBLAS/pull/5357 and derived from https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	2 months ago
Martin Kroeker	343830c26f	Add BGEMM parameter tables	2 months ago
Chris Sidebottom	f95e7b0e32	Add infrastructure for BGEMM Setting up all the infrastructure for BGEMM support in OpenBLAS, hopefully I found all the right places. Derived mostly from the previous work done in https://github.com/OpenMathLib/OpenBLAS/pull/5287 Co-authored-by: Ye Tao <ye.tao@arm.com>	3 months ago
gkdddd	670ec6f757	Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B Added HFLOAT16 support for RISCV64 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B based on HFLOAT16 The instruction sets used are ZVFH and ZFH, which need to be supported by RVV1.0 Related to issue #5279 Co-authored-by Linjin Li <linjin_li@163.com>	4 months ago
Martin Kroeker	5141a90993	Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222 ) * Fix ARMV9SME target and add support_sme1 code for MacOS * make sgemm_direct unconditionally available on all arm64 * build a (dummy) sgemm_direct kernel on all arm64 * Update dynamic_arm64.c	4 months ago
Vaisakh K V	f66ca05b31	Merge branch 'develop' into topic/sgemm_direct_sme1	7 months ago
Vaisakh K V	d23eb3b93e	Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API * Added ARMV9SME target * Added SGEMM_DIRECT kernel based on SME1	10 months ago
Martin Kroeker	4924319c50	fix position of srotm, qrotm	8 months ago
tingbo.liao	3c8df6358f	Further rearranged the rotm kernel for the different architectures. Signed-off-by: tingbo.liao <tingbo.liao@starfivetech.com>	8 months ago
gxw	48698b2b1d	LoongArch64: Rename core Use microarchitecture name instead of meaningless strings to name the core, the legacy core is still retained. 1. Rename LOONGSONGENERIC to LA64_GENERIC 2. Rename LOONGSON3R5 to LA464 3. Rename LOONGSON2K1000 to LA264	1 year ago
Mark Ryan	3b715e6162	Add autodetection for riscv64 Implement DYNAMIC_ARCH support for riscv64. Three cpu types are supported, riscv64_generic, riscv64_zvl256b, riscv64_zvl128b. The two non-generic kernels require CPU support for RVV 1.0 to function correctly. Detecting that a riscv64 device supports RVV 1.0 is a little complicated as there are some boards on the market that advertise support for V via hwcap but only support RVV 0.7.1, which is not binary compatible with RVV 1.0. The approach taken is to first try hwprobe. If hwprobe is not available, we fall back to hwcap + an additional check to distinguish between RVV 1.0 and RVV 0.7.1. Tested on a VM with VLEN=256, a CanMV K230 with VLEN=128 (with only the big core enabled), a Lichee Pi with RVV 0.7.1 and a VF2 with no vector. A compiler with RVV 1.0 support must be used to build OpenBLAS for riscv64 when DYNAMIC_ARCH=1. Signed-off-by: Mark Ryan <markdryan@rivosinc.com>	1 year ago
Martin Kroeker	93d975d8fd	Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset loongarch: Optimizing the performance of the GEMM on servers	1 year ago
gxw	d8c4ea8793	loongarch: Optimizing the performance of the GEMM on servers	1 year ago
Chen Yu	8e39c05efd	Get the l2 cache size via environment variable on confidential VM The CPUID(leaf:2 or leaf:0x80000006) is not supported on some confidential VMs. As a result the get_l2_size() returns the default 512M which brings performance issues. Introduce the environment variable OPENBLAS_L2_SIZE provided by the user to get the l2 cache size. Suggested-by: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>	1 year ago
Honglin Zhu	90f041e348	Invoke the syscall to allow the use of amx tiles	2 years ago
Martin Kroeker	437c0bf2b4	Merge pull request #3843 from Mousius/switch-ratio Propagate SWITCH_RATIO to DYNAMIC_ARCH builds	2 years ago
Chris Sidebottom	32f2fafde7	Propagate SWITCH_RATIO to DYNAMIC_ARCH builds Previously dynamic builds were either using the default SWITCH_RATIO or one from the higher level architecture; this patch ensures the dynamic builds can use this parameter as well.	2 years ago
Martin Kroeker	38d6fb4225	Fix dependencies in builds with specified subsets of precision types	2 years ago
Martin Kroeker	5481c328e8	fix DYNAMIC_ARCH builds that use only a subset of precisions	2 years ago
Martin Kroeker	c9d78dc3b2	Remove excess initializer (leftover from rework of PR 3793)	2 years ago
Honglin Zhu	4989e039a5	Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build	2 years ago
Honglin Zhu	843e9fd0b9	Fix typo error	2 years ago
Honglin Zhu	b00d5b9746	New sbgemm implementation for Neoverse N2 1. Use UZP instructions but not gather load and scatter store instructions to get lower latency. 2. Padding k to a power of 4.	2 years ago
gxw	fbfe1daf6e	LoongArch64: Add DYNAMIC_ARCH support	3 years ago
Martin Kroeker	40302558ed	Remove extraneous (and wrong) definition of sbgemm_r on x86_64	3 years ago
Martin Kroeker	d9894f45d3	Define sbgemm_r to fix DYNAMIC_ARCH builds	3 years ago
Wangyang Guo	3dc6052c7e	initial support for Sapphire Rapids platform	4 years ago
Wangyang Guo	1d83ca4bca	Small Matrix: support BFLOAT16 data type	4 years ago
Wangyang Guo	478d1086c1	Small Matrix: support DYNAMIC_ARCH build	4 years ago
gxw	4b548857d6	Add msa support for loongson 1. Using core loongson3r3 and loongson3r4 for loongson 2. Add DYNAMIC_ARCH for loongson Change-Id: I1c6b54dbeca3a0cc31d1222af36a7e9bd6ab54c1	4 years ago
Chen, Guobing	a7b1f9b1bb	Implementation of BF16 based gemv 1. Add a new API -- sbgemv to support bfloat16 based gemv 2. Implement a generic kernel for sbgemv 3. Implement an avx512-bf16 based kernel for sbgemv Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	5 years ago
Martin Kroeker	10379fc83b	Use ifdef instead of if	5 years ago
Martin Kroeker	3aecafad80	Change "HALF" and "sh" to "BFLOAT16" and "sb"	5 years ago
Martin Kroeker	6b6adf8a4a	Allow compiling only a subset of kernels for specific variable types	5 years ago
Martin Kroeker	dfbc62ef7e	Support building only a subset of types	5 years ago
Chen, Guobing	deaeb6c5b8	Add bfloat16 based dot and conversion with single/double 1. Added bfloat16 based dot as new API: shdot 2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot 3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod shstobf16 -- convert single float array to bfloat16 array shdtobf16 -- convert double float array to bfloat16 array sbf16tos -- convert bfloat16 array to single float array dbf16tod -- convert bfloat16 array to double float array 4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16 5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs 6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building 7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t Signed-off-by: Chen, Guobing <guobing.chen@intel.com>	5 years ago
Martin Kroeker	9ee21a0a39	Merge pull request #2780 from Guobing-Chen/CPL_build_support Enable COOPERLAKE build target	5 years ago
Martin Kroeker	75eeb265d7	[WIP] Refactor the driver code for direct SGEMM (#2782 ) Move "direct SGEMM" functionality out of the SkylakeX SGEMM kernel and make it available (on x86_64 targets only for now) in DYNAMIC_ARCH builds * Add sgemm_direct targets in the kernel Makefile.L3 and CMakeLists.txt * Add direct_sgemm functions to the gotoblas struct in common_param.h * Move sgemm_direct_performant helper to separate file * Update gemm.c to macros for sgemm_direct to support dynamic_arch naming via common_s,h * (Conditionally) add sgemm_direct functions in setparam-ref.c	5 years ago
Chen, Guobing	e740c4873d	Enable COOPERLAKE build target Enable new build target platform -- COOPERLAKE. This target platform supports all the SKYLAKEX supported ISAs + avx512bf16. So all the SKYLAKEX specific kernels/drivers and related code are now extended to be also active on COOPERLAKE. Besides, new BF16 related kernels are active under this target.	5 years ago
Martin Kroeker	5dd14e3d48	Make building the bfloat16 functions conditional on option BUILD_HALF (#2590 ) * make building the bfloat16 BLAS functions conditional on BUILD_HALF * pass the BUILD_HALF option to gensymbol * Pass BUILD_HALF as a compiler define for dynamic_arch builds	5 years ago
Rajalakshmi Srinivasaraghavan	67cc4b9e16	Fix warnings in clang and export symbol	5 years ago
Rajalakshmi Srinivasaraghavan	a87793e03c	Fix DYNAMIC_ARCH compilation errors	5 years ago
Rajalakshmi Srinivasaraghavan	7eb55504b1	RFC : Add half precision gemm for bfloat16 in OpenBLAS This patch adds support for bfloat16 data type matrix multiplication kernel. For architectures that don't support bfloat16, it is defined as unsigned short (2 bytes). Default unroll sizes can be changed as per architecture as done for SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be changed as per architecture requirement and for now, size 2 is used. Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare sgemm and shgemm output. This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm. Complex type implementation can be discussed and added once this is approved.	5 years ago
int_13h	96ad579428	add in runtime cpu detection for zarch (#2349 ) add in runtime cpu detection for zarch	5 years ago
Martin Kroeker	ccfb7ead15	Merge pull request #2072 from martin-frbg/sum Add (C)BLAS extension ?sum	6 years ago
Rashmica Gupta	bcdf1d4917	Add in runtime CPU detection for POWER.	6 years ago
Martin Kroeker	b9f4943a14	Add ?sum	6 years ago
Ashwin Sekhar T K	d5aeff636f	ARM64: Enable DYNAMIC_ARCH Enable DYNAMIC_ARCH feature on ARM64. This patch uses the cpuid feature in linux kernel to detect the core type at runtime (https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt). If this feature is missing in kernel, then the user should use the OPENBLAS_CORETYPE env variable to select the desired core type.	7 years ago
Ashwin Sekhar T K	e7b66cd36e	ARM64: Fix DYNAMIC_ARCH compilation for cores which dont use GEMM3M	7 years ago
Martin Kroeker	6f71c0fce4	Return a somewhat sane default value for L2 cache size if cpuid retur… (#1611 ) * Return a somewhat sane default value for L2 cache size if cpuid returned something unexpected Fixes #1610, the KVM hypervisor on Google Chromebooks returning zero for CPUID 0x80000006, causing DYNAMIC_ARCH builds of OpenBLAS to hang	7 years ago

1 2

79 Commits (740efd71c4ad0f6f56371cabd23f086b985e0602)