Martin Kroeker
49689fbef7
Add support for compiling SVE kernels with the NVIDIA HPC compiler
2 years ago
Martin Kroeker
ac698cedad
Add compiler options for ARM64 SVE targets in DYNAMIC_ARCH builds
2 years ago
Martin Kroeker
d2144b2981
Add NVHPC
2 years ago
Martin Kroeker
de937b3194
Add clang option to avoid running out of registers in AVX512 assembly
2 years ago
Martin Kroeker
e964ebd0d0
Add compiler option for AVX512-capable Ryzen(4)
2 years ago
Martin Kroeker
a0a4f7c447
Add -mfma to -mavx2 for clang, and add AVX2 declaration for Zen in DYNAMIC_ARCH builds
3 years ago
Martin Kroeker
85fd3c4279
Support compilation with the Cray C and Fortran compilers ( #3712 )
* Add support for the Cray Fortran compiler
3 years ago
Martin Kroeker
18b19d135b
C_LAPACK: Fixes to make it compile with MSVC ( #3605 )
* Fix f2c-like support functions to compile with MSVC, and
re-enable C_LAPACK for MSVC in CMAKE
* Add MSVC&flang build to Azure CI in order to check C_LAPACK correctness
3 years ago
Martin Kroeker
b7873605d4
Use f2c translations of LAPACK when no Fortran compiler is available ( #3539 )
* Add C equivalents of the Fortran routines from Reference-LAPACK as fallbacks, and C_LAPACK variable to trigger their use
3 years ago
Rafael Cardoso Fernandes Sousa
d38110a5ce
Use CMake variables instead of as
3 years ago
Rafael Cardoso Fernandes Sousa
214fbcee15
Fix cmake for power
3 years ago
Markus Mützel
de2ed66596
cmake: Set SUFFIX64 also for NOFORTRAN
3 years ago
Wangyang Guo
3dc6052c7e
initial support for Sapphire Rapids platform
4 years ago
Martin Kroeker
e02df9fc55
Propagate BUILD_BFLOAT16 to CFLAGS
4 years ago
Wangyang Guo
76ea8db4da
Small Matrix: enable by default for x86_64 arch
If no customized GEMM_SMALL_M_PERMIT kernel defined, it will just by pass to normal path.
4 years ago
Wangyang Guo
fee5abd84b
Small Matrix: support cmake build
4 years ago
Martin Kroeker
30f23be0f9
Rework setting of -mfma to only apply it where necessary
4 years ago
User User-User
91e2b11d3c
add to cmake listings too
4 years ago
刘雨培
725432efaa
pass NO_AVX512 macro def
4 years ago
Martin Kroeker
33b5670122
Merge pull request #3096 from martin-frbg/fixclangcmake
Fix Cooperlake/DYNAMIC_ARCH builds with clang on Windows
4 years ago
Martin Kroeker
95e19e2e23
fix case in compiler name check
Co-authored-by: xoviat <49173759+xoviat@users.noreply.github.com>
4 years ago
Martin Kroeker
99ac042702
remove spurious lines (probably editor malfunction)
4 years ago
Martin Kroeker
774b9f8653
handle AppleClang in Cooperlake support condition
4 years ago
Martin Kroeker
eb1d2344f7
Fix compiler version check for Intel Cooperlake support (clang-cl does not accept -dumpversion)
4 years ago
xoviat
b60de4447a
add cortex-m platform
4 years ago
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
b9bc76aec4
Add files via upload
4 years ago
Martin Kroeker
f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1
5 years ago
Martin Kroeker
e3a29f6b58
Change "HALF" and "sh" to "BFLOAT16" and "sb"
5 years ago
Martin Kroeker
68e6823d36
Adapt for supporting only a subset of variable types
5 years ago
Martin Kroeker
e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot
Optimize the performance of dot by using universal intrinsics in X86/ARM
5 years ago
Qiyu8
f32d34a015
add sse3 compiler flag
5 years ago
Martin Kroeker
a5feea6611
make BLAS3_MEM_ALLOC_THRESHOLD configurable on non-Windows
5 years ago
Martin Kroeker
c4aeeeb9f4
Activate all BUILD_ options if none was specified
5 years ago
Martin Kroeker
26792d2096
Copy BUILD_* directives to the compiler options to allow ifdef in tests
5 years ago
Martin Kroeker
68b1713c30
Merge pull request #2811 from martin-frbg/issue2806
Make NO_AVX512 option override the AVX512 compile test in CMAKE builds as well
5 years ago
Martin Kroeker
bd3207b4b4
Update system.cmake
5 years ago
Martin Kroeker
b8ebfc9335
Update system.cmake
5 years ago
Martin Kroeker
71d33c952d
Typo fix
5 years ago
Martin Kroeker
6a3c074786
-march=cooperlake requires gcc10
5 years ago
Chen, Guobing
e740c4873d
Enable COOPERLAKE build target
Enable new build target platform -- COOPERLAKE. This target platform
supports all the SKYLAKEX supported ISAs + avx512bf16. So all the
SKYLAKEX specific kernels/drivers and related code are now extended
to be also active on COOPERLAKE. Besides, new BF16 related kernels
are active under this target.
5 years ago
Martin Kroeker
6876221cf3
Remove optimization level limit for flang again and add -fno-unroll-loops for AOCC flang 2.x instead
5 years ago
Martin Kroeker
3ce469a34f
Limit optimization level to O1 for flang and add -frecursive
5 years ago
Martin Kroeker
bb12c2c854
Limit MAX_STACK_ALLOC availability to non-Wndows
5 years ago
Martin Kroeker
6e97df7b47
Add CMAKE support for MAX_STACK_ALLOC setting
5 years ago
Rajalakshmi Srinivasaraghavan
7eb55504b1
RFC : Add half precision gemm for bfloat16 in OpenBLAS
This patch adds support for bfloat16 data type matrix multiplication kernel.
For architectures that don't support bfloat16, it is defined as unsigned short
(2 bytes). Default unroll sizes can be changed as per architecture as done for
SGEMM and for now 8 and 4 are used for M and N. Size of ncopy/tcopy can be
changed as per architecture requirement and for now, size 2 is used.
Added shgemm in kernel/power/KERNEL.POWER9 and tested in powerpc64le and
powerpc64. For reference, added a small test compare_sgemm_shgemm.c to compare
sgemm and shgemm output.
This patch does not cover OpenBLAS test, benchmark and lapack tests for shgemm.
Complex type implementation can be discussed and added once this is approved.
5 years ago
Martin Kroeker
7f0d523b42
Make BUFFER_SIZE configurable
5 years ago
Martin Kroeker
e3d846ab57
Do not use -march=native with the PGI compiler
6 years ago
Martin Kroeker
f69a0be712
Add getarch flags to disable AVX on x86
(and other small fixes to match Makefile behaviour)
6 years ago