Qiyu8
e5c2ceb675
fix the CI failure of lack the head
4 years ago
Qiyu8
a87e537b8c
modify macro
4 years ago
Qiyu8
5bc0a7583f
only FMA3 and vector larger than 128 have positive effects.
4 years ago
Qiyu8
8c0b206d4c
Optimize the performance of rot by using universal intrinsics
4 years ago
Martin Kroeker
ff16329cb7
Merge pull request #2972 from xiegengxin/rot-intrinsic
Improve the performance of rot by using AVX512 and AVX2 intrinsic
4 years ago
Martin Kroeker
433637ccd8
Merge pull request #2980 from martin-frbg/fixgetarch
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
4 years ago
Martin Kroeker
ec088bf33a
Fix missing AVX2 and FMA3 capabilities in FORCE_target mode
4 years ago
Martin Kroeker
110c7a6de0
Merge pull request #2979 from RajalakshmiSR/dot_power10
Optimize sdot/ddot for POWER10
4 years ago
Martin Kroeker
d2faa1be4e
Merge pull request #2978 from martin-frbg/fixdynfeatures
Fix handling of cpu capability flags in DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
1c4cfdc139
Stay compatible with old gmake that did not support undefine
4 years ago
Martin Kroeker
f6a57d8f63
Update Makefile.system
4 years ago
Martin Kroeker
f4b7ba12b7
Update Makefile.system
4 years ago
Rajalakshmi Srinivasaraghavan
6e364981a8
Optimize sdot/ddot for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Martin Kroeker
b976a0bf40
Remove previous workaround for compiler flags related to cpu capabilities in x86_64 DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
a04f532edf
Reset cpu property flags between build cycles in DYNAMIC_ARCH mode
4 years ago
Martin Kroeker
ccb9731c7b
Fix propagation of cpu properties to compiler options
4 years ago
Martin Kroeker
a29338aaa6
Remove extraneous quotes that caused a cmake policy warning
4 years ago
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
e5967810b7
Merge pull request #110 from xianyi/develop
rebase
4 years ago
Martin Kroeker
ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
4 years ago
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
4 years ago
Gengxin Xie
725ffbf041
fix typo
4 years ago
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Martin Kroeker
60ab9c783f
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
8cc73fee98
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
4 years ago
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
a9f9354296
Fix target test
4 years ago
Martin Kroeker
b9bc76aec4
Add files via upload
4 years ago
Martin Kroeker
f071245939
Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10: Change dgemm unroll factors
4 years ago
Martin Kroeker
e5f8c2bf8a
typo fix
4 years ago
Martin Kroeker
6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
4 years ago
Martin Kroeker
40a93c232b
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
4 years ago
Martin Kroeker
fab952bee4
Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
4 years ago
Martin Kroeker
1cf04a6f0e
Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
4 years ago
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
4 years ago
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
4 years ago
User User-User
9fab65e90a
add openbsd gfortran
4 years ago
Martin Kroeker
9efc3f0815
Merge pull request #109 from xianyi/develop
rebase
4 years ago
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
4 years ago
Guillaume Horel
1f564d729b
fix avx2 detection
reword commits to make it clearer
4 years ago
Martin Kroeker
9349dcd206
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
4 years ago
Rajalakshmi Srinivasaraghavan
b435491885
Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Martin Kroeker
9a058f2451
Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
4 years ago
Martin Kroeker
074927a7d0
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
4 years ago
Martin Kroeker
60b22e3462
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
4 years ago
Chen, Guobing
c5e62dad69
Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
4 years ago
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
5 years ago
Martin Kroeker
6e13a7e99e
Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
5 years ago
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
5 years ago