Martin Kroeker
89ae305e11
Workaround for cmake having its own C_COMPILER variable
4 years ago
Martin Kroeker
b716c0ef01
Add workaround for NVIDIA HPC
4 years ago
Martin Kroeker
2efa3b70dc
Add workaround for NVIDIA HPC
4 years ago
Martin Kroeker
49959d4f1c
Add workaround for NVIDIA HPC
4 years ago
Martin Kroeker
0f27a03607
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
4 years ago
Martin Kroeker
c2a8ebfe69
Add workaround for NVIDIA HPC mishandling of the asm DOT kernels
4 years ago
Martin Kroeker
43aac5bacc
Support NVIDIA HPC compiler
4 years ago
Martin Kroeker
bff2b7c94d
Support compilation with NVIDIA HPC compilers (which do not take gcc-style arch options)
4 years ago
Martin Kroeker
2d45a262d9
Support compilation with nvfortran
4 years ago
Martin Kroeker
018dec8588
Merge pull request #7 from xianyi/develop
rebase
4 years ago
Martin Kroeker
5d6209e1f9
Merge pull request #3055 from RajalakshmiSR/swapp10
Optimize swap function for POWER10
4 years ago
Rajalakshmi Srinivasaraghavan
601b711c78
Optimize swap function for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
4 years ago
Martin Kroeker
78702753f2
Merge pull request #3053 from pkubaj/patch-1
Fix build on FreeBSD/powerpc64le
4 years ago
pkubaj
7aa1ff8ff6
Fix build on FreeBSD/powerpc64le
4 years ago
Martin Kroeker
d6c97cf010
Merge pull request #3052 from ashwinyes/arm64_fix_nrm2
arm64: Fix nrm2 for input vectors with Inf
4 years ago
Ashwin Sekhar T K
1b2508362b
arm64: Fix nrm2 for input vectors with Inf
Fix double precision nrm2 kernels returning NaN when the
input vectors contain Inf/-Inf.
4 years ago
Martin Kroeker
cd898af59f
Merge pull request #3050 from aurel32/riscv64-openblas-supported
getarch.c: define OPENBLAS_SUPPORTED for riscv64
4 years ago
Aurelien Jarno
0a535e58d8
getarch.c: define OPENBLAS_SUPPORTED for riscv64
4 years ago
Martin Kroeker
9ce9e295fe
Merge pull request #3049 from martin-frbg/readme
Expand the introductory paragraph of the README with links to netlib docs and linear algebra lecture videos
4 years ago
Martin Kroeker
9a38592c79
Add pointers to the netlib documentation and Gilbert Strang's linear algebra primers
4 years ago
Martin Kroeker
9b3965b08c
Merge pull request #6 from xianyi/develop
rebase
4 years ago
Martin Kroeker
531cb4f673
Merge pull request #3035 from Joshua-Ashton/patch-1
Define BLAS acronym in README
4 years ago
Martin Kroeker
3559c5d7a2
Merge pull request #3048 from martin-frbg/issue2998
Temporarily revert to the old NRM2 kernels for ThunderX2/3 and NeoverseN1
4 years ago
Martin Kroeker
8631e2976a
Temporarily revert to the old nrm2 kernels
4 years ago
Martin Kroeker
2768bc1764
Temporarily revert to the old nrm2 kernels
4 years ago
Martin Kroeker
6f4698ee1f
Temporarily revert to the old nrm2 kernel
4 years ago
Martin Kroeker
85e5165e98
Merge pull request #3046 from martin-frbg/nvidiasdk-ppc
Support NVIDIA HPC SDK on POWERPC
4 years ago
Martin Kroeker
17c16f2a71
Implement builtin_cpu_is and limit cpu choices to P8 and P9 for NVIDIA compilers
4 years ago
Martin Kroeker
91c3f86c2b
NVIDIA compiler does not yet support POWER10
4 years ago
Martin Kroeker
75b1f3becc
Limit POWERPC DYNAMIC_CORE list to P8 and P9 for NVIDIA compilers
4 years ago
Martin Kroeker
07c5e549b2
Merge pull request #3045 from martin-frbg/nvidiasdk
Support NVIDIA HPC SDK 20.11 compilers on x86_64
4 years ago
Martin Kroeker
114eb159a4
Disable FMA intrinsics in the srot kernel when the compiler is PGI/NVIDIA
4 years ago
Martin Kroeker
005cce5507
Amend SkylakeX options to support the NVIDIA compiler
4 years ago
Martin Kroeker
b859b6e79d
Add nvfortran
4 years ago
Martin Kroeker
b212a2fb9f
Add/modify "PGI" compiler options for NVIDIA SDK 20.11
4 years ago
Martin Kroeker
e40416567a
Add version printout for PGI/NVIDIA compiler
4 years ago
Martin Kroeker
b37e5fa2f8
Merge pull request #5 from xianyi/develop
rebase
4 years ago
Martin Kroeker
326469ef4a
Merge pull request #3042 from martin-frbg/develop
Move FMA3 option setting to the kernel makefile
4 years ago
Martin Kroeker
c73d8ee40d
Conditionally add -mfma to compiler options where needed
4 years ago
Martin Kroeker
abef2ea770
Move -fma option setting to kernel/Makefile.L1
4 years ago
Martin Kroeker
b26e32c3af
Merge pull request #3040 from martin-frbg/fixfcheck
Fix undefined CC variable in check for clang+gfortran combo
4 years ago
Martin Kroeker
7822eff936
Merge pull request #3038 from martin-frbg/issue3037
Fix spurious assumption of cross-compilation on some architectures
4 years ago
Martin Kroeker
b03dc011be
Fix undefined CC variable in clang check
4 years ago
Martin Kroeker
00ce35336e
Fix spurious removal of a trailing character from the hostarch string on x86_64
4 years ago
Martin Kroeker
723776ddf7
Merge pull request #4 from xianyi/develop
rebase
4 years ago
Martin Kroeker
5a77ec7f1c
Merge pull request #3036 from RajalakshmiSR/p10copyalign
POWER10: Improve copy performance
4 years ago
Rajalakshmi Srinivasaraghavan
2fb11f873b
POWER10: Improve copy performance
This patch aligns the stores to 32 byte boundary for scopy and dcopy
before entering into vector pair loop. For ccopy, changed the store
instructions to stxv to improve performance of unaligned cases.
4 years ago
Joshie
ad63647446
Define BLAS acronym in README
4 years ago
Martin Kroeker
87315e8a8d
Update version to 0.3.13.dev
4 years ago
Martin Kroeker
9031ebd7d5
Update version to 0.3.13.dev
4 years ago