Martin Kroeker
751d127d7c
Include cblas_test.h to achieve int/long size change with INTERFACE64
4 years ago
Martin Kroeker
fc101b67e5
Merge pull request #23 from xianyi/develop
rebase
4 years ago
Martin Kroeker
b0239a05fd
Merge pull request #3183 from martin-frbg/2715-x
Restore __volatile__ keyword in ARM64 DYNAMIC_ARCH detection mechanism
4 years ago
Martin Kroeker
623d580b4c
Restore __volatile__ keyword
4 years ago
Martin Kroeker
974acb39ff
Merge pull request #3181 from RajalakshmiSR/dgemmp10vp
POWER10: Improve dgemm performance
4 years ago
Rajalakshmi Srinivasaraghavan
2379abaa5e
POWER10: Improve dgemm performance
This patch uses vector pair pointer for input load operation
which helps to generate power10 lxvp instructions.
4 years ago
Martin Kroeker
3caf781d7c
Merge pull request #3179 from RajalakshmiSR/zgemvp10
POWER10: Optimized zgemv
4 years ago
Rajalakshmi Srinivasaraghavan
55bb9f639a
POWER10: Optimized zgemv
This patch makes use of Matrix-Multiply Assist (MMA)
feature introduced in POWER ISA v3.1 for zgemv_n and zgemv_t.
4 years ago
Martin Kroeker
0dba04bb58
Merge pull request #3178 from martin-frbg/fix2864
Fix unwanted fallback to implicit typing in slanv2/dlanv2
4 years ago
Martin Kroeker
e96f5e3c65
Fix implicit typing of new variable TWO
4 years ago
Martin Kroeker
558724e99f
Fix implicit typing of new variable TWO
4 years ago
Martin Kroeker
067c96a873
Merge pull request #3177 from martin-frbg/issue3176
Use "old" compute(24) function with clang due to register limitations
4 years ago
Martin Kroeker
4b380c0b40
Merge pull request #3175 from LYP951018/develop
Pass NO_AVX512 macro def when `DYNAMIC_ARCH` is enabled
4 years ago
Martin Kroeker
2dfb24730d
Use "old" compute(24) function with clang due to register limitations
4 years ago
刘雨培
725432efaa
pass NO_AVX512 macro def
4 years ago
Martin Kroeker
a2216ef19f
Merge pull request #3173 from martin-frbg/dyna-sse3
Fix spillover of host-specific build flags into the shared part of x86 DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
5332cbae18
Avoid adding host-specific cpuflags to the common part of DYNAMIC_ARCH builds
4 years ago
Martin Kroeker
209b026e46
Merge pull request #3172 from martin-frbg/lapack477-final
Copy missing fixes from the final revision of Reference-LAPACK PR477
4 years ago
Martin Kroeker
1ae607beca
Update Makefile.x86_64
4 years ago
Martin Kroeker
d393f1923f
Fix spillover of host-specific build flags into the shared part of DYNAMIC_ARCH builds with gmake
for #3139
4 years ago
Martin Kroeker
081d5ae971
Fix typo and potentially undefined variables
(copies fixes made in Reference-LAPACK PR 477 after the initial cherrypick)
4 years ago
Martin Kroeker
0492f0f3f9
Merge pull request #22 from xianyi/develop
rebase
4 years ago
Martin Kroeker
147e0a75fd
Merge pull request #3170 from CodesWithWolves/sgemm_tcopy_16-invalid-read
Remove Unnecessary/Erroneous Adds/Reads In sgemm_tcopy_16.S COPY1x8 Macro
4 years ago
Martin Kroeker
ee068af843
Merge pull request #3171 from RajalakshmiSR/BE_p10
POWER10: Adding check for little endian
4 years ago
Rajalakshmi Srinivasaraghavan
2dbcddd83d
POWER10: Adding check for little endian
This patch makes sure that recent POWER10 patches are used
only for little endian.
4 years ago
CodesWithWolves
d2bda3b56a
Remove Unnecessary/Erroneous Reads In sgemm_tcopy_16.S COPY1x8 Macro
There appears to have been some code leak when copying from the COPY2x8
macro above where we're reading 8 bytes into d4-d7 directly after
reading 4 bytes into s4-s7. These 32 bytes in d4-7 are unused and can
possibly overrun the boundary of allocated memory -- Valgrind detected
this which is what dragged my attention to it for a 128,1 copy.
Additionally, there is no need to update the addresses stored in A0-A7
as the only possible paths after running this macro will overwrite A0-7
if looping to the next 8 rows, or overwrite A0-3 if moving to 4 rows --
in which case A4-7 are unused.
4 years ago
Martin Kroeker
903fd85c85
Merge pull request #3167 from xianyi/fix3126
Fix compilation of the benchmarks on older OSX versions
4 years ago
Martin Kroeker
d57c681a6d
Fix compilation on older OSX versions
4 years ago
Martin Kroeker
d7efe5857c
Merge pull request #3165 from martin-frbg/azure-osx
Add OSX build to Azure
4 years ago
Martin Kroeker
8fd694c18f
Update .travis.yml
4 years ago
Martin Kroeker
e69b0b1771
Update azure-pipelines.yml
4 years ago
Martin Kroeker
9dc0bfd617
Update azure-pipelines.yml
4 years ago
Martin Kroeker
e6664ec2c9
Update azure-pipelines.yml
4 years ago
Martin Kroeker
dbb33f412f
Update azure-pipelines.yml
4 years ago
Martin Kroeker
70b89a6205
Add OSX build to Azure
4 years ago
Martin Kroeker
07b144855a
Merge pull request #3164 from martin-frbg/travisosxomp
Fix xcode12 build on Travis and add OSX/OpenMP job
4 years ago
Martin Kroeker
292a0aed66
Fix xcode12 build and add OSX/OpenMP
4 years ago
Martin Kroeker
42f0201e21
Merge pull request #20 from xianyi/develop
rebase
4 years ago
Martin Kroeker
22db876d48
Merge pull request #3158 from austinpagan/Gemm.CZPQ
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
4 years ago
Martin Kroeker
bdd6e3a153
Merge pull request #3157 from martin-frbg/issue3020-final
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler on PPC
4 years ago
Martin Kroeker
7b8f580941
Merge pull request #3156 from martin-frbg/omatcopy_d
Move x86_64 DOMATCOPY_RT back to the C implementation
4 years ago
Gordon Fossum
198adea961
Changed default P/Q values for CGEMM and ZGEMM (Power10 only)
4 years ago
Martin Kroeker
86c5a0013f
Add workaround for LAPACK testsuite failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
ef85c22474
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
d3555d2e50
Add workaround for LAPACK test failures with the NVIDIA HPC compiler
4 years ago
Martin Kroeker
c4b91bfcf1
Merge pull request #3155 from martin-frbg/issue3152
Fix recent SGEMM_direct breakage on SkylakeX and Cooperlake
4 years ago
Martin Kroeker
0f5e86a0d9
Remove premature entry for DOMATCOPY_RT
4 years ago
Martin Kroeker
7b294a99fd
Move common.h back to the top of the file so that SKYLAKEX (from config.h) is defined in time
4 years ago
Martin Kroeker
1e4b2e98d9
Merge pull request #3154 from martin-frbg/issue3153
Fix premature include in getarch_2nd
4 years ago
Martin Kroeker
3fd6ccdf76
Include just the definition of BLASLONG rather than all of common.h
4 years ago