Martin Kroeker
d85b24e103
Clean up STACKSIZE redefinition
5 years ago
Martin Kroeker
7d6c85f9da
Add compiler option -mmma for POWER10
5 years ago
Martin Kroeker
89eea6b455
Merge pull request #102 from xianyi/develop
rebase
5 years ago
Martin Kroeker
0ac6102708
Update version string to 0.3.11.dev
5 years ago
Martin Kroeker
26a701f4ad
Update version string to 0.3.11.dev
5 years ago
Martin Kroeker
fcd0fa1a3a
Merge pull request #2908 from xianyi/release-0.3.0
Synchronyse tag with release 0.3.11
5 years ago
Martin Kroeker
51c22612eb
Merge pull request #2907 from xianyi/develop
Update from develop for 0.3.11
5 years ago
Martin Kroeker
b8f689200e
Update version number to 0.3.11
5 years ago
Martin Kroeker
fe9015b619
Update version for 0.3.11 release
5 years ago
Martin Kroeker
f99b8c1502
Merge pull request #2906 from martin-frbg/changelog-0311
Update Changelog.txt with the 0.3.11 changes
5 years ago
Martin Kroeker
5381a18056
Update Changelog.txt with the 0.3.11 changes
5 years ago
Martin Kroeker
e35576c6fc
Merge pull request #2905 from martin-frbg/aocc-clang
Add -mavx for clang & aocc
5 years ago
Martin Kroeker
f1bb85d378
Add AVX flags for clang/aocc as well
5 years ago
Martin Kroeker
25907e672b
Merge pull request #101 from xianyi/develop
rebase
5 years ago
Martin Kroeker
9789375389
Merge pull request #2900 from martin-frbg/fixcmake_sse
Add compiler options for SSE to the cmake support files
5 years ago
Martin Kroeker
f64243ff57
Add compiler options for sse/sse2/ssse3/sse4.1
5 years ago
Martin Kroeker
786c0a3ce8
Add sse options for use of intrinics with older compilers
5 years ago
Martin Kroeker
df70667043
fix core list for sse/sse2
5 years ago
Martin Kroeker
e6c5b13a18
Merge pull request #2898 from martin-frbg/morefixes
More pre-release fixes
5 years ago
Martin Kroeker
f071d1207a
add sse2
5 years ago
Martin Kroeker
dc6cefd2f5
Expressly enable -msse for 32bit DYNAMIC_ARCH kernels
5 years ago
Martin Kroeker
c339c40c01
Silence a redefinition warning
5 years ago
Martin Kroeker
ac8af9cec6
Add -msse where supported, apparently required for older gcc
5 years ago
Martin Kroeker
10379fc83b
Use ifdef instead of if
5 years ago
Martin Kroeker
a85ac71633
Merge pull request #100 from xianyi/develop
rebase
5 years ago
Martin Kroeker
4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double
Add compiler flag for SSE4 where available
5 years ago
Martin Kroeker
9b9ee92d5f
Merge pull request #2897 from Qiyu8/usimd-double
Add double precision universal intrinsics for X86/ARM
5 years ago
Martin Kroeker
ae6ac83991
Revert "add double precision SSE"
5 years ago
Qiyu8
4fac91ef37
adapt arm platform
5 years ago
Qiyu8
bfdf4b56da
Add double precision universal intrinsics for X86/ARM
5 years ago
Martin Kroeker
ebf0470fc2
add sse4.1 for DYNAMIC_ARCH kernels
5 years ago
Martin Kroeker
ca160bb440
Add -msse4.1 when SSE4.1 is supported
5 years ago
Martin Kroeker
c9c3ae07af
Add double precision operations
5 years ago
Martin Kroeker
a897bc3bd2
Merge pull request #99 from xianyi/develop
rebase
5 years ago
Martin Kroeker
756802df61
Merge pull request #2890 from martin-frbg/s-d-sum
Revert special handling of Windows xNRM2 and enable C+intrinsics kern…
5 years ago
Martin Kroeker
01492decf4
Merge pull request #2895 from martin-frbg/sb-tests
Fix remaining build errors related to bfloat16 and cmake
5 years ago
Martin Kroeker
bd0752444a
Merge pull request #2894 from RajalakshmiSR/bf16_packing
POWER10: Change the packing format for bfloat16
5 years ago
Martin Kroeker
c1f4f5d4e7
Replace Makefile with simplified version again
5 years ago
Martin Kroeker
75e3a92df6
Add express -mavx and -msse options (and fix a stray = for cooperlake)
5 years ago
Martin Kroeker
2a329baa81
Add the BFLOAT16 functions to cmake builds
5 years ago
Rajalakshmi Srinivasaraghavan
0826d68f93
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.
5 years ago
Martin Kroeker
4bb73c0171
Rename "HALF" type to "BFLOAT16"
5 years ago
Martin Kroeker
bc5c7f9578
Cleanup
5 years ago
Martin Kroeker
437b7fe261
sh prefix renamed to sb
5 years ago
Martin Kroeker
a0ada4bcb8
Merge pull request #98 from xianyi/develop
rebase
5 years ago
Martin Kroeker
602a0c7a69
Merge pull request #2892 from RajalakshmiSR/bf16_make
Fix build issues with bfloat16
5 years ago
Rajalakshmi Srinivasaraghavan
b5d30b390d
Fix build issues with bfloat16
This patch fixes compilation errors due to recent renaming from SH to SB
with BUILD_BFLOAT16.
5 years ago
Martin Kroeker
137ae618db
Fix typo
5 years ago
Martin Kroeker
9e3cff5cf2
Expressly enable -mavx2 on Zen, SkylakeX and Cooperlake as well
5 years ago
Martin Kroeker
d85b968424
Merge pull request #2891 from martin-frbg/fix-2886
Fix several bugs and omissions from the BFLOAT16 rename
5 years ago