2023 Commits (2d0b2334259d41c2003b51a07580dbd25cfe267c)

Author SHA1 Message Date
  Martin Kroeker c339c40c01
Silence a redefinition warning 5 years ago
  Martin Kroeker 10379fc83b
Use ifdef instead of if 5 years ago
  Martin Kroeker 4c25910da0
Merge pull request #2896 from martin-frbg/intrin-double 5 years ago
  damonyu ef8e7d0279 Add the support for RISC-V Vector. 5 years ago
  Martin Kroeker ae6ac83991
Revert "add double precision SSE" 5 years ago
  Qiyu8 4fac91ef37 adapt arm platform 5 years ago
  Qiyu8 bfdf4b56da Add double precision universal intrinsics for X86/ARM 5 years ago
  Martin Kroeker ebf0470fc2
add sse4.1 for DYNAMIC_ARCH kernels 5 years ago
  Martin Kroeker c9c3ae07af
Add double precision operations 5 years ago
  Martin Kroeker 756802df61
Merge pull request #2890 from martin-frbg/s-d-sum 5 years ago
  Rajalakshmi Srinivasaraghavan 0826d68f93 POWER10: Change the packing format for bfloat16 5 years ago
  Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16 5 years ago
  Martin Kroeker fecedc9c69
Add -mssse3 5 years ago
  Martin Kroeker 0eacbca85f
Add Haswell and Zen to temporary sse3 whitelist 5 years ago
  Martin Kroeker 6999086a2b
whitelist SANDYBRIDGE for SSE3 5 years ago
  Martin Kroeker 8d2df7d066
Revert special handling of Windows xNRM2 and enable C+intrinsics kernel for SSUM/DSUM 5 years ago
  Martin Kroeker 08929430cd
Merge pull request #2886 from martin-frbg/issue_2767 5 years ago
  Martin Kroeker 0c84ffe05f
Merge pull request #2881 from mattip/fninit 5 years ago
  Matti Picus 403eb513a0 use emms instead, add WIN guards 5 years ago
  Qiyu8 0ed1f07660 Optimize the performance of sum by using universal intrinsics 5 years ago
  Martin Kroeker 3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 756062afa5
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 2061f7fdff
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker dc8a1afa63
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker fd94236042
Rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 68ce719fac
Rename shdot_microk_cooperlake.c to sbdot_microk_cooperlake.c 5 years ago
  Martin Kroeker d7dd9b396c
Rename shdot.c to sbdot.c 5 years ago
  Martin Kroeker 9ae80490e0
rename "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker d314d1f49f
Rename shgemm_kernel_power10.c to sbgemm_kernel_power10.c 5 years ago
  Martin Kroeker c589c3e2a1
Merge pull request #2882 from martin-frbg/issue2709 5 years ago
  Martin Kroeker ec638a82bf
Merge pull request #2852 from martin-frbg/issue2588-cmake 5 years ago
  Martin Kroeker 6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types 5 years ago
  Martin Kroeker ac653c94f3
Merge branch 'develop' into issue2588-cmake 5 years ago
  Martin Kroeker 7a53128481
Add whitelist of DYNAMIC_ARCH kernels for which -msse3 needs to be enabled 5 years ago
  Martin Kroeker e1b7123bbe
Merge pull request #2867 from Qiyu8/usimd-floatdot 5 years ago
  Qiyu8 f32d34a015 add sse3 compiler flag 5 years ago
  Martin Kroeker 7812486091
Use generic C for D/Z nrm2 kernels on Windows to work around fpu exception bug 5 years ago
  Matti Picus a5b164946c add fninit to reset fpu registers before assembler routines 5 years ago
  User User-User d2333e7842 aarch64 fix std=c18 compilation 5 years ago
  Qiyu8 60e6c68e38 Adapt ARM architect 5 years ago
  Qiyu8 1b1a757f5f Optimize the performance of dot by using universal intrinsics in X86/ARM 5 years ago
  Rajalakshmi Srinivasaraghavan 2df4235e00 Optimize dcopy/zcopy for POWER10 5 years ago
  Martin Kroeker dfbc62ef7e
Support building only a subset of types 5 years ago
  Qiyu8 14f7dad3b7 performance improved 5 years ago
  Qiyu8 325b539c26 Optimize the performance of daxpy by using universal intrinsics 5 years ago
  Marius Hillenbrand 22aa81f3e5 s390x: fix cscal and zscal implementations 5 years ago
  Marius Hillenbrand f91057cbad s390x: move common vector definitions and utils into header 5 years ago
  Rajalakshmi Srinivasaraghavan be43d2cb96 Optimize daxpy/zaxpy for POWER10 5 years ago
  Martin Kroeker 91c84e1c01
Merge pull request #2796 from Guobing-Chen/BF16_dot_coversion_apis 5 years ago
  Martin Kroeker e72430fe46
Merge pull request #2803 from xiegengxin/AVX2-asum 5 years ago