132 Commits (develop)

Author SHA1 Message Date
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2 months ago
  youcai 41f9701ebc Fix cmake building with cblas_bgemm 2 months ago
  Chris Sidebottom 48394384ef Use correct constants for per-target BGEMM/SBGEMM 2 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 3 months ago
  Chris Sidebottom 7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2 months ago
  Masato Nakagawa 5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for 3 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 3 months ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 4 months ago
  Martin Kroeker 20f2ba0141
Move declaration of i for pre-C99 compilers 4 months ago
  Masato Nakagawa 2351a98005 Update 2D thread-partitioned GEMM for M << N case. 4 months ago
  Ruiyang Wu 02fd1df10b CMake: Pass `OpenMP` compiler and linker flags through CMake targets 6 months ago
  Masato Nakagawa 80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 6 months ago
  Martin Kroeker 77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 7 months ago
  John Hein 6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 8 months ago
  Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 1 year ago
  shivammonaka 9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 1 year ago
  Martin Kroeker db070a9223
add gemm_batch drivers 1 year ago
  Martin Kroeker d0794f88dc
add gemm_batch driver 1 year ago
  yamazaki-mitsufumi 51ab1903e7 Expanding the scop of 2D thread distribution 1 year ago
  shivammonaka d49ebc54e1 Merge branch 'shivam-develop' into shivam-Locks 1 year ago
  shivammonaka bc191015e3 Using OpenMP locks with NUM_PARALLEL 1 year ago
  Martin Kroeker c4bd4a2e5d
fix improper function prototypes (empty parentheses) 2 years ago
  Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds 2 years ago
  Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2 years ago
  Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2 2 years ago
  Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 4 years ago
  Martin Kroeker 2f8220d757
Add sbgemm 4 years ago
  Martin Kroeker 307c4c0786
Fix typo 4 years ago
  Martin Kroeker e83df93975
Work around another recent macro name collision with winnt.h 4 years ago
  Martin Kroeker a554712439
remove extra/intermediate size step for min_jj introduced in PR747 4 years ago
  Martin Kroeker 5d26223f4a
remove extra/intermediate size step of min_jj from PR747 4 years ago
  Martin Kroeker d3ff1f889f
Convert ifndefs to ifneq 4 years ago
  Rajalakshmi Srinivasaraghavan b5d30b390d Fix build issues with bfloat16 5 years ago
  Martin Kroeker 006c7f6671
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 886a8e3190
Adapt for supporting only a subset of variable types 5 years ago
  Martin Kroeker ac653c94f3
Merge branch 'develop' into issue2588-cmake 5 years ago
  Martin Kroeker 988a6f429e
Add BUILD_vartype defines 5 years ago
  Martin Kroeker e5e2fbd593
Support building only selected types 5 years ago
  y00512012 06cf73a239 fix a bug of trmm 5 years ago
  Martin Kroeker ddec244a5a
Merge pull request #2838 from austinpagan/gordon_trmm 5 years ago
  fossum dfeca46098 Adding performance patch for trmm, just like #2836 5 years ago
  fossum 274d6e015b Fixing a performance bug in trsm_[LR].c. 5 years ago
  Martin Kroeker 330044d821
Fix potentiol domain error in sqrt 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker ce45af8151
Update conditional for atomics to use HAVE_C11 5 years ago
  Martin Kroeker 6f38de06d2
Update conditional for atomics to use HAVE_C11 5 years ago
  Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 5 years ago
  Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS 5 years ago
  Ali Saidi 97ce6bbce2 Fix barriers in level3_thread 5 years ago
  wjc404 2f96a2c55b
Update trmm_R.c 5 years ago