51 Commits (d23680b81d5179ce6ae1ca5546303b81646ecac1)

Author SHA1 Message Date
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2 months ago
  Chris Sidebottom 48394384ef Use correct constants for per-target BGEMM/SBGEMM 2 months ago
  Chris Sidebottom 7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2 months ago
  Martin Kroeker 20f2ba0141
Move declaration of i for pre-C99 compilers 4 months ago
  Masato Nakagawa 2351a98005 Update 2D thread-partitioned GEMM for M << N case. 4 months ago
  Masato Nakagawa 80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 6 months ago
  Martin Kroeker 77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 7 months ago
  John Hein 6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 8 months ago
  Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 1 year ago
  shivammonaka 9e22d70957 Dynamic locking in Pthread Backend to allow multiple BLAS calls to be executed parallelly 1 year ago
  yamazaki-mitsufumi 51ab1903e7 Expanding the scop of 2D thread distribution 1 year ago
  shivammonaka d49ebc54e1 Merge branch 'shivam-develop' into shivam-Locks 1 year ago
  shivammonaka bc191015e3 Using OpenMP locks with NUM_PARALLEL 1 year ago
  Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds 2 years ago
  Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 2 years ago
  Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2 2 years ago
  Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 4 years ago
  Martin Kroeker a554712439
remove extra/intermediate size step for min_jj introduced in PR747 4 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS 5 years ago
  Ali Saidi 97ce6bbce2 Fix barriers in level3_thread 5 years ago
  wjc404 77b8f49556
Update level3_thread.c 5 years ago
  Martin Kroeker f72fdf525c
Merge pull request #1875 from martin-frbg/issue1851 6 years ago
  Martin Kroeker 113cb00b95
fix missing parenthesis 6 years ago
  Martin Kroeker 5192651706
Add CriticalSection handling instead of mutexes for Windows 6 years ago
  Martin Kroeker 2e6fae2aad
Serialize accesses to parallelized level3 functions from multiple callers 6 years ago
  Arjan van de Ven 5b708e5eb1 sgemm/dgemm: add a way for an arch kernel to specify prefered sizes 7 years ago
  Martin Kroeker 5f2a3c05cd
Revert "Rewrite &= -> = and simplify the initial blocking phase." 7 years ago
  Craig Donner 0144068537 Rewrite &= -> = and simplify the initial blocking phase. 7 years ago
  Arjan van de Ven 73de17664d Add missing barriers in gemm scheduler 7 years ago
  Arjan van de Ven d148ec4ea1 Don't use _Atomic for jobs sometimes... 7 years ago
  Arjan van de Ven 9e162146a9 Only initialize the part of the jobs array that will get used 7 years ago
  Zhiyong Dang 3716267124 Change _STDC_VERSION__ to __STDC_VERSION__ 7 years ago
  Martin Kroeker 6a99fcce94
Use _Atomic instead of volatile for thread safety where C11 is supported 7 years ago
  Andrew 11a627c54e remove surplus parentheses to silence clang5 7 years ago
  Tim Moon 30486a356c Reduce number of data partitions in n. 8 years ago
  Tim Moon 9de52b489a Cleaning up and documenting multi-threaded GEMM code. 8 years ago
  Tim Moon 860dcfc703 Use 2D thread distribution for small GEMMs. 8 years ago
  Tim Moon 6aaa107865 Reducing threads for multi-threaded GEMMs on small matrices. 8 years ago
  Werner Saar a2672d5589 prepared driver/level3 functions for UNROLL values, that are not a power of two 8 years ago
  Werner Saar b07d733a71 added updates for syrk and syr2k 9 years ago
  Ralph Campbell fbc21266e6 Minor C code fixes in driver/ 10 years ago
  wernsaar 1d33547222 optimized zgemm kernel for haswell 11 years ago
  Timothy Gu 6c2ead30f0 Remove all trailing whitespace except lapack-netlib 11 years ago
  wernsaar c947ab85dc changed level3.c 12 years ago
  wernsaar 2840d56aeb added dgemm_kernel for Piledriver 12 years ago
  Zhang Xianyi 32d2ca3035 Refs #214, #221, #246. Fixed the getrf overflow bug on Windows. 12 years ago
  wernsaar 6f008abcef replaced defined(DOUBLE) by !defined(XDOUBLE) 12 years ago
  Zhang Xianyi 5d3312142a Refs #221 #246. Fixed the overflowing stack bug in mutlithreading BLAS3. 12 years ago
  wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 12 years ago