799 Commits (57c2936a4359d8c91af567456fa5668ba70c0772)

Author SHA1 Message Date
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 2 months ago
  youcai 41f9701ebc Fix cmake building with cblas_bgemm 2 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 2 months ago
  Martin Kroeker b37516add6
Add BGEMM parameters 2 months ago
  Chris Sidebottom 48394384ef Use correct constants for per-target BGEMM/SBGEMM 2 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 3 months ago
  Martin Kroeker 3d31887073
Merge pull request #5362 from Mousius/fix-bf16 2 months ago
  Martin Kroeker 0ddf8ebd42
Merge pull request #5354 from pratiklp00/p11 2 months ago
  Chris Sidebottom 7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 2 months ago
  Masato Nakagawa 5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for 3 months ago
  Martin Kroeker 8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289 3 months ago
  Martin Kroeker 9bcffbd655
Declare the server_lock mutex volatile in addition to static 3 months ago
  pratiklp00 1dde4a13c0 p11 changes 3 months ago
  zhoupeng 134b21ae60 Fix some hyperthreading errors. 4 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 3 months ago
  Martin Kroeker e541bf68f5
support AmpereOne/OneA as NeoverseN1 3 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 3 months ago
  Martin Kroeker 31ef2cbbb3
Exit if memory allocation keeps failing, instead of looping forever 3 months ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 4 months ago
  Martin Kroeker 20f2ba0141
Move declaration of i for pre-C99 compilers 4 months ago
  Masato Nakagawa 2351a98005 Update 2D thread-partitioned GEMM for M << N case. 4 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 4 months ago
  Ruiyang Wu 02fd1df10b CMake: Pass `OpenMP` compiler and linker flags through CMake targets 6 months ago
  Masato Nakagawa 80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 6 months ago
  Martin Kroeker 39eb43d441
Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170) 6 months ago
  Martin Kroeker 1533fe49be
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2 7 months ago
  Ye Tao f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 7 months ago
  Martin Kroeker eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1 7 months ago
  Martin Kroeker 77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 7 months ago
  Vaisakh K V f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1 7 months ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 10 months ago
  John Hein 6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 8 months ago
  Martin Kroeker a182251284
fix typo 9 months ago
  Martin Kroeker ed95791618
fix conflicting variables 9 months ago
  Martin Kroeker 3c3d1c4849
Identify all cores and select the most performant one as TARGET 9 months ago
  Ralf Gommers 765ad8bcd2 Fix guard around `alloc_hugetlb`, fixes compile warning 9 months ago
  Ralf Gommers 48caf2303d Fix build warning about discarding volatile qualifier in memory.c 9 months ago
  Martin Kroeker 4060dd43e3
Add dummy implementations of openblas_get/set_affinity 10 months ago
  Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 1 year ago
  Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake 1 year ago
  gxw 30af9278dc LoongArch64: Enable cmake cross-compilation 1 year ago
  gxw 48698b2b1d LoongArch64: Rename core 1 year ago
  Martin Kroeker 3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2 1 year ago
  Martin Kroeker a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter 1 year ago
  Martin Kroeker d24b3cf393
properly fix buffer allocation and assignment 1 year ago
  gxw fd033467ac Fixed the undefined reference to blas_set_parameter 1 year ago
  Martin Kroeker 23b5d66a86
Ensure a memory buffer has been allocated for each thread before invoking it 1 year ago
  Martin Kroeker 753c7ebe17
Merge pull request #4835 from martin-frbg/revertwin4359 1 year ago
  Martin Kroeker 50397e017a
Merge pull request #4838 from martin-frbg/fix4662-3 1 year ago
  Martin Kroeker 5257f807a9
fix invalid ifdef syntax in HUGETLB handling 1 year ago