799 Commits (develop)

Author SHA1 Message Date
  Masato Nakagawa 7e29f11396 Multi-thread GEMM Performance Improvement on NeoverseV1 (DIVIDE_RATE=1) 3 months ago
  youcai 41f9701ebc Fix cmake building with cblas_bgemm 3 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 4 months ago
  Martin Kroeker b37516add6
Add BGEMM parameters 4 months ago
  Chris Sidebottom 48394384ef Use correct constants for per-target BGEMM/SBGEMM 4 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 4 months ago
  Martin Kroeker 3d31887073
Merge pull request #5362 from Mousius/fix-bf16 4 months ago
  Martin Kroeker 0ddf8ebd42
Merge pull request #5354 from pratiklp00/p11 4 months ago
  Chris Sidebottom 7a97c4ca97 Rename HALF -> BFLOAT16 in some more places 4 months ago
  Masato Nakagawa 5253c8f165 Multi-thread Performance Improvement of GEMM with DIVIDE_RATE=1 for 4 months ago
  Martin Kroeker 8f0a1a3f82
Merge pull request #5303 from martin-frbg/issue5289 4 months ago
  Martin Kroeker 9bcffbd655
Declare the server_lock mutex volatile in addition to static 4 months ago
  pratiklp00 1dde4a13c0 p11 changes 4 months ago
  zhoupeng 134b21ae60 Fix some hyperthreading errors. 5 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 4 months ago
  Martin Kroeker e541bf68f5
support AmpereOne/OneA as NeoverseN1 5 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 5 months ago
  Martin Kroeker 31ef2cbbb3
Exit if memory allocation keeps failing, instead of looping forever 5 months ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 5 months ago
  Martin Kroeker 20f2ba0141
Move declaration of i for pre-C99 compilers 6 months ago
  Masato Nakagawa 2351a98005 Update 2D thread-partitioned GEMM for M << N case. 6 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 6 months ago
  Ruiyang Wu 02fd1df10b CMake: Pass `OpenMP` compiler and linker flags through CMake targets 8 months ago
  Masato Nakagawa 80d3c2ad95 Add Improving Load Imbalance in Thread-Parallel GEMM 8 months ago
  Martin Kroeker 39eb43d441
Improve thread safety of pthreads builds that rely on C11 atomic operations for locking (#5170) 8 months ago
  Martin Kroeker 1533fe49be
Merge pull request #5144 from taoye9/dispatch_neoversve2_to_neoversven2 8 months ago
  Ye Tao f0bea79a6e dispatch NEOVERSEV2 to NEOVERSEN2 under dynamic setting 9 months ago
  Martin Kroeker eb84aac7ad
Merge pull request #5084 from quic/topic/sgemm_direct_sme1 9 months ago
  Martin Kroeker 77c638db67
Revert "Fix potential inaccuracy in multithreaded level3 related to SWITCH_RATIO" 9 months ago
  Vaisakh K V f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1 9 months ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 11 months ago
  John Hein 6cd9bbe531 fix signedness of pointer to integer type passed to blas_lock() 9 months ago
  Martin Kroeker a182251284
fix typo 10 months ago
  Martin Kroeker ed95791618
fix conflicting variables 10 months ago
  Martin Kroeker 3c3d1c4849
Identify all cores and select the most performant one as TARGET 10 months ago
  Ralf Gommers 765ad8bcd2 Fix guard around `alloc_hugetlb`, fixes compile warning 11 months ago
  Ralf Gommers 48caf2303d Fix build warning about discarding volatile qualifier in memory.c 11 months ago
  Martin Kroeker 4060dd43e3
Add dummy implementations of openblas_get/set_affinity 1 year ago
  Martin Kroeker 8a1710dd0d
don't apply switch_ratio to tail of loop 1 year ago
  Martin Kroeker de421b7764
Merge pull request #4904 from XiWeiGu/la64_cross_cmake 1 year ago
  gxw 30af9278dc LoongArch64: Enable cmake cross-compilation 1 year ago
  gxw 48698b2b1d LoongArch64: Rename core 1 year ago
  Martin Kroeker 3ee9e9d8d0
Merge pull request #4879 from martin-frbg/issue4868-2 1 year ago
  Martin Kroeker a8d6b0219a
Merge pull request #4877 from XiWeiGu/fixed_undefined_blas_set_parameter 1 year ago
  Martin Kroeker d24b3cf393
properly fix buffer allocation and assignment 1 year ago
  gxw fd033467ac Fixed the undefined reference to blas_set_parameter 1 year ago
  Martin Kroeker 23b5d66a86
Ensure a memory buffer has been allocated for each thread before invoking it 1 year ago
  Martin Kroeker 753c7ebe17
Merge pull request #4835 from martin-frbg/revertwin4359 1 year ago
  Martin Kroeker 50397e017a
Merge pull request #4838 from martin-frbg/fix4662-3 1 year ago
  Martin Kroeker 5257f807a9
fix invalid ifdef syntax in HUGETLB handling 1 year ago