82 Commits (9a64b32b44c4cb66c4976f0f678d0914d7343a6c)

Author SHA1 Message Date
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 4 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 4 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 4 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 4 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 4 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 4 months ago
  gkdddd 670ec6f757 Added shgemm_kernel_8x8 for RISCV64_ZVL128B and shgemm_kernel_16x8 for RISCV64_ZVL256B 5 months ago
  Martin Kroeker 5141a90993
Fix ARMV9SME target in DYNAMIC_ARCH and add SME query code for MacOS (#5222) 6 months ago
  Vaisakh K V f66ca05b31
Merge branch 'develop' into topic/sgemm_direct_sme1 9 months ago
  Vaisakh K V d23eb3b93e Support for SME1 based sgemm_direct kernel for cblas_sgemm level 3 API 11 months ago
  Martin Kroeker 4924319c50
fix position of srotm, qrotm 10 months ago
  tingbo.liao 3c8df6358f Further rearranged the rotm kernel for the different architectures. 10 months ago
  gxw 48698b2b1d LoongArch64: Rename core 1 year ago
  Mark Ryan 3b715e6162 Add autodetection for riscv64 1 year ago
  Martin Kroeker 93d975d8fd
Merge pull request #4593 from XiWeiGu/loongarch_add_buffer_offset 1 year ago
  gxw d8c4ea8793 loongarch: Optimizing the performance of the GEMM on servers 1 year ago
  Chen Yu 8e39c05efd Get the l2 cache size via environment variable on confidential VM 1 year ago
  Honglin Zhu 90f041e348 Invoke the syscall to allow the use of amx tiles 2 years ago
  Martin Kroeker 437c0bf2b4
Merge pull request #3843 from Mousius/switch-ratio 2 years ago
  Chris Sidebottom 32f2fafde7 Propagate SWITCH_RATIO to DYNAMIC_ARCH builds 3 years ago
  Martin Kroeker 38d6fb4225
Fix dependencies in builds with specified subsets of precision types 2 years ago
  Martin Kroeker 5481c328e8
fix DYNAMIC_ARCH builds that use only a subset of precisions 2 years ago
  Martin Kroeker c9d78dc3b2
Remove excess initializer (leftover from rework of PR 3793) 3 years ago
  Honglin Zhu 4989e039a5 Define SBGEMM_ALIGN_K for DYNAMIC_ARCH build 3 years ago
  Honglin Zhu 843e9fd0b9 Fix typo error 3 years ago
  Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2 3 years ago
  gxw fbfe1daf6e LoongArch64: Add DYNAMIC_ARCH support 3 years ago
  Martin Kroeker 40302558ed
Remove extraneous (and wrong) definition of sbgemm_r on x86_64 3 years ago
  Martin Kroeker d9894f45d3
Define sbgemm_r to fix DYNAMIC_ARCH builds 3 years ago
  Wangyang Guo 3dc6052c7e initial support for Sapphire Rapids platform 4 years ago
  Wangyang Guo 1d83ca4bca Small Matrix: support BFLOAT16 data type 4 years ago
  Wangyang Guo 478d1086c1 Small Matrix: support DYNAMIC_ARCH build 4 years ago
  gxw 4b548857d6 Add msa support for loongson 5 years ago
  Chen, Guobing a7b1f9b1bb Implementation of BF16 based gemv 5 years ago
  Martin Kroeker 10379fc83b
Use ifdef instead of if 5 years ago
  Martin Kroeker 3aecafad80
Change "HALF" and "sh" to "BFLOAT16" and "sb" 5 years ago
  Martin Kroeker 6b6adf8a4a
Allow compiling only a subset of kernels for specific variable types 5 years ago
  Martin Kroeker dfbc62ef7e
Support building only a subset of types 5 years ago
  Chen, Guobing deaeb6c5b8 Add bfloat16 based dot and conversion with single/double 5 years ago
  Martin Kroeker 9ee21a0a39
Merge pull request #2780 from Guobing-Chen/CPL_build_support 5 years ago
  Martin Kroeker 75eeb265d7
[WIP] Refactor the driver code for direct SGEMM (#2782) 5 years ago
  Chen, Guobing e740c4873d Enable COOPERLAKE build target 5 years ago
  Martin Kroeker 5dd14e3d48
Make building the bfloat16 functions conditional on option BUILD_HALF (#2590) 5 years ago
  Rajalakshmi Srinivasaraghavan 67cc4b9e16 Fix warnings in clang and export symbol 5 years ago
  Rajalakshmi Srinivasaraghavan a87793e03c Fix DYNAMIC_ARCH compilation errors 5 years ago
  Rajalakshmi Srinivasaraghavan 7eb55504b1 RFC : Add half precision gemm for bfloat16 in OpenBLAS 5 years ago
  int_13h 96ad579428 add in runtime cpu detection for zarch (#2349) 5 years ago
  Martin Kroeker ccfb7ead15
Merge pull request #2072 from martin-frbg/sum 6 years ago
  Rashmica Gupta bcdf1d4917 Add in runtime CPU detection for POWER. 6 years ago
  Martin Kroeker b9f4943a14
Add ?sum 6 years ago