2556 Commits (855945befb1c5855b3739d1200bf89533a82a0d1)

Author SHA1 Message Date
  h-motoki 855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 1 month ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 1 month ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 1 month ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 1 month ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 1 month ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 1 month ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 1 month ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 1 month ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 1 month ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 1 month ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 2 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 2 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 2 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 2 months ago
  abhishek-fujitsu 0bc79da587 add neon header 2 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 2 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 2 months ago
  Martin Kroeker e2d941e9af
Declare the "small" kernel static in addition to inline 2 months ago
  Martin Kroeker 8214700930
Declare the "small" kernel static in addition to inline 2 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 2 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 2 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 2 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 2 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 2 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 2 months ago
  Martin Kroeker ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds 2 months ago
  Martin Kroeker 0e11537cab
Merge pull request #5357 from Mousius/bgemm-init 2 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 2 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 2 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 3 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 3 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 3 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 3 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 3 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 3 months ago
  Martin Kroeker 3318a2b904
override CDOT and ZDOT with the generic C kernel 3 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 3 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 3 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 3 months ago
  Srangrang ec14e1648c fix: resolve non-RISCV host build failed issue 3 months ago
  Martin Kroeker e338d34ce1
fix path 3 months ago
  Martin Kroeker d36093d084
temporarily change default C/ZSCAL to the non-asm implementation 3 months ago
  Martin Kroeker b3c90564d7
resync with the generic arm version for inf/nan handling 3 months ago
  Martin Kroeker 6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296 3 months ago
  Martin Kroeker 73af02b89f
use dummy2 as Inf/NAN handling flag 3 months ago
  Martin Kroeker 549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now 3 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 3 months ago
  Martin Kroeker 7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc 3 months ago
  Martin Kroeker 63287e1855
Merge pull request #5296 from martin-frbg/zscal_riscv 3 months ago
  Martin Kroeker d2855d3dab
Merge pull request #5285 from martin-frbg/zscal_zarch 3 months ago