2558 Commits (06c09deee94e4d03ab814d576da95fb047acbdda)

Author SHA1 Message Date
  Martin Kroeker 06c09deee9
Merge pull request #5426 from hideaki-motoki/issue5417_axpy_sve 2 months ago
  yuanjia c2cc7a3602 riscv64: optimize gemv_t_vector.c 3 months ago
  h-motoki 855945befb Implementing SVE in [SD]AXPY Kernels for A64FX and Graviton3E 3 months ago
  Martin Kroeker 9d6df1dd3e
Merge pull request #5422 from ChipKerchner/addRVVVectorizedPacking 3 months ago
  Martin Kroeker f3b2a15fad
Merge pull request #5420 from yuanjia111/develop 3 months ago
  Chip Kerchner 64401b4417 Disable vectorized packing for DGEMM - since it is slower than scalar. 3 months ago
  Chip Kerchner c00afc86a6 Add and use vectorized packing to ZVL128B and ZVL256B. Up to 3x+ faster than generic scalar functions. 3 months ago
  yuanjia 803e8d4838 Move the value assignment of vector x in gemv_n_sve.c to the outermost loop to reduce the repeated data retrieval. 3 months ago
  Chris Sidebottom 5f47b872f1 Remove older kernels for BGEMM on NEOVERSEV1 3 months ago
  Chris Sidebottom 114316f361 Optimize SBGEMM / BGEMM for NEOVERSEV1 further 3 months ago
  Martin Kroeker f1ee61ea30
Include NEON header for the bfloat conversion functions 3 months ago
  Martin Kroeker b3ffd5524a
Include NEON header for the bfloat conversion functions 3 months ago
  Martin Kroeker 0968dddf1a
Merge pull request #5409 from martin-frbg/issue5372 3 months ago
  Martin Kroeker a3b9c933c5
mark xbuffer as volatile to work around gcc15.1 optimizer bug 3 months ago
  Chip Kerchner 72f082f31d Fix bad vector zero initializer and other compiler warnings for RISC-V. 3 months ago
  Martin Kroeker a5e7c0e3e0
Merge pull request #5396 from abhishek-iitmadras/abhishekk_bfloat16 3 months ago
  abhishek-fujitsu 0bc79da587 add neon header 3 months ago
  Chris Sidebottom ea2faf0c9a Add optimized BGEMM for NEOVERSEN2 target 4 months ago
  Chris Sidebottom 2c3cdaf74e Optimized BGEMV for NEOVERSEV1 target 4 months ago
  Martin Kroeker e2d941e9af
Declare the "small" kernel static in addition to inline 4 months ago
  Martin Kroeker 8214700930
Declare the "small" kernel static in addition to inline 4 months ago
  Martin Kroeker 39c90f9859
Merge pull request #5380 from quic/topic/sgemm_direct_sme1_alpha_beta 4 months ago
  Rajendra Prasad Matcha eae0abfdb6 SME1 based direct kernel with alpha and beta for cblas_sgemm level 3 API. 4 months ago
  Chris Sidebottom 947d7af4c9 Fix CMake references to bscal and bgemv 4 months ago
  Chris Sidebottom e105411460 Add infrastructure for bgemv/bscal 4 months ago
  Chris Sidebottom 740efd71c4 Add optimized BGEMM kernel for NEOVERSEV1 target 4 months ago
  Martin Kroeker 343830c26f
Add BGEMM parameter tables 4 months ago
  Martin Kroeker ff614575c9
Fix arm64 HAVE_SME setting for DYNAMIC_ARCH builds 4 months ago
  Martin Kroeker 0e11537cab
Merge pull request #5357 from Mousius/bgemm-init 4 months ago
  Chris Sidebottom 66d9185ebe Fix CMake support 4 months ago
  Martin Kroeker fd37406817
Merge branch 'develop' into optimized_gemv_n_1x3 4 months ago
  Chris Sidebottom f95e7b0e32 Add infrastructure for BGEMM 4 months ago
  Iha, Taisei f7ad906b49 Performance improvements of [SD]DOT with loop-unrolling on A64FX 4 months ago
  Martin Kroeker d96daa220d
Merge pull request #5290 from Srangrang/develop 4 months ago
  Martin Kroeker ee26caffb3
Merge pull request #5309 from davidz-ampere/dev-ampereone 4 months ago
  davidz-ampere aa90ab4142 Add support for Ampere AmpereOne processors 4 months ago
  Ian McInerney badef1d32e Update sbgemm_tcopy_4_neoversev1 kernel to use standard C types 5 months ago
  Martin Kroeker 3318a2b904
override CDOT and ZDOT with the generic C kernel 5 months ago
  davidz-ampere 84730068af reduce duplicate kernel code 5 months ago
  davidz-ampere be68ef03b4 Add support for Ampere processors 5 months ago
  Srangrang 9f13b2c6ac style: modify HALF to BFLOAT16 in benchmark folder 5 months ago
  Srangrang ec14e1648c fix: resolve non-RISCV host build failed issue 5 months ago
  Martin Kroeker e338d34ce1
fix path 5 months ago
  Martin Kroeker d36093d084
temporarily change default C/ZSCAL to the non-asm implementation 5 months ago
  Martin Kroeker b3c90564d7
resync with the generic arm version for inf/nan handling 5 months ago
  Martin Kroeker 6bdc7f9eb7
Merge pull request #5300 from martin-frbg/fixup5296 5 months ago
  Martin Kroeker 73af02b89f
use dummy2 as Inf/NAN handling flag 5 months ago
  Martin Kroeker 549a9f1dbb
Disable the default SSE kernels for CSCAL/ZSCAL for now 5 months ago
  Martin Kroeker 58eeb9041c
fix handling of dummy2 5 months ago
  Martin Kroeker 7c77537b25
Merge pull request #5297 from martin-frbg/zscal_x86_sparc 5 months ago