230 Commits (2d0b2334259d41c2003b51a07580dbd25cfe267c)

Author SHA1 Message Date
  Martin Kroeker fc8894dd98
Workaround miscompilation by NVIDIA nvc 2 years ago
  Martin Kroeker 5720fa02c5
Merge pull request #4168 from Mousius/sve-zgemm-cgemm 2 years ago
  Chris Sidebottom 84a268b6ca Use SVE zgemm/cgemm on Arm(R) Neoverse(TM) V1 core 2 years ago
  Chris Sidebottom 730ca04b48 Fix ZHEMM copy for SVE 2 years ago
  Martin Kroeker 849c8806b8
Merge pull request #4161 from Mousius/non-sve-kernels 2 years ago
  Chris Sidebottom 24586bc4ff Disambiguate whilelt 2 years ago
  Chris Sidebottom aea2a4622b Use latest non-SVE kernels in ARMV8SVE 2 years ago
  martin-frbg 7976deff80 Fix file permissions (issue 4095) 2 years ago
  Martin Kroeker 3d31191b0f
Work around Clang failing to disambiguate SVE intrinsics and add AppleClang crossbuild to MacOS/arm64 DYNAMIC_ARCH in AzureCI (#4140) 2 years ago
  Martin Kroeker 72caceb324
Merge pull request #4009 from Mousius/sve-gemm 2 years ago
  Chris Sidebottom ec334e69dc Use SVE kernel for SGEMM/DGEMM on Arm(R) Neoverse(TM) V1 2 years ago
  Martin Kroeker 44164e3a3d
revert "move alpha out of register 18" (out of PR scope, no SVE on Apple hw) 2 years ago
  Martin Kroeker 8be68fa7f4
move declaration of sca to really keep the compiler from throwing it out (for now) 2 years ago
  Martin Kroeker 3727672a74
Improve workaround and keep compilers from optimizing it out 2 years ago
  Martin Kroeker 108a21e47a
Move ALPHA out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker 0b1acb0ba3
Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker c7bbad09ad
Move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker cda29633a3
move ALPHA_I out of register 18 (reserved on OSX) 2 years ago
  Martin Kroeker 09ace3cf23
Merge pull request #3846 from lilh9598/sbgemm_opt 2 years ago
  Chris Sidebottom 1361229291 Remove prefetches from SVE kernels 2 years ago
  lilianhuang 729af6406f bugfix for sbgemm_ncopy_8_neoversen2 2 years ago
  Chris Sidebottom eea006a688 Wrap SVE header with __has_include check 2 years ago
  Chris Sidebottom fd4f52c797 Add SVE implementation for sdot/ddot 2 years ago
  lilianhuang fdac8a97c1 Add sbgemm_ncopy_8 and sbgemm_tcopy_4 2 years ago
  lilianhuang 135718eafc Improve the performance of sbgemm_tcopy on neoversen2 2 years ago
  Chris Sidebottom 4f7b77e08a Remove unnecessary instructions from Advanced SIMD dot 2 years ago
  Martin Kroeker 1688c7da43
change line endings from CRLF to LF 2 years ago
  Honglin Zhu 79066b6bf3 Change file name to match the norm and delete useless code. 2 years ago
  Honglin Zhu b00d5b9746 New sbgemm implementation for Neoverse N2 3 years ago
  Martin Kroeker e12d474780
Eliminate uses of CREAL on left-hand side of assignments 3 years ago
  Martin Kroeker 9e29598575
workaround fault with ssq=inf,scale=0 3 years ago
  Honglin Zhu 123e0dfb62 Neoverse N2 sbgemm: 3 years ago
  Honglin Zhu bc3728475f format code 3 years ago
  Honglin Zhu 55d686d41e neoverse n2 sbgemm: 3 years ago
  Honglin Zhu 04593bb27c neoverse n2 sbgemm: init file 3 years ago
  Nursultan Zarlyk 1bb7993a97 Fix MSVC ARM64 build. Add generic kernel for ARM64 3 years ago
  Martin Kroeker 115bc9b98f
CortexX1 is ARMV8 like A7x 3 years ago
  Martin Kroeker b3b4672c30
Add initial support for Phytium FT2000 series and ARMV9 Cortex 510/710/X1/X2 3 years ago
  Martin Kroeker c1c0d5ce1d
Merge pull request #3492 from binebrank/arm_sve_zgemm 3 years ago
  Bine Brank 19d435b1b3 update armv8sve + contributors 3 years ago
  Bine Brank 0fb6cc07bf fix ztrsm lt/ut copy 3 years ago
  Bine Brank f1315288a8 add sve ztrsm 3 years ago
  Bine Brank aaa2b1a861 fix sve dtrsm kernels 3 years ago
  Bine Brank 8071e179f1 add remaining sve trsm copy kernels 3 years ago
  Bine Brank f87468ac91 trsm_lncopy_sve 3 years ago
  Bine Brank e8939b3d30 sve trsmRN and trsmRT 3 years ago
  Bine Brank 098672b51b add trsm_kernel_LT_sve 3 years ago
  Bine Brank be7e55880c sve trsm_kernel_LN 3 years ago
  Sunita Nadampalli 19c8f615dc OpenBLAS: aarch64: Add neoverse-v1/n2 architecture specifics 3 years ago
  Bine Brank f33543d029 combine zchemm into single file 3 years ago