33 Commits (f54f5bac9e9b0127052a23aaa7c9cfc4170b1f00)

Author SHA1 Message Date
  Zhang Xianyi 886cbaf4e4 Support AMD Piledriver by bulldozer kernels. 12 years ago
  Zhang Xianyi 6e8501c8a1 Fixed #239 bug in param.h about BARCELONA and BULLDOZER. 12 years ago
  wernsaar f67fa62851 added dgemv_n_bulldozer.S 12 years ago
  wernsaar d65bbec99b added new sgemm kernel for BULLDOZER 12 years ago
  wernsaar ba800f0883 correct GEMM_THREAD in param.h 12 years ago
  wernsaar 25491e42f9 New dgemm kernel for BULLDOZER: dgemm_kernel_8x2_bulldozer.S 12 years ago
  wernsaar 731220f870 changed DGEMM_DEFAULT_P and DGEMM_DEFAULT_Q to 248 for BULLDOZER 64bit 12 years ago
  Zhang Xianyi b7c0fa6bd2 Init AMD Bulldozer codebase. 13 years ago
  Sébastien Villemot 01e3c984ce Fix compilation with TARGET=GENERIC 13 years ago
  Sylvestre Ledru 3692b4d631 Improve the detection of sparc 13 years ago
  Xianyi Zhang b39c51195b Fixed the build bug about Sandy Bridge on 32-bit. 13 years ago
  Xianyi Zhang 996dc6d1c8 Fixed dynamic_arch building bug. 13 years ago
  wangqian f76f952547 Refs #83 #53. Adding Intel Sandy Bridge (AVX supported) kernel codes for BLAS level 3 functions. 13 years ago
  Zhang Xianyi d3b67d0bd8 Refs #113. Fixed the typo BOBCATE -> BOBCAT 13 years ago
  Zhang Xianyi d6cab3f37e Refs #113. Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX. 13 years ago
  Xianyi Zhang 19a48b82cf Init Sandybridge codes based on Nehalem. 13 years ago
  traz 7af0139a09 Modify P Q R size of Loongson3b. 13 years ago
  Wang Qian 66904fc4e8 BLAS3 used standard MIPS instructions without extensions on Loongson 3B. 14 years ago
  Wang Qian 8163ab7e55 Change the block size on Loongson 3B. 14 years ago
  Xianyi Zhang b95ad4cfaf Support detecting ICT Loongson-3B CPU. 14 years ago
  traz 831858b883 Modify aligned address of sa and sb to improve the performance of multi-threads. 14 years ago
  traz d238a768ab Use ps instructions in cgemm. 14 years ago
  Xianyi Zhang 4727fe8abf Refs #47. On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads. 14 years ago
  traz 74a3f63489 Tuning mb, kb, nb size to get the best performance. 14 years ago
  traz cb0214787b Modify compile options. 14 years ago
  traz c8360e3ae5 Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops. 14 years ago
  traz e72113f06a Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G. 14 years ago
  traz 1c96d345e2 Improve zgemm performance from 1G to 1.8G, change block size in param.h. 14 years ago
  traz 88d94d0ec8 Fixed #30 strmm computational error on Loongson3A. 14 years ago
  traz ab9e4ce351 Adjust kc size from 112 to 116 . 14 years ago
  traz 1aa9a298e1 Change BLOCK SIZE of LOONGSON3A TARGET. 14 years ago
  Xianyi Zhang 0597c1076f Added the configures of loongson 3a. refs #1 14 years ago
  Xianyi Zhang 342bbc3871 Import GotoBLAS2 1.13 BSD version codes. 14 years ago