Ashwin Sekhar T K
d50abc8903
ARM64: Move parameters from parameter.c to param.h
Remove the runtime setting of P, Q, R parameters for
targets ARMV8, THUNDERX2T99. Instead set them as constants
in param.h at compile time.
7 years ago
Ashwin Sekhar T K
21f46a1cf2
ARM64: Use THUNDERX2T99 Neon Kernels for ARMV8
Currently the generic ARMV8 target uses C implementations
for many routines. Replace these with the neon implementations
written for THUNDERX2T99 target which are upto 6x faster for
certain routines.
7 years ago
Arjan van de Ven
99c7bba8e4
Initial support for SkylakeX / AVX512
This patch adds the basic infrastructure for adding the SkylakeX (Intel Skylake server)
target. The SkylakeX target will use the AVX512 (AVX512VL level) instruction set,
which brings 2 basic things:
1) 512 bit wide SIMD (2x width of AVX2)
2) 32 SIMD registers (2x the number on AVX2)
This initial patch only contains a trivial transofrmation of the Haswell SGEMM kernel
to AVX512VL; more will follow later but this patch aims to get the infrastructure
in place for this "later".
Full performance tuning has not been done yet; with more registers and wider SIMD
it's in theory possible to retune the kernels but even without that there's an
interesting enough performance increase (30-40% range) with just this change.
7 years ago
Denis Steckelmacher
c9ff735da6
Add ZEN support (tested for auto-detected static backend)
8 years ago
Ashwin Sekhar T K
a86474c6f7
THUNDERX2T99: Performance fix for ZGEMM
8 years ago
Ashwin Sekhar T K
19ba133383
THUNDERX2T99: Add Optimized ZGEMM Implementation
8 years ago
Ashwin Sekhar T K
2757b49767
THUNDERX2T99: Add Optimized CGEMM Implementation
8 years ago
Ashwin Sekhar T K
f279ff4789
THUNDERX2T99: Add Optimized SGEMM Implementation
8 years ago
Zhang Xianyi
0863a0d4b4
Merge pull request #1061 from ashwinyes/develop_aarch64_vulcan_thunderx_patch
Add new targets for ARM64
8 years ago
Werner Saar
c1c5a63d3c
prepared parameter.c for UNROLL values, that are not a power of two
8 years ago
Ashwin Sekhar T K
4b55fae337
ARM64: Add Cavium THUNDERX2T99 Target
8 years ago
Ashwin Sekhar T K
0b8e876d89
VULCAN: Add optimized DGEMM implementation
9 years ago
Ashwin Sekhar T K
4713e7c47f
ARM64: Add the VULCAN Target
9 years ago
Werner Saar
78b05f6476
bugfix for EXCAVATOR and DYNAMIC_ARCH
9 years ago
Zhang Xianyi
05196a8497
Refs #716 . Only call getenv at init function.
9 years ago
Werner Saar
4319769b79
added target processor STEAMROLLER
11 years ago
wernsaar
a64fe9bcc9
added optimized sgemv_n kernel for sandybridge
11 years ago
wernsaar
2021d0f9d6
experimentally removed expensive function calls
11 years ago
wernsaar
50e99a52ea
added definitions for PILEDRIVER and HASWELL
11 years ago
Zhang Xianyi
7a8949e0ce
Merge branch 'develop' of https://github.com/TimothyGu/OpenBLAS into TimothyGu-develop
Conflicts:
driver/others/memory.c
11 years ago
Timothy Gu
6c2ead30f0
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
11 years ago
Jameson Nash
f41f03ab83
fix #394 . this cleans up some handles after using them, and doesn't disable ALL process privileges upon success
11 years ago
Zhang Xianyi
bfaaa975e6
Added BULLDOZER target. So far it uses barcelona kernels.
13 years ago
Zhang Xianyi
d3b67d0bd8
Refs #113 . Fixed the typo BOBCATE -> BOBCAT
13 years ago
Zhang Xianyi
d6cab3f37e
Refs #113 . Support AMD Bobcate using Barcelona kernel codes. Replace 3DNow! with MMX.
13 years ago
Xianyi Zhang
19a48b82cf
Init Sandybridge codes based on Nehalem.
13 years ago
Wang Qian
8163ab7e55
Change the block size on Loongson 3B.
14 years ago
Xianyi Zhang
b95ad4cfaf
Support detecting ICT Loongson-3B CPU.
14 years ago
traz
831858b883
Modify aligned address of sa and sb to improve the performance of multi-threads.
14 years ago
Xianyi Zhang
16fc083322
Refs #47 . Fixed the seting parameter bug on Loongson 3A single thread version.
14 years ago
Xianyi Zhang
4727fe8abf
Refs #47 . On Loongson 3A, set DGEMM_R parameter depending on different number of threads. It would improve double precision BLAS3 on multi-threads.
14 years ago
Xianyi Zhang
342bbc3871
Import GotoBLAS2 1.13 BSD version codes.
15 years ago