Martin Kroeker
1eb43cccad
Merge pull request #1317 from martin-frbg/power8-asm
Save and restore VSX registers
8 years ago
Martin Kroeker
9d92f526dd
Comment out a code block that performs out-of-bounds memory accesses
...and does not appear to be needed even when it stays within the bounds of the array
8 years ago
Martin Kroeker
514d237257
Merge pull request #1279 from xsacha/develop
CMake improvements
8 years ago
Martin Kroeker
f96afd94b0
Fix out-of-bounds accesses where the data should be zero anyway
8 years ago
Martin Kroeker
9c017a2218
Save and restore VSX registers
8 years ago
Shivraj Patil
e3d844b062
Added mips I6500 core
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
8 years ago
Martin Kroeker
46c9357c72
Merge pull request #1288 from quickwritereader/develop
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision). Issue 884
8 years ago
Abdurrauf
1cfdb2295d
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
8 years ago
Sacha Refshauge
47ebce4d1a
Clean up, fix old typos. Simplify arch usages. Move system arch check to earlier position.
8 years ago
Sacha Refshauge
69b560751c
Improvements to previous commit (cross-compile).
Fix typos and bad if statements discovered in 0.2.20.
8 years ago
Sacha Refshauge
11911fd941
Add kernel/Makefile.LA to CMake
8 years ago
Isuru Fernando
d3b677fe87
Add commonobjs
8 years ago
Isuru Fernando
505b218829
Merge remote-tracking branch 'upstream/develop' into dyn
8 years ago
Isuru Fernando
d9346930dd
Merge remote-tracking branch 'upstream/develop' into develop
8 years ago
Ashwin Sekhar T K
4899d67f7d
THUDNERX2T99: Fix clang compilation
8 years ago
Isuru Fernando
1d1854032b
Add missing EXCAVATOR
8 years ago
Isuru Fernando
2c51a990ac
Fix extra whitespaces. CMake parser macro fails with it
TODO: Fix the parser macro to strip trailing whitespaces
8 years ago
Isuru Fernando
7892434572
Add hemm3m and symm3m objects
8 years ago
Isuru Fernando
d798487213
Fixes for dynamic_arch. almost there
8 years ago
Isuru Fernando
251715d9ef
configure kernel_core.h
8 years ago
Isuru Fernando
50deeb49b7
configure setparam
8 years ago
Isuru Fernando
4260215adf
Support DYNAMIC_ARCH with cmake
8 years ago
Isuru Fernando
d245caa49a
Support out-of-source build
8 years ago
Isuru Fernando
ca17b4b75c
Fix complex support for MSVC headers
8 years ago
Isuru Fernando
dc24914415
check compiler is msvc instead of msvc
8 years ago
Zhang Xianyi
d5ef0dee9a
Merge pull request #1226 from ashwinyes/develop_arm_clang_ual_fix
arm: Fix clang compilation for ARMv7
8 years ago
Zhang Xianyi
4239dd65ce
Merge branch 'develop' into develop_arm_softfp
8 years ago
Ashwin Sekhar T K
f02d535fde
arm: Fix clang compilation for ARMv7
clang is not recognizing some pre-UAL VFP mnemonics like fnmacs, fnmacd,
fnmuls and fnmuld. Replaced them with equivalent UAL mnemonics which are
vmls.f32, vmls.f64, vnmul.f32 and vnmul.f64 respectively.
8 years ago
Zhang Xianyi
a6515bb858
Merge pull request #1218 from m-brow/power9
Optimise loads on Power9 LE
8 years ago
Ashwin Sekhar T K
37efb5bc1d
arm: Remove unnecessary files/code
Since softfp code has been added to all required vfp kernels,
the code for auto detection of abi is no longer required.
The option to force softfp ABI on make command line by giving
ARM_SOFTFP_ABI=1 is retained. But there is no need to give this option
anymore.
Also the newly added C versions of 4x4/4x2 gemm/trmm kernels are removed.
These are longer required. Moreover these kernels has bugs.
8 years ago
Ashwin Sekhar T K
97d671eb61
arm: add softfp support in zgemm/ztrmm vfp kernels
8 years ago
Ashwin Sekhar T K
305cd2e8b4
arm: add softfp support in cgemm/ctrmm vfp kernels
8 years ago
Ashwin Sekhar T K
09bc6ebe5b
arm: add softfp support in dgemm/dtrmm vfp kernels
8 years ago
Ashwin Sekhar T K
872a11a2bf
arm: add softfp support in sgemm/strmm vfp kernels
8 years ago
Ashwin Sekhar T K
eda9e8632a
generic: Bug fixes in generic 4x2 and 4x4 gemm kernels
8 years ago
Ashwin Sekhar T K
8f83d3f961
arm: add softfp support in vfp gemv kernels
8 years ago
Ashwin Sekhar T K
83bd547517
arm: add softfp support in kernel/arm/swap_vfp.S
8 years ago
Ashwin Sekhar T K
e25f4c01d6
arm: add softfp support in kernel/arm/nrm2_vfp*.S
8 years ago
Ashwin Sekhar T K
54915ce343
arm: add softfp support in kernel/arm/*dot_vfp.S
8 years ago
Ashwin Sekhar T K
0150fabdb6
arm: add softfp support in kernel/arm/rot_vfp.S
8 years ago
Ashwin Sekhar T K
4f0773f07d
arm: add softfp support in kernel/arm/axpy_vfp.S
8 years ago
Ashwin Sekhar T K
aa5edebc80
arm: add softfp support in kernel/arm/asum_vfp.S
8 years ago
Ashwin Sekhar T K
89924b3d5b
arm: Use assembly implementations based on the ARM abi
In case of softfp abi, assembly implementations of only those APIs are
used which doesnt have a floating point argument or return value.
In case of hard abi, all assembly implementations are used.
8 years ago
Ashwin Sekhar T K
da7f0ff425
generic: add some generic gemm and trmm kernels
Added generic 4x4 and 4x2 gemm kernels
Added generic 4x2 trmm kernel
8 years ago
Zhang Xianyi
482015f8d6
Merge branch 'arm_soft_fp_abi' into develop
8 years ago
Matt Brown
bd831a03a8
Optimise sscal for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
edc97918f8
Optimise srot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
e0034de22d
Optimise sdot for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
32c7fe6bff
Optimise sasum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago
Matt Brown
19bdf9d52b
Optimise casum for POWER9
Use lxvd2x instruction instead of lxvw4x.
lxvd2x performs far better on the new POWER architecture than lxvw4x.
8 years ago