Browse Source

s390x/SGEMM: adjust default P and Q to multiples of M

We recently changed the register blocking for SGEMM on s390x to 16x4.
However, we did not adjust Q to a multiple of 16 and thus fell back to
the 8x4 kernel at each block's margin, without need. Adjust P and Q to
multiples of 16 to employ the faster 16x4 kernel for complete full-sized
blocks.

Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
tags/v0.3.11^2
Marius Hillenbrand 5 years ago
parent
commit
e115c97e05
1 changed files with 2 additions and 2 deletions
  1. +2
    -2
      param.h

+ 2
- 2
param.h View File

@@ -3092,12 +3092,12 @@ is a big desktop or server with abundant cache rather than a phone or embedded d
#define ZGEMM_DEFAULT_UNROLL_M 4 #define ZGEMM_DEFAULT_UNROLL_M 4
#define ZGEMM_DEFAULT_UNROLL_N 4 #define ZGEMM_DEFAULT_UNROLL_N 4


#define SGEMM_DEFAULT_P 456
#define SGEMM_DEFAULT_P 480
#define DGEMM_DEFAULT_P 320 #define DGEMM_DEFAULT_P 320
#define CGEMM_DEFAULT_P 480 #define CGEMM_DEFAULT_P 480
#define ZGEMM_DEFAULT_P 224 #define ZGEMM_DEFAULT_P 224


#define SGEMM_DEFAULT_Q 488
#define SGEMM_DEFAULT_Q 512
#define DGEMM_DEFAULT_Q 384 #define DGEMM_DEFAULT_Q 384
#define CGEMM_DEFAULT_Q 128 #define CGEMM_DEFAULT_Q 128
#define ZGEMM_DEFAULT_Q 352 #define ZGEMM_DEFAULT_Q 352


Loading…
Cancel
Save