Author | SHA1 | Message | Date |
---|---|---|---|
|
0826d68f93 |
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16, changing the format in copy/packing code. This avoids permute instructions in the gemm kernel inner loop. |
5 years ago |