This patch changes 32 bytes stores to two 16 bytes stores to fix a recent degradation due to 32 bytes stores.
This patch aligns the stores to 32 byte boundary for saxpy and daxpy before entering into vector pair loop. Fox caxpy, changed the store instructions to stxv to improve performance of unaligned cases.
This patch makes use of new POWER10 vector pair instructions for loads and stores. Tested in simulator and no new failures.