I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.