I don't have as many benchmarks for these as for gemm, but it should still make a difference for small matrices.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>