Zhang Xianyi
|
2fb02626da
|
Update organization info.
|
11 years ago |
Timothy Gu
|
6c2ead30f0
|
Remove all trailing whitespace except lapack-netlib
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
|
11 years ago |
Wang Qian
|
8e53b57bb2
|
Appending gemmkernel and trmmkernel C code in kernel/generic, this code can be used to execute on a new platform which dose not have optimized assemble kernel.
|
13 years ago |
Wang Qian
|
66904fc4e8
|
BLAS3 used standard MIPS instructions without extensions on Loongson 3B.
|
14 years ago |
Xianyi Zhang
|
0884f6b78d
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3b
|
14 years ago |
traz
|
2d78fb05c8
|
Add conjugate condition to gemv.
|
14 years ago |
Xianyi Zhang
|
b95ad4cfaf
|
Support detecting ICT Loongson-3B CPU.
|
14 years ago |
traz
|
a32e56500a
|
Fix the compute error of gemv when incx and incy are negative numbers.
|
14 years ago |
traz
|
c1e618ea2d
|
Add complete gemv function on Loongson3a platform.
|
14 years ago |
traz
|
e08cfaf9ca
|
Complete all the complex single-precision functions of level3, but the performance needs further improve.
|
14 years ago |
traz
|
ee4bb8bd25
|
Add ctrmm part in cgemm_kernel_loongson3a_4x2_ps.S.
|
14 years ago |
traz
|
7fa3d23dd9
|
Complete cgemm function, but no optimization.
|
14 years ago |
traz
|
9679dd077e
|
Fix some compute error.
|
14 years ago |
traz
|
d238a768ab
|
Use ps instructions in cgemm.
|
14 years ago |
traz
|
74d4cdb81a
|
Fix an illegal instruction for strmm_RTLU.
|
14 years ago |
traz
|
7906146836
|
Fix an error for strmm_LLTN.
|
14 years ago |
traz
|
3274ff47b8
|
Fix an error for strmm_LLTN.
|
14 years ago |
traz
|
a059c553a1
|
Fix a compute error for strmm.
|
14 years ago |
traz
|
23e182ca7c
|
Fix stack-pointer bug for strmm.
|
14 years ago |
traz
|
a15bc95824
|
Add strmm part.
|
14 years ago |
traz
|
09f49fa891
|
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now.
|
14 years ago |
traz
|
cb0214787b
|
Modify compile options.
|
14 years ago |
traz
|
2e8cdd1542
|
Using ps instruction.
|
14 years ago |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
14 years ago |
traz
|
68532fa9ec
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
|
14 years ago |
traz
|
708d2b6255
|
Fix compute error in ztrmm.
|
14 years ago |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
14 years ago |
traz
|
14f81da375
|
Change prefetch length of A and B, the performance is 2.1G now.
|
14 years ago |
Xianyi Zhang
|
fc21f7ad28
|
Merge branch 'release-v0.1alpha2' into loongson3a
|
14 years ago |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
14 years ago |
Xianyi Zhang
|
c4efde7713
|
Merge branch 'loongson3a' into release-v0.1alpha2
|
14 years ago |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
14 years ago |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
14 years ago |
traz
|
5ca4e51df0
|
Remove the useless code, modify code comments and format.
|
14 years ago |
Xianyi Zhang
|
fcb5ce011b
|
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel.
|
14 years ago |
traz
|
a9320f896e
|
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
|
14 years ago |
traz
|
29dce62b8f
|
Finish dtrsm_kernel_Rx.S on Loongson3A.
|
14 years ago |
traz
|
432c309f63
|
Finish dtrsm_kernel_Lx.S on Loongson3A.
|
14 years ago |
traz
|
d2f351d819
|
Modify dtrsm compiler options
|
14 years ago |
traz
|
5a991b7149
|
Fixed #24 drmm error on Loongson3A
|
14 years ago |
traz
|
9320933520
|
Completely dtrmm function.
|
14 years ago |
traz
|
921caefa56
|
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .
|
14 years ago |
traz
|
ecd4c1f3d9
|
Modify prefetching C.
|
14 years ago |
traz
|
ab9e4ce351
|
Adjust kc size from 112 to 116 .
|
14 years ago |
traz
|
782205a693
|
Add dgemm compiler Options in KERNEL.LOONGSON3A.
|
14 years ago |
traz
|
ac494c0d04
|
New kernel in LOONGSON3A.
|
14 years ago |
Xianyi Zhang
|
f405b5bcc5
|
Fixed the bug about Loongson3A gsLQC1 & gsSQC1 instructions in daxpy kernel. Now daxpy is correct.
|
14 years ago |
Wang Qian
|
d5cffd506a
|
Modified the default kernel makefile in MIPS64 arch.
|
14 years ago |
Xianyi Zhang
|
5838f12995
|
Support unalign address in daxpy on loongson3a simd..
|
14 years ago |
Xianyi Zhang
|
5444a3f8f7
|
Unroll to 16 in daxpy on loongson3a.
|
14 years ago |