Xianyi Zhang
b95ad4cfaf
Support detecting ICT Loongson-3B CPU.
14 years ago
Xianyi Zhang
3bbe3ddb31
Merge branch 'develop' of github.com:xianyi/OpenBLAS into loongson3b
14 years ago
traz
a32e56500a
Fix the compute error of gemv when incx and incy are negative numbers.
14 years ago
traz
c1e618ea2d
Add complete gemv function on Loongson3a platform.
14 years ago
traits
19f5b5c132
Fixed #66 the bug in zgemv kernel with transpose matrix on 64-bit MingW (Windows).
14 years ago
traits
c852ce3981
Ref #65 . Fixed 64-bit Windows calling convention bug in cdot and zdot.
According to 64-bit Windows calling convention, the return value is in %rax instead of %xmm0 in cdot kernel.
In zdot, the caller allocates a memory space for return value and sets this memory address to the first hidden parameter. Thus, the callee (zdot) should assign the result to this memory space and return the memory address in %rax.
14 years ago
traz
e08cfaf9ca
Complete all the complex single-precision functions of level3, but the performance needs further improve.
14 years ago
traz
ee4bb8bd25
Add ctrmm part in cgemm_kernel_loongson3a_4x2_ps.S.
14 years ago
traz
7fa3d23dd9
Complete cgemm function, but no optimization.
14 years ago
traz
9679dd077e
Fix some compute error.
14 years ago
Zhang Xiianyi
7b410b7f0e
Fixed #58 zdot SEGFAULT bug with GCC-4.6. Thank Mr. John for this patch.
In i386 calling convention, the caller put the address of return value of zdot into the first hidden parameter.
Thus, the callee should delete this address before return.
Actually, I have fixed the same bug on x86/zdot_sse2.S (issue #32 ). However, that is not a good implementation which uses 3 instructions. Mr. John told me used "ret $0x4" to skip the first hidden address (4 bytes).
14 years ago
traz
d238a768ab
Use ps instructions in cgemm.
14 years ago
traits
b1fe26c45a
refs #55 . Changed DTB_ENTRIES to DTB_DEFAULT_ENTRIES in x86 gemv_n kernel codes.
14 years ago
traits
9fc6764fa7
refs #55 . Added DTB_ENTRIES into dynamic arch setting parameters. Now, it can read DTB_ENTRIES on runtime.
14 years ago
traz
74d4cdb81a
Fix an illegal instruction for strmm_RTLU.
14 years ago
traz
7906146836
Fix an error for strmm_LLTN.
14 years ago
traz
3274ff47b8
Fix an error for strmm_LLTN.
14 years ago
traz
a059c553a1
Fix a compute error for strmm.
14 years ago
traz
23e182ca7c
Fix stack-pointer bug for strmm.
14 years ago
traz
a15bc95824
Add strmm part.
14 years ago
traz
09f49fa891
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now.
14 years ago
traz
cb0214787b
Modify compile options.
14 years ago
traz
2e8cdd1542
Using ps instruction.
14 years ago
traz
c8360e3ae5
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
14 years ago
traz
68532fa9ec
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
14 years ago
traz
708d2b6255
Fix compute error in ztrmm.
14 years ago
traz
e72113f06a
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
14 years ago
traz
14f81da375
Change prefetch length of A and B, the performance is 2.1G now.
14 years ago
Xianyi Zhang
fc21f7ad28
Merge branch 'release-v0.1alpha2' into loongson3a
14 years ago
traz
1c96d345e2
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
14 years ago
Xianyi Zhang
c4efde7713
Merge branch 'loongson3a' into release-v0.1alpha2
14 years ago
Xianyi Zhang
32353a9d30
Refs #20 . Fixed the installation bug with DYNAMIC_ARCH=1.
14 years ago
Xianyi Zhang
b3d1887745
Fixed #35 a build bug with NO_LAPACK=1 DYNAMIC_ARCH=1 FC=gfortran. I forgot to test it with gfortran in last bug fixed commit.
14 years ago
Xianyi Zhang
8d50a9fd1a
Fixed #35 a build bug with NO_LAPACK=1 & DYNAMIC_ARCH=1.
14 years ago
Wang Qian
4335bca2f7
Fixed #33 ztrmm bug on Nehalem.
14 years ago
Xianyi
31040e4d80
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.
14 years ago
traz
88d94d0ec8
Fixed #30 strmm computational error on Loongson3A.
14 years ago
traz
fc84909115
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
14 years ago
traz
5ca4e51df0
Remove the useless code, modify code comments and format.
14 years ago
Xianyi Zhang
fcb5ce011b
Fixed #28 . Convert the result to double precision in MIPS64 dsdot_k kernel.
14 years ago
traz
a9320f896e
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
14 years ago
Xianyi Zhang
b206fc7075
Fixed #28 . Convert the result to double precision in the end of dsdot kernel.
14 years ago
traz
29dce62b8f
Finish dtrsm_kernel_Rx.S on Loongson3A.
14 years ago
traz
432c309f63
Finish dtrsm_kernel_Lx.S on Loongson3A.
14 years ago
traz
d2f351d819
Modify dtrsm compiler options
14 years ago
traz
5a991b7149
Fixed #24 drmm error on Loongson3A
14 years ago
traz
9320933520
Completely dtrmm function.
14 years ago
traz
921caefa56
Increased handling trmm part, no edge handling. Test size(M and N) must be a multiple of 4 .
14 years ago
traz
ecd4c1f3d9
Modify prefetching C.
14 years ago
traz
ab9e4ce351
Adjust kc size from 112 to 116 .
14 years ago