traz
|
23e182ca7c
|
Fix stack-pointer bug for strmm.
|
14 years ago |
traz
|
a15bc95824
|
Add strmm part.
|
14 years ago |
traz
|
74a3f63489
|
Tuning mb, kb, nb size to get the best performance.
|
14 years ago |
traz
|
09f49fa891
|
Using PS instructions to improve the performance of sgemm and it is 4.2Gflops now.
|
14 years ago |
traz
|
cb0214787b
|
Modify compile options.
|
14 years ago |
traz
|
2e8cdd1542
|
Using ps instruction.
|
14 years ago |
traz
|
b29d327d14
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
|
14 years ago |
traz
|
c8360e3ae5
|
Complete all the plura single precision functions of level3 on Loongson3a, the performance is 2.3GFlops.
|
14 years ago |
Xianyi Zhang
|
33313b0221
|
Merge branch 'develop' into loongson3a
|
14 years ago |
traits
|
a5300420e2
|
Merge branch 'hotfix-0.1alpha2.1' into develop
|
14 years ago |
traits
|
c06b7be32f
|
Refs #42. Output the error message when detecting fortran compiler failed.
|
14 years ago |
traz
|
68532fa9ec
|
Merge branch 'loongson3a' of github.com:xianyi/OpenBLAS into loongson3a
|
14 years ago |
traz
|
708d2b6255
|
Fix compute error in ztrmm.
|
14 years ago |
traz
|
e72113f06a
|
Add ztrmm and ztrsm part on loongson3a. The average performance is 2.2G.
|
14 years ago |
traz
|
14f81da375
|
Change prefetch length of A and B, the performance is 2.1G now.
|
14 years ago |
Xianyi Zhang
|
fc21f7ad28
|
Merge branch 'release-v0.1alpha2' into loongson3a
|
14 years ago |
Xianyi Zhang
|
ca8bf5abb0
|
Merge branch 'release-v0.1alpha2' into develop
|
14 years ago |
traits
|
4a73f5c5ea
|
Merge branch 'release-v0.1alpha2'
|
14 years ago |
traits
|
6a0762949d
|
Fixed #38. Released v0.1 alpha2.
|
14 years ago |
traits
|
859b71645a
|
Refs #37. Updated REAME about the compatible issue with EKOPath compiler.
|
14 years ago |
Xianyi Zhang
|
078bfd0b4f
|
Refs #39. Moved the shared lib (dll) to top directory in MingW64 compiler environment.
|
14 years ago |
traz
|
1c96d345e2
|
Improve zgemm performance from 1G to 1.8G, change block size in param.h.
|
14 years ago |
Xianyi Zhang
|
82f5274828
|
Refs #39. It's unnecessary to include sys/mman.h file in blas_server_omp.c.
|
14 years ago |
Xianyi Zhang
|
e568df0dae
|
Refs #38. Prepare the docs with v0.1alpha2.
|
14 years ago |
Xianyi Zhang
|
c4efde7713
|
Merge branch 'loongson3a' into release-v0.1alpha2
|
14 years ago |
Xianyi Zhang
|
7a1e6202e1
|
Merge branch 'add_install_target' into develop
|
14 years ago |
Xianyi Zhang
|
32353a9d30
|
Refs #20. Fixed the installation bug with DYNAMIC_ARCH=1.
|
14 years ago |
Xianyi Zhang
|
2e6e9272fe
|
Merge branch 'add_install_target' into develop
Conflicts:
Changelog.txt
|
14 years ago |
Xianyi Zhang
|
d978436c4b
|
Refs #20. Updated the docs.
|
14 years ago |
Xianyi Zhang
|
fab36f1adb
|
Fixed #20. Added install target in makefile. You can use "make install PREFIX=your_installation_directory".
|
14 years ago |
Xianyi Zhang
|
7945919f22
|
Updated gitignore file.
|
14 years ago |
Xianyi Zhang
|
c642b61d4d
|
Merge branch 'master' of github.com:xianyi/OpenBLAS into develop
|
14 years ago |
Xianyi Zhang
|
aeed8d6225
|
Fixed #27. Temporarily walk around axpy's low performance issue with small imput size & multithreads.
|
14 years ago |
Xianyi Zhang
|
1a4181afd0
|
Merge pull request #36 from pipping/master
Fixed the bug about USE_OPENMP=0 enabling OpenMP
|
14 years ago |
Elias Pipping
|
49742cb2d3
|
Make USE_OPENMP=0 disable openmp
|
14 years ago |
Xianyi Zhang
|
b3d1887745
|
Fixed #35 a build bug with NO_LAPACK=1 DYNAMIC_ARCH=1 FC=gfortran. I forgot to test it with gfortran in last bug fixed commit.
|
14 years ago |
Xianyi Zhang
|
8d50a9fd1a
|
Fixed #35 a build bug with NO_LAPACK=1 & DYNAMIC_ARCH=1.
|
14 years ago |
Xianyi Zhang
|
1496383224
|
Print the wall time (cycles) with enabling FUNCTION_PROFILE.
|
14 years ago |
Wang Qian
|
4335bca2f7
|
Fixed #33 ztrmm bug on Nehalem.
|
14 years ago |
Xianyi
|
31040e4d80
|
Fixed #32 a SEGFAULT bug with gcc-4.6. According to i386 calling convention, The called funtion should remove the hidden return value address from the stack.
|
14 years ago |
Xianyi Zhang
|
3d7e62eb8b
|
Fixed #31 Shared library placement on Mac. Thank Mr.Viral B. Shah for this patch.
|
14 years ago |
traz
|
88d94d0ec8
|
Fixed #30 strmm computational error on Loongson3A.
|
14 years ago |
Xianyi Zhang
|
af40551c9f
|
Fixed the makefile bug about openblas_set_num_threads.
|
14 years ago |
Xianyi Zhang
|
c30c22a76c
|
Fixed a bug about detecting underscore prefix in c_check.
|
14 years ago |
Xianyi Zhang
|
cc09e6ef3a
|
Ingnore *.obj files in git.
|
14 years ago |
traz
|
fc84909115
|
Modify single precision compiler conditions, increasing single precision kernel code on Loongson3a.
|
14 years ago |
traz
|
5ca4e51df0
|
Remove the useless code, modify code comments and format.
|
14 years ago |
Xianyi Zhang
|
fcb5ce011b
|
Fixed #28. Convert the result to double precision in MIPS64 dsdot_k kernel.
|
14 years ago |
traz
|
a9320f896e
|
Fixed #25 dtrmm and dtrsm computational error on Loongson3A.
|
14 years ago |
Xianyi Zhang
|
830a823be1
|
Added missed testing codes for dsdot.
|
14 years ago |