You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

USAGE.md 6.8 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
  1. # Notes on OpenBLAS usage
  2. ## Usage
  3. #### Program is Terminated. Because you tried to allocate too many memory regions
  4. In OpenBLAS, we mange a pool of memory buffers and allocate the number of
  5. buffers as the following.
  6. ```
  7. #define NUM_BUFFERS (MAX_CPU_NUMBER * 2)
  8. ```
  9. This error indicates that the program exceeded the number of buffers.
  10. Please build OpenBLAS with larger `NUM_THREADS`. For example, `make
  11. NUM_THREADS=32` or `make NUM_THREADS=64`. In `Makefile.system`, we will set
  12. `MAX_CPU_NUMBER=NUM_THREADS`.
  13. #### How can I use OpenBLAS in multi-threaded applications?
  14. If your application is already multi-threaded, it will conflict with OpenBLAS
  15. multi-threading. Thus, you must set OpenBLAS to use single thread in any of the
  16. following ways:
  17. * `export OPENBLAS_NUM_THREADS=1` in the environment variables.
  18. * Call `openblas_set_num_threads(1)` in the application on runtime.
  19. * Build OpenBLAS single thread version, e.g. `make USE_THREAD=0`
  20. If the application is parallelized by OpenMP, please use OpenBLAS built with
  21. `USE_OPENMP=1`
  22. #### How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH
  23. The environment variable which control the kernel selection is
  24. `OPENBLAS_CORETYPE` (see `driver/others/dynamic.c`) e.g. `export
  25. OPENBLAS_CORETYPE=Haswell` and the function `char* openblas_get_corename()`
  26. returns the used target.
  27. #### How could I disable OpenBLAS threading affinity on runtime?
  28. You can define the `OPENBLAS_MAIN_FREE` or `GOTOBLAS_MAIN_FREE` environment
  29. variable to disable threading affinity on runtime. For example, before the
  30. running,
  31. ```
  32. export OPENBLAS_MAIN_FREE=1
  33. ```
  34. Alternatively, you can disable affinity feature with enabling `NO_AFFINITY=1`
  35. in `Makefile.rule`.
  36. ## Linking with the library
  37. * Link with shared library
  38. `gcc -o test test.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas`
  39. If the library is multithreaded, please add `-lpthread`. If the library
  40. contains LAPACK functions, please add `-lgfortran` or other Fortran libs.
  41. * Link with static library
  42. `gcc -o test test.c /your/path/libopenblas.a`
  43. You can download `test.c` from https://gist.github.com/xianyi/5780018
  44. On Linux, if OpenBLAS was compiled with threading support (`USE_THREAD=1` by
  45. default), custom programs statically linked against `libopenblas.a` should also
  46. link with the pthread library e.g.:
  47. ```
  48. gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread
  49. ```
  50. Failing to add the `-lpthread` flag will cause errors such as:
  51. ```
  52. /opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':
  53. memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'
  54. memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'
  55. ...
  56. ```
  57. ## Code examples
  58. #### Call CBLAS interface
  59. This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656
  60. ```
  61. #include <cblas.h>
  62. #include <stdio.h>
  63. void main()
  64. {
  65. int i=0;
  66. double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
  67. double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
  68. double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
  69. cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);
  70. for(i=0; i<9; i++)
  71. printf("%lf ", C[i]);
  72. printf("\n");
  73. }
  74. ```
  75. `gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran`
  76. #### Call BLAS Fortran interface
  77. This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018
  78. ```
  79. #include "stdio.h"
  80. #include "stdlib.h"
  81. #include "sys/time.h"
  82. #include "time.h"
  83. extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);
  84. int main(int argc, char* argv[])
  85. {
  86. int i;
  87. printf("test!\n");
  88. if(argc<4){
  89. printf("Input Error\n");
  90. return 1;
  91. }
  92. int m = atoi(argv[1]);
  93. int n = atoi(argv[2]);
  94. int k = atoi(argv[3]);
  95. int sizeofa = m * k;
  96. int sizeofb = k * n;
  97. int sizeofc = m * n;
  98. char ta = 'N';
  99. char tb = 'N';
  100. double alpha = 1.2;
  101. double beta = 0.001;
  102. struct timeval start,finish;
  103. double duration;
  104. double* A = (double*)malloc(sizeof(double) * sizeofa);
  105. double* B = (double*)malloc(sizeof(double) * sizeofb);
  106. double* C = (double*)malloc(sizeof(double) * sizeofc);
  107. srand((unsigned)time(NULL));
  108. for (i=0; i<sizeofa; i++)
  109. A[i] = i%3+1;//(rand()%100)/10.0;
  110. for (i=0; i<sizeofb; i++)
  111. B[i] = i%3+1;//(rand()%100)/10.0;
  112. for (i=0; i<sizeofc; i++)
  113. C[i] = i%3+1;//(rand()%100)/10.0;
  114. //#if 0
  115. printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc);
  116. gettimeofday(&start, NULL);
  117. dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
  118. gettimeofday(&finish, NULL);
  119. duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;
  120. double gflops = 2.0 * m *n*k;
  121. gflops = gflops/duration*1.0e-6;
  122. FILE *fp;
  123. fp = fopen("timeDGEMM.txt", "a");
  124. fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops);
  125. fclose(fp);
  126. free(A);
  127. free(B);
  128. free(C);
  129. return 0;
  130. }
  131. ```
  132. ` gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a`
  133. ` ./time_dgemm <m> <n> <k> `
  134. ## Troubleshooting
  135. * Please read [Faq](https://github.com/xianyi/OpenBLAS/wiki/Faq) at first.
  136. * Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
  137. * Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
  138. * The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
  139. * OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
  140. * On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
  141. ## BLAS reference manual
  142. If you want to understand every BLAS function and definition, please read
  143. [Intel MKL reference manual](https://software.intel.com/sites/products/documentation/doclib/iss/2013/mkl/mklman/GUID-F7ED9FB8-6663-4F44-A62B-61B63C4F0491.htm)
  144. or [netlib.org](http://netlib.org/blas/)
  145. Here are [OpenBLAS extension functions](https://github.com/xianyi/OpenBLAS/wiki/OpenBLAS-Extensions)
  146. ## How to reference OpenBLAS.
  147. You can reference our [papers](https://github.com/xianyi/OpenBLAS/wiki/publications).
  148. Alternatively, you can cite the OpenBLAS homepage http://www.openblas.net directly.