You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

GotoBLAS_03FAQ.txt 4.7 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
  1. GotoBLAS2 FAQ
  2. 1. General
  3. 1.1 Q Can I find useful paper about GotoBLAS2?
  4. A You may check following URL.
  5. http://www.cs.utexas.edu/users/flame/Publications/index.htm
  6. 11. Kazushige Goto and Robert A. van de Geijn, " Anatomy of
  7. High-Performance Matrix Multiplication," ACM Transactions on
  8. Mathematical Software, accepted.
  9. 15. Kazushige Goto and Robert van de Geijn, "High-Performance
  10. Implementation of the Level-3 BLAS." ACM Transactions on
  11. Mathematical Software, submitted.
  12. 1.2 Q Does GotoBLAS2 work with Hyperthread (SMT)?
  13. A Yes, it will work. GotoBLAS2 detects Hyperthread and
  14. avoid scheduling on the same core.
  15. 1.3 Q When I type "make", following error occured. What's wrong?
  16. $shell> make
  17. "./Makefile.rule", line 58: Missing dependency operator
  18. "./Makefile.rule", line 61: Need an operator
  19. ...
  20. A This error occurs because you didn't use GNU make. Some binary
  21. packages install GNU make as "gmake" and it's worth to try.
  22. 1.4 Q Function "xxx" is slow. Why?
  23. A Generally GotoBLAS2 has many well optimized functions, but it's
  24. far and far from perfect. Especially Level 1/2 function
  25. performance depends on how you call BLAS. You should understand
  26. what happends between your function and GotoBLAS2 by using profile
  27. enabled version or hardware performance counter. Again, please
  28. don't regard GotoBLAS2 as a black box.
  29. 1.5 Q I have a commercial C compiler and want to compile GotoBLAS2 with
  30. it. Is it possible?
  31. A All function that affects performance is written in assembler
  32. and C code is just used for wrapper of assembler functions or
  33. complicated functions. Also I use many inline assembler functions,
  34. unfortunately most of commercial compiler can't handle inline
  35. assembler. Therefore you should use gcc.
  36. 1.6 Q I use OpenMP compiler. How can I use GotoBLAS2 with it?
  37. A Please understand that OpenMP is a compromised method to use
  38. thread. If you want to use OpenMP based code with GotoBLAS2, you
  39. should enable "USE_OPENMP=1" in Makefile.rule.
  40. 1.7 Q Could you tell me how to use profiled library?
  41. A You need to build and link your application with -pg
  42. option. After executing your application, "gmon.out" is
  43. generated in your current directory.
  44. $shell> gprof <your application name> gmon.out
  45. Each sample counts as 0.01 seconds.
  46. % cumulative self self total
  47. time seconds seconds calls Ks/call Ks/call name
  48. 89.86 975.02 975.02 79317 0.00 0.00 .dgemm_kernel
  49. 4.19 1020.47 45.45 40 0.00 0.00 .dlaswp00N
  50. 2.28 1045.16 24.69 2539 0.00 0.00 .dtrsm_kernel_LT
  51. 1.19 1058.03 12.87 79317 0.00 0.00 .dgemm_otcopy
  52. 1.05 1069.40 11.37 4999 0.00 0.00 .dgemm_oncopy
  53. ....
  54. I think profiled BLAS library is really useful for your
  55. research. Please find bottleneck of your application and
  56. improve it.
  57. 1.8 Q Is number of thread limited?
  58. A Basically, there is no limitation about number of threads. You
  59. can specify number of threads as many as you want, but larger
  60. number of threads will consume extra resource. I recommend you to
  61. specify minimum number of threads.
  62. 1.9 Q I have segfaults when I compile with USE_OPENMP=1. What's wrong?
  63. A This may be related to a bug in the Linux kernel 2.6.32. Try applying
  64. the patch segaults.patch using
  65. patch < segfaults.patch
  66. and see if the crashes persist. Note that this patch will lead to many
  67. compiler warnings.
  68. 2. Architecture Specific issue or Implementation
  69. 2.1 Q GotoBLAS2 seems to support any combination with OS and
  70. architecture. Is it possible?
  71. A Combination is limited by current OS and architecture. For
  72. examble, the combination OSX with SPARC is impossible. But it
  73. will be possible with slight modification if these combination
  74. appears in front of us.
  75. 2.2 Q I have POWER architecture systems. Do I need extra work?
  76. A Although POWER architecture defined special instruction
  77. like CPUID to detect correct architecture, it's privileged
  78. and can't be accessed by user process. So you have to set
  79. the architecture that you have manually in getarch.c.
  80. 2.3 Q I can't create DLL on Cygwin (Error 53). What's wrong?
  81. A You have to make sure if lib.exe and mspdb80.dll are in Microsoft
  82. Studio PATH. The easiest way is to use 'which' command.
  83. $shell> which lib.exe
  84. /cygdrive/c/Program Files/Microsoft Visual Studio/VC98/bin/lib.exe