You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

Readme.md 4.1 kB

3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
  1. # Submission of jobs
  2. This folder shows in a few more examples of how jobs can be submitted in Slurm. Some examples use containers.
  3. Attention: The parameters for job names and partitions probably have to be adjusted!
  4. # Simple Jobs
  5. ## submit_job.go
  6. In this example, a simple bash-Jobs is submitted. The used partition is *long* (adapt probably).
  7. ```
  8. job_desc.Partition="long"
  9. ```
  10. The job sets two environment variables and executes a
  11. ```
  12. hostname
  13. env | grep SLUM
  14. ```
  15. On a single node of the cluster (single task job).
  16. The application does not wait until the hob is completed, but dirctly returns.
  17. The (std) output is written to
  18. out-jobid.txt, the std- error to err-jobid.txt
  19. ```
  20. job_desc.Std_out = ("./out-%j.txt")
  21. job_desc.Std_err = ("./err-%j.txt")
  22. ````
  23. ## update_job.go
  24. This example allows to update the qos and the partition a job is running on. This can help to move the job to another queue with another partition.
  25. Note to users: In theory, the API allows the update of the number of nodes and the tasks per node. However, since this is only allowed by root or a slurm admin, we do not include an example here.
  26. Synthax
  27. ```
  28. ./update_job JobId qos partition
  29. ```
  30. (Note: This requires that the Job with the Id JobID is already submitted and in a pending state)
  31. # Container jobs
  32. The following examples all submit a job that starts singulrity containers.
  33. These containers, if they do not exist, are created. However, problems can arise if the user does not have sudo permissions..
  34. ## The containers
  35. The first container is an MPI container. This is used by and `submit_mpi_containier.go` and `submit_mpi_and_update.go`. The definition is stored in `mpi_container.def`
  36. It can also be created with the command
  37. ```
  38. sudo singularity build mpi_container.img mpi_container.def
  39. ```
  40. The program mpi_pingppong (source code enclosed: `mpi_pingpong.c` ) is built into the container. It performs a ping-pong test between two processes.
  41. This container uses the hybrid model, which assumes that MPI is installed on the cluter (to start the job) and installs it in the container itself. Works with OpenMPI.
  42. The second container is an openmp container, including a sample OpenMP programm openmp_example (source code: ` openmp_example.c`).
  43. It can also be created with the command:
  44. ```
  45. sudo singularity build openmp_container.img openmp_container.def
  46. ```
  47. This container is used bei `submit_openmp_container.go`.
  48. ## submit_mpi_containier.go
  49. Submits a mpi-container job to the cluster. It runs to Processes on two nodes
  50. ```
  51. job_desc.Min_nodes =uint32(2)
  52. job_desc.Num_tasks = uint32(2)
  53. ```
  54. The application blocks, until the job is completed. The (std) output is written to
  55. jobid-out.txt, the std- error to jobId-err.txt
  56. ```
  57. job_desc.Std_out = ("./%j-out.txt")
  58. job_desc.Std_err = ("./%j-err.txt")
  59. ```
  60. ## submit_omp_container.go
  61. Submits two openMP jobs to the cluster and wait, until they are completed.
  62. Both jobs allocate *one process* for the job, but *two CPUs per task/process* (for multi-threading).
  63. ```
  64. job_desc.Num_tasks = uint32(1)
  65. job_desc.Cpus_per_task = uint16(2)
  66. ```
  67. The first job reads the environment variable ` SLURM_JOB_CPUS_PER_NODE` and sets the number of openMP threads to exactly the number of cpus that are available per task/process.
  68. ```
  69. job_desc.Script+= "export OMP_NUM_THREADS=$SLURM_JOB_CPUS_PER_NODE\n"
  70. ```
  71. The second job sets the number of threads to 4 (which is oversuscribing because more threads are started than processes) and executes the same job.
  72. ```
  73. job_desc.Script+= "export OMP_NUM_THREADS=4\n"
  74. ```
  75. The program waits until both jobs are completed. The results are written to the two outputs files, similiar to `submit_mpi_container.go`
  76. ### submit_mpi_and_update.go
  77. This application is dooing the same as `submit_mpi_container.go`
  78. ```
  79. ops.Qos = "shortjobs"
  80. ops.Partition = "short"
  81. ```
  82. This situation, can, for example, be created my submitting longer, other jobs bevore in the background (depending on the partion size) and than start this application:
  83. ```
  84. ./submit_mpi_containier & ./submit_mpi_containier & ./submit_mpi_and_update
  85. ```

PCM is positioned as Software stack over Cloud, aiming to build the standards and ecology of heterogeneous cloud collaboration for JCC in a non intrusive and autonomous peer-to-peer manner.