Difference between revisions of "SLURM Partitions"

Line 11: Line 11:
“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the <code>qw</code> state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.
“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the <code>qw</code> state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.


== Partition Node Sets ==
== CIRCE Partition Node Sets ==
The following node sets are available:
The following node sets are available:



Revision as of 19:55, 20 February 2017

SLURM Partitions

Dispatching

It is really no longer necessary to discuss queues in the traditional sense. In the past, we would create queues based on pools of hardware resources. If a user wanted to utilize a particular hardware resource, he or she would request the appropriate queue. Most times, however, what the user wants and what is best for the user or what is best for all users are not necessarily the same. Allowing individuals to dictate where their jobs will run will inevitably lead to throughput problems since it would be unreasonable to expect the users to understand the complete state and behavior of the scheduler.

Below is a general description of how jobs make their way through the queue. Please see Scheduling and Dispatch Policy for more information.

When a user submits a job to a specific partiton, the scheduler determines if the requested hardware/time requirements of the job (see Using Features) match up with the resources the partition provides. If it does, the job is executed if there are available resources. If there are no available resources, the job will be held until the next scheduler iteration, to see if resources have become available.

“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the qw state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.

CIRCE Partition Node Sets

The following node sets are available:

Memory CPU Cores Interconnect Nodes Slots GPUs Complex Flags Location
24GB Opteron 2384 12 4x DDR IB 32 384 n/a ib_ddr, ib_psm, tpa, sse4, sse4a, cpu_amd, opteron_2384 Tampa
24GB Xeon E5649 12 4x QDR IB 107 1284 8 ib_qdr, ib_psm, sse4, sse41, sse42, cpu_xeon, xeon_E5649 Tampa
24GB Xeon E5-2630 12 4x QDR IB 67 804 n/a ib_qdr, ib_psm, sse41, sse42, avx, cpu_xeon, xeon_E52630 Tampa
24GB Xeon E5649 12 4x QDR IB 14 168 n/a ib_qdr, ib_ofa, sse41, sse42, avx, cpu_xeon, xeon_E5649 Tampa
32GB Xeon E5-2670 16 4x QDR IB 129 2064 40 ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670, gpu_K20 Tampa
512GB Xeon E5-2650 20 4x QDR IB 3 60 n/a ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650, mem_512G Tampa
Total 4764 48

CIRCE Partition Layout

The node sets are associated with the following queues:

Queue Name Max Runtime QOS' Required Description (Preempt Grace Period) Notes
circe infinite none default general-purpose queue The default partition if no partition is specified
gpfsgpu infinite none CUDA GPU nodes
cuda infinite none CUDA GPU nodes
cms2016 infinite cms16, preempt CMS nodes (2 hour grace period) *See Preemption Guidelines for more info
mri2016 infinite mri16, preempt MRI nodes (2 hour grace period) *See Preemption Guidelines for more info
himem 1 week memaccess large memory job queue (>= 24 GB) To request access, email rc-help@usf.edu

SC Partition Layout

The node sets are associated with the following queues:

Queue Name Max Runtime QOS' Required Description Notes
sc 2 days none default general-purpose queue The default partition if no partition is specified
cuda 2 days none CUDA GPU nodes