SLURM Partitions
SLURM Partitions
Dispatching
It is really no longer necessary to discuss queues in the traditional sense. In the past, we would create queues based on pools of hardware resources. If a user wanted to utilize a particular hardware resource, he or she would request the appropriate queue. Most times, however, what the user wants and what is best for the user or what is best for all users are not necessarily the same. Allowing individuals to dictate where their jobs will run will inevitably lead to throughput problems since it would be unreasonable to expect the users to understand the complete state and behavior of the scheduler.
Below is a general description of how jobs make their way through the queue. Please see Scheduling and Dispatch Policy for more information.
When a user submits a job to a specific partiton, the scheduler determines if the requested hardware/time requirements of the job (see Using Features) match up with the resources the partition provides. If it does, the job is executed if there are available resources. If there are no available resources, the job will be held until the next scheduler iteration, to see if resources have become available.
“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the qw
state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.
CIRCE Partitions
CIRCE Node Sets
The following node sets are available:
Memory | CPU | Cores | Interconnect | Nodes | Slots | GPUs | Constraint Flags | Location |
24GB | Xeon E5-2630 | 12 | 4x QDR IB | 28 | 336 | n/a | ib_qdr, ib_psm, sse41, sse42, avx, cpu_xeon, xeon_E52630 | Tampa |
32GB | Xeon E5-2670 | 16 | 4x QDR IB | 19 | 304 | 37 | ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670, gpu_K20 | Tampa |
32GB | Xeon E5-2670 | 16 | 4x QDR IB | 84 | 1344 | n/a | ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670 | Tampa |
512GB | Xeon E5-2650 V3 | 20 | 4x QDR IB | 3 | 60 | n/a | ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650, mem_512G | Tampa |
64GB | Xeon E5-2650 V4 | 24 | 4x QDR IB | 108 | 2592 | n/a | ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650 | Tampa |
192GB | Xeon Silver 4114 | 20 | 100G Omni-Path | 57 | 1140 | 57 | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1080ti | Tampa |
192GB | Xeon Silver 4114 | 20 | 100G Omni-Path | 39 | 780 | 49 | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1070ti | Tampa |
96GB | Xeon Silver 4114 | 20 | 100G Omni-Path | 10 | 200 | 6 | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1080ti | Tampa |
96GB | Xeon Silver 4114 | 20 | 100G Omni-Path | 2 | 40 | 2 | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_titanv100 | Tampa |
96GB | Xeon Gold 6136 | 20 | 100G Omni-Path | 8 | 160 | 24 | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, gold_6136, gpu_gtx1080ti | Tampa |
96GB | Xeon Gold 6136 | 20 | 100G Omni-Path | 20 | 400 | 60 (480) | ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, gold_6136, gpu_gtx1080ti | Tampa |
Total | 378 | 7324 | 235 (655) |
CIRCE Partition Layout
The node sets are associated with the following queues:
Queue Name | Max Runtime | QOS' Required | Description (Preempt Grace Period) | Operating System | $WORK file system path | Notes | |
bgfsqdr | Per QOS | el7, el7_cms | EL7 application testing | RHEL 7.4 | /work_bgfs | To request access, email rc-help@usf.edu | |
circe | 1 week | none | default general-purpose queue | RHEL 6.8 | /work | The default partition if no partition is specified | |
cms2016 | 1 week | cms16 | CMS nodes | RHEL 6.8 | /work | ||
cuda | 1 week | none | RHEL 6.8 | /work | CUDA GPU nodes | ||
devel | Per QOS | devel, trial | development partition | RHEL 6.8 | /work | max: 24 cores and 2 nodes per user | |
mri2016 | 2 days | mri16, mri16_npi, preempt | MRI nodes (2 hour grace period) | RHEL 6.8 | /work | *See Preemption Guidelines for more info | |
henderson_itn18 | Per QOS | hen18 | Chemical Engineering GPU nodes
(2 hour grace period) |
RHEL 7.4 | /work_bgfs | 3 GPU's (GTX 1080 Ti) per node | |
himem | 1 week | memaccess | large memory job queue (>= 64 GB) | RHEL 6.8 | /work | To request access, email rc-help@usf.edu | |
rra | 1 week | rra | Genomics Center/Restricted Research | RHEL 7.4 | /work | HIPAA certification required & audited
BeeGFS file system | |
simmons_itn18 | Per QOS | sim18 | Chemical Engineering GPU nodes
(2 hour grace period) |
RHEL 7.4 | /work_bgfs | 24 GPU's (GTX 1080 Ti) per node
(oversubscribed) | |
snsm_itn19 | Per QOS | snsm19, snsm19_long | SNSM grant nodes | RHEL 7.4 | /work_bgfs | 1 GPU (GTX 1070 Ti) per node |
- Note: For jobs requiring longer than 1 week to run, please email rc-help@usf.edu with your project details (hardware/runtime requested, duration of project, etc).
SC Partitions
SC Node Sets
The following node sets are available:
Memory | CPU | Cores | Interconnect | Nodes | Slots | GPUs | Constraint Flags | Location |
64GB | Xeon E5-2650 v4 | 24 | 4x QDR IB | 10 | 240 | n/a | ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E52650 | Tampa |
48GB | Xeon E5649 | 12 | 4x QDR IB | 7 | 84 | 7 | ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E5649 | Tampa |
Total | 444 | 7 |
SC Partition Layout
The node sets are associated with the following queues:
Queue Name | Max Runtime | QOS' Required | Description | Notes |
sc | 2 days | none | default general-purpose queue | The default partition if no partition is specified |
cuda | 2 days | none | CUDA GPU nodes |