SLURM Partitions

SLURM Partitions

Dispatching

It is really no longer necessary to discuss queues in the traditional sense. In the past, we would create queues based on pools of hardware resources. If a user wanted to utilize a particular hardware resource, he or she would request the appropriate queue. Most times, however, what the user wants and what is best for the user or what is best for all users are not necessarily the same. Allowing individuals to dictate where their jobs will run will inevitably lead to throughput problems since it would be unreasonable to expect the users to understand the complete state and behavior of the scheduler.

Below is a general description of how jobs make their way through the queue. Please see Scheduling and Dispatch Policy for more information.

When a user submits a job to a specific partiton, the scheduler determines if the requested hardware/time requirements of the job (see Using Features) match up with the resources the partition provides. If it does, the job is executed if there are available resources. If there are no available resources, the job will be held until the next scheduler iteration, to see if resources have become available.

“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the qw state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.

CIRCE Partitions

CIRCE Node Sets

The following node sets are available:

Memory CPU Cores Interconnect Nodes Slots GPUs Constraint Flags Location
24GB Xeon E5-2630 12 4x QDR IB 28 336 n/a ib_qdr, ib_psm, sse41, sse42, avx, cpu_xeon, xeon_E52630 Tampa
32GB Xeon E5-2670 16 4x QDR IB 19 304 37 ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670, gpu_K20 Tampa
32GB Xeon E5-2670 16 4x QDR IB 84 1344 n/a ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670 Tampa
512GB Xeon E5-2650 V3 20 4x QDR IB 3 60 n/a ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650, mem_512G Tampa
64GB Xeon E5-2650 V4 24 4x QDR IB 108 2592 n/a ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650 Tampa
192GB Xeon Silver 4114 20 100G Omni-Path 57 1140 57 ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1080ti Tampa
192GB Xeon Silver 4114 20 100G Omni-Path 39 780 49 ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1070ti Tampa
96GB Xeon Silver 4114 20 100G Omni-Path 10 200 6 ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_gtx1080ti Tampa
96GB Xeon Silver 4114 20 100G Omni-Path 2 40 2 ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, silver_4114, gpu_titanv100 Tampa
96GB Xeon Gold 6136 20 100G Omni-Path 8 160 24 ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, gold_6136, gpu_gtx1080ti Tampa
96GB Xeon Gold 6136 20 100G Omni-Path 20 400 60 (480) ib_opa, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, gold_6136, gpu_gtx1080ti Tampa
Total 378 7324 235 (655)

CIRCE Partition Layout

The node sets are associated with the following queues:

Queue Name Max Runtime QOS' Required Description (Preempt Grace Period) Operating System $WORK file system path Notes
bgfsqdr Per QOS el7, el7_cms EL7 application testing RHEL 7.4 /work_bgfs To request access, email rc-help@usf.edu
circe 1 week none default general-purpose queue RHEL 6.8 /work The default partition if no partition is specified
cms2016 1 week cms16 CMS nodes RHEL 6.8 /work Limited to College of Marine Science only
cuda 1 week none Pseudo partition for access to Kepler K20's RHEL 6.8 /work CUDA GPU nodes
devel Per QOS devel, trial development partition RHEL 6.8 /work max: 24 cores and 2 nodes per user
mri2016 2 days mri16, mri16_npi, preempt MRI nodes (2 hour grace period) RHEL 6.8 /work *See Preemption Guidelines for more info
henderson_itn18 Per QOS hen18 Chemical Engineering GPU nodes

(2 hour grace period)

RHEL 7.4 /work_bgfs 3 GPU's (GTX 1080 Ti) per node
himem 1 week memaccess large memory job queue (>= 64 GB) RHEL 6.8 /work To request access, email rc-help@usf.edu
rra 1 week rra Genomics Center/Restricted Research RHEL 7.4 /work HIPAA certification required & audited

BeeGFS file system

simmons_itn18 Per QOS sim18 Chemical Engineering GPU nodes

(2 hour grace period)

RHEL 7.4 /work_bgfs 24 GPU's (GTX 1080 Ti) per node

(oversubscribed)

snsm_itn19 Per QOS snsm19, snsm19_long SNSM grant nodes RHEL 7.4 /work_bgfs 1 GPU (GTX 1070 Ti) per node
  • Note: For jobs requiring longer than 1 week to run, please email rc-help@usf.edu with your project details (hardware/runtime requested, duration of project, etc).

SC Partitions

SC Node Sets

The following node sets are available:

Memory CPU Cores Interconnect Nodes Slots GPUs Constraint Flags Location
64GB Xeon E5-2650 v4 24 4x QDR IB 10 240 n/a ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E52650 Tampa
48GB Xeon E5649 12 4x QDR IB 7 84 7 ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E5649 Tampa
Total 444 7

SC Partition Layout

The node sets are associated with the following queues:

Queue Name Max Runtime QOS' Required Description Notes
sc 2 days none default general-purpose queue The default partition if no partition is specified
cuda 2 days none CUDA GPU nodes