Difference between revisions of "SLURM Partitions"

 
(38 intermediate revisions by 4 users not shown)
Line 10: Line 10:


“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the <code>qw</code> state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.
“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the <code>qw</code> state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.
== CIRCE Partitions ==
=== CIRCE Node Sets ===
The following node sets are available:
{| class=wikitable
|- style="background-color:#f1edbe;"
|'''Memory'''
|'''CPU'''
|'''Cores'''
|'''Interconnect'''
|'''Nodes'''
|'''Slots'''
|'''GPUs'''
|'''Constraint Flags'''
|'''Location'''
|-
|24GB
|Opteron 2384
|12
|4x DDR IB
|32
|384
|n/a
|ib_ddr, ib_psm, tpa, sse4, sse4a, cpu_amd, opteron_2384
|Tampa
|-
|24GB
|Xeon E5649
|12
|4x QDR IB
|107
|1284
|8
|ib_qdr, ib_psm, sse4, sse41, sse42, cpu_xeon, xeon_E5649
|Tampa
|-
|24GB
|Xeon E5-2630
|12
|4x QDR IB
|67
|804
|n/a
|ib_qdr, ib_psm, sse41, sse42, avx, cpu_xeon, xeon_E52630
|Tampa
|-
|24GB
|Xeon E5649
|12
|4x QDR IB
|14
|168
|n/a
|ib_qdr, ib_ofa, sse41, sse42, avx, cpu_xeon, xeon_E5649
|Tampa
|-
|32GB
|Xeon E5-2670
|16
|4x QDR IB
|129
|2064
|40
|ib_qdr, ib_psm, sse4, sse41, sse42, avx, cpu_xeon, xeon_E52670, gpu_K20
|Tampa
|-
|512GB
|Xeon E5-2650
|20
|4x QDR IB
|3
|60
|n/a
|ib_qdr, ib_psm, tpa, sse4_1, sse4_2, avx, avx2, gpfs, cpu_xeon, xeon_E52650, mem_512G
|Tampa
|-
|'''Total'''
|
|
|
|
|4764
|48
|
|}


=== CIRCE Partition Layout ===
=== CIRCE Partition Layout ===
The node sets are associated with the following queues:
The following partitions (aka queues) are available on CIRCE:
 
=== Current QOS' configuration and limits ===
[[SLURM_Active_QOS']]
=== Per Partition Hardware ===
[[CIRCE_Hardware]]
{|class=wikitable
{|class=wikitable
|- style="background-color:#f1edbe;"
|- style="background-color:#f1edbe;text-align:center"
|'''Queue Name'''
|'''Queue Name'''
|'''Max Runtime'''
|'''Max Runtime'''
|'''QOS' Required'''
|'''QOS' Required'''
|'''Description (Preempt Grace Period)'''
|'''Description (Preempt Grace Period)'''
|'''Operating System'''
|'''$WORK file system path'''
|'''Notes'''
|'''Notes'''
|-
|amd_2021
|1 week
|none
|AMD 2021 hardware purchase
|RHEL 7.4
|'''/work_bgfs'''
|
|-
|amdwoods_2022
|Per QOS
|amdwoods22, physics22, preempt
|Physics 2022 hardware purchase
|RHEL 7.4
|'''/work_bgfs'''
|Joint hardware purchase within Physics
|-
|bfbsm_2019
|Per QOS
|bfbsm19, preempt
|BFBSM_2019 hardware purchase
|RHEL 7.4
|'''/work_bgfs'''
|*See [[SLURM_Preemption|Preemption Guidelines]] for more info
|-
|cbcs
|Per QOS
|fawcett_access, preempt
|CBCS/Engineering queue
|RHEL 7.4
|'''/work_bgfs'''
|-
|charbonnier_2022
|Per QOS
|charbonnier22, preempt
|Physics 2022 hardware purchase
|RHEL 7.4
|'''/work_bgfs'''
|Joint hardware purchase within Physics
|-
|chbme_2018
|Per QOS
|chbme18, sim18, preempt
|Chemical Engineering GPU nodes
|RHEL 7.4
|'''/work_bgfs'''
|3 GPU's (GTX 1080 Ti) per node. Uses Omni-Path.
|-
|-
|circe
|circe
Line 112: Line 78:
|none
|none
|default general-purpose queue
|default general-purpose queue
|The default partition if no partition is specified
|RHEL 7.4
|'''/work'''
|The default general-use partition if no partition is specified
|-
|-
|cuda
|cms_ocg
|1 week
|Per QOS
|none
|cms_ocg
|CUDA GPU nodes
|CMS OCG nodes
|
|RHEL 7.4
|'''/work_bgfs'''
|Limited to College of Marine Science OCG only
|-
|-
|devel
|cool2022
|30 minutes
|Per QOS
|devel
|cool22,preempt
|development partition
|CMS OCG nodes
|max: 24 cores and 2 nodes per user
|RHEL 7.4
|'''/work'''
|College of Marine Science OOL AMD hardware purchase
|-
|-
|mri2016
|hchg
|2 days
|1 week
|mri16, preempt
|hchg, interactive
|MRI nodes (2 hour grace period)
|general-purpose interactive/serial partition
|RHEL 7.4
|'''/work_bgfs'''
|*See [[SLURM_Preemption|Preemption Guidelines]] for more info
|*See [[SLURM_Preemption|Preemption Guidelines]] for more info
|-
|-
Line 136: Line 110:
|memaccess
|memaccess
|large memory job queue (&gt;= 64 GB)
|large memory job queue (&gt;= 64 GB)
|RHEL 7.4
|'''/work'''
|To request access, email {{rchelp}}
|To request access, email {{rchelp}}
|-
|-
|cms2016
|margres_2020
|Per QOS
|margres20,preempt
|Margres lab nodes, Integrative Biology
|RHEL 7.4
|'''/work_bgfs'''
|
|-
|muma_2021
|Per QOS
|muma21, preempt_short
|MUMA
|RHEL 7.4
|'''/work_bgfs'''
|
|-
|qcg_gayles_2022
|Per QOS
|qcg_gayles22, physics22, preempt
|Physics 2022 hardware purchase
|RHEL 7.4
|'''/work_bgfs'''
|Joint hardware purchase within Physics
|-
|rra
|1 week
|1 week
|cms16
|rra, rra_guest
|CMS nodes
|Genomics Center/Restricted Research
|
|RHEL 7.4
|'''/work'''
|HIPAA certification required & audited BeeGFS file system. Uses Omni-Path.
|-
|rra_con2020
|35 days
|rradl
|Genomics Center/College of Nursing deep learning partition
|RHEL 7.4
|'''/work'''
|HIPAA certification required & audited BeeGFS file system. Uses HDR Infiniband.  Access limited to approved personnel and workflows only.
|-
|simmons_itn18
|Per QOS
|sim18,chbme18, preempt, preempt_short
|Chemical Engineering GPU nodes.
(30 minute grace period)
|RHEL 7.4
|'''/work_bgfs'''
|*See [[SLURM_Preemption|Preemption Guidelines]] for more info. 3 GPU's (GTX 1080 Ti) per node. Uses Omni-Path.
|-
|snsm_itn19
|Per QOS
|openaccess, snsm19, snsm19_long, snsm19_special
|SNSM grant nodes
|RHEL 7.4
|'''/work_bgfs'''
|1 GPU (GTX 1070 Ti) per node. Uses Omni-Path.
|}
|}


Line 148: Line 175:


== SC Partitions ==
== SC Partitions ==
=== SC Node Sets ===
The following node sets are available:
{| class=wikitable
|- style="background-color:#f1edbe;"
|'''Memory'''
|'''CPU'''
|'''Cores'''
|'''Interconnect'''
|'''Nodes'''
|'''Slots'''
|'''GPUs'''
|'''Constraint Flags'''
|'''Location'''
|-
|64GB
|Xeon E5-2650 v4
|24
|4x QDR IB
|10
|240
|n/a
|ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E52650
|Tampa
|-
|48GB
|Xeon E5649
|12
|4x QDR IB
|7
|84
|7
|ib_qdr, ib_psm, avx, avx2, sse4_1, sse4_2, gpfs, cpu_xeon, xeon_E5649
|Tampa
|-
|'''Total'''
|
|
|
|
|444
|7
|
|}
=== SC Partition Layout ===
=== SC Partition Layout ===
The node sets are associated with the following queues:
The following partitions (aka queues) are available on SC:  


{|class=wikitable
{|class=wikitable
Line 209: Line 191:
|default general-purpose queue
|default general-purpose queue
|The default partition if no partition is specified
|The default partition if no partition is specified
|-
|cuda
|2 days
|none
|CUDA GPU nodes
|
|}
|}

Latest revision as of 21:25, 13 February 2024

SLURM Partitions

Dispatching

It is really no longer necessary to discuss queues in the traditional sense. In the past, we would create queues based on pools of hardware resources. If a user wanted to utilize a particular hardware resource, he or she would request the appropriate queue. Most times, however, what the user wants and what is best for the user or what is best for all users are not necessarily the same. Allowing individuals to dictate where their jobs will run will inevitably lead to throughput problems since it would be unreasonable to expect the users to understand the complete state and behavior of the scheduler.

Below is a general description of how jobs make their way through the queue. Please see Scheduling and Dispatch Policy for more information.

When a user submits a job to a specific partiton, the scheduler determines if the requested hardware/time requirements of the job (see Using Features) match up with the resources the partition provides. If it does, the job is executed if there are available resources. If there are no available resources, the job will be held until the next scheduler iteration, to see if resources have become available.

“Available resources” include processors and memory. Processors generally match up to the number of slots in a given queue while memory is defined as a complex value which may not be so obvious to query. If your job is waiting in the qw state, it is likely that either the slots requested or the memory requested are beyond what the system can provide at that particular point in time.

CIRCE Partition Layout

The following partitions (aka queues) are available on CIRCE:

Current QOS' configuration and limits

SLURM_Active_QOS'

Per Partition Hardware

CIRCE_Hardware

Queue Name Max Runtime QOS' Required Description (Preempt Grace Period) Operating System $WORK file system path Notes
amd_2021 1 week none AMD 2021 hardware purchase RHEL 7.4 /work_bgfs
amdwoods_2022 Per QOS amdwoods22, physics22, preempt Physics 2022 hardware purchase RHEL 7.4 /work_bgfs Joint hardware purchase within Physics
bfbsm_2019 Per QOS bfbsm19, preempt BFBSM_2019 hardware purchase RHEL 7.4 /work_bgfs *See Preemption Guidelines for more info
cbcs Per QOS fawcett_access, preempt CBCS/Engineering queue RHEL 7.4 /work_bgfs
charbonnier_2022 Per QOS charbonnier22, preempt Physics 2022 hardware purchase RHEL 7.4 /work_bgfs Joint hardware purchase within Physics
chbme_2018 Per QOS chbme18, sim18, preempt Chemical Engineering GPU nodes RHEL 7.4 /work_bgfs 3 GPU's (GTX 1080 Ti) per node. Uses Omni-Path.
circe 1 week none default general-purpose queue RHEL 7.4 /work The default general-use partition if no partition is specified
cms_ocg Per QOS cms_ocg CMS OCG nodes RHEL 7.4 /work_bgfs Limited to College of Marine Science OCG only
cool2022 Per QOS cool22,preempt CMS OCG nodes RHEL 7.4 /work College of Marine Science OOL AMD hardware purchase
hchg 1 week hchg, interactive general-purpose interactive/serial partition RHEL 7.4 /work_bgfs *See Preemption Guidelines for more info
himem 1 week memaccess large memory job queue (>= 64 GB) RHEL 7.4 /work To request access, email rc-help@usf.edu
margres_2020 Per QOS margres20,preempt Margres lab nodes, Integrative Biology RHEL 7.4 /work_bgfs
muma_2021 Per QOS muma21, preempt_short MUMA RHEL 7.4 /work_bgfs
qcg_gayles_2022 Per QOS qcg_gayles22, physics22, preempt Physics 2022 hardware purchase RHEL 7.4 /work_bgfs Joint hardware purchase within Physics
rra 1 week rra, rra_guest Genomics Center/Restricted Research RHEL 7.4 /work HIPAA certification required & audited BeeGFS file system. Uses Omni-Path.
rra_con2020 35 days rradl Genomics Center/College of Nursing deep learning partition RHEL 7.4 /work HIPAA certification required & audited BeeGFS file system. Uses HDR Infiniband. Access limited to approved personnel and workflows only.
simmons_itn18 Per QOS sim18,chbme18, preempt, preempt_short Chemical Engineering GPU nodes.

(30 minute grace period)

RHEL 7.4 /work_bgfs *See Preemption Guidelines for more info. 3 GPU's (GTX 1080 Ti) per node. Uses Omni-Path.
snsm_itn19 Per QOS openaccess, snsm19, snsm19_long, snsm19_special SNSM grant nodes RHEL 7.4 /work_bgfs 1 GPU (GTX 1070 Ti) per node. Uses Omni-Path.
  • Note: For jobs requiring longer than 1 week to run, please email rc-help@usf.edu with your project details (hardware/runtime requested, duration of project, etc).

SC Partitions

SC Partition Layout

The following partitions (aka queues) are available on SC:

Queue Name Max Runtime QOS' Required Description Notes
sc 2 days none default general-purpose queue The default partition if no partition is specified