Difference between revisions of "SLURM Active QOS'"

(Created page with "= Active QOS'= Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt...")
 
Line 1: Line 1:
= Active QOS'=
= Active QOS'=


Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt" QOS (example: ''sbatch --partition=mri2016 --qos=preempt ./submit-script.sh''). The caveat however is that as this is contributor hardware, non-contributor jobs running on this partition are subject to preemption.
The following QOS' are configured on SLURM and are assigned based upon several factors, usually due to account inheritance.


There is a partition-specified grace period (listed above, typically 2 hours) before a contributor’s job(s) will cancel the non-contributors job(s). This means that any user taking advantage of the hardware should have some kind of check-pointing enabled, so that interrupted jobs can be re-submitted without needing to start over.
{| class=wikitable
|- style="background-color:#f1edbe;"
|'''QOS'''
|'''Base Priority'''
|'''Preemptable'''
|'''Preempted by'''
|'''Grace Period'''
|'''Maximum runtime'''
|'''Maximum permitted resources'''
|'''Maximum submitted jobs'''
|'''Available partition(s)'''
|-
|ac
|1000
|NO
|N/A
|N/A
|7 days
|2048 CPU's
|2000
|circe, cuda
|-
|cms16
|2000
|NO
|N/A
|N/A
|7 days
|384 CPU's
|12001
|cms2016
|-
|deadline
|15000
|NO
|N/A
|N/A
|3 days
|384 CPU's
|30
|circe, cuda
|-
|devel
|0
|NO
|N/A
|N/A
|30 minutes
|24 CPU's & 2 nodes
|3
|devel
|-
|el7
|1000
|NO
|N/A
|N/A
|2 days
|16 CPU's and 2 nodes
|4
|bgfsqdr
|-
|el7_cms
|1000
|NO
|N/A
|N/A
|7 days
|180 CPU's
|100
|bgfsqdr
|-
|faculty
|50
|NO
|N/A
|N/A
|7 days
|2048 CPU's
|1024
|circe, cuda
|-
|hen18
|1000
|NO
|N/A
|N/A
|UNLIMITED
|UNLIMITED
|UNLIMITED
|henderson_itn18
|-
|ic
|1100
|NO
|N/A
|N/A
|7 days
|2048 CPU's
|2000
|circe, cuda
|-
|longrun
|1000
|NO
|N/A
|N/A
|35 days
|384 CPU's
|80
|circe, cuda, himem
|-
|memaccess
|1000
|NO
|N/A
|N/A
|7 days
|UNLIMITED
|100
|himem
|-
|mri16
|2000
|NO
|N/A
|N/A
|7 days
|1560 CPU's
|500
|mri2016
|-
|mri16_npi
|1000
|NO
|N/A
|N/A
|7 days
|1560 CPU's
|250
|mri2016
|-
|normal
|0
|NO
|N/A
|N/A
|7 days
|1024 CPU's
|10000
|circe, cuda
|-
|preempt
|500
|YES
|cms16, mri16, mri16_npi
|2 hours
|7 days
|1024 CPU's
|2000
|mri2016
|-
|rra
|1000
|NO
|N/A
|N/A
|7 days
|1200 CPU's
|500
|rra
|-
|sim18
|1000
|NO
|N/A
|N/A
|UNLIMITED
|432 CPU's
|UNLIMITED
|simmons_itn18
|-
|snsm19
|2000
|NO
|N/A
|N/A
|2 days
|400 CPU's, 100 CPU's per job
|50
|snsm_itn19
|-
|snsm19_long
|2000
|NO
|N/A
|N/A
|5 days
|40 CPU's and 3 nodes
|50
|snsm_itn19
|-
|trial
|0
|NO
|N/A
|N/A
|7 days
|UNLIMITED
|3
|devel
|-
|}

Revision as of 22:31, 29 May 2019

Active QOS'

The following QOS' are configured on SLURM and are assigned based upon several factors, usually due to account inheritance.

QOS Base Priority Preemptable Preempted by Grace Period Maximum runtime Maximum permitted resources Maximum submitted jobs Available partition(s)
ac 1000 NO N/A N/A 7 days 2048 CPU's 2000 circe, cuda
cms16 2000 NO N/A N/A 7 days 384 CPU's 12001 cms2016
deadline 15000 NO N/A N/A 3 days 384 CPU's 30 circe, cuda
devel 0 NO N/A N/A 30 minutes 24 CPU's & 2 nodes 3 devel
el7 1000 NO N/A N/A 2 days 16 CPU's and 2 nodes 4 bgfsqdr
el7_cms 1000 NO N/A N/A 7 days 180 CPU's 100 bgfsqdr
faculty 50 NO N/A N/A 7 days 2048 CPU's 1024 circe, cuda
hen18 1000 NO N/A N/A UNLIMITED UNLIMITED UNLIMITED henderson_itn18
ic 1100 NO N/A N/A 7 days 2048 CPU's 2000 circe, cuda
longrun 1000 NO N/A N/A 35 days 384 CPU's 80 circe, cuda, himem
memaccess 1000 NO N/A N/A 7 days UNLIMITED 100 himem
mri16 2000 NO N/A N/A 7 days 1560 CPU's 500 mri2016
mri16_npi 1000 NO N/A N/A 7 days 1560 CPU's 250 mri2016
normal 0 NO N/A N/A 7 days 1024 CPU's 10000 circe, cuda
preempt 500 YES cms16, mri16, mri16_npi 2 hours 7 days 1024 CPU's 2000 mri2016
rra 1000 NO N/A N/A 7 days 1200 CPU's 500 rra
sim18 1000 NO N/A N/A UNLIMITED 432 CPU's UNLIMITED simmons_itn18
snsm19 2000 NO N/A N/A 2 days 400 CPU's, 100 CPU's per job 50 snsm_itn19
snsm19_long 2000 NO N/A N/A 5 days 40 CPU's and 3 nodes 50 snsm_itn19
trial 0 NO N/A N/A 7 days UNLIMITED 3 devel