Difference between revisions of "SLURM Preemption"

(Created page with "= Preemption= Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt"...")
 
Line 1: Line 1:
= Preemption=
= Preemption=


Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt" QOS (example: ''sbatch --partition=mri2016 --qos=preempt ./submit-script.sh''). The caveat however is that as this is contributor hardware, non-contributor jobs running on this partition are subject to preemption.
Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt" QOS (example: ''sbatch --partition=margres_2020 --qos=preempt ./submit-script.sh''). The caveat however is that as this is contributor hardware, non-contributor jobs running on this partition are subject to preemption.


There is a partition-specified grace period (listed above, typically 2 hours) before a contributor’s job(s) will cancel the non-contributors job(s). This means that any user taking advantage of the hardware should have some kind of check-pointing enabled, so that interrupted jobs can be re-submitted without needing to start over.
There is a partition-specified grace period (listed above, typically 2 hours) before a contributor’s job(s) will cancel the non-contributors job(s). This means that any user taking advantage of the hardware should have some kind of check-pointing enabled, so that interrupted jobs can be re-submitted without needing to start over.

Revision as of 15:01, 24 August 2021

Preemption

Some hardware on CIRCE is provided by research contributors. This hardware is available for use by all CIRCE users by specifying the partition and the "preempt" QOS (example: sbatch --partition=margres_2020 --qos=preempt ./submit-script.sh). The caveat however is that as this is contributor hardware, non-contributor jobs running on this partition are subject to preemption.

There is a partition-specified grace period (listed above, typically 2 hours) before a contributor’s job(s) will cancel the non-contributors job(s). This means that any user taking advantage of the hardware should have some kind of check-pointing enabled, so that interrupted jobs can be re-submitted without needing to start over.