CIRCE Data Management

Revision as of 18:33, 23 June 2016 by Botto (talk | contribs) (Created page with "= Managing your data on CIRCE = This page will provide guidelines for managing your data effectively on CIRCE. Since we have various storage locations available as well as di...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Managing your data on CIRCE

This page will provide guidelines for managing your data effectively on CIRCE. Since we have various storage locations available as well as differing rules for how those storage locations are managed, its good to have an idea of what you should do to ensure your data is on the best location depending on your requirements.

Guidelines for Running Jobs

1. Use /work (as opposed to your /home directory) as your storage location for running jobs. This file system was designed for that purpose and prevents disk-heavy applications from slowing down the system for other users. There is also a large amount of space available so it is much less likely that you will run into any quota issues. Do not use your /home directory for running jobs on the cluster. The /home file system was not and is not designed for, nor intended for, handling the I/O created by jobs running on the cluster.

2. Move data that you would like to keep to /home, as /work is not backed up, is not redundant, and files older than 6 months are purged. You do not want to lose important data, so do not use /work for permanent storage!

3. Compress results that you would like to store permanently! There is no reason not to do this and it helps you to keep you under your quota, allowing you to store more results.

Running Jobs Example

Typically, jobs should be run from a staging directory under /work. This can be accomplished by creating a directory under /work for your job input files and resulting output. We’ll show an example below:

Create the directory

[user@login0 ~]$ mkdir $WORK/myjob

Put your input files inside the directory

[user@login0 ~]$ cp INPUT1 INPUT2 INPUT3 $WORK/myjob

Next, lets change to our job directory:

[user@login0 ~]$ cd $WORK/myjob

Let’s next create our submit script, and then run the job. We’ll be running myapp against the input files in this directory, so lets create a submit script (named myjob.sh), with the following contents:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=10:00:00
#SBATCH --ntasks=16
#SBATCH --output=output.%j.myjob

module add apps/myapp/1.0

myapp < INPUT*

Then, we can submit the job to the SLURM scheduler:

[user@login0 myjob]$ sbatch ./myjob.sh
...

After we run and the job completes, there should be output as well

[user@login0 myjob]$ ls
INPUT1 INPUT2 INPUT3 OUTPUT1 OUTPUT2 OUTPUT3 output.45698.myjob

Let’s do some post-processing and review our data while its on /work

[user@login0 myjob]$ 

Good. we have what we need, but we should store our calculation somewhere safe to recall later. We’ll put the directory in a compressed archive in our /home directory where it will be backed up and kept safe

[user@login0 myjob]$ pushd ..; tar -czvf $HOME/myjob.tar.gz myjob; popd
...

Let’s go ahead and remove the job directory from /work to keep our disk utilization low

[user@login0 myjob]$ cd
[user@login0 ~]$ rm -rf $WORK/myjob