CIRCE Data Management

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Managing your data on CIRCE

This page will provide guidelines for managing your data effectively on CIRCE. Since we have various storage locations available as well as differing rules for how those storage locations are managed, its good to have an idea of what you should do to ensure your data is on the best location depending on your requirements.

Guidelines for Running Jobs

1. Use /work or /work_bgfs (as opposed to your /home directory) as your storage location for running jobs. Do not use your /home directory for running jobs on the cluster.

2. Move data that you would like to keep to /home, as /work and /work_bgfs are not backed up, are not redundant, and files older than 6 months are purged. You do not want to lose important data, so do not use /work nor /work_bgfs for permanent storage!

3. Compress results that you would like to store permanently! There is no reason not to do this and it helps you to keep you under your quota, allowing you to store more results. See Data Storage and Archiving for more info.

Running Jobs Example

Typically, jobs should be run from a staging directory under /work or /work_bgfs. This can be accomplished by creating a directory under /work or /work_bgfs for your job input files and resulting output. We’ll show an example below using /work:

1. Create the directory:

[user@login0 ~]$ mkdir $WORK/myjob

 

2. Put your input files inside the directory:

[user@login0 ~]$ cp INPUT1 INPUT2 INPUT3 $WORK/myjob

 

3. Next, lets change to our job directory:

[user@login0 ~]$ cd $WORK/myjob

 

4. Let’s next create our submit script, and then run the job. We’ll be running myapp against the input files in this directory, so lets create a submit script (named myjob.sh), with the following contents:

#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --time=10:00:00
#SBATCH --ntasks=16
#SBATCH --output=output.%j.myjob

module add apps/myapp/1.0

myapp < INPUT*

 

5.Then, we can submit the job to the SLURM scheduler:

[user@login0 myjob]$ sbatch ./myjob.sh

 

6. After we run and the job completes, there should be output as well:

[user@login0 myjob]$ ls
INPUT1 INPUT2 INPUT3 OUTPUT1 OUTPUT2 OUTPUT3 output.45698.myjob

 

7. Let’s do some post-processing and review our data while its on /work:

[user@login0 myjob]$ 

 

8. Good. we have what we need, but we should store our calculation somewhere safe to recall later. We’ll put the directory in a compressed archive in our /home directory where it will be backed up and kept safe:

[user@login0 myjob]$ pushd ..; tar -czvf $HOME/myjob.tar.gz myjob; popd
...

 

9. Let’s go ahead and remove the job directory from /work to keep our disk utilization low:

[user@login0 myjob]$ cd
[user@login0 ~]$ rm -rf $WORK/myjob

 


Checking Your Disk Space/File Count Quotas

When logged-in to CIRCE, you can easily check your disk space and file count quota with the command "myquota". Running the command will produce the following output::

[user@login0 ~]$ myquota

RC Filesystem Current Quota Utilization:
Date: Sat Nov  5 16:37:53 EDT 1955


Filesystem        Space Used  Space Quota  % of Quota  File Count  File Count Quota  % of Quota 
CIRCE /home       27.97 GB    200.00 GB    14%         335683      512000            66% 
CIRCE /shares     9.21 MB     2.00 TB      0%          45374       626000            7% 
CIRCE /work       662.07 GB   2.00 TB      33%         267530      691200            44% 
CIRCE /work_bgfs  618.00 GiB  2.00 TiB     31%         9025        626600            1%


For more information about the data above, please refer to the manual page using the command:  man myquota


This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the manual page for myquota using the command:

[user@login0 ~]$ man myquota

Checking Your Disk Space/File Count Quotas on BeeGFS

When logged-in to CIRCE, you can easily check your disk space and file count quota with the command "beegfs-ctl --getquota --uid username". Running the command will produce an output similar to::

[user@login0 ~]$ beegfs-ctl --getquota --uid user

Quota information for storage pool BeeGFS_work (ID: 2):

      user/group      ||           size          ||    chunk files
     name     |  id   ||    used    |    hard    ||  used   |  hard
--------------|------ ||------------|------------||---------|---------
          user|0001234||   30.00 GiB|    2.00 TiB||      800|   626000


This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the help page for beegfs-ctl --getquota using the command:

[user@login0 ~]$ beegfs-ctl --getquota --help


To check the disk space and file count quota of a shared group. Use the command "beegfs-ctl --getquota --gid groupname". Running the command will produce an output similar to::

[user@login0 ~]$ beegfs-ctl --getquota --gid 0001234

Quota information for storage pool BeeGFS_shares (ID: 3):

      user/group      ||           size          ||    chunk files
     name     |  id   ||    used    |    hard    ||  used   |  hard
--------------|------ ||------------|------------||---------|---------
          user|0001234||   22.00 TiB|   50.00 TiB||  2250000| 20000000