Difference between revisions of "CIRCE Data Archiving"

Line 17: Line 17:
Work Directory (/work): Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 626,000 files/directories per user. Data stored on the work directory file system '''is not''' backed up in any way. We strongly encourage that users do not store important data here. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.
Work Directory (/work): Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 626,000 files/directories per user. Data stored on the work directory file system '''is not''' backed up in any way. We strongly encourage that users do not store important data here. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.


== Checking my quota ==
When logged-in to CIRCE/SC, you can easily check your disk space and file count quota with the command "myquota". Running the command will produce the following output:
<pre style="white-space:pre-wrap; width:75%; border:1px solid lightgrey; background:#000000; color:white;">
[user@login0 ~]$ myquota
RC Filesystem Current Quota Utilization:
Date: Sat Nov  5 16:37:53 EDT 1955
Filesystem    Space Used  Space Quota  % of Quota  File Count  File Count Quota  % of Quota
SC /home      10.10 GB    500.00 GB    2%          123456      409600            30%
CIRCE /home    27.97 GB    200.00 GB    14%        335683      512000            66%
CIRCE /shares  9.21 MB    500.00 GB    1%          45374      300000            15%
CIRCE /work    662.07 GB  2.00 TB      33%        267530      614400            44%
SC /shares    none                                                                           
For more information about the data above, please refer to the manual page using the command:  man myquota
</pre>
This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the manual page for myquota using the command:
<pre style="white-space:pre-wrap; width:25%; border:1px solid lightgrey; background:#000000; color:white;">
[user@login0 ~]$ man myquota</pre>
== Archiving and compressing data ==
== Archiving and compressing data ==



Revision as of 20:45, 21 March 2018

Storing and Archiving Data on CIRCE

Research computing is committed to partnering with researchers to ensure computational resources are available and reliable for our community. It is important for everyone to understand the proper usage, and limitations on our storage systems. The basic levels of storage offered through Research Computing, as outlined below, are provided to USF researchers. However, these resources are not unlimited, and there must be quotas set to provide reliable availability and access.

In addition to total storage size, we must also consider the number of files and directories stored. Most, if not all, modern file systems rely on data structures that contain information about the files. On Unix-based file systems, this information is contained in inodes. Each file and directory has an associated inode, which contains metadata (data that describes data) as well as owner and permission details. Each file system has a designated number of inodes set at the creation of the file system. The number of inodes is chosen to give the best balance between performance and capacity. Unfortunately, it is possible to run out of inodes before the capacity of the file system is reached. This is usually the result of a large number of very small files being written. When this happens, new files cannot be created on the file system, even though there may be free space available.

Fortunately, there are ways to work around the inode limit, primarily by compressing a large number a files into a single archive. The rest of this page will describe the file space available, and how to use the Linux tar command to compress and then access files.

Directories and usage

The following lists the directories, their intended usage, and the limitations:

Home Directory (/home): Storage of research data after processing. The home directories have a 200G size limit with a maximum of 62,600 files/directories (inodes) per user. The user home directories are backed up nightly.

Group Shares Directory (/shares): Storage of data and files that is intended to be shared with others in a research group. The shares directories have a 2TB size limit with a maximum of 626,000 files/directories. The group shares directories are backed up on a regular basis.

Work Directory (/work): Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 626,000 files/directories per user. Data stored on the work directory file system is not backed up in any way. We strongly encourage that users do not store important data here. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.

Checking my quota

When logged-in to CIRCE/SC, you can easily check your disk space and file count quota with the command "myquota". Running the command will produce the following output:

[user@login0 ~]$ myquota

RC Filesystem Current Quota Utilization:
Date: Sat Nov  5 16:37:53 EDT 1955


Filesystem     Space Used  Space Quota  % of Quota  File Count  File Count Quota  % of Quota
SC /home       10.10 GB    500.00 GB    2%          123456      409600            30% 
CIRCE /home    27.97 GB    200.00 GB    14%         335683      512000            66% 
CIRCE /shares  9.21 MB     500.00 GB    1%          45374       300000            15% 
CIRCE /work    662.07 GB   2.00 TB      33%         267530      614400            44% 
SC /shares     none                                                                            


For more information about the data above, please refer to the manual page using the command:  man myquota


This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the manual page for myquota using the command:

[user@login0 ~]$ man myquota

Archiving and compressing data

When faced with inode limitations, archiving files that are not actively being accessed allows you to free up inodes. Compressing directories and files allows all of the inodes (used for both files and subdirectories) to be encapsulated into one file, thereby reducing the number of inodes you are using. It is important to remove the target files after archiving them to ensure the inode count is in fact reduced. If all inodes are consumed on a file system, new files will not be able to be created, which will cause jobs running on the system to be stopped, and in some cases, users may not be able to log in to the system at all.

In order to assist users with data archiving and management, the following tar examples are provided as reference.

1. tar -cvf archive_name.tar directory-or-files

In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.

2. gzip archive_name.tar

This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension. Files can be extracted from, but not appended to, a compressed archive.

3. tar -rvf archive_name.tar file-to-add

Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.

4. tar -tvf name_of_archive.tar<.gz>

List all directories and files in an archive. This works on both compressed and uncompressed archive files.

5. tar -xvf name_of_archive.tar '*/file1' '*/file2'

This command allows you to extract files from an archive, but will not work for compressed archives. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.

If you need more help with archiving files, please contact Research Computing at rc-help@usf.edu