RRA Data Archiving

Storing and Archiving Data on RRA

Research computing is committed to partnering with researchers to ensure computational resources are available and reliable for our community. It is important for everyone to understand the proper usage, and limitations on our storage systems. The basic levels of storage offered through Research Computing, as outlined below, are provided to USF researchers. However, these resources are not unlimited, and there must be quotas set to provide reliable availability and access.

In addition to total storage size, we must also consider the number of files and directories stored. Most, if not all, modern file systems rely on data structures that contain information about the files. On Unix-based file systems, this information is contained in inodes. Each file and directory has an associated inode, which contains metadata (data that describes data) as well as owner and permission details. Each file system has a designated number of inodes set at the creation of the file system. The number of inodes is chosen to give the best balance between performance and capacity. Unfortunately, it is possible to run out of inodes before the capacity of the file system is reached. This is usually the result of a large number of very small files being written. When this happens, new files cannot be created on the file system, even though there may be free space available.

Fortunately, there are ways to work around the inode limit, primarily by compressing a large number a files into a single archive. The rest of this page will describe the file space available, and how to use the Linux tar command to compress and then access files.

Directories and usage

The following lists the directories, their intended usage, and the limitations:

Home Directory (/home)

Storage of research data after processing. The home directories have a 200G size limit with a maximum of 62,600 files/directories (inodes) per user. The user home directories are backed up nightly. Account data will be preserved for a period of 5 years from the time of last affiliation with the university and can be removed thereafter at the discretion of administration.

Group Shares Directory (/shares)

Storage of data and files that is intended to be shared with others in a research group. The shares directories have a 2TB size limit with a maximum of 626,000 files/directories. The group shares directories are backed up on a regular basis. Group shares data will be preserved for a period of 5 years from the time of the group leader's last affiliation with the university and can be removed thereafter at the discretion of administration.

Work Directory (/work )

Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 626,000 files/directories per user.

  • IMPORTANT NOTE: Data stored in /work is not backed up or archived in any way! We strongly encourage that users do not store important data here, and instead copy it to your /home directory (which is backed-up) for safe-keeping. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.

Checking Your Disk Space/File Count Quotas

You can check your quota via use of the rra-myquota command.

[user@rra-login0 ~]$ rra-myquota

RC Filesystem Current Quota Utilization:
Date: Sat Nov  5 16:37:53 EDT 1955

Filesystem  Space Used  Space Quota  File Count  File Count Quota
/home       27.97 GiB   200.00 GiB   143598      409600            
/work       662.07 GiB  2.00 TiB     9025        626600            

For more information about the data above, please refer to the manual page using the command:  man rra-myquota


Archiving and compressing data

When faced with inode limitations, archiving files that are not actively being accessed allows you to free up inodes. Compressing directories and files allows all of the inodes (used for both files and subdirectories) to be encapsulated into one file, thereby reducing the number of inodes you are using. It is important to remove the target files after archiving them to ensure the inode count is in fact reduced. If all inodes are consumed on a file system, new files will not be able to be created, which will cause jobs running on the system to be stopped, and in some cases, users may not be able to log in to the system at all.

In order to assist users with data archiving and management, the following tar examples are provided as reference.

1. tar -cvf archive_name.tar directory-or-files

In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.

2. gzip archive_name.tar

This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension. Files can be extracted from, but not appended to, a compressed archive.

3. tar -rvf archive_name.tar file-to-add

Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.

4. tar -tvf name_of_archive.tar<.gz>

List all directories and files in an archive. This works on both compressed and uncompressed archive files.

5. tar -xvf name_of_archive.tar '*/file1' '*/file2'

This command allows you to extract files from an archive, but will not work for compressed archives. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.

If you need more help with archiving files, please contact Research Computing at rc-help@usf.edu