CIRCE Data Archiving
Storing and Archiving Data on CIRCE
Research computing is committed to partnering with researchers to ensure computational resources are available and reliable for our community. It is important for everyone to understand the proper usage, and limitations on our storage systems. The basic levels of storage offered through Research Computing, as outlined below, are provided to USF researchers. However, these resources are not unlimited, and there must be quotas set to provide reliable availability and access.
In addition to total storage size, we must also consider the number of files and directories stored. Most, if not all, modern file systems rely on data structures that contain information about the files. On Unix-based file systems, this information is contained in inodes. Each file and directory has an associated inode, which contains metadata (data that describes data) as well as owner and permission details. Each file system has a designated number of inodes set at the creation of the file system. The number of inodes is chosen to give the best balance between performance and capacity. Unfortunately, it is possible to run out of inodes before the capacity of the file system is reached. This is usually the result of a large number of very small files being written. When this happens, new files cannot be created on the file system, even though there may be free space available.
Fortunately, there are ways to work around the inode limit, primarily by compressing a large number a files into a single archive. The rest of this page will describe the file space available, and how to use the Linux tar command to compress and then access files.
Directories and usage
The following lists the directories, their intended usage, and the limitations:
Home Directory (/home)
Storage of research data after processing. The home directories have a 200G size limit with a maximum of 409,600 files/directories (inodes) per user. The user home directories are backed up nightly. Account data will be preserved for a period of 5 years from the time of last affiliation with the university and can be removed thereafter at the discretion of administration.
Storage of data and files that is intended to be shared with others in a research group. The shares directories have a 2TB size limit with a maximum of 626,000 files/directories. The group shares directories are backed up on a regular basis. Group shares data will be preserved for a period of 5 years from the time of the group leader's last affiliation with the university and can be removed thereafter at the discretion of administration.
Work Directory (/work or /work_bgfs)
Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 691,200 files/directories per user.
- IMPORTANT NOTE: Data stored in either /work or /work_bgfs is not backed up or archived in any way! We strongly encourage that users do not store important data here, and instead copy it to your /home directory (which is backed-up) for safe-keeping. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.
Checking Your Disk Space/File Count Quotas
When logged-in to CIRCE, you can easily check your disk space and file count quota with the command "myquota". Running the command will produce the following output::
[user@login0 ~]$ myquota RC Filesystem Current Quota Utilization: Date: Sat Nov 5 16:37:53 EDT 1955 Filesystem Space Used Space Quota % of Quota File Count File Count Quota % of Quota CIRCE /home 27.97 GB 200.00 GB 14% 335683 512000 66% CIRCE /shares 9.21 MB 2.00 TB 0% 45374 626000 7% CIRCE /work 662.07 GB 2.00 TB 33% 267530 691200 44% CIRCE /work_bgfs 618.00 GiB 2.00 TiB 31% 9025 626600 1% For more information about the data above, please refer to the manual page using the command: man myquota
This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the manual page for myquota using the command:
[user@login0 ~]$ man myquota
Checking Your Disk Space/File Count Quotas on BeeGFS
When logged-in to CIRCE, you can easily check your disk space and file count quota with the command "beegfs-ctl --getquota --uid username". Running the command will produce an output similar to::
[user@login0 ~]$ beegfs-ctl --getquota --uid user Quota information for storage pool BeeGFS_work (ID: 2): user/group || size || chunk files name | id || used | hard || used | hard --------------|------ ||------------|------------||---------|--------- user|0001234|| 30.00 GiB| 2.00 TiB|| 800| 626000
This will show you all of your existing Space/File Count quotas on the filesystems that you have access to. More information can be found at the help page for beegfs-ctl --getquota using the command:
[user@login0 ~]$ beegfs-ctl --getquota --help
To check the disk space and file count quota of a shared group. Use the command "beegfs-ctl --getquota --gid groupname". Running the command will produce an output similar to::
[user@login0 ~]$ beegfs-ctl --getquota --gid 0001234 Quota information for storage pool BeeGFS_shares (ID: 3): user/group || size || chunk files name | id || used | hard || used | hard --------------|------ ||------------|------------||---------|--------- user|0001234|| 22.00 TiB| 50.00 TiB|| 2250000| 20000000
Archiving and compressing data
When faced with inode limitations, archiving files that are not actively being accessed allows you to free up inodes. Compressing directories and files allows all of the inodes (used for both files and subdirectories) to be encapsulated into one file, thereby reducing the number of inodes you are using. It is important to remove the target files after archiving them to ensure the inode count is in fact reduced. If all inodes are consumed on a file system, new files will not be able to be created, which will cause jobs running on the system to be stopped, and in some cases, users may not be able to log in to the system at all.
In order to assist users with data archiving and management, the following tar examples are provided as reference.
1. tar -cvf archive_name.tar directory-or-files
- In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.
2. gzip archive_name.tar
- This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension. Files can be extracted from, but not appended to, a compressed archive.
3. tar -rvf archive_name.tar file-to-add
- Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.
4. tar -tvf name_of_archive.tar<.gz>
- List all directories and files in an archive. This works on both compressed and uncompressed archive files.
5. tar -xvf name_of_archive.tar '*/file1' '*/file2'
- This command allows you to extract files from an archive, but will not work for compressed archives. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.
If you need more help with archiving files, please contact Research Computing at email@example.com