Difference between revisions of "CIRCE Data Archiving"

(Created page with "= Storing and Archiving Data on CIRCE = Research computing is committed to partnering with researchers to ensure computational resources are available and reliable for all re...")
 
Line 22: Line 22:


1. tar -cvf archive_name.tar directory-or-files
1. tar -cvf archive_name.tar directory-or-files
  In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.
::In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.


2. gzip archive_name.tar
2. gzip archive_name.tar
  This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension.  Files can be extracted from, but not appended to, a compressed archive.
::This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension.  Files can be extracted from, but not appended to, a compressed archive.


3. tar -rvf archive_name.tar file-to-add  
3. tar -rvf archive_name.tar file-to-add  
  Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.
::Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.


4. tar -tvf name_of_archive.tar<.gz>
4. tar -tvf name_of_archive.tar<.gz>
  List all directories and files in an archive. This works on both compressed and uncompressed archive files.
::List all directories and files in an archive. This works on both compressed and uncompressed archive files.


5. tar -xvf name_of_archive.tar<.gz> '*/file1' '*/file2'
5. tar -xvf name_of_archive.tar<.gz> '*/file1' '*/file2'
  This command allows you to extract files from an archive. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.
::This command allows you to extract files from an archive. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.

Revision as of 17:30, 14 July 2017

Storing and Archiving Data on CIRCE

Research computing is committed to partnering with researchers to ensure computational resources are available and reliable for all research users. To this end it is important for research users to understand the proper usage, and limitations on our storage systems. Our basic levels of storage, as outlined below, are provided free of charge to research users. Since these resources are not unlimited, we must set user quotas to ensure availability and access for all users.

In addition to total storage size, we must also consider the number of files and directories stored. Unix-style file systems rely on data structures that contain information about the files, besides the file content. Each file and directory has an associated inode, which contains metadata (data that describes data) as well as owner and permission details. Each file system has a designated number of inodes, so it is possible to run out of inodes. When this happens, new files cannot be created on the file system, even though there may be free space available.

Directories and usage

The following lists the directories, their intended usage, and the limitations:

Home Directory (/home): Storage of research data after processing. The home directories have a 200G size limit with a maximum of 62,600 files/directories (inodes) per user. The user home directories are backed up nightly.

Group Shares Directory (/shares): Storage of data and files that is intended to be shared with others in a research group. The shares directories have a 2TB size limit with a maximum of 626,000 files/directories. The group shares directories are backed up on a regular basis.

Work Directory (/work): Scratch space for research data being processed. The work directories have a 2TB limit with a maximum of 626,000 files/directories per user. Data stored on the work directory file system is not backed up in any way. We strongly encourage that users do not store important data here. Although every attempt will be made to not do so without warning, data on this partition can be removed at any time to ensure system operability.

Archiving and compressing data

When faced with inode limitations, archiving your unused files allows you to free up inode space. Compressing directories and files allows all of the inodes (used for both files and subdirectories) to be encapsulated into one file, thereby reducing the number of inodes you are using. It is important to remove the target files after archiving them to ensure the inode count is in fact reduced. If all inodes are consumed on a file system, new files will not be able to be created, which will cause jobs running on the system to be stopped, and in some cases, users may not be able to log in to the system at all.

In order to assist users with data archiving and management, the following tar examples are provided as reference.

1. tar -cvf archive_name.tar directory-or-files

In the example above, a new archive file will be created using the contents of a directory. The "directory-or-files" variable can be either a directory containing the files to be archived, or it can be a space separated list of files and directories.

2. gzip archive_name.tar

This will compress (size reduction) the archive file using gzip. The resulting file will have the ".tar.gz" extension. Files can be extracted from, but not appended to, a compressed archive.

3. tar -rvf archive_name.tar file-to-add

Using the "-r" flag will add a file to an existing archive file. As in examples 1 and 3, the "file-to-add" can be a single file or a list of files and directories. Only uncompressed .tar archives can have files appended to them, however.

4. tar -tvf name_of_archive.tar<.gz>

List all directories and files in an archive. This works on both compressed and uncompressed archive files.

5. tar -xvf name_of_archive.tar<.gz> '*/file1' '*/file2'

This command allows you to extract files from an archive. The single quotes and asterisk allow you to extract multiple files without specifying the full path in the archive.