File Management: Difference between revisions

From Research Computing Center Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:


To help us optimize storage space and improve system performance, we kindly ask you to compress your files into archives (e.g., tar files) rather than storing numerous individual files in the /project file system. Compressing files not only reduces the amount of storage used but also simplifies file management (e.g. file back up and recovery) and transfer.
To help us optimize storage space and improve the /project file system performance, we kindly ask you to compress your files into archives (e.g., tar files) rather than storing numerous individual files in the /project file system. Compressing files not only reduces the amount of storage used but also simplifies file management (e.g. file back up and recovery) and transfer.





Revision as of 15:22, 24 September 2024

To help us optimize storage space and improve the /project file system performance, we kindly ask you to compress your files into archives (e.g., tar files) rather than storing numerous individual files in the /project file system. Compressing files not only reduces the amount of storage used but also simplifies file management (e.g. file back up and recovery) and transfer.


File Compression

In order to save space, please compress your files before transferring them into your group's /project file system. The gzip command can be used to compress files, but it uses a single thread on a single core.

The pigz command is a parallel implementation of gzip that can run with multiple threads, making use of multiple cores. The unpigz command is equivalent to gunzip and it can be used to uncompress gzip'ed files. The .gz files created by pigz is compatible with gzip/gunzip. The pigz command is particularly helpful to compress a large number of files (or a folder) or to compress large files.

The compute nodes on Sapelo2 and on the teaching cluster have pigz installed centrally, so you don't need to load any modules in order to use this command. The help page for this command shows the available options, and it can be viewed with the command

pigz --help

Some simple examples

Compress a file

pigz filename

Compress a file with best compression rate

pigz -9 filename
pigz --best filename

Uncompress a file

unpigz filename.gz


We suggest that you run pigz on an interactive session that request multiple cores and run pigz with the '-p num_thread' option to specify the numnber of threads (num_threads) to use.

For example, start an interactive session with 10 cores and 4GB of RAM with

interact -c 10 --mem=4g 

and then run pigz with 10 threads with

pigz -9 -p 10 my_big_file

To recursive compress all files in a directory (e.g. called dirname) use the -r option. For example, using 10 threads

pigz -9 -r -p 10 dirname

To recursive uncompress all files in a directory:

unpigz -r dirname


Creating tar files

Having a large number of files in a file system can overload the storage metadata server and delay data recovery from backups, etc. If you need to store a large number of files in your group's /project area, instead of storing a large number of individual files, please first create a tar file with the files or with a directory, and transfer the tar file to /project.

The tar command can be run in interactive job on Sapelo2 with

tar cvf dirname.tar dirname

Note: Please do not run tar directly on the Sapelo2 login nodes.

A tar file can be compressed with pigz using multiple cores in an interactive session that requested multiple cores with

pigz dirname.tar

Alternatively, you could use pigz to compress the files in your directory, before creating a tar file.

To extract the files from a tar file:

tar xvf dirname.tar 

To extract a tar.gz file:

tar zxvf dirname.tar.gz