Borg backup
Introduction
The Borg backup software is a deduplicating backup program. It supports compression using various codecs as well.
The borg backup software is installed on all xfer nodes. You can use this software to archive your files. If you have a large amount of data this could possibly reduce storage by deduplicating and compressing your data. This is especially true for large genomic datasets.
Example showing archiving Lustre data into your lab's project directory
Initializing a borg repository
Borg uses a repository (a special directory in your filesystem) to store the backup data. You will have to initialize a repository before writing backup data to it. You can create the repository in any filesystem. In this example, I am storing the repository in my project filesystem. Please note that you will need to be logged on the xfer node to issue these commands. Run the following command to initialize the repository.
$ borg init --encryption none /project/gclab/raj76/my_project
If successful, the command will not return/print anything.
Creating a backup
The next step is to create a backup to the repository. Run the following command to create a backup with deduplication and fast compression.
$ borg create -s --compression auto,lz4 /project/gclab/raj76/my_project::lustre1-{now} /lustre1/raj76/my_project
In the above command ::lustre1-{now} is the name of the archive that will be created by this backup. The {now} shorthand tells borg backup to use the current time stamp as part of the archive name. This will be useful later to identify archives. The above command should be run in a screen or tmux session as this will take a while for large datasets.
Checking the repository
You can check the repository for consistency using the following command.
$ borg check /project/gclab/raj76/my_project
You can list the contents of the repository using the following command. This will lists the archives in the repo.
$ borg list /project/gclab/raj76/my_project lustre-2018-09-01T23:26:59 Sat, 2018-09-01 23:27:00 [d2365ef51f205d20428c3df74bc9ae9ffadb779283f806722acc6e282b6abd27]
You can list the files in each archive using a similar command.
$ borg list /project/gclab/raj76/my_project::lustre-2018-09-01T23:26:59 drwxrwx--- raj76 jlmlab 0 Wed, 2018-04-11 23:39:20 scratch/raj76/Dogwood drwx------ raj76 jlmlab 0 Wed, 2017-08-30 08:40:08 scratch/raj76/Dogwood/PASA_runs drwxr-xr-x raj76 jlmlab 0 Tue, 2017-09-05 01:37:31 scratch/raj76/Dogwood/PASA_runs/run_2 drwxr-xr-x raj76 jlmlab 0 Mon, 2017-09-04 15:54:19 scratch/raj76/Dogwood/PASA_runs/run_2/compreh_init_build -rw-r--r-- raj76 jlmlab 141790367 Wed, 2017-08-30 22:53:47 scratch/raj76/Dogwood/PASA_runs/run_2/StringTie-Merged.gtf.fasta -rw-r--r-- raj76 jlmlab 80133357 Mon, 2017-09-04 17:18:08 scratch/raj76/Dogwood/PASA_runs/run_2/pasa_spakala_201.assemblies.fasta.transdecoder.pep -rw------- raj76 jlmlab 63993718 Tue, 2017-09-05 01:37:46 scratch/raj76/Dogwood/PASA_runs/run_2/pasa_spakala_201.assemblies.fasta.transdecoder.pep-longest-ORFs.fasta drwxr-xr-x raj76 jlmlab 0 Wed, 2017-08-30 21:44:27 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir -rw-r--r-- raj76 jlmlab 37565886 Wed, 2017-08-30 13:37:15 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa -rw-r--r-- raj76 jlmlab 0 Wed, 2017-08-30 21:29:35 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.completed -rw-r--r-- raj76 jlmlab 51667326 Wed, 2017-08-30 21:36:05 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.top_1 -rw-r--r-- raj76 jlmlab 0 Wed, 2017-08-30 21:36:48 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.top_1.completed -rw-r--r-- raj76 jlmlab 36687377 Wed, 2017-08-30 13:37:55 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa -rw-r--r-- raj76 jlmlab 0 Wed, 2017-08-30 21:29:35 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa.pslx.completed -rw-r--r-- raj76 jlmlab 49636612 Wed, 2017-08-30 21:36:01 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa.pslx.top_1
Accessing the data in the backup archives
One of the nice features of borg backup is the ability to mount a backup archive as a FUSE filesystem and access the files in the backup as if they are in a real directory tree. Create a mount point in your home directory and mount the borg backup repo using the following commands.
$ mkdir /home/raj76/backup_mount $ borg mount /project/gclab/raj76/my_project /home/raj76/backup_mount $ cd /home/raj76/backup_mount $ ls -l total 0 drwxr-xr-x. 1 raj76 rccstaff 0 Sep 1 23:27 lustre-2018-09-01T23:26:59
You can browse the archive directories and access the files in the backup and if need be copy the files to an area outside of the backup. The borg backup mount is a read-only mount and files and directories cannot be changed or deleted from the archive. You can unmount the backup using the following command.
$ cd $ borg umount /home/raj76/backup_mount
More information
Please refer to the Borg backup manual for more information and detailed explanation of the available options. https://borgbackup.readthedocs.io/en/stable/index.html