Borg backup

From Research Computing Center Wiki
Revision as of 14:00, 17 September 2018 by Raj76 (talk | contribs)
Jump to navigation Jump to search

Introduction

The Borg backup software is a deduplicating backup program. It supports compression using various codecs as well.

The borg backup software is installed on all xfer nodes. You can use this software to archive your files. If you have a large amount of data this could possibly reduce storage by deduplicating and compressing your data. This is especially true for large genomic datasets.

Example showing archiving lustre project directory

Initializing a borg repository

Borg uses a repository (a special directory in your filesystem) to store the backup data. You will have to initialize a repository before writing backup data to it. You can create the repository in any filesystem. In this example I am storing the repository in my project filesystem. Run the following command to initialize the repository.

$ borg init --encryption none /project/gclab/raj76/my_project

If successful the command will not return/print anything.

Creating a backup

The next step is to create a backup to the repository. Run the following command to create a backup with deduplication and fast compression.

$ borg create -s --compression auto,lz4 /project/gclab/raj76/my_project::lustre1-{now} /lustre1/raj76/my_project

In the above command ::lustre1-{now} is the name of the archive that will be created by this backup. The {now} shorthand tells borg backup to use the current time stamp as part of the archive name. This will be useful later to identify archives. The above command should be run in a screen or tmux session as this will take a while for large datasets.

Checking the repository

You can check the repository for consistency using the following command.

$ borg check /project/gclab/raj76/my_project

You can list the contents of the repository using the following command. This will lists the archives in the repo.

$ borg list /project/gclab/raj76/my_project
lustre-2018-09-01T23:26:59    Sat, 2018-09-01 23:27:00 [d2365ef51f205d20428c3df74bc9ae9ffadb779283f806722acc6e282b6abd27]

You can list the files in each archive using a similar command.

$ borg list /project/gclab/raj76/my_project::lustre-2018-09-01T23:26:59
drwxrwx--- raj76  jlmlab        0 Wed, 2018-04-11 23:39:20 scratch/raj76/Dogwood
drwx------ raj76  jlmlab        0 Wed, 2017-08-30 08:40:08 scratch/raj76/Dogwood/PASA_runs
drwxr-xr-x raj76  jlmlab        0 Tue, 2017-09-05 01:37:31 scratch/raj76/Dogwood/PASA_runs/run_2
drwxr-xr-x raj76  jlmlab        0 Mon, 2017-09-04 15:54:19 scratch/raj76/Dogwood/PASA_runs/run_2/compreh_init_build
-rw-r--r-- raj76  jlmlab 141790367 Wed, 2017-08-30 22:53:47 scratch/raj76/Dogwood/PASA_runs/run_2/StringTie-Merged.gtf.fasta
-rw-r--r-- raj76  jlmlab 80133357 Mon, 2017-09-04 17:18:08 scratch/raj76/Dogwood/PASA_runs/run_2/pasa_spakala_201.assemblies.fasta.transdecoder.pep
-rw------- raj76  jlmlab 63993718 Tue, 2017-09-05 01:37:46 scratch/raj76/Dogwood/PASA_runs/run_2/pasa_spakala_201.assemblies.fasta.transdecoder.pep-longest-ORFs.fasta
drwxr-xr-x raj76  jlmlab        0 Wed, 2017-08-30 21:44:27 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir
-rw-r--r-- raj76  jlmlab 37565886 Wed, 2017-08-30 13:37:15 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa
-rw-r--r-- raj76  jlmlab        0 Wed, 2017-08-30 21:29:35 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.completed
-rw-r--r-- raj76  jlmlab 51667326 Wed, 2017-08-30 21:36:05 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.top_1
-rw-r--r-- raj76  jlmlab        0 Wed, 2017-08-30 21:36:48 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.0.fa.pslx.top_1.completed
-rw-r--r-- raj76  jlmlab 36687377 Wed, 2017-08-30 13:37:55 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa
-rw-r--r-- raj76  jlmlab        0 Wed, 2017-08-30 21:29:35 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa.pslx.completed
-rw-r--r-- raj76  jlmlab 49636612 Wed, 2017-08-30 21:36:01 scratch/raj76/Dogwood/PASA_runs/run_2/blat_out_dir/partition.111180.fa.pslx.top_1

Accessing the data in the backup archives

One of the nice feature of borg backup is the ability to mount a backup archive as a FUSE filesystem and access the files in the backup as if they are in a real directory tree. Create a mount point in your home directory and mount the borg backup repo using the following commands.

$ mkdir /home/raj76/backup_mount
$ borg mount /project/gclab/raj76/my_project /home/raj76/backup_mount
$ cd /home/raj76/backup_mount
$ ls -l
total 0
drwxr-xr-x. 1 raj76 rccstaff 0 Sep  1 23:27 lustre-2018-09-01T23:26:59

You can browse the archive directories and access the files in the backup and if need be copy the files to an area outside of the backup. The borg backup mount is a read-only mount and files and directories cannot be changed or deleted from the archive.