User Tools

Site Tools


misc:backups

Backups and other useful file transfer operations

Two main reasons for backups:

  • to protect against broken hardware
  • to protect against your own stupidity

Do backups on a regular basis. Both of your computer/operating system as a whole as well as your simulation data. Ideally, you want your (important simulation) data to exist in two physically different places, i.e. your laptop and an external hard drive, your desktop and the cluster, and so on. This is typically unnecessary for operating systems, because most of the time you can recover a broken OS.

There are many programs to do automatic backups, two of them are:

In general, there are two types of backups, full backups, which means a complete copy of all data, usually resulting in large amounts of data needing to be stored. The second type is called “incremental backup”, which saves all changes made since last backup, resulting in smaller storage needs. A chain of all incremental backups till last full backup is needed to restore data.

What to backup:

  • input scripts to run simulations
  • trajectories, log files, other output from simulations (trim down if too large, make zip/tar archives)
  • analyzing scripts
  • plotting scripts
  • README files
  • Code (should be automatically done with git/github anyway!)

When actively working on a project, a good backup frequency is about ~1 week and you shouldn't delete too much but clean up and update data regularly. Once a project is completely finished (e.g., published), some time needs to be spent to clean up, organize and compress the data. Then, a tar.gz archive can be made and permanently moved to a back up location.

What not to backup:

  • cluster log files (e.g. slurm.out,..)
  • random testing data and files (periodically clean up your projects/folders)
  • unnecessary output

Sync files using Globus (webpage)

Globus - https://www.globus.org/ is a webpage that helps file transferring. It can do a lot of things, but the most important features are that the UIUC clusters and Box.com are linked to it. One can also link personal computer/laptop to it as well. Globus calls different devices “Collections”. Think of them as origin and endpoint for your data transfer. Here are some how-to's: https://docs.globus.org/how-to/, also here: https://docs.globus.org/.

Setup for Cluster + Box.com

This is very straightforward. Log into Globus https://www.globus.org/, UIUC has a subscription, you will need to use DUO authentification with your netid. Navigate to FileManager. You should see two Search boxes, type “Illinois Research Storage Box” into one. That should ask you for permissions, again use your netid and DUO login. This links your Box.com storage to Globus, they call this a “Collection”. On the other search box you can link a Cluster (Campuscluster or Delta, maybe more). Their respective names to search for are:

  • Delta = NCSA Delta Collection
  • CampusCluster = Illinois Research Storage (without BOX)

Now you can transfer between the two, using the “Transfer/Sync” Button in the middle. There are advanced options under 'Transfer and Timer Options“ in the middle. Useful ones are “sync - only transfer new or changed files” and “preserve source file modification times”. You can transfer both directions, the arrows on the “Start” button indicate direction. You can queue multiple transfers at the same time, they show up under “Activities” as they are progressing.

Setup for Local Computer/Laptop + Box.com

To be able to transfer files from/to your local device (Computer or Laptop) you need to install a program called Globus Connect Personal. It comes for all operating systems, see here for instructions: https://docs.globus.org/globus-connect-personal/. Once installed and configured, your local device will also show up in the web interface as “Collection” with the name you gave it during setup and you can transfer files between any other Collection.

Automate Backups

There is an option to do timed syncs that can be used for backups. They also have something called “Flow” that might be able to create an automated backup system. https://docs.globus.org/api/flows/. Will edit once we find out how to use this.

There is also a terminal/command line option: https://docs.globus.org/cli/ that may be used for scripting/and complex tasks. It also has a Python API https://globus-sdk-python.readthedocs.io/en/stable/tutorial.html. Both of these things are not tested by anyone so far. Edit this as we gain experience.

Sync files with Box.com over ftp (terminal)

You might find yourself in the situation that you want or need to transfer a larger number of files from or to the box and the webpage keeps timing out or it's simply annoying to do it over the graphical web interface. Box has official instructions on how to use (S)FTP, i.e.,(secure) file transfer protocol, for this purpose. It is going to work from the terminal. There is also this post which might be helpful.

In practice, you first have to set-up a password for your box. Sign into your box account and go to “Account Settings” and then create or set a password under Account→Authentication (scroll down to see it).

Now you need to have ncftp installed, type it into the terminal to see if it is installed. If you have ssh installed you probably are good to go. If not you need to install it first.

  • Windows users: look into filezilla (?).
  • Mac users: brew install ncftp.

Now to actually transfer files:

ncftp -u <username> -p <password> ftp.box.com

Your username should be your university email (<netid>@illinois.edu) and the password is whatever you set in the box account settings. It should say “331 User name okay, need password for <netid>@illinois.edu. Password: 230 User logged in, proceed.”. Now you are in a terminal where the prompt at the beginning says ncftp>.

From here you can transfer files. If something goes wrong your you typed your password wrong you can always type quit to get out of ftp and start again. First, go to the folder in box you want to backup to (what you see on your box.com web-interface is ”/“) with cd, got to the local folder you want to backup from (lcd) and then put or mput the file(s) you want to backup.

cd /backups
lcd /Users/statt/simulations/triblocks/equilibrium_properties/plots 
put test.png 
(or mput *.png for all pngs)
(or mput -r * to sync all files recursively)

Try a single file first and see that it works and ends up in the right place. Double check in the graphical web interface that the file/folder structure is right. If you don't change your directories correctly it might attempt to re-create all folders and sub-folders and make a gigantic mess. mput copies many files. It will ask you to confirm every file by entering 'y'. To deactivate this, do:

Downloading files is done with get or mget. See here. One really important thing to realize is that commands like ls, cd, pwd and so on are now referring to the REMOTE locations, i.e., your box.com folders. For LOCAL commands it's now: !ls,lcd and !pwd.

You can upload an entire directory from the terminal using this command:

ncftpput -R -u <user>@illinois.edu -p <password> ftp.box.com /remote/foldername/ /local/path/to/folder 

or, alternatively, in interactive mode:

ncftp -u <username>@illinois.edu -p <password> ftp.box.com 
cd /remote/foldername/
mput -r /local/path/to/folder/* 

For downloading files/folders, use ncftget. In practice, all of this is fairly limited and clunky. It is best to install a more powerful ftp client or program.

Sync files with Box.com over ftp in a script (terminal)

This can be useful if you want to sync an entire folder as is to box. Copy this script to the folder to sync, adjust username, passwd, and remote location (i.e. folder in Box). It

#!/bin/bash
ftp_site=ftp.box.com
username=XXX@illinois.edu
passwd=XXX
remote=/XXX
dir=`pwd`

#files in main folder
files=`find . -type f -maxdepth 1`
echo $files
for f in $files
do
    ftp -in <<HERE
    open $ftp_site
    user $username $passwd
    passive on
    epsv
    cd $remote/
    prompt n
    mput $f
    close
    bye
HERE
done

# folders and files in subfolders
folder=`find . -type d`
echo $folder
for f in $folder
do
    cd $dir/$f
    ftp -in <<HERE
    open $ftp_site
    user $username $passwd
    passive on
    epsv
    mkdir $remote/$f
    cd $remote/$f
    prompt n
    mput *
    close
    bye
HERE
done

exit 0

This script syncs up to 1 subfolder deep (I believe, check with a test folder!). Alternatively, using lftp once can do this:

#!/bin/bash
ftp_site=ftp.box.com
username=XXX@illinois.edu
passwd=XXX
remote=/XXX
dir=`pwd`
folder=`find . -type d`

lftp -u "$username","$passwd" $ftp_site <<HERE
set sftp:auto-confirm yes
mkdir -f $remote
mirror --verbose --parallel=10 --reverse --only-newer  -p $dir
HERE

This script will sync the entire folder and subfolders, only copying newer files.

Sync files with Box.com over ftp from Pitt cluster (terminal)

use lftp. So, type lftp into your terminal. Then user <username>@illinois.edu, type your password, then you can use the same lcd, cd, put, get commands like described above.

Rclone for Box.com syncing

If ftp or ncftp etc don't quite suit your needs, try Rclone. Most common limitation is that you might want to 'synchronize' a remote and a local folder but NOT up/down-load ALL files again, but skip the already existing ones.

First, install Rclone:

Then, configure it, following these instructions (usually default 'enter' is fine). Now you can use it.

Most useful commands:

rclone sync /local/path/ remote:/remote/path/.  #sync, see https://rclone.org/commands/rclone_sync/
rclone lsd remote: #list all remote folders

Useful flags for sync are –dry-run to test first what would happen, check that it's correct. And also -P for progress output. See https://rclone.org/commands/ for a list and documentation of all commands.

misc/backups.txt · Last modified: 2022/12/13 20:38 by statt

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki