This guide will demonstrate a couple of strategies for backing up and restoring VMs on the ThreeFold Grid. So far there isn’t a built in backup solution within Zos, which means that we need to approach this from within the VM.
Here’s a brief overview:
- Works for full and micro VMs, deployed via the Dashboard or via other tools (that includes Dashboard applications, which are micro VMs)
- Backs up an entire VM, including installed software and all data
- Optionally use a second VM to receive data, in the case that the first VM is more than 50% full
Prepare source VM
What we are going to be doing is making a copy of the source VM while it is running. Generally this works just fine, but with one caveat:
If any write operations happen during the backup, it’s not guaranteed whether the old or new version of the file will be stored in the backup. This can cause data corruption in the backup (there’s no risk to the original data), especially for databases.
There are two ways to mitigate this:
- Stop services, like databases, that write to disk before making the backup
- Make a separate backup of any databases, using a supported method of database backup
It won’t be possible to cover either of these in a totally exhaustive way, since there are many possible permutations. I’ll show how to view and stop services in typical cases in the subsections below. If you’re not sure, just check them all and follow the instructions that apply.
Backing up databases separately is beyond the scope of this guide. You should be able to find instructions in the documentation for your database or via a web search.
Systemd
If your VM is a full VM, then it most likely has systemd as the init system. You can view services managed by systemd by running this command:
systemctl list-units --type=service --state=running --no-pager
If this results in an error that systemd isn’t found, no problem—just continue skip on to the next section. Otherwise, have a look through the list for anything that might be writing to disk. Stop those services like this:
# For example, we can stop the mysql database
systemctl stop mysql-server
Whatever you do, don’t stop ssh.service
or sshd.service
. You can lose the ability to connect to your VM by SSH in that case.
Ubuntu / Debian automatic updates
If your VM has systemd (full VM) and it’s an Ubuntu or Debian machine, there’s a chance you have automatic updates enabled. Needless to say, an automatic update is exactly the kind of thing we’d like to avoid during our backup.
To temporarily disable automatic updates, run this command:
dpkg-reconfigure unattended-upgrades
And then select “No” on the menu that pops up. To reenable later, just run the same command and select “Yes”.
Finally, let’s just double check that apt
is not running:
pgrep apt || echo Good to go
If you see some numbers printed on screen, that means an apt
is running. Try waiting a while and doing this again. If you see “Good to go” echoed back, then proceed on.
Zinit
Our ThreeFold micro VMs and application deployments come with zinit as the process manager. Check for zinit and list running services like this:
zinit
Stop services like this:
# Web servers aren't as risky as databases, but it won't hurt to stop nginx
zinit stop nginx
Docker
Finally, check if Docker is installed and if any containers are running:
docker ps
The simplest way to proceed here is to just stop Docker while making the backup, which will also stop any containers:
# With systemd
systemctl stop docker.service
# With zinit
zinit stop dockerd
Final check
After you tried nuking all services that might be writing data to disk and cause corruption in the backup, you could have a final look at running processes on the machine:
# Shows non kernel processes, for a fairly concise view
ps --ppid 2 -p 2 --deselect
If there’s some lingering processes that seem like they should have been stopped already, you can try the steps above again.
Backup time
Now we have a fork in the road. If your VM has adequate disk space you can keep things simple and make the backup to the VM’s own disk. Otherwise, you will need another machine to receive the backup (since life is already getting complicated in this case, I’ll assume that machine is always going to be a second VM).
Let’s have a look at the disk space situation:
# That's "h" for "human readable"
df -h
There’s going to be a bit of noise in these results, such as tmpfs which is actually on RAM and not disk. The general rule would be to look at the Mounted on
column and look for either a bare /
indicating the root filesystem or something starting with /mnt
which is a typical mount point.
The formula here is basically to add up the used disk space and see if there’s some place where all of it can fit (either under root or under some mount). That will be pretty simple if no disks are mounted—then the question is basically whether the root filesystem usage is less than 50%.
If this is all feeling a bit overwhelming, you can always opt to backup to a second VM. Then all that’s important is that the second VM has enough space. If you plan to use a second VM, skip ahead to rsync
. Otherwise, continue with tar
.
Backing up with tar
Here we will create a compressed archive containing the entire backup. All commands below should be run as root.
First, make sure the pv
utility is installed for progress monitoring:
apt update && apt install -y pv
Before running the long and scary looking command sequence below, let’s quickly explain:
- Creates a tar archive
- Exclude system directories that are either generic or don’t contain permanent data
- Exclude the backup file itself (to avoid an infinite loop)
- Pipe through
pv
to show progress - Compress with
gzip
tar -c --exclude='/boot/*' --exclude='/dev/*' --exclude='/proc/*' --exclude='/sys/*' --exclude='/tmp/*' --exclude='/run/*' --exclude='/lost+found/' --exclude=/backup.tar.gz / | pv | gzip > /backup.tar.gz
When this command completes, you’ll have a backup at /backup.tar.gz
which you can, for example, download to your local computer using scp
or an SFTP client like Filezilla. Once you have copied the backup file elsewhere, you can remove it from the VM to free up the space:
rm /backup.tar.gz
With that, you are done, until it’s time to restore. See instructions below for info on doing that.
Backing up with rsync
In this case, we will backup to a second VM over a network connection using rsync
. We’ll use this naming convention throughout:
- VM1 - the original VM that we are backing up
- VM2 - the new VM that’s receiving the backup
At this point, you’ll need to deploy your VM2. Here’s what I recommend:
- Micro or full VM at your preference
- Use “custom” capacity:
- 1 vCPU
- 1024 MB of RAM
- SSD big enough to receive all data from VM1 and hold the backup archive (go ahead and reserve 2x the stored data amount—this VM is only temporary so the cost is not important)
- Public IPv4 address (this is a simple way to get reliable communication between the VMs, but isn’t strictly required)
Install rsync
Both VMs will need rsync
installed. We’ll also make sure nano
is installed on VM2 while we’re at it:
# VM1
apt update && apt install -y rsync
# VM2
apt update && apt install -y rsync nano pv
Establish SSH between VMs
Using rsync
requires SSH connectivity between the two VMs. We’ll generate a new SSH key on VM1 and add it to the authorized keys on VM2:
# VM1
ssh-keygen -t ed25519
# Hit enter to accept all defaults
cat ~/.ssh/id_ed25519.pub
Copy the key and paste it into the authorized keys file on VM2:
# VM2
nano ~/.ssh/authorized_keys
# Paste in the key, then ctrl-o, enter to save and ctrl-x to exit
Here’s a gif showing these steps, with VM1 on the left and VM2 on the right (click to enlarge):
Do the backup
The rsync
command below is adapted from this Arch Linux wiki page. Briefly:
- Sync all files from VM1 root to a folder on VM2,
/backup
- Preserve file ownership and attributes
- Show progress
- Exclude system directories that are either generic or don’t contain permanent data
# VM1
# Substitute in the IP address of VM2, for example: root@1.2.3.4:/backup
rsync -aAXHv --exclude='/boot/*' --exclude='/dev/*' --exclude='/proc/*' --exclude='/sys/*' --exclude='/tmp/*' --exclude='/run/*' --exclude='/lost+found/' / root@<VM2 IP address>:/backup
Once that completes, we can again use tar
to create a compressed version as a single file:
# VM2
tar -c -C /backup . | pv | gzip > /backup.tar.gz
Now you can download the backup.tar.gz
file to its final destination using scp
or an SFTP client like Filezilla. Once you have the backup safely stored, you can decomission VM2.
Restoring the backup
As before, we have a fork in the road. Our first, and simpler case will be restoring by extracting the tar
archive directly into a new VM. This will work in many but not all cases. Due to some quirks, it can actually be impossible to restore the root filesystem into certain application deployments using tar
. In that case, you can get around this by using a second VM and rsync
again.
If you’re not sure, you can try the first method and if it fails, move on to the second. The symptom to look out for would be “disk quota exceeded” errors. In that case, destroy the VM you tried to recover into and start fresh.
Restoring with tar
First you will need to deploy the VM or application solution to restore the backup into. If you are restoring an application solution, follow the steps again to stop any running services. If you are restoring into a fresh VM then there’s no need to worry about this.
Upload and extract
Upload the backup.tar.gz
into the root directory of the new VM first using the method of your choice. Then run:
cd /
tar -xf backup.tar.gz
# No tricks for showing progress here
# Just grab some coffee, or whatever, and hope for the best
Reenable auto updates
Don’t forget to reenable automatic updates, if applicable (this applies both for the original VM and on the restored VM):
dpkg-reconfigure unattended-upgrades # Choose "Yes"
Reboot
Finally, go ahead and reboot the VM. This serves two purposes. First, it will bring up all services and generally bring the machine to a “normal” state. Second, it will ensure that no issue blocking the VM from booting up was introduced along the way—better to discover that now than much later when the VM reboots unexpectedly due to the host node losing power.
# Full VM
reboot
# Micro VM
reboot -f
# If all else fails
echo b > /proc/sysrq-trigger
Cleanup
It will take a little while before the VM comes back and you can connect via SSH again. Have a look around and make sure everything looks normal, then cleanup the backup archive:
rm /backup.tar.gz
Restoring with rsync
To set this up, we will need two VMs. Deploy them as follows:
- VM1 - deploy this VM with the same specs as the VM you backed up. If it was an application solution, deploy the same solution with the same specs
- VM2 - this is the temp VM for extracting and transferring the backup. Micro or full VM at your preference with “custom” capacity:
- 1 vCPU
- 1024 MB of RAM
- SSD big enough to receive all data from VM1 and hold the backup archive (go ahead and reserve 2x the stored data amount—this VM is only temporary so the cost is not important)
- Public IPv4 address (this is a simple way to get reliable communication between the VMs, but isn’t strictly required)
Stop running services
If you are restoring an application solution, first follow the steps above again to make sure that all services are stopped.
Install rsync
Both VMs will need rsync
installed. We’ll also make sure nano
is installed on VM2 while we’re at it:
# VM1
apt update && apt install -y rsync
# VM2
apt update && apt install -y rsync nano
Establish SSH between VMs
This is exactly the same process as before. Just scroll up until you see the gif if you need to reference it.
Upload, extract, and transfer
Now upload the backup.tar.gz
file to the root directory of VM2, using the method of your choice. When this is done, extract it like so:
# VM2
cd /
mkdir /backup
tar -C /backup -xf backup.tar.gz
Then from VM1, initiate the transfer via rsync
:
# VM1
# Substitute in the IP address of VM2, for example: root@1.2.3.4:/backup
rsync -aAXHv --exclude='/boot/*' --exclude='/dev/*' --exclude='/proc/*' --exclude='/sys/*' --exclude='/tmp/*' --exclude='/run/*' --exclude='/lost+found/' root@<VM2 IP address>:/backup /
Reenable auto updates
Don’t forget to reenable automatic updates, if applicable:
# VM1
dpkg-reconfigure unattended-upgrades # Choose "Yes"
Reboot
Ensure the restore was successful and bring the VM back up to a normal state by rebooting:
# Full VM
reboot
# Micro VM
reboot -f
# If all else fails
echo b > /proc/sysrq-trigger
Postlog
I hope this guide is clear and helpful, but if you have any questions, please do post them below. We should eventually one day get a backup feature built into Zos that’s substantially easier to use than what I described here, but for now, at least we know it’s possible