How do you all go about backing up your data, on Linux?
I'm trying to find a good method of making periodic, incremental backups. I assume that the most minimal approach would be to have a Cronjob run rsync periodically, but I'm curious what other solutions may exist.
I'm interested in both command-line, and GUI solutions.
Timeshift is a great tool for creating incremental backups. Basically it's a frontend for rsync and it works great. If needed you can also use it in CLI
To be on topic as well - I use restic+autorestic combo. Pretty simple, I made repo with small script to generate config for different machines and that's it. Storing between machines and b2.
I like rsnapshot, run from a cron job at various useful intervals. backups are hardlinked and rotated so that eventually the disk usage reaches a very slowly growing steady state.
Exactly like you think. Cronjob runs a periodic rsync of a handful of directories under /home. My OS is on a different drive that doesn't get backed up. My configs are in an ansible repository hosted on my home server and backed up the same way.
Kopia or Restic. Both do incremental, deduplicated backups and support many storage services.
Kopia provides UI for end user and has integrated scheduling. Restic is a powerfull cli tool thatlyou build your backup system on, but usually one does not need more than a cron job for that. I use a set of custom systems jobs and generators for my restic backups.
Keep in mind, than backups on local, constantly connected storage is hardly a backup. When the machine fails hard, backups are lost ,together with the original backup. So timeshift alone is not really a solution. Also: test your backups.
I rotate between a few computers. Everything is synced between them with syncthing and they all have automatic btrfs snapshots. So I have several physical points to roll back from.
For a worst case scenario everything is also synced offsite weekly to a pCloud share. I have a little script that mounts it with pcloudfs, encfs and then rsyncs any updates.
by the way, syncthing is great if you need bi-directional sync.
not exactly what you're looking for (sth like Duplicacy?) but you should probably know about it as it's a great tool.
Restic since 2018, both to locally hosted storage and to remote over ssh. I've "stuff I care about" and "stuff that can be relatively easily replaced" fairly well separated so my filtering rules are not too complicated. I used duplicity for many years before that and afbackup to DLT IV tapes prior to that.
I have scripts scheduled to run rsync on local machines, which save incremental backups to my NAS. The NAS in turn is incrementally backed up to a remote server with Borg.
Not all of my machines are on all the time so I also built in a routine which checks how old the last backup is, and only makes a new one if the previous backup is older than a set interval.
I also save a lot of my config files to a local git repo, the database of which is regularly dumped and backed up in the same way as above.
Git projects and system configs are on GitHub (see etckeeper), the reset is synced to my self-hosted Nextcloud instance using their desktop client. There I have periodic backup using Borg for both the files and Nextcloud database.
I use btrbk to send btrfs snapshots to a local NAS. Consistent backups with no downtime. The only annoyance (for me at least) is that both send and receive ends must use the same SELinux policy or labels won't match.
At the core it has always been rsync and Cron. Sure I add a NAS and things like rclone+cryptomator to have extra copies of synchronized data (mostly documents and media files) spread around, but it's always rsync+Cron at the core.
Use synching on several devices to replicate data I want to keep backups of. Family photos, journals, important docs, etc. Works perfect and I run a relay node to give back to the community given I am on a unlimited data connection.
I just run my own nextcloud instance. Everything important is synced to that with the nextcloud desktop client, and the server keeps a month's worth of backups on my NAS via rsync.
I use lucky backup to mirror to external drive. And I also use Duplicacy to back up 2 other separate drives at the same time. Have a read on the data hoarder wiki on backups.
Syncthing replicates the data off-site to Machine B
Machine B:
RAIDz1 takes care of single-disk failure
ZFS doing regular snapshots
Syncthing receiving data from Machine A
Implications
Any single-disk hardware failure on machine A or B results in no data loss
Physical destruction of A won't affect B and the other way around
Any accidentally deleted or changed file can be recovered from a previous snapshot
Any ZFS corruption at A doesn't affect B because send/recv isn't used. The two filesystems are completely independent
Any malicious data destruction on A can be recovered from B even if it's replicated via snapshot at B. The reverse is also true. A malicious actor would have to have root access on both A and B in order to destroy the data and the snapshots on both machines to prevent recovery
Any data destruction caused by Syncthing can be recovered from snapshot at A or B
#!/usr/bin/ssh-agent /bin/bash
# chronicle.sh
# Get absolute directory chronicle.sh is in
REAL_PATH=`(cd $(dirname "$0"); pwd)`
# Defaults
BACKUP_DEF_FILE="${REAL_PATH}/backup.conf"
CONF_FILE="${REAL_PATH}/chronicle.conf"
FAIL_IF_PRE_FAILS='0'
FIXPERMS='true'
FORCE='false'
LOG_DIR='/var/log/chronicle'
LOG_PREFIX='chronicle'
NAME='backup'
PID_FILE='~/chronicle/chronicle.pid'
RSYNC_OPTS="-qRrltH --perms --delete --delete-excluded"
SSH_KEYFILE="${HOME}/.ssh/id_rsa"
TIMESTAMP='date +%Y%m%d-%T'
# Set PID file for root user
[ $EUID = 0 ] && PID_FILE='/var/run/chronicle.pid'
# Print an error message and exit
ERROUT () {
TS="$(TS)"
echo "$TS $LOG_PREFIX (error): $1"
echo "$TS $LOG_PREFIX (error): Backup failed"
rm -f "$PID_FILE"
exit 1
}
# Usage output
USAGE () {
cat << EOF
USAGE chronicle.sh [ OPTIONS ]
OPTIONS
-f path configuration file (default: chronicle.conf)
-F force overwrite incomplete backup
-h display this help
EOF
exit 0
}
# Timestamp
TS ()
{
if
echo $TIMESTAMP | grep tai64n &>/dev/null
then
echo "" | eval $TIMESTAMP
else
eval $TIMESTAMP
fi
}
# Logger function
# First positional parameter is message severity (notice|warn|error)
# The log message can be the second positional parameter, stdin, or a HERE string
LOG () {
local TS="$(TS)"
# local input=""
msg_type="$1"
# if [[ -p /dev/stdin ]]; then
# msg="$(cat -)"
# else
shift
msg="${@}"
# fi
echo "$TS chronicle ("$msg_type"): $msg"
}
# Logger function
# First positional parameter is message severity (notice|warn|error)
# The log message canbe stdin or a HERE string
LOGPIPE () {
local TS="$(TS)"
msg_type="$1"
msg="$(cat -)"
echo "$TS chronicle ("$msg_type"): $msg"
}
# Process Options
while
getopts ":d:f:Fmh" options; do
case $options in
d ) BACKUP_DEF_FILE="$OPTARG" ;;
f ) CONF_FILE="$OPTARG" ;;
F ) FORCE='true' ;;
m ) FIXPERMS='false' ;;
h ) USAGE; exit 0 ;;
* ) USAGE; exit 1 ;;
esac
done
# Ensure a configuration file is found
if
[ "x${CONF_FILE}" = 'x' ]
then
ERROUT "Cannot find configuration file $CONF_FILE"
fi
# Read the config file
. "$CONF_FILE"
# Set the owner and mode for backup files
if [ $FIXPERMS = 'true' ]; then
#FIXVAR="--chown=${SSH_USER}:${SSH_USER} --chmod=D770,F660"
FIXVAR="--usermap=*:${SSH_USER} --groupmap=*:${SSH_USER} --chmod=D770,F660"
fi
# Set up logging
if [ "${LOG_DIR}x" = 'x' ]; then
ERROUT "(error): ${LOG_DIR} not specified"
fi
mkdir -p "$LOG_DIR"
LOGFILE="${LOG_DIR}/chronicle.log"
# Make all output go to the log file
exec >> $LOGFILE 2>&1
# Ensure a backup definitions file is found
if
[ "x${BACKUP_DEF_FILE}" = 'x' ]
then
ERROUT "Cannot find backup definitions file $BACKUP_DEF_FILE"
fi
# Check for essential variables
VARS='BACKUP_SERVER SSH_USER BACKUP_DIR BACKUP_QTY NAME TIMESTAMP'
for var in $VARS; do
if [ ${var}x = x ]; then
ERROUT "${var} not specified"
fi
done
LOG notice "Backup started, keeping $BACKUP_QTY snapshots with name \"$NAME\""
# Export variables for use with external scripts
export SSH_USER RSYNC_USER BACKUP_SERVER BACKUP_DIR LOG_DIR NAME REAL_PATH
# Check for PID
if
[ -e "$PID_FILE" ]
then
LOG error "$PID_FILE exists"
LOG error 'Backup failed'
exit 1
fi
# Write PID
touch "$PID_FILE"
# Add key to SSH agent
ssh-add "$SSH_KEYFILE" 2>&1 | LOGPIPE notice -
# enhance script readability
CONN="${SSH_USER}@${BACKUP_SERVER}"
# Make sure the SSH server is available
if
! ssh $CONN echo -n ''
then
ERROUT "$BACKUP_SERVER is unreachable"
fi
# Fail if ${NAME}.0.tmp is found on the backup server.
if
ssh ${CONN} [ -e "${BACKUP_DIR}/${NAME}.0.tmp" ] && [ "$FORCE" = 'false' ]
then
ERROUT "${NAME}.0.tmp exists, ensure backup data is in order on the server"
fi
# Try to create the destination directory if it does not already exist
if
ssh $CONN [ ! -d $BACKUP_DIR ]
then
if
ssh $CONN mkdir -p "$BACKUP_DIR"
ssh $CONN chown ${SSH_USER}:${SSH_USER} "$BACKUP_DIR"
then :
else
ERROUT "Cannot create $BACKUP_DIR"
fi
fi
# Create metadata directory
ssh $CONN mkdir -p "$BACKUP_DIR/chronicle_metadata"
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PRE_COMMAND
if
[ -n "$PRE_COMMAND" ]
then
LOG notice "Running ${PRE_COMMAND}"
if
$PRE_COMMAND
then
LOG notice "${PRE_COMMAND} complete"
else
LOG error "Execution of ${PRE_COMMAND} was not successful"
if [ "$FAIL_IF_PRE_FAILS" -eq 1 ]; then
ERROUT 'Command specified by PRE_COMMAND failed and FAIL_IF_PRE_FAILS enabled'
fi
fi
fi
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Backup
# Make a hard link copy of backup.0 to rsync with
if [ $FORCE = 'false' ]; then
ssh $CONN "[ -d ${BACKUP_DIR}/${NAME}.0 ] && cp -al ${BACKUP_DIR}/${NAME}.0 ${BACKUP_DIR}/${NAME}.0.tmp"
fi
while read -u 9 l; do
# Skip commented lines
if [[ "$l" =~ ^#.* ]]; then
continue
fi
if [[ $l = '/*'* ]]; then
LOG warn "$SOURCE is not an absolute path"
continue
fi
# Reduce whitespace to one tab
line=$(echo $l | tr -s [:space:] '\t')
# get the source
SOURCE=$(echo "$line" | cut -f1)
# get the exclusions
EXCLUSIONS=$(echo "$line" | cut -f2-)
# Format exclusions for the rsync command
unset exclude_line
if [ ! "$EXCLUSIONS" = '' ]; then
for each in $EXCLUSIONS; do
exclude_line="$exclude_line--exclude $each "
done
fi
LOG notice "Using SSH transport for $SOURCE"
# get directory metadata
PERMS="$(getfacl -pR "$SOURCE")"
# Copy metadata
ssh $CONN mkdir -p ${BACKUP_DIR}/chronicle_metadata/${SOURCE}
echo "$PERMS" | ssh $CONN -T "cat > ${BACKUP_DIR}/chronicle_metadata/${SOURCE}/metadata"
LOG debug "rsync $RSYNC_OPTS $exclude_line "$FIXVAR" "$SOURCE" \
"${SSH_USER}"@"$BACKUP_SERVER":"${BACKUP_DIR}/${NAME}.0.tmp""
rsync $RSYNC_OPTS $exclude_line $FIXVAR "$SOURCE" \
"${SSH_USER}"@"$BACKUP_SERVER":"${BACKUP_DIR}/${NAME}.0.tmp"
done 9< "${BACKUP_DEF_FILE}"
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Try to see if the backup succeeded
if
ssh $CONN [ ! -d "${BACKUP_DIR}/${NAME}.0.tmp" ]
then
ERROUT "${BACKUP_DIR}/${NAME}.0.tmp not found, no new backup created"
fi
# Test for empty temp directory
if
ssh $CONN [ ! -z "$(ls -A ${BACKUP_DIR}/${NAME}.0.tmp 2>/dev/null)" ]
then
ERROUT "No new backup created"
fi
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Rotate
# Number of oldest backup
X=`expr $BACKUP_QTY - 1`
LOG notice 'Rotating previous backups'
# keep oldest directory temporarily in case rotation fails
ssh $CONN [ -d "${BACKUP_DIR}/${NAME}.${X}" ] && \
ssh $CONN mv "${BACKUP_DIR}/${NAME}.${X}" "${BACKUP_DIR}/${NAME}.${X}.tmp"
# Rotate previous backups
until [ $X -eq -1 ]; do
Y=$X
X=`expr $X - 1`
ssh $CONN [ -d "${BACKUP_DIR}/${NAME}.${X}" ] && \
ssh $CONN mv "${BACKUP_DIR}/${NAME}.${X}" "${BACKUP_DIR}/${NAME}.${Y}"
[ $X -eq 0 ] && break
done
# Create "backup.0" directory
ssh $CONN mkdir -p "${BACKUP_DIR}/${NAME}.0"
# Get individual items in "backup.0.tmp" directory into "backup.0"
# so that items removed from backup definitions rotate out
while read -u 9 l; do
# Skip commented lines
if [[ "$l" =~ ^#.* ]]; then
continue
fi
# Skip invalid sources that are not an absolute path"
if [[ $l = '/*'* ]]; then
continue
fi
# Reduce multiple tabs to one
line=$(echo $l | tr -s [:space:] '\t')
source=$(echo "$line" | cut -f1)
source_basedir="$(dirname $source)"
ssh $CONN mkdir -p "${BACKUP_DIR}/${NAME}.0/${source_basedir}"
LOG debug "ssh $CONN cp -al "${BACKUP_DIR}/${NAME}.0.tmp${source}" "${BACKUP_DIR}/${NAME}.0${source_basedir}""
ssh $CONN cp -al "${BACKUP_DIR}/${NAME}.0.tmp${source}" "${BACKUP_DIR}/${NAME}.0${source_basedir}"
done 9< "${BACKUP_DEF_FILE}"
# Remove oldest backup
X=`expr $BACKUP_QTY - 1` # Number of oldest backup
ssh $CONN rm -Rf "${BACKUP_DIR}/${NAME}.${X}.tmp"
# Set time stamp on backup directory
ssh $CONN touch -m "${BACKUP_DIR}/${NAME}.0"
# Delete temp copy of backup
ssh $CONN rm -Rf "${BACKUP_DIR}/${NAME}.0.tmp"
#-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# Post Command
if
[ ! "${POST_COMMAND}x" = 'x' ]
then
LOG notice "Running ${POST_COMMAND}"
if
$POST_COMMAND
then
LOG notice "${POST_COMMAND} complete"
else
LOG warning "${POST_COMMAND} complete with errors"
fi
fi
# Delete PID file
rm -f "$PID_FILE"
# Log success message
LOG notice 'Backup completed successfully'
I use Restic, called from cron, with a password file containing a long randomly generated key.
I back up with Restic to a repository on a different local hard drive (not part of my main RAID array), with --exclude-caches as well as excluding lots of files that can easily be re-generated / re-installed/ re-downloaded (so my backups are focused on important data). I make sure to include all important data including /etc (and also backup the output of dpkg --get-selections as part of my backup). I auto-prune my repository to apply a policy on how far back I keep (de-duplicated) Restic snapshots.
Once the backup completes, my script runs du -s on the backup and emails me if it is unexpectedly too big (e.g. I forgot to exclude some new massive file), otherwise it uses rclone sync to sync the archive from the local disk to Backblaze B2.
I backup my password for B2 (in an encrypted password database) separately, along with the Restic decryption key. Restore procedure is: if the local hard drive is intact, restore with Restic from the last good snapshot on the local repository. If it is also destroyed, rclone sync the archive from Backblaze B2 to local, and then restore from that with Restic.
Postgres databases I do something different (they aren't included in my Restic backups, except for config files): I back them up with pgbackrest to Backblaze B2, with archive_mode on and an archive_command to archive WALs to Backblaze. This allows me to do PITR recovery (back to a point in accordance with my pgbackrest retention policy).
For Docker containers, I create them with docker-compose, and keep the docker-compose.yml so I can easily re-create them. I avoid keeping state in volumes, and instead use volume mounts to a location on the host, and back up the contents for important state (or use PostgreSQL for state instead where the service supports it).
Periodic backup to external drive via Deja Dup.
Plus, I keep all important docs in Google Drive.
All photos are in Google Photos.
So it's only my music really which isn't in the cloud. But I might try upload it to Drive as well one day.
I use Pika backup, which uses borg backup under the hood. It's pretty good, with amazing documentation. Main issue I have with it is its really finicky and is kind of a pain to setup, even if it "just works" after that.
Most of my data is backed up to (or just stored on) a VPS in the first instance, and then I backup the VPS to a local NAS daily using rsnapshot (the NAS is just a few old hard drives attached to a Raspberry Pi until I can get something more robust). Very occasionally I'll back the NAS up to a separate drive. I also occasionally backup my laptop directly to a separate hard drive.
Not a particularly robust solution but it gives me some piece of mind. I would like to build a better NAS that can support RAID as I was never able to get it working with the Pi.
I use duplicity to a drive mounted off a Pi for local, tarsnap for remote. Both are command-line tools; tarsnap charges for their servers based on exact usage. (And thanks for the reminder; I'm due for another review of exactly what parts of which drives I'm backing up.)
Either an external hard drive or a pendrive. Just put one of those in a keychain and voila, a perfect backup solution that does not need of internet access.
I use Duplicacy to encrypt and backup my data to OneDrive on a schedule. If Proton ever creates a Linux client for Drive, then I'll switch to that, but I'm not holding my breath.
Anything important I keep in my Dropbox folder, so then I have a copy on my desktop, laptop, and in the cloud.
When I turn off my desktop, I use restic to backup my Dropbox folder to a local external hard drive, and then restic runs again to back up to Wasabi which is a storage service like amazon's S3.
Same exact process for when I turn off my laptop.. except sometimes I don't have my laptop external hd plugged in so that gets skipped.
So that's three local copies, two local backups, and two remote backup storage locations. Not bad.
Changes I might make:
add another remote location
rotate local physical backup device somewhere (that seems like a lot of work)
move to next cloud or seafile instead of Dropbox
I used seafile for a long time but I couldn't keep it up so I switched to Dropbox.
Good ol' fashioned rsync once a day to a remote server with zfs with daily zfs snapshot (rsync.net). Very fast because it only need to send changed/new files, and saved my hide several times when I need to access deleted files or old version of some files from the zfs snapshots.
Get a Mac, use Time Machine. Go all in on the eco system. phone, watch, iPad, tv. I resisted for years but it's so good man and the apple silicon is just leaps beyond everything else.