Coming Back to Rsync

denaba

Explorer
Joined
Jan 12, 2014
Messages
59
So not to go to the details, but once I used Rsync and it was great. But then it was broke. Since then I used a 3rd party backup software to copy files from main box to a backup box. That has worked, but now since TrueNAS is under one roof I want to use Rsync again.

Question. Since I have a backup of everything (well, maybe one or two files) will Rsync have enough "logic" to say, hey all these files are copied so only copy the two missing ones? Or will Rsync create duplicate files since it has no history of ever copying the files from the main box to the backup box?

I will need some commands as I do not have the notes from back in the FreeNAS days, but I will post that on a different thread. Interested to know what Rsync will do as mentioned that there is no history of any file copying from one box to the other.

Thank you
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Whether RSync will copy over files that exist on the destination is a combination of details. Their is no "catalog" of files. RSync uses what it finds on the source path and destination path to determine if a file is to be copied.

For example, if RSync is doing time & date based comparisons, then if the backup box has a different date than the source, the file is checked, block by block between the source and backup box. Any changes are copied over. And then the file time & date stamp is made the same as source. So the next time, RSync won't have to check the file block by block.

RSync can also be told to do a block by block check anyway, regardless of time & date. In your case, this might be a good, one time refresh of the backup box. But, it can take a long time even though only the checksums are sent over the network. This is because both sides have to checksum blocks of each file, thus, basically reading every file and every byte of those files.

As for command line options, I normally use this;
rsync -aAHSXxv --delete SOURCE DESTINATION
The actual SOURCE and DESTINATION syntax can vary depending on how the network connection is made. Examples:

RSync run from source machine to remote via SSH:
rsync -aAHSXxv --delete /mnt/POOL/DATASET/ backup_box:/DESTINATION_PATH/

RSync run from source machine to remote via RSync protocol:
rsync -aAHSXxv --delete /mnt/POOL/DATASET/ rsync://backup_box/DESTINATION_PATH/

Some RSync options I found did not work with FreeBSD / TrueNAS Core, but I don't remember which they were. RSync options are plentiful and too complex to go into more detail.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I'd also add the --inplace option if the destination being written to is ZFS.

This works with block-based CoW filesystems, minimizing the space used up by snapshots. (Otherwise, any file that has been modified will be written as a brand new file, with completely unrelated records. The previous old version of the file will be entirely represented in the snapshot. Using "--inplace" will only have the old modified sections represented in the snapshot.)
 

denaba

Explorer
Joined
Jan 12, 2014
Messages
59
Thank you for the details. Another question. My structure is;
mnt/tank/media/4 folders

Q1 - Would it be better to make the Rsync from the media folder or would it be better to do each individual folder within the media folder? Separate schedules is what I mean here for the latter

Q2 - If using the individual folder method I would then set the pathways (for an example of one of the folders within media);
Box 1 (push) mnt/tank/media/SD
Box 2 (pull) mnt/tank/backup/SD

The SD folders contain files

Q3 - I have schedules for the SMART tests and also the boot and pool scrubs. Assuming to make sure Rsync does not work at those times
 
Joined
Oct 22, 2019
Messages
3,641
Would it be better to make the Rsync from the media folder or would it be better to do each individual folder within the media folder? Separate schedules is what I mean here for the latter
Personal preference? It would be simplified if you only had a single sync task that takes care of "Media" and everything within.


I have schedules for the SMART tests and also the boot and pool scrubs. Assuming to make sure Rsync does not work at those times
Correct. It's best to separate I/O activity from when your drives will run their internal SMART tests. (The same is true in regards to scheduling scrubs during low activity times.)
 
Last edited:

denaba

Explorer
Joined
Jan 12, 2014
Messages
59
Thank you Arwen and winnielinnie for your help. One more question.

I was setting the push box and under the tasks there was a question I was unfamiliar with from previous usage. There is a field called "Remote Module Name". If I remember from before just entering the I.P. of my box 2 was good enough so not sure what to enter here.
 
Joined
Oct 22, 2019
Messages
3,641
It depends if you're using Rsync over SSH or Rsync via the "daemon" on the other end, which will have a module with a specific name.
 
Top