Emile.Belcourt
Dabbler
- Joined
- Feb 16, 2021
- Messages
- 23
Hello all,
I've been testing TrueNAS Scale on a non-critical auxiliary data replication node and have noted something a bit odd that has me a bit concerned.
I use Zabbix to monitor disk activity on our primary systems and have noticed that since moving to Scale, disk activity on one of our nodes which is a replication source to the target wherein the target is the one performing the "pull" method of a Replication via SSH. Before Scale, Replication would take a couple of hours overnight and would be done in a very short order but now it appears to be taking so long, it's tripping over into the next days daily replication.
Here are the metrics:
(it says TB/s but it's actually GB/s, something is up with scale)
I can't see anything obvious and I have the same replication configuration as before, here are details and notes:
As far as I can tell from my notes, all settings are the same. However, initially I had to enable "synchronise destination snapshots with source" to start replication to a nested dataset but on or off it still appeared to take an extended length of time.
The total size of the data on the source server is about 8TB and it looks like each replication cycle it's literally copying every byte over as 8TB over a 1GbE connection will take ~17-19 hours and there are dips for a couple of hours which appears to be when it's "done".
My question is whether something has changed with the settings that means it's now doing a full byte replication rather than a changed block replication?
I hope I'm articulating the issue correctly but I'm worried I'm slamming the disks unnecessarily now!
Many thanks,
Emile
I've been testing TrueNAS Scale on a non-critical auxiliary data replication node and have noted something a bit odd that has me a bit concerned.
I use Zabbix to monitor disk activity on our primary systems and have noticed that since moving to Scale, disk activity on one of our nodes which is a replication source to the target wherein the target is the one performing the "pull" method of a Replication via SSH. Before Scale, Replication would take a couple of hours overnight and would be done in a very short order but now it appears to be taking so long, it's tripping over into the next days daily replication.
Here are the metrics:
(it says TB/s but it's actually GB/s, something is up with scale)
I can't see anything obvious and I have the same replication configuration as before, here are details and notes:
- The two nodes are attached directly with a 1GbE link with MTU9000
- Source node has 12x 4TB SAS3 drives in a RAID 10 configuration, E5-2620v3, 128GB DDR4 ECC, NVMe Read Cache + Optane SLOG. Zpool is encrypted
- Target node has 4x SATA 8TB in RAIDz1
- Replication configuration example of 1 of 5 replication tasks (one for each Dataset, they have various retention lifetimes) on auxiliary server:
As far as I can tell from my notes, all settings are the same. However, initially I had to enable "synchronise destination snapshots with source" to start replication to a nested dataset but on or off it still appeared to take an extended length of time.
The total size of the data on the source server is about 8TB and it looks like each replication cycle it's literally copying every byte over as 8TB over a 1GbE connection will take ~17-19 hours and there are dips for a couple of hours which appears to be when it's "done".
My question is whether something has changed with the settings that means it's now doing a full byte replication rather than a changed block replication?
I hope I'm articulating the issue correctly but I'm worried I'm slamming the disks unnecessarily now!
Many thanks,
Emile