Huge file copy fails every time

mbit

Cadet
Joined
Aug 12, 2021
Messages
3
Hi,
we are running TrueNAS Core 12.0-U5 with 11 8TB WD Red in a Z2 configuration.
Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
32GiB ECC RAM
No matter what settings we are trying (compression/no compression, sector size, Atime on/off) we struggle copying a 20TB vhdx file to the SMB shared folder. the copy process always fails at 85,7%. (Dedup always off)
We can copy smaller vhdx files to the same share and we can copy the huge vhdx file from a windows server to a windows storage.
We also tried to write a huge file using DD and stopped this after about 24TB have been written.

Syslog shows no log entry for the time the copy fails and robocopy with debug logs no error while copying.
Currently we are reverting back a test system to FreeNAS to check whether the error happens there too.
Any idea how to proceed from here?
Thx
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Would you be so kind as to provide us with details about your hardware configuration? Stalls are a classic symptom of problems with your I/O subsystem, such as not using an IT-mode HBA, or using SMR HDD's, etc., but since you've failed to provide any information about your mainboard, your I/O controllers (both disk and network), the type of hard drives in use, etc., it's a bit hard to look for common problems.

It's a little odd for it to stop so exactly though.

Also, vhdx files are virtual disk files, and RAIDZ2 is inappropriate for most virtual disk applications. Please refer to the following article:

https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

If you are merely archiving vhdx files, then that's fine.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
I wonder if 85.7% of 20TB happens to be exactly 16 TiB. While it seems to be a simple calculation, it depends much on the exact definition of TB.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
using SMR HDD's,
OP mentions 8TB WD REDs... no REDs of that size have ever been SMR, so can be assumed CMR.

It all sounds to me like a timeout or maximum bytes per session of some kind... maybe a SAMBA setting/limitation somewhere. @anodos ?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
OP mentions 8TB WD REDs... no REDs of that size have ever been SMR, so can be assumed CMR.

It all sounds to me like a timeout or maximum bytes per session of some kind... maybe a SAMBA setting/limitation somewhere. @anodos ?
There's not a limit to this on samba side. When you boil things down to the nuts and bolts, the smb session opens file then performs aio_write() to the fd. In overload situations (failing with EAGAIN), we fall back to synchronous pwrite(). User can try disabling AIO in samba (aio write size = 0) and see if the situation improves. This will force the synchronous write path. 87.5% maybe means something special in windows GUI land. It doesn't necessarily mean that 7/8ths of the file has been copied. Maybe last 12.5% (1/8) is to indicate some additional checking. Might want to double-check other metrics.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I wonder if 85.7% of 20TB happens to be exactly 16 TiB. While it seems to be a simple calculation, it depends much on the exact definition of TB.
20 * 0.909 = 18.18
So 85.7% of 20 TB is 18.18 TiB * 0.857 = 15.58 TiB
 

mbit

Cadet
Joined
Aug 12, 2021
Messages
3
thanks for your replies.
the systems are only used for backup/archiving the vhdx files

The error occurs on two different systems:
1) HBA Broadcom BRC SAS 9305-16i (due to availability)
Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
32GB ECC RAM
Supermicro X9SCL/X9SCM
Intel 82599ES 10GBit SFP+ with DAC
using WD80EFAX

2) HP Microserver Gen8 using the onboard controller
Intel(R) Xeon(R) CPU E3-1220L V2 @ 2.30GHz
16GB RAM
4x WD80EFZX
Intel X540-AT2

We removed the pool on system 2 and did a fresh install of FreeNAS 11.3 on system 2, only to see the copy job fail at 91,1% but this time with a ix down log entry. trying to figure out what happend.
Those entrys are not showing in any trueNAS logs.
 

mbit

Cadet
Joined
Aug 12, 2021
Messages
3
After reverting back to FreeNAS we stumbled across a random network error. When we started a second copy containing nothing else then the huge vhdx file it worked.

After that we downgraded our main NAS (System 1) and stumbled across a nice samba bug:
Resolved it as described in the other post: To fix the issue add protocol = SMB2_02 to Services -> SMB in TrueNAS
Currently this is working. Yay.

Our third system is still on TrueNAS 12 but is a HP Microserver Gen10.
With this system we applied the modification to the SMB service, who knows, and currently test the huge copy. Currently waiting for the copy job to finish which will take another 9 to 12 hours.
 
Top