System crash on SMB file copy

Status
Not open for further replies.

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Hello all,

New to the forum.

I have a fresh FreeNAS box. Upgraded from my super micro to a Dell r710 with the PERC H310 HBA. Having issues with a system crash when I copy to a SMB share. It copies full on at 115MB/s then drops out to 0 then the server crashes. I have tried copying to the RAIDZ1 and copy to the 500GB Stripe as it is a different brand of drive. Same result. I have tried different files and different client machines.

I have a feeling it is a driver or a firmware issue with the Perc H310. Where can I go to pull logs for the system crash? and/or see what FreeNAS sees as my HBA? Any ideas what may be causing this?

I flashed the HBA to the LSI firmware IT Mode with the steps in the guide, and everything was successful to my knowledge.
https://techmattr.wordpress.com/201...-flashing-to-it-mode-dell-perc-h200-and-h310/

The HBA is installed in a 8X pcie slot. Not the Dell storage card slot.

LAGG 3 gig nics to a Cisco 4948. Worked like a champ on my super micro.

Hardware specs.
Dell R710
12GB ram. (16 more gigs on order)
2X - Xeon X5560
5 Samsung R54/32MB C - SATA II - 2TB HDD in RaidZ1
1 WD 500GB Enterprise drive in stripe for testing.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Remove the lagg and run the test again?

Sent from my Nexus 5X using Tapatalk
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
LAGG removed and used a single onboard NIC and tried different port on switch. Issue remains. These Dell servers have Broadcom NICs onboard. Any issues with FreeNAS and this brand of NICs?

nas.jpg
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What do you mean by crash? Like it reboots?

Sent from my Nexus 5X using Tapatalk
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Yes the file transfer hangs and then I went to look at the monitor output on the server and it was scrolling text so fast you cant read it and then it just reboots and comes back up. Looks like some sort of dump.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
How are your system temps? How is big is power supply? What happens if you just dd a file locally on the Nas, does it crash? Seems like hardware failure to me.

Sent from my Nexus 5X using Tapatalk
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Temps are good. I'm using one 530watt PSU. The drives were moved over from the super micro.

I just wanted to see if it was anything obvious with the setup. I'll run diagnostics and reinstall FN to see if that changes anything. May try to put in a intel 4 port nic and try that also if none of the above works. I'll report back with results later on.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Intel is always better. The brodcom could cause problems

Sent from my Nexus 5X using Tapatalk
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Added more ram. 28GB now. Ran diagnostics including mem test. All passed without a hitch. Installed 4 port intel gig card. Disabled all onboard nics. Downloaded a fresh copy of the ISO and made and new usb drive and did a fresh reinstall FreeNAS-9.10.2-U4 to a new 16GB flash drive. Tried with a single connection and set the LAGG back up. Tried different port on switch. Still when i copy any big file to the server it dump crashes and reboots. I'm unable to do a share to share copy because I cant get any files on the shares. So cant test that. All system firmware and bios are up to date. I'm running out of options. Anyone have any more ideas?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm unable to do a share to share copy because I cant get any files on the shares.
dd -if=/dev/random -of=/poolname/some/path/filename or something similar. Add a size argument or CTRL-C it once a decent size is reached.

Anything conspicuous in /var/log?
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Here is the logfile around the crash.

Code:
ay 31 20:18:50 freenas notifier: 33+0 records in																				  
May 31 20:18:50 freenas notifier: 32+1 records out																				
May 31 20:18:50 freenas notifier: 33644544 bytes transferred in 0.292106 secs (115179264 bytes/sec)								
May 31 20:18:54 freenas GEOM: da3: the primary GPT table is corrupt or invalid.													
May 31 20:18:54 freenas GEOM: da3: using the secondary instead -- recovery strongly advised.										
May 31 20:18:54 freenas notifier: 32+0 records in																				  
May 31 20:18:54 freenas notifier: 32+0 records out																				
May 31 20:18:54 freenas notifier: 33554432 bytes transferred in 1.487693 secs (22554674 bytes/sec)								
May 31 20:18:55 freenas notifier: dd: /dev/da3: short write on character device													
May 31 20:18:55 freenas notifier: dd: /dev/da3: end of device																	  
May 31 20:18:55 freenas notifier: 33+0 records in																				  
May 31 20:18:55 freenas notifier: 32+1 records out																				
May 31 20:18:55 freenas notifier: 33644544 bytes transferred in 0.252268 secs (133368219 bytes/sec)								
May 31 20:18:59 freenas GEOM: da4: the primary GPT table is corrupt or invalid.													
May 31 20:18:59 freenas GEOM: da4: using the secondary instead -- recovery strongly advised.										
May 31 20:18:59 freenas notifier: 32+0 records in																				  
May 31 20:18:59 freenas notifier: 32+0 records out																				
May 31 20:18:59 freenas notifier: 33554432 bytes transferred in 1.497406 secs (22408373 bytes/sec)								
May 31 20:18:59 freenas notifier: dd: /dev/da4: short write on character device													
May 31 20:18:59 freenas notifier: dd: /dev/da4: end of device																	  
May 31 20:18:59 freenas notifier: 33+0 records in																				  
May 31 20:18:59 freenas notifier: 32+1 records out																				
May 31 20:18:59 freenas notifier: 33644544 bytes transferred in 0.288836 secs (116483208 bytes/sec)								
May 31 20:19:03 freenas GEOM: da5: the primary GPT table is corrupt or invalid.													
May 31 20:19:03 freenas GEOM: da5: using the secondary instead -- recovery strongly advised.										
May 31 20:19:03 freenas notifier: 32+0 records in																				  
May 31 20:19:03 freenas notifier: 32+0 records out																				
May 31 20:19:03 freenas notifier: 33554432 bytes transferred in 1.522776 secs (22035043 bytes/sec)								
May 31 20:19:03 freenas notifier: dd: /dev/da5: short write on character device													
May 31 20:19:03 freenas notifier: dd: /dev/da5: end of device																	  
May 31 20:19:03 freenas notifier: 33+0 records in																				  
May 31 20:19:03 freenas notifier: 32+1 records out																				
May 31 20:19:03 freenas notifier: 33644544 bytes transferred in 0.287583 secs (116990680 bytes/sec)								
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=5976954755807732738						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=7240164681082129914						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=16484351082831648765					  
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=4184686427352321786						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=1028878345902712155						
May 31 20:19:11 freenas savecore: reboot after panic: sbdrop																		
May 31 20:19:11 freenas savecore: writing compressed core to /data/crash/textdump.tar.0.gz										
May 31 20:19:11 freenas notifier: savecore: reboot after panic: sbdrop															
May 31 20:19:11 freenas notifier: savecore: writing compressed core to /data/crash/textdump.tar.0.gz								
May 31 20:19:11 freenas notifier: /data/crash/vmcore.0 not found



I see the HBA in the logs as
May 31 20:30:55 freenas mps0: <Avago Technologies (LSI) SAS2008> port 0xdc00-0xdcff mem 0xdfcb0000-0xdfcbffff,0xdfcc0000-0xdfcfffff
May 31 20:30:55 freenas mps0: Firmware: 20.00.07.00, Driver: 21.01.00.00-fbsd
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Here is the logfile around the crash.

Code:
ay 31 20:18:50 freenas notifier: 33+0 records in																				 
May 31 20:18:50 freenas notifier: 32+1 records out																				
May 31 20:18:50 freenas notifier: 33644544 bytes transferred in 0.292106 secs (115179264 bytes/sec)								
May 31 20:18:54 freenas GEOM: da3: the primary GPT table is corrupt or invalid.													
May 31 20:18:54 freenas GEOM: da3: using the secondary instead -- recovery strongly advised.										
May 31 20:18:54 freenas notifier: 32+0 records in																				 
May 31 20:18:54 freenas notifier: 32+0 records out																				
May 31 20:18:54 freenas notifier: 33554432 bytes transferred in 1.487693 secs (22554674 bytes/sec)								
May 31 20:18:55 freenas notifier: dd: /dev/da3: short write on character device													
May 31 20:18:55 freenas notifier: dd: /dev/da3: end of device																	 
May 31 20:18:55 freenas notifier: 33+0 records in																				 
May 31 20:18:55 freenas notifier: 32+1 records out																				
May 31 20:18:55 freenas notifier: 33644544 bytes transferred in 0.252268 secs (133368219 bytes/sec)								
May 31 20:18:59 freenas GEOM: da4: the primary GPT table is corrupt or invalid.													
May 31 20:18:59 freenas GEOM: da4: using the secondary instead -- recovery strongly advised.										
May 31 20:18:59 freenas notifier: 32+0 records in																				 
May 31 20:18:59 freenas notifier: 32+0 records out																				
May 31 20:18:59 freenas notifier: 33554432 bytes transferred in 1.497406 secs (22408373 bytes/sec)								
May 31 20:18:59 freenas notifier: dd: /dev/da4: short write on character device													
May 31 20:18:59 freenas notifier: dd: /dev/da4: end of device																	 
May 31 20:18:59 freenas notifier: 33+0 records in																				 
May 31 20:18:59 freenas notifier: 32+1 records out																				
May 31 20:18:59 freenas notifier: 33644544 bytes transferred in 0.288836 secs (116483208 bytes/sec)								
May 31 20:19:03 freenas GEOM: da5: the primary GPT table is corrupt or invalid.													
May 31 20:19:03 freenas GEOM: da5: using the secondary instead -- recovery strongly advised.										
May 31 20:19:03 freenas notifier: 32+0 records in																				 
May 31 20:19:03 freenas notifier: 32+0 records out																				
May 31 20:19:03 freenas notifier: 33554432 bytes transferred in 1.522776 secs (22035043 bytes/sec)								
May 31 20:19:03 freenas notifier: dd: /dev/da5: short write on character device													
May 31 20:19:03 freenas notifier: dd: /dev/da5: end of device																	 
May 31 20:19:03 freenas notifier: 33+0 records in																				 
May 31 20:19:03 freenas notifier: 32+1 records out																				
May 31 20:19:03 freenas notifier: 33644544 bytes transferred in 0.287583 secs (116990680 bytes/sec)								
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=5976954755807732738						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=7240164681082129914						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=16484351082831648765					 
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=4184686427352321786						
May 31 20:19:06 freenas ZFS: vdev state changed, pool_guid=6555470622872566251 vdev_guid=1028878345902712155						
May 31 20:19:11 freenas savecore: reboot after panic: sbdrop																		
May 31 20:19:11 freenas savecore: writing compressed core to /data/crash/textdump.tar.0.gz										
May 31 20:19:11 freenas notifier: savecore: reboot after panic: sbdrop															
May 31 20:19:11 freenas notifier: savecore: writing compressed core to /data/crash/textdump.tar.0.gz								
May 31 20:19:11 freenas notifier: /data/crash/vmcore.0 not found



I see the HBA in the logs as
May 31 20:30:55 freenas mps0: <Avago Technologies (LSI) SAS2008> port 0xdc00-0xdcff mem 0xdfcb0000-0xdfcbffff,0xdfcc0000-0xdfcfffff
May 31 20:30:55 freenas mps0: Firmware: 20.00.07.00, Driver: 21.01.00.00-fbsd
Looks like you have disk partition problems that need to be addressed. Review these two threads:

https://forums.freenas.org/index.php?threads/gpt-rejected-may-not-be-recoverable.43813/
https://forums.freenas.org/index.php?threads/unable-to-format-disk.46484/

The cleanest way to proceed would be to destroy your RAIDZ1 pool; then destroy all partitions on the disks; then re-create the pool. Starting over with a 'Clean Slate', so to speak.

You may be able to clean up the disks with gpart destroy -F /dev/[disk ID]; by zeroing the first and last 100GB or so of space with dd; or by running badblocks. All of these approaches are described in the two threads above.

Good luck!
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
So i destroyed all the disks and wiped them via the GUI. Ran
Code:
dd if=/dev/zero of=/dev/da1 bs=512 count=1
on all disks rebooted and resetup the pool. Tried writing to the new pool. Boom... crash again. Checked /var/log/messages nothing odd listed like before. Opened a brand new WD enterprise HDD. stuck it in bay 6. Created stripe and added a data set then a share. Tried copying to it. same result. crash. o_O
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
I just ordered another Perc H310 this time with a different part number. Hopefully a different revision. I'm gonna try that and see if it's the HBA causing this mess.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
So i destroyed all the disks and wiped them via the GUI. Ran
Code:
dd if=/dev/zero of=/dev/da1 bs=512 count=1
on all disks rebooted and resetup the pool. Tried writing to the new pool. Boom... crash again. Checked /var/log/messages nothing odd listed like before. Opened a brand new WD enterprise HDD. stuck it in bay 6. Created stripe and added a data set then a share. Tried copying to it. same result. crash. o_O
Drat! Well... it was worth a shot and at least we've gone a long way towards eliminating partition problems as the root cause; thanks for going to the trouble.

It really is behaving like an intermittent hardware problem -- our favorite kind! :oops:

What kind of diagnostics did you run? MemTest for RAM? Mersenne prime or Passmark for the CPU?

Did you burn in the hard drives? I wrote a disk burn-in script you can access via this thread in Resources: "Github repository for FreeNAS scripts, including disk burnin". Warning: it takes a long, long time to burnin hard drives.

This may be a bug, but with a new system and all it's really hard to say; especially since there aren't any similar bug reports. We need to try to find and eliminate any hardware glitches.
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
I ran the diag that's embedded in the life cycle controller. Ran quick mem test and then a express diag on the rest of the hardware. I may let a extended test run overnight and see what happens or try to use other software to test.

I never burned these disks in. They have been in my old 2u supermicro for I know a good 3 years. They all worked flawlessly before. Only reason for my upgrade is those supermicros are loud and use a ton of power.

I have a gut feeling its got something to do with that Perc. It only happens on write to disk. System runs flawless with everything else. When you start flashing firmware on hardware that it wasn't designed for you're asking for headaches. This guide I used is flashing the 20.00.07.00 firmware. I remember reading somewhere in some comments on a post somewhere in the lost internet that there are different versions of the P20 firmware and some of those versions may cause issues. This is the 07 version. So in the meantime in going to research more on this ordeal.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I ran the diag that's embedded in the life cycle controller. Ran quick mem test and then a express diag on the rest of the hardware. I may let a extended test run overnight and see what happens or try to use other software to test.

I never burned these disks in. They have been in my old 2u supermicro for I know a good 3 years. They all worked flawlessly before. Only reason for my upgrade is those supermicros are loud and use a ton of power.

I have a gut feeling its got something to do with that Perc. It only happens on write to disk. System runs flawless with everything else. When you start flashing firmware on hardware that it wasn't designed for you're asking for headaches. This guide I used is flashing the 20.00.07.00 firmware. I remember reading somewhere in some comments on a post somewhere in the lost internet that there are different versions of the P20 firmware and some of those versions may cause issues. This is the 07 version. So in the meantime in going to research more on this ordeal.
Yeah, it may very well be the HBA...

FWIW, version P20.00.07.00 is reportedly the 'Golden' version -- I run it on my IBM M1015, all 3 of my LSI 9210s, and all 4 of my Dell H200 HBAs with no problems whatsoever.
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Going back over my notes I found something. When I flashed my card and re flashed my SAS address. He lists the command as

Code:
  • s2fp19.exe -o -sasadd 500xxxxxxxxxxxxx (replace this address with the one you wrote down in the first steps).


I didn't add the 500 in the address. I just added the SAS address I wrote down without the 500 before it. Is that required to be in there? If so then that's why this might be happening.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Going back over my notes I found something. When I flashed my card and re flashed my SAS address. He lists the command as

Code:
  • s2fp19.exe -o -sasadd 500xxxxxxxxxxxxx (replace this address with the one you wrote down in the first steps).


I didn't add the 500 in the address. I just added the SAS address I wrote down without the 500 before it. Is that required to be in there? If so then that's why this might be happening.
Well... that might be the problem. I dunno. But it certainly wouldn't hurt to go ahead and assign the card a valid SAS Address.

You can check the SAS Address of your card(s) from a FreeNAS shell with the sas2flash program. If you only have one card: sas2flash -list, if you have multiple cards: sas2flash -listall.
 

hardlivinlow

Dabbler
Joined
May 31, 2017
Messages
30
Code:
Adapter Selected is a LSI SAS: SAS2008(B2)																				
																																	
		Controller Number			  : 0																						
		Controller					 : SAS2008(B2)																				
		PCI Address					: 00:0a:00:00																				
		SAS Address					: 5d4ae52-0-b191-xxxx																		
		NVDATA Version (Default)	   : 14.01.00.08																				
		NVDATA Version (Persistent)	: 14.01.00.08																				
		Firmware Product ID			: 0x2213 (IT)																				
		Firmware Version			   : 20.00.07.00																				
		NVDATA Vendor				  : LSI																						
		NVDATA Product ID			  : SAS9211-8i																				
		BIOS Version				   : N/A																						
		UEFI BSD Version			   : N/A																						
		FCODE Version				  : N/A																						
		Board Name					 : SAS9211-8i																				
		Board Assembly				 : N/A																						
		Board Tracer Number			: N/A																						
																																	
		Finished Processing Commands Successfully.																				
		Exiting SAS2Flash.																										


This is the output. I don't know if that looks normal or not for the SAS address string. I took out the last 4. I looked at more guides and they have the 500 before the 16 digit address. It could be some kinda identifier.
 
Status
Not open for further replies.
Top