To pull an SSD or not to pull an SSD

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
I have a Hyper-V host with four Samsung 850 Pro drives in it, each home to a guest OS install and nothing else. There's also a Intel SSD with the host OS install, but the host is perfectly well behaved and in fact I think it's only ever restarted for Windows Updates maybe five times, total.

Two of the hosts have low-use databases on them. Both of those machines are acting "weird" (full disk backups fail, the databases detach etc) on an irregular basis, and the errors in question suggest to me a disk-based cause.

Samsung doesn't have an official diagnostic app for these drives, but they've been in place for over a year and according to wear leveling, the misbehaving VMs are both on drives at 99% health, having each written less than 1TB of data. BUT Samsung Drive Magician (on the Hyper-V host) says three of the four Samsung drives in the system, in spite of having "Good" status by SMART data, have extremely high, failed Unrecoverable Read Error rates.

Checks with Crystal Disk Info and WinDFT do not corroborate this and show all drives and SMART data (both by raw and interpreted values) as good.

I can't find anything wrong with any of these guys at a software level and it really bothers me that Samsung's kinda-sorta vendor tool is showing problems when nothing else is.

Given that, does it make sense to replace those drives?
 

ddrueding

Fixture
Joined
Feb 4, 2002
Messages
19,522
Location
Horsens, Denmark
If disk backups are failing, you need to pull them. I might try to pull these, erase them every way you can think of, and then restore back. Or just replace them and hold onto these for a less critical project in the future.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
Ordinarily, I'd pull drives at the first sign of problems but here's a case where my diagnostic tools don't agree and I can't create a reproducible issue. SSD drives have odd fail states and thus could just be something I don't recognize, which is why I'm third-guessing at this point.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
Can you pull the drives and run diagnostics on them in another PC connected via SATA and run some diagnostic programs on them that way and see what they tell you?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
Not really. This is production stuff. It's low-use/low volume, but I can't predict when it's being used, either. Right now it's more of an annoyance than a major problem, so I'd rather not put a bunch of billable hours (and the argument over getting them paid) in to dealing with it unless I'm sure I have to.
 

Howell

Storage? I am Storage!
Joined
Feb 24, 2003
Messages
4,740
Location
Chattanooga, TN
Do you have enough free space on the drives to run a disk exerciser at the host level? Is drive magician the tool to run to get an rma code?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
1. No. 2. Looks like. And EVERYTHING says the drives are OK by SMART settings. It's only when I go look at details in the Samsung Software that I see a problem.
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,726
Location
Québec, Québec
Did you, by any chance, configure replication on the VMs on the Samsung SSD? If you did, deactivate it and check if the issue persist. VM replication still doesn't work well on Hyper-V 2012 R2. It created weird issues when I enabled it on some VM in my main cluster, more than a year ago. I had VM refusing to migrate from a host to another in the same cluster and sometimes, I had to force a shutdown, then move or modify the VM before it worked again.
 
Last edited:

LunarMist

I can't believe I'm a Fixture
Joined
Feb 1, 2003
Messages
16,665
Location
USA
I thought those 850 Pro drives were for consumer desktop use. :scratch: I'm curious why you would use them especially when you were complaining so much about Samsung quality and service.
 

mubs

Storage? I am Storage!
Joined
Nov 22, 2002
Messages
4,908
Location
Somewhere in time.
Considering the drives are Samsung, and Magician is created by Samsung, I'd trust their sw more than third party sw.
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
I thought those 850 Pro drives were for consumer desktop use. :scratch: I'm curious why you would use them especially when you were complaining so much about Samsung quality and service.

As I said, this is all low-use stuff, legacy systems (2 of the drives haven't written one full terabyte of data since they've been deployed, including swap space) that are still irregularly needed. Sometimes, we wind up with parts we don't particularly like because they're what is most readily available for purchase; I'd still rather throw my stuff on consumer SSDs than on Enterprise 10krpm SAS drives, for example.

Having now tried four different SMART tools besides Samsung's, they all read a five hex-digit Failed Read Count on those suspect drives according to the raw value and all the drives reports Good health according to SMART values, even in the Samsung software. Checking other SSDs I have, almost all of them report 0 Failed Reads according to raw data. So I guess that couple hundred thousand times a drive didn't manage to read a memory cell is no big deal?
 

Howell

Storage? I am Storage!
Joined
Feb 24, 2003
Messages
4,740
Location
Chattanooga, TN
I guess there is no way to tell when in their lives the bad reads happened, or reset the counter to see if they are continuing?
 

Mercutio

Fatwah on Western Digital
Joined
Jan 17, 2002
Messages
21,593
Location
I am omnipresent
There was no way to reset the counter given the software available, but after watching the drives for a couple weeks, one of the suspect drives had its bad read count increase by 97 (from a base value of a little over 200000) and the other had no change over a two week time period. All SMART reporting still says all the drives are fine. So SOMETHING is happening that Samsung's software calls a bad read. I'm going to pull both drives next time I get a maintenance window for it, which won't be until April.
 
Top