Power-Loss-Protected SSDs Tested

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
There is no mention of whether write cache was enabled or disabled on the drives.
What operating system was being used?
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,726
Location
Québec, Québec
He wrote that he used the drive under a Linux distribution of some sort.

That was a very interesting read. Thanks for sharing. I'd really like to know if the drives using the LAMB controller (Seagate 600 series and Corsair Neutron GTX) would show the same data corruption as other non-Intel drives he's tested. I bet they wouldn't.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
I have the same question this report as the last one. Where was power disabled from? Were the mains switched or the power to the drive while the rest of the system stayed powered?

The report says that no drive failed when the mains were switched with the OS up and booted... If you have to cut power to just the SSD with the rest of the system up then frankly is the test useful (not sure if this was done or not)? I guess I'd want to see what happens when system (not drive) power is cut while the system is writing to the drive (maybe this is what was done, it's not clear). I'd also like to see what a spinning HDDs do as a comparison point.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
If write cache was disabled, then what is the point of the test. If write cache is disabled on a spinning hard drive, there is a 99% chance you are going to lose or corrupt the data too. ( been there, done that, paid the price) Even Windows warns you that you can lose data if you disable write caching.
I don't see the point of the test.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
Don't you mean enabled? When the write caches are disabled the data is written directly to the disc and system waits for that to happen before moving on. The chances for data loss or corruption are much smaller.
 

Bozo

Storage? I am Storage!
Joined
Feb 12, 2002
Messages
4,396
Location
Twilight Zone
Correct. I was looking at "Turn off Windows write cache buffer flushing......" which also can lose data.
 

Chewy509

Wotty wot wot.
Joined
Nov 8, 2006
Messages
3,327
Location
Gold Coast Hinterland, Australia
Having had more time more time to read the report over (and a few similar reports/papers on the IEEE/ACM websites), there appears to be a fundamental issue with some SSD designs, in that they make no guarantee that the underlying storage will remain consistent in the event of a power failure. SSD models targeted at Enterprise mostly highlight power-loss-protection, but some do not.

The linked ./ article and paper specifically tested SSDs that specify they have circuitry designed to handle sudden power loss. What the report states, that even when SSDs claim to have power-loss protection (typically in the form of a supercap, which can provide enough power to flush the onboard RAM cache and update the FTTs to NAND), it's certainly not that well tested and/or implemented. (I would also like to point out, that the SSDs tested all claimed to have this functionally and were below a certain price point).

Looking around, I can't find any mention of LAMB based SSDs offering power-loss protection in their datasheets as well (sorry Coug, those models might do well in the test, but there are no guarantees). (FYI, The Seagate 600 Pro does list power-protection, the non-Pro does not). The initial report (and this isn't mentioned in a clear manner), was looking for SSDs to be used in an embedded field-deployed Linux computer for sensor gathering, not for general desktop use. (So the usage patterns will be different).

What the reports highlights, that some SSDs will suffer failure in the event of sudden power loss (as the power loss will stuff up the FTT (Flash Translation Tables) used for NAND cell to LBA mapping), and that some SSDs are not capable of handling large amounts of concurrent read/writes without either, significant loss of performance, or in some cases (and OCZ appears to be at fault here), firmware errors will cause catastrophic failure when the SSD experiences massive amounts of concurrent read/writes. The FTT corruption is significant, as many controllers either use compression and/or encryption on the NAND, which means a perfectly working unit has lost all of it's data... (There are some reports on ./, that some drives store their firmware in the NAND flash also used for data, and corrupt FTTs causes the firmware to be overwritten with user data causing catastrophic failure).

As far as I'm concerned, if the primary non-volatile storage cannot guarantee against data loss for all completed write operations as issued by the Operating System** in the event of power-loss or under heavy concurrent read/write operations, then that device has no place being used by the general consumer public... This is clearly the case for some SSDs on the market, and the report linked to, certainly highlights that.

**Note: even though write caching may be off, metadata, like updating the FTTs may still be in progress in the background within the SSD as this is 100% transparent to the OS...
 

jtr1962

Storage? I am Storage!
Joined
Jan 25, 2002
Messages
4,174
Location
Flushing, New York
Maybe it's just me, but it certainly doesn't seem to be an insurmountable issue to mount a supercap on the SSD controller board to power the SSD for the few seconds it might take to complete write operations after a power loss.
 

Chewy509

Wotty wot wot.
Joined
Nov 8, 2006
Messages
3,327
Location
Gold Coast Hinterland, Australia
Maybe it's just me, but it certainly doesn't seem to be an insurmountable issue to mount a supercap on the SSD controller board to power the SSD for the few seconds it might take to complete write operations after a power loss.
Agree 100%, and most Enterprise targeted models already do... but most consumer orientated SSDs do not (eg the ones that most people purchase), all to save a few dollars per unit and/or to justify the higher price on enterprise models.
 

time

Storage? I am Storage!
Joined
Jan 18, 2002
Messages
4,932
Location
Brisbane, Oz
I admit I didn't read the 'paper' that thoroughly, because it rapidly became crystal clear that there was almost a complete absence of scientific rigor. There is plenty of hearsay and innuendo, with unwarranted conclusions drawn from unrealistic tests.

Rather than losing data through power interruptions, it became clear that the OCZ units had actually crapped out during "heavy concurrent read/write operations". So the whole premise for this stunt was flawed.

I can certainly see why some people pegged him as an "Intel shill". He repeatedly asserts that Intel is the *only* brand to consider, even though the two products he refers to are practically the only ones to even have 'power-loss protection'. For example, the 335 doesn't have it and the X25-E didn't either.

He also condemns the Crucial M4, but it doesn't even have a 'super-capacitor'. There's a world of difference between writing firmware to minimize disruption problems and actually having the necessary hardware. That's why people were complaining that he should have tested the M500, which *does* claim to have power-loss protection. There's been no satisfactory answer to why he chose to test the obsolete model for this and other brands.

And then there's why didn't he test Samsung or SanDisk, the biggest OEM brands? And then there's the nature of the test, which frankly would probably always fail with an OS like Windows or one of the more user-friendly Linux flavors. Power failure every 9 to 25 seconds - really? (Happy to be corrected on this, I wasted a couple hours of my life on this the other night and may well have missed something).

(There are some reports on ./, that some drives store their firmware in the NAND flash also used for data, and corrupt FTTs causes the firmware to be overwritten with user data causing catastrophic failure).

And for all I know they may even be true, but it's complete hearsay and may well turn out to be utter bollocks.

As far as I'm concerned, if the primary non-volatile storage cannot guarantee against data loss for all completed write operations as issued by the Operating System** in the event of power-loss or under heavy concurrent read/write operations, then that device has no place being used by the general consumer public...

How many different ways would you like this statement to be shot down? You're a (former) IT infrastructure support guy - how many RAID controllers have battery backup? If you pull the plug on an active MS Exchange server (substitute advanced product of your choice) without a UPS, what are your chances of losing the odd datum, or even having to spend hours trying to rebuild the sod? And presumably elevator seeking can't be used on your planet?

There are other, genuine research papers available (that don't promote a particular company) that canvas the issue. I certainly think it's a concern, but frankly it's well down my list of possible data loss worries.

BTW, his CV mentions that he managed to "increase response times" on some project. Well done that man.
 

CougTek

Hairy Aussie
Joined
Jan 21, 2002
Messages
8,726
Location
Québec, Québec
And presumably elevator seeking can't be used on your planet?
That's uncalled for. Please be nice with Chewy. It would be a great lost for us if someone would piss him off over some trivial technical debate. Behave Aussies, behave.

No matter if the process was rigourous or (apparently) not, pointing the issue of data lost in case of sudden power-loss or extensive concurrent read/write operations on certain models under certain conditions is helpful to some of us. Looking for power-loss protection on SSDs I'll put in servers will now be on my short list. I wasn't aware of the potential problem before.
 

Chewy509

Wotty wot wot.
Joined
Nov 8, 2006
Messages
3,327
Location
Gold Coast Hinterland, Australia
Rather than losing data through power interruptions, it became clear that the OCZ units had actually crapped out during "heavy concurrent read/write operations". So the whole premise for this stunt was flawed.
How is the premise flawed? He initially suspected the OCZ units were failing due to power loss, however through testing found the real root cause for their failure. He continued to test his initial suspicion on further units to ensure that they actually did what they were supposed to do...

I can certainly see why some people pegged him as an "Intel shill". He repeatedly asserts that Intel is the *only* brand to consider, even though the two products he refers to are practically the only ones to even have 'power-loss protection'. For example, the 335 doesn't have it and the X25-E didn't either.
The only brand to consider if looking for power-loss protection, as from his testing, they were the only brand to pass his tests.

He also condemns the Crucial M4, but it doesn't even have a 'super-capacitor'. There's a world of difference between writing firmware to minimize disruption problems and actually having the necessary hardware. That's why people were complaining that he should have tested the M500, which *does* claim to have power-loss protection. There's been no satisfactory answer to why he chose to test the obsolete model for this and other brands.
The requirements for the units under test were: Power-loss protection, and below a certain price point from his supplier. Maybe (and I can't confirm this), is that the M500 was out of the price bracket, and the only units in the required price bracket were older/obsolete units?

And then there's why didn't he test Samsung or SanDisk, the biggest OEM brands?
I agree on this... maybe it's related to above: How many of those units have power-loss protection that were within the pricing bracket and available through his supplier?

And then there's the nature of the test, which frankly would probably always fail with an OS like Windows or one of the more user-friendly Linux flavors.
Modern filesystems like NTFS, ext4, ZFS, JFS, XFS are all journal based filesystems, at most a sudden power loss should only result in some user data lost, but no filesystem corruption.

Power failure every 9 to 25 seconds - really? (Happy to be corrected on this, I wasted a couple hours of my life on this the other night and may well have missed something).
If you're trying to test how something handles a certain situation (which may have timing variables and race conditions) you test it as often as possible in the shortest time possible to get meaningful results. By cycling the power every 9-25 seconds, he was trying to determine how successful the power loss protection is... Yes, it's not going to happen in the real-world, but if his tests show that the unit may fail on a 1/900 chance due to sudden power loss (eg it failed on the 900th test), the 1/900 chance result is meaningful.

How many different ways would you like this statement to be shot down? You're a (former) IT infrastructure support guy - how many RAID controllers have battery backup?
Sadly, not enough...

If you pull the plug on an active MS Exchange server (substitute advanced product of your choice) without a UPS, what are your chances of losing the odd datum, or even having to spend hours trying to rebuild the sod?
Granted, while the filesystem may be OK, there will be some data loss in the files used by the application, which, yes will result in having to fix that application. (but, that's what backups are for and disaster recovery plans).
But the test wasn't for filesystem corruption, but for device failure... Losing 1-2 sectors of information isn't that bad, compared to losing the entire device... Scenario, I have a zpool/RAID of 20x 15K SAS HDDs, and I lose 1-2 sectors of information due to power loss, versus, I have a zpool of 10x SSDs, and I just lost half of them (as the FTTs on all of them corrupted)...

There are other, genuine research papers available (that don't promote a particular company) that canvas the issue. I certainly think it's a concern, but frankly it's well down my list of possible data loss worries.
And those other research papers have similar findings... (There is a lot of marketing bullshit being spewed by companies, and you need to test if they truly live up to their claims).

In the real-world, you're not likely to suffer a sudden power loss, due to varying number of factors (eg, battery on a laptop/tablet, UPS, battery-backed RAID controller, etc). Therefore, I agree, the loss of data due to this is very small compared to other causes of data loss. However, if you're planning to deploy SSDs, you need to weigh the risk of this sort of power loss happening vs using traditional HDDs where sudden power loss is almost certain to not kill the drive completely... (sure a few lost sectors, but not the whole drive).

PS. happy new year! Wishing everyone a safe and joyful 2014.
 

Stereodude

Not really a
Joined
Jan 22, 2002
Messages
10,865
Location
Michigan
How is the premise flawed? He initially suspected the OCZ units were failing due to power loss, however through testing found the real root cause for their failure. He continued to test his initial suspicion on further units to ensure that they actually did what they were supposed to do...
It's like testing a bunch of screwdrivers to see which one makes the best hammer. Unless you routinely need to use a screwdriver as a hammer the whole thing is basically pointless.
 
Top