2009-08-25 00:09:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Tue, Aug 25, 2009 at 01:00:36AM +0200, Pavel Machek wrote:
> Then to answer your question... ext2. You expect to run fsck after
> unclean shutdown, and you expect to have to solve some problems with
> it. So the way ext2 deals with the flash media actually matches what
> the user expects. (*)

But if the 256k hole is in data blocks, fsck won't find a problem,
even with ext2.

And if the 256k hole is the inode table, you will *still* suffer
massive data loss. Fsck will tell you how badly screwed you are, but
it doesn't "fix" the disk; most users don't consider questions of the
form "directory entry <precious-thesis-data> points to trashed inode,
may I delete directory entry?" as being terribly helpful. :-/

> OTOH in ext3 case you expect consistent filesystem after unplug; and
> you don't get that.

You don't get a consistent filesystem with ext2, either. And if your
claim is that several hundred lines of fsck output detailing the
filesystem's destruction somehow makes things all better, I suspect
most users would disagree with you.

In any case, depending on where the flash was writing at the time of
the unplug, the data corruption could be silent anyway.

Maybe this came as a surprise to you, but anyone who has used a
compact flash in a digital camera knows that you ***have*** to wait
until the led has gone out before trying to eject the flash card. I
remember seeing all sorts of horror stories from professional
photographers about how they lost an important wedding's day worth of
pictures with the attendant commercial loss, on various digital
photography forums. It tends to be the sort of mistake that digital
photographers only make once.

(It's worse with people using Digital SLR's shooting in raw mode,
since it can take upwards of 30 seconds or more to write out a 12-30MB
raw image, and if you eject at the wrong time, you can trash the
contents of the entire CF card; in the worst case, the Flash
Translation Layer data can get corrupted, and the card is completely
ruined; you can't even reformat it at the filesystem level, but have
to get a special Windows program from the CF manufacturer to --maybe--
reset the FTL layer. Early CF cards were especially vulnerable to
this; more recent CF cards are better, but it's a known failure mode
of CF cards.)

- Ted


2009-08-25 09:42:51

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Mon 2009-08-24 20:08:42, Theodore Tso wrote:
> On Tue, Aug 25, 2009 at 01:00:36AM +0200, Pavel Machek wrote:
> > Then to answer your question... ext2. You expect to run fsck after
> > unclean shutdown, and you expect to have to solve some problems with
> > it. So the way ext2 deals with the flash media actually matches what
> > the user expects. (*)
>
> But if the 256k hole is in data blocks, fsck won't find a problem,
> even with ext2.

True.

> And if the 256k hole is the inode table, you will *still* suffer
> massive data loss. Fsck will tell you how badly screwed you are, but
> it doesn't "fix" the disk; most users don't consider questions of the
> form "directory entry <precious-thesis-data> points to trashed inode,
> may I delete directory entry?" as being terribly helpful. :-/

Well it will fix the disk in the end. And no, "directory entry
<precious-thesis-data> points to trashed inode, may I delete directory
entry?" is not _terribly_ helpful, but it is slightly helpful and
people actually expect that from ext2.

> Maybe this came as a surprise to you, but anyone who has used a
> compact flash in a digital camera knows that you ***have*** to wait
> until the led has gone out before trying to eject the flash card. I
> remember seeing all sorts of horror stories from professional
> photographers about how they lost an important wedding's day worth of
> pictures with the attendant commercial loss, on various digital
> photography forums. It tends to be the sort of mistake that digital
> photographers only make once.

It actually comes as surprise to me. Actually yes and no. I know that
digital cameras use VFAT, so pulling CF card out of it may do bad
thing, unless I run fsck.vfat afterwards. If digital camera was using
ext3, I'd expect it to be safely pullable at any time.

Will IBM microdrive do any difference there?

Anyway, it was not known to me. Rather than claiming "everyone knows"
(when clearly very few people really understand all the details), can
we simply document that?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-08-27 03:35:12

by Rob Landley

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Monday 24 August 2009 19:08:42 Theodore Tso wrote:
> And if your
> claim is that several hundred lines of fsck output detailing the
> filesystem's destruction somehow makes things all better, I suspect
> most users would disagree with you.

Suppose a small office makes nightly backups to an offsite server via rsync. If
a thunderstorm goes by causing their system to reboot twice in a 15 minute
period, would they rather notice the filesystem corruption immediately upon
reboot, or notice after the next rsync?

> In any case, depending on where the flash was writing at the time of
> the unplug, the data corruption could be silent anyway.

Yup. Hopefully btrfs will cope less badly? They keep talking about
checksumming extents...

> Maybe this came as a surprise to you, but anyone who has used a
> compact flash in a digital camera knows that you ***have*** to wait
> until the led has gone out before trying to eject the flash card.

I doubt the cupholder crowd is going to stop treating USB sticks as magical
any time soon, but I also wonder how many of them even remember Linux _exists_
anymore.

> I
> remember seeing all sorts of horror stories from professional
> photographers about how they lost an important wedding's day worth of
> pictures with the attendant commercial loss, on various digital
> photography forums. It tends to be the sort of mistake that digital
> photographers only make once.

Professionals have horror stories about this issue, therefore documenting it
is _less_ important?

Ok...

Rob
--
Latency is more important than throughput. It's that simple. - Linus Torvalds

2009-08-27 08:46:53

by David Woodhouse

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Mon, 2009-08-24 at 20:08 -0400, Theodore Tso wrote:
>
> (It's worse with people using Digital SLR's shooting in raw mode,
> since it can take upwards of 30 seconds or more to write out a 12-30MB
> raw image, and if you eject at the wrong time, you can trash the
> contents of the entire CF card; in the worst case, the Flash
> Translation Layer data can get corrupted, and the card is completely
> ruined; you can't even reformat it at the filesystem level, but have
> to get a special Windows program from the CF manufacturer to --maybe--
> reset the FTL layer.

This just goes to show why having this "translation layer" done in
firmware on the device itself is a _bad_ idea. We're much better off
when we have full access to the underlying flash and the OS can actually
see what's going on. That way, we can actually debug, fix and recover
from such problems.

> Early CF cards were especially vulnerable to
> this; more recent CF cards are better, but it's a known failure mode
> of CF cards.)

It's a known failure mode of _everything_ that uses flash to pretend to
be a block device. As I see it, there are no SSD devices which don't
lose data; there are only SSD devices which haven't lost your data
_yet_.

There's no fundamental reason why it should be this way; it just is.

(I'm kind of hoping that the shiny new expensive ones that everyone's
talking about right now, that I shouldn't really be slagging off, are
actually OK. But they're still new, and I'm certainly not trusting them
with my own data _quite_ yet.)

--
dwmw2


2009-08-28 14:46:42

by David Lang

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Thu, 27 Aug 2009, David Woodhouse wrote:

> On Mon, 2009-08-24 at 20:08 -0400, Theodore Tso wrote:
>>
>> (It's worse with people using Digital SLR's shooting in raw mode,
>> since it can take upwards of 30 seconds or more to write out a 12-30MB
>> raw image, and if you eject at the wrong time, you can trash the
>> contents of the entire CF card; in the worst case, the Flash
>> Translation Layer data can get corrupted, and the card is completely
>> ruined; you can't even reformat it at the filesystem level, but have
>> to get a special Windows program from the CF manufacturer to --maybe--
>> reset the FTL layer.
>
> This just goes to show why having this "translation layer" done in
> firmware on the device itself is a _bad_ idea. We're much better off
> when we have full access to the underlying flash and the OS can actually
> see what's going on. That way, we can actually debug, fix and recover
> from such problems.
>
>> Early CF cards were especially vulnerable to
>> this; more recent CF cards are better, but it's a known failure mode
>> of CF cards.)
>
> It's a known failure mode of _everything_ that uses flash to pretend to
> be a block device. As I see it, there are no SSD devices which don't
> lose data; there are only SSD devices which haven't lost your data
> _yet_.
>
> There's no fundamental reason why it should be this way; it just is.
>
> (I'm kind of hoping that the shiny new expensive ones that everyone's
> talking about right now, that I shouldn't really be slagging off, are
> actually OK. But they're still new, and I'm certainly not trusting them
> with my own data _quite_ yet.)

so what sort of test would be needed to identify if a device has this
problem?

people can do ad-hoc tests by pulling the devices in use and then checking
the entire device, but something better should be available.

it seems to me that there are two things needed to define the tests.

1. a predictable write load so that it's easy to detect data getting lose

2. some statistical analysis to decide how many device pulls are needed
(under the write load defined in #1) to make the odds high that the
problem will be revealed.

with this we could have people test various devices and report if the test
detects unrelated data being lost (or businesses, and I think the tech
hardware sites would jump into this given some sort of accepted test)

for USB devices there may be a way to use the power management functions
to cut power to the device without requiring it to physically be pulled,
if this is the case (even if this only works on some specific chipsets),
it would drasticly speed up the testing

David Lang

2009-08-29 10:09:09

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Fri 2009-08-28 07:46:42, [email protected] wrote:
> On Thu, 27 Aug 2009, David Woodhouse wrote:
>
>> On Mon, 2009-08-24 at 20:08 -0400, Theodore Tso wrote:
>>>
>>> (It's worse with people using Digital SLR's shooting in raw mode,
>>> since it can take upwards of 30 seconds or more to write out a 12-30MB
>>> raw image, and if you eject at the wrong time, you can trash the
>>> contents of the entire CF card; in the worst case, the Flash
>>> Translation Layer data can get corrupted, and the card is completely
>>> ruined; you can't even reformat it at the filesystem level, but have
>>> to get a special Windows program from the CF manufacturer to --maybe--
>>> reset the FTL layer.
>>
>> This just goes to show why having this "translation layer" done in
>> firmware on the device itself is a _bad_ idea. We're much better off
>> when we have full access to the underlying flash and the OS can actually
>> see what's going on. That way, we can actually debug, fix and recover
>> from such problems.
>>
>>> Early CF cards were especially vulnerable to
>>> this; more recent CF cards are better, but it's a known failure mode
>>> of CF cards.)
>>
>> It's a known failure mode of _everything_ that uses flash to pretend to
>> be a block device. As I see it, there are no SSD devices which don't
>> lose data; there are only SSD devices which haven't lost your data
>> _yet_.
>>
>> There's no fundamental reason why it should be this way; it just is.
>>
>> (I'm kind of hoping that the shiny new expensive ones that everyone's
>> talking about right now, that I shouldn't really be slagging off, are
>> actually OK. But they're still new, and I'm certainly not trusting them
>> with my own data _quite_ yet.)
>
> so what sort of test would be needed to identify if a device has this
> problem?
>
> people can do ad-hoc tests by pulling the devices in use and then
> checking the entire device, but something better should be available.
>
> it seems to me that there are two things needed to define the tests.
>
> 1. a predictable write load so that it's easy to detect data getting lose
>
> 2. some statistical analysis to decide how many device pulls are needed
> (under the write load defined in #1) to make the odds high that the
> problem will be revealed.

Its simpler than that. It usually breaks after third unplug or so.

> for USB devices there may be a way to use the power management functions
> to cut power to the device without requiring it to physically be pulled,
> if this is the case (even if this only works on some specific chipsets),
> it would drasticly speed up the testing

This is really so easy to reproduce, that such speedup is not
neccessary. Just try the scripts :-).
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-08-29 16:27:32

by David Lang

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

On Sat, 29 Aug 2009, Pavel Machek wrote:

> On Fri 2009-08-28 07:46:42, [email protected] wrote:
>>
>>
>> so what sort of test would be needed to identify if a device has this
>> problem?
>>
>> people can do ad-hoc tests by pulling the devices in use and then
>> checking the entire device, but something better should be available.
>>
>> it seems to me that there are two things needed to define the tests.
>>
>> 1. a predictable write load so that it's easy to detect data getting lose
>>
>> 2. some statistical analysis to decide how many device pulls are needed
>> (under the write load defined in #1) to make the odds high that the
>> problem will be revealed.
>
> Its simpler than that. It usually breaks after third unplug or so.
>
>> for USB devices there may be a way to use the power management functions
>> to cut power to the device without requiring it to physically be pulled,
>> if this is the case (even if this only works on some specific chipsets),
>> it would drasticly speed up the testing
>
> This is really so easy to reproduce, that such speedup is not
> neccessary. Just try the scripts :-).

so if it doesn't get corrupted after 5 unplugs does that mean that that
particular device doesn't have a problem? or does it just mean you got
lucky?

would 10 sucessful unplugs mean that it's safe?

what about 20?

we need to get this beyond anecdotal evidence mode, to something that
(even if not perfect, as you can get 100 'heads' in a row with an honest
coin) gives you pretty good assurances that a particular device is either
good or bad.

David Lang

2009-08-29 21:33:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [patch] ext2/3: document conditions when reliable operation is possible

Hi!

>> This is really so easy to reproduce, that such speedup is not
>> neccessary. Just try the scripts :-).
>
> so if it doesn't get corrupted after 5 unplugs does that mean that that
> particular device doesn't have a problem? or does it just mean you got
> lucky?
>
> would 10 sucessful unplugs mean that it's safe?
>
> what about 20?

I'd say 20 means its safe.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html