2001-10-03 16:17:31

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: ReiserFS data corruption in very simple configuration

Hi,

On Mon, Oct 01, 2001 at 07:27:31PM +0400, Hans Reiser wrote:
> This is the meaning of metadata journaling: that writes in progress at the time
> of the crash may write garbage, but you won't need to fsck. You can get this
> behaviour with other filesystems like FFS also. If you cannot accept those
> terms of service, you might use ext3 with data journaling on, but then your
> performance will be far worse.

ext3 with ordered data writes has performance nearly up to the level
of the fast-and-loose writeback mode for most workloads, and still
avoids ever exposing stale disk blocks after a crash.

Sure, it's a tradeoff, but there are positions between the two
extremes (totally unordered data writes, and totally journaled data
writes) which offer a good compromise here.

Cheers,
Stephen


2001-10-03 20:16:34

by Pascal Schmidt

[permalink] [raw]
Subject: Re: ReiserFS data corruption in very simple configuration

On Wed, 3 Oct 2001, Stephen C. Tweedie wrote:

> ext3 with ordered data writes has performance nearly up to the level
> of the fast-and-loose writeback mode for most workloads, and still
> avoids ever exposing stale disk blocks after a crash.
What if the machine crashes with parts of the data blocks written to
disk, before the commit block gets submitted to the drive?

The journal will tell us that the write transaction hasn't finished, but
that doesn't mean that no data blocks made it to disk, right? We won't
expose stale disk blocks, right, but there is still a mix between new and
old file data in this situation. I assume e2fsck will warn about this?

--
Ciao, Pascal

-<[ [email protected], netmail 2:241/215.72, home http://cobol.cjb.net/) ]>-

2001-10-04 11:01:48

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: ReiserFS data corruption in very simple configuration

Hi,

On Wed, Oct 03, 2001 at 10:06:58PM +0200, Pascal Schmidt wrote:
> On Wed, 3 Oct 2001, Stephen C. Tweedie wrote:
>
> > ext3 with ordered data writes has performance nearly up to the level
> > of the fast-and-loose writeback mode for most workloads, and still
> > avoids ever exposing stale disk blocks after a crash.
> What if the machine crashes with parts of the data blocks written to
> disk, before the commit block gets submitted to the drive?

In most cases, users write data by extending off the end of a file.
Only in a few cases (such as databases) do you ever write into the
middle of an existing file. Even overwriting an existing file is done
by first truncating the file and then extending it again.

If you crash during such an extend, then the data blocks may have been
partially written, but the extend will not have been, so the
incompletely-written data blocks will not be part of any file.

The *only* way to get mis-ordered data blocks in ordered mode after a
crash is if you are overwriting in the middle of an existing file. In
such a case there is no absolute guarantee about write ordering unless
you use fsync() or O_SYNC to force writes in a particular order.

In journaled data mode, even mid-file overwrites will be strictly
ordered after a crash.

Cheers,
Stephen