Message-ID: <3DCC252F.65C0F70B@digeo.com>
Date: Fri, 08 Nov 2002 12:57:19 -0800
From: Andrew Morton <akpm@digeo.com>
MIME-Version: 1.0
To: Ross Biro <rossb@google.com>
CC: linux-kernel@vger.kernel.org
Subject: Re: [BUG] Failed writes marked clean?
References: <3DCC1EB5.4020303@google.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2524
Lines: 54

Ross Biro wrote:
> 
> Perhaps I'm reading the code incorrectly, but in kernel versions 2.4.18
> and 2.5.46 it looks to me like in the case of a write, ll_rw_block
> always clears the dirty bit.  In the event of an error, nothing resets
> the dirty bit and the uptodate flag is cleared.  This means that if the
> same block needs to be read again, the buffer cache will see that the
> buffer is not uptodate and attempt to read the old contents of the
> buffer off of the device.  If the read suceeds the kernel ends up
> corrupting data.

That's correct, for metadata.  It may not be fully accurate for
file data, where the page state comes into play as well.

The handling of IO errors is very weird.  Especially for writes.
And poorly tested.  It needs a big revamp and testing.

> It seems to me that a better solution would be to mark the buffer as
> dirty and uptodate and then attempt to propogate the error as far back
> as possible.  Ideally something can be done to correct the problem at a
> higher level.  Before I dive in and attempt to do something about this,
> I wanted to make sure I was not missing anything important.  So am I
> full of it, or could this really be a problem?
> 

Well before going and changing stuff, we need to decide what to
change it _to_.  What do we want to happen if there's a read error?
And a write error?

For reads, it makes sense for the page/buffer to be left not uptodate,
and return an error.

For write errors, marking the page/buffer not uptodate doesn't make
a lot of sense.  Marking it clean makes sense if we're not going to retry
the write.  Marking it dirty, uptodate and unmapped would make sense
if we want to go and try a different part of the disk.  But it
doesn't make sense if the whole disk is dead.

Also, think about what a write error _means_.  Unless the disk is truly
ancient, it means that the device has run out of alternate space for
the block, or all writes are failing.  ie: it is a serious failure.

So perhaps the appropriate strategy on write errors is to mark the
device readonly and to drop all write data on the floor.  That means
clean+mapped+uptodate.

So yes, I think I agree with myself.  Write errors should leave the
page/buffer clean, uptodate, mapped, PageError (whatever the latter
maens...)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/