2002-04-13 23:00:14

by Stephen C. Tweedie

[permalink] [raw]
Subject: [RFC] Patch: aliasing bug in blockdev-in-pagecache?

Hi all,

I think there's a data-corruption possible in both ext2 and ext3 (and
in fact any filesystem which uses bread) if user space is reading from
the filesystem's buffered block device during live fs activity.

I've recently been seeing an ext3 assert failure where the filesystem
comes across a buffer which is unexpectedly locked, while running a
dump(8) on the live filesystem. Now, I do _not_ want to start another
debate about whether it's sane to do that or not! But the mounted
kernel filesystem should still survive if the user is only reading
from the buffered device.

The problem turns out to be in block_read_full_page(). The page cache
IO code assumes that IO is synchronised at the page level, but any
kernel code using getblk() (or sb_read() or related functions) to
access the buffers at the same time will bypass the page locking.

So if block_read_full_page() encounters a page which bread() is
already reading into cache, it will see the page unlocked but the
buffer_head locked and !uptodate. However, it ignores the bh lock,
seeing only that the buffer is !uptodate, and so it then locks the bh
itself and submits a read IO.

Unfortunately, if the bread() wins the race for the wait_on_buffer
after the initial IO, we can already have started to modify the buffer
contents by the time that block_read_full_page() starts the new IO.
So we read stale contents from disk on top of the modified contents in
cache.

To solve this, we really do need to have block_read_full_page() test
the uptodate state under protection of the buffer_head lock. We
already go through 3 stages in block_read_full_page(): gather the
buffers needing IO, then lock them, then submit the IO. To be safe,
we need a final test for buffer_uptodate() *after* we have locked the
required buffers.

I've verified that the scenario above is definitely happening, and I'm
currently testing the patch below. I'm using 2.4 for the testing
right now, but 2.5 seems to have the same problem at first glance.

Comments?

--Stephen


--- fs/buffer.c.~1~ Fri Apr 12 17:59:09 2002
+++ fs/buffer.c Sat Apr 13 21:09:36 2002
@@ -1902,9 +1902,14 @@
}

/* Stage 3: start the IO */
- for (i = 0; i < nr; i++)
- submit_bh(READ, arr[i]);
-
+ for (i = 0; i < nr; i++) {
+ struct buffer_head * bh = arr[i];
+ if (buffer_uptodate(bh))
+ end_buffer_io_async(bh, 1);
+ else
+ submit_bh(READ, bh);
+ }
+
return 0;
}


2002-04-13 23:16:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Patch: aliasing bug in blockdev-in-pagecache?

"Stephen C. Tweedie" wrote:
>
> ...
> --- fs/buffer.c.~1~ Fri Apr 12 17:59:09 2002
> +++ fs/buffer.c Sat Apr 13 21:09:36 2002
> @@ -1902,9 +1902,14 @@
> }
>
> /* Stage 3: start the IO */
> - for (i = 0; i < nr; i++)
> - submit_bh(READ, arr[i]);
> -
> + for (i = 0; i < nr; i++) {
> + struct buffer_head * bh = arr[i];
> + if (buffer_uptodate(bh))
> + end_buffer_io_async(bh, 1);
> + else
> + submit_bh(READ, bh);
> + }
> +
> return 0;
> }
>

Agree. I have debug checks for this in the 2.5 code and
they do not come out in normal use, so no probs there.

I'm starting to tighten all of this up in the patches
which I'm working on. Move the debug checks out of
ll_rw_block and into submit_bh. submit_bh will complain about

- writing non-uptodate buffer
- reading uptodate buffer
- reading dirty buffer
- reading or writing unmapped buffer
- anything else I can think of.

This means that a number of callers need to be changed
to set the bits correctly, which is pointless work.

But really, I think we need that formality - we keep on
making obscure mistakes in this area and the costs are
high.

-

2002-04-14 12:59:18

by Alexander Viro

[permalink] [raw]
Subject: Re: [RFC] Patch: aliasing bug in blockdev-in-pagecache?



On Sat, 13 Apr 2002, Stephen C. Tweedie wrote:

> To solve this, we really do need to have block_read_full_page() test
> the uptodate state under protection of the buffer_head lock. We
> already go through 3 stages in block_read_full_page(): gather the
> buffers needing IO, then lock them, then submit the IO. To be safe,
> we need a final test for buffer_uptodate() *after* we have locked the
> required buffers.

Ouch.

I suspect that correct fix is to do that test in submit_bh() itself
(and remove it from ll_rw_block()). IMO it's cleaner than messing
with all callers out there... Linus?

2002-04-14 13:09:12

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [RFC] Patch: aliasing bug in blockdev-in-pagecache?

Hi,

On Sun, Apr 14, 2002 at 08:59:15AM -0400, Alexander Viro wrote:

> On Sat, 13 Apr 2002, Stephen C. Tweedie wrote:
>
> > To solve this, we really do need to have block_read_full_page() test
> > the uptodate state under protection of the buffer_head lock. We
> > already go through 3 stages in block_read_full_page(): gather the
> > buffers needing IO, then lock them, then submit the IO. To be safe,
> > we need a final test for buffer_uptodate() *after* we have locked the
> > required buffers.
>
> Ouch.
>
> I suspect that correct fix is to do that test in submit_bh() itself
> (and remove it from ll_rw_block()). IMO it's cleaner than messing
> with all callers out there... Linus?

Actually, if we move the test to submit_bh(), we _do_ need to mess
with all callers.

submit_bh() is currently an unconditional demand to perform a given IO
regardless of buffer state, so lots of callers (ext3 journal writes,
soft raid mirror writes etc) call it with private buffer_heads which
just don't have any persistent dirty / uptodate state. Changing
submit_bh() still means we need to audit all callers to make sure that
they set up those state flags correctly even for private bh'es.

Cheers,
Stephen

2002-04-16 18:02:00

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [RFC] Patch: aliasing bug in blockdev-in-pagecache?

On Sat, Apr 13, 2002 at 11:59:48PM +0100, Stephen C. Tweedie wrote:
> --- fs/buffer.c.~1~ Fri Apr 12 17:59:09 2002
> +++ fs/buffer.c Sat Apr 13 21:09:36 2002
> @@ -1902,9 +1902,14 @@
> }
>
> /* Stage 3: start the IO */
> - for (i = 0; i < nr; i++)
> - submit_bh(READ, arr[i]);
> -
> + for (i = 0; i < nr; i++) {
> + struct buffer_head * bh = arr[i];
> + if (buffer_uptodate(bh))
> + end_buffer_io_async(bh, 1);
> + else
> + submit_bh(READ, bh);
> + }
> +
> return 0;
> }

looks fine from my part too, thanks!

Andrea