LinuxLists.cc - [patch] fix race in __block_prepare

2005-04-21 06:18:29

by Nick Piggin

[permalink] [raw]

Subject: [patch] fix race in __block_prepare_write (again)

... I somehow didn't send it to Andrew last time.

--
SUSE Labs, Novell Inc.

Attachments:

__block_prepare_write-bug.patch (1.12 kB)

2005-04-21 07:01:32

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

Hi,

On Thu, 21 Apr 2005, Nick Piggin wrote:
> ... I somehow didn't send it to Andrew last time.
>
> Fix a race where __block_prepare_write can leak out an in-flight
> read against a bh if get_block returns an error. This can lead to
> the page becoming unlocked while the buffer is locked and the read
> still in flight. __mpage_writepage BUGs on this condition.
[snip]
> --- linux-2.6.orig/fs/buffer.c 2005-04-21 11:55:17.549614278
+1000
> +++ linux-2.6/fs/buffer.c 2005-04-21 15:55:41.483826075 +1000
> @@ -1988,6 +1988,7 @@
> *wait_bh++=bh;
> }
> }
> +out:
> /*
> * If we issued read requests - let them complete.
> */
> @@ -1996,8 +1997,9 @@
> if (!buffer_uptodate(*wait_bh))
> return -EIO;
> }
> - return 0;
> -out:
> + if (!err)
> + return err;
> +
> /*
> * Zero out any newly allocated blocks to avoid exposing stale
> * data. If BH_New is set, we know that the block was newly

Any reason why you left the goto out? It would be IMO much cleaner to
remove the label "out" altogether and replace the single "goto out" with a
"break" (which is fine since the goto happens inside the for loop
immediately after which you place the label.)

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-04-21 07:10:21

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

And one more thing...

On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote:
> On Thu, 21 Apr 2005, Nick Piggin wrote:
> > ... I somehow didn't send it to Andrew last time.
> >
> > Fix a race where __block_prepare_write can leak out an in-flight
> > read against a bh if get_block returns an error. This can lead to
> > the page becoming unlocked while the buffer is locked and the read
> > still in flight. __mpage_writepage BUGs on this condition.
> [snip]
> > --- linux-2.6.orig/fs/buffer.c 2005-04-21 11:55:17.549614278
> +1000
> > +++ linux-2.6/fs/buffer.c 2005-04-21 15:55:41.483826075 +1000
> > @@ -1988,6 +1988,7 @@
> > *wait_bh++=bh;
> > }
> > }
> > +out:
> > /*
> > * If we issued read requests - let them complete.
> > */
> > @@ -1996,8 +1997,9 @@
> > if (!buffer_uptodate(*wait_bh))
> > return -EIO;

This return is now wrong after your patch. It should be "err = -EIO;"
otherwise you do not zero newly allocated blocks and thus risk exposing
stale data on buffer i/o errors.

> > }
> > - return 0;
> > -out:
> > + if (!err)
> > + return err;
> > +
> > /*
> > * Zero out any newly allocated blocks to avoid exposing stale
> > * data. If BH_New is set, we know that the block was newly
>
> Any reason why you left the goto out? It would be IMO much cleaner to
> remove the label "out" altogether and replace the single "goto out" with a
> "break" (which is fine since the goto happens inside the for loop
> immediately after which you place the label.)

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-04-21 07:13:37

by Nick Piggin

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote:

> Any reason why you left the goto out? It would be IMO much cleaner to
> remove the label "out" altogether and replace the single "goto out" with a
> "break" (which is fine since the goto happens inside the for loop
> immediately after which you place the label.)
>

No reason at all ;)

--
SUSE Labs, Novell Inc.

2005-04-21 07:21:29

by Nick Piggin

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

On Thu, 2005-04-21 at 08:10 +0100, Anton Altaparmakov wrote:
> And one more thing...
>
> On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote:
> > On Thu, 21 Apr 2005, Nick Piggin wrote:
> > > ... I somehow didn't send it to Andrew last time.
> > >
> > > Fix a race where __block_prepare_write can leak out an in-flight
> > > read against a bh if get_block returns an error. This can lead to
> > > the page becoming unlocked while the buffer is locked and the read
> > > still in flight. __mpage_writepage BUGs on this condition.
> > [snip]
> > > --- linux-2.6.orig/fs/buffer.c 2005-04-21 11:55:17.549614278
> > +1000
> > > +++ linux-2.6/fs/buffer.c 2005-04-21 15:55:41.483826075 +1000
> > > @@ -1988,6 +1988,7 @@
> > > *wait_bh++=bh;
> > > }
> > > }
> > > +out:
> > > /*
> > > * If we issued read requests - let them complete.
> > > */
> > > @@ -1996,8 +1997,9 @@
> > > if (!buffer_uptodate(*wait_bh))
> > > return -EIO;
>
> This return is now wrong after your patch. It should be "err = -EIO;"
> otherwise you do not zero newly allocated blocks and thus risk exposing
> stale data on buffer i/o errors.
>

Hmm yeah I should have been more careful. But isn't that another bug? I
mean, wasn't that wrong *before* my patch as well?

It was, right? Because not only might it return without having waited
for all in-flight buffers, but it also didn't zero the blocks on errors?

Attachments:

__block_prepare_write-bug.patch (1.32 kB)

2005-04-21 08:43:57

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

On Thu, 2005-04-21 at 17:20 +1000, Nick Piggin wrote:
> On Thu, 2005-04-21 at 08:10 +0100, Anton Altaparmakov wrote:
> > And one more thing...
> >
> > On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote:
> > > On Thu, 21 Apr 2005, Nick Piggin wrote:
> > > > ... I somehow didn't send it to Andrew last time.
> > > >
> > > > Fix a race where __block_prepare_write can leak out an in-flight
> > > > read against a bh if get_block returns an error. This can lead to
> > > > the page becoming unlocked while the buffer is locked and the read
> > > > still in flight. __mpage_writepage BUGs on this condition.
> > > [snip]
> > > > --- linux-2.6.orig/fs/buffer.c 2005-04-21 11:55:17.549614278
> > > +1000
> > > > +++ linux-2.6/fs/buffer.c 2005-04-21 15:55:41.483826075 +1000
> > > > @@ -1988,6 +1988,7 @@
> > > > *wait_bh++=bh;
> > > > }
> > > > }
> > > > +out:
> > > > /*
> > > > * If we issued read requests - let them complete.
> > > > */
> > > > @@ -1996,8 +1997,9 @@
> > > > if (!buffer_uptodate(*wait_bh))
> > > > return -EIO;
> >
> > This return is now wrong after your patch. It should be "err = -EIO;"
> > otherwise you do not zero newly allocated blocks and thus risk exposing
> > stale data on buffer i/o errors.
> >
>
> Hmm yeah I should have been more careful. But isn't that another bug? I
> mean, wasn't that wrong *before* my patch as well?
>
> It was, right? Because not only might it return without having waited
> for all in-flight buffers, but it also didn't zero the blocks on errors?

I agree with you. It was a bug. There are a lot more bugs in the
generic write code paths. I have been analysing the code quite
thoroughly because I am reimplementing it in NTFS and am shocked that a
number of bugs in the generic file write code paths have gone unnoticed
for ages (I guess since they only affect seldom traversed code paths).
When I have the time I will be cooking up patches but it might be a
while. And perhaps someone else will fix them before I get to them so
here are a couple of examples off the top of my head...

mm/filemap.c::file_buffered_write():

- It calls fault_in_pages_readable() which is completely bogus if
@nr_segs > 1. It needs to be replaced by a to be written
"fault_in_pages_readable_iovec()".

- It increments @buf even in the iovec case thus @buf can point to
random memory really quickly (in the iovec case) and then it calls
fault_in_pages_readable() on this random memory. Ouch...

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-04-22 15:16:45

by Nikita Danilov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

Anton Altaparmakov writes:

[...]

>
> mm/filemap.c::file_buffered_write():
>
> - It calls fault_in_pages_readable() which is completely bogus if
> @nr_segs > 1. It needs to be replaced by a to be written
> "fault_in_pages_readable_iovec()".

Which will be only marginally less bogus, because page(s) can be evicted
from the memory between fault_in_pages_readable*() and
__grab_cache_page() anyway.

[...]

> Best regards,
>
> Anton

Nikita.

2005-04-22 15:53:50

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

On Fri, 2005-04-22 at 19:10 +0400, Nikita Danilov wrote:
> Anton Altaparmakov writes:
> > mm/filemap.c::file_buffered_write():
> >
> > - It calls fault_in_pages_readable() which is completely bogus if
> > @nr_segs > 1. It needs to be replaced by a to be written
> > "fault_in_pages_readable_iovec()".
>
> Which will be only marginally less bogus, because page(s) can be evicted
> from the memory between fault_in_pages_readable*() and
> __grab_cache_page() anyway.

That is true. But it does make the race condition smaller.

A better approach would be to lock the pages into memory via set page
reserved or something. Of course they will need unmarking straight
after and we would need to be careful to not unmark pages that were
marked reserved to start with.

Comments?

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/

2005-04-24 21:25:19

by Andrew Morton

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

Anton Altaparmakov <[email protected]> wrote:
>
> mm/filemap.c::file_buffered_write():
>
> - It calls fault_in_pages_readable() which is completely bogus if
> @nr_segs > 1. It needs to be replaced by a to be written
> "fault_in_pages_readable_iovec()".
>
> - It increments @buf even in the iovec case thus @buf can point to
> random memory really quickly (in the iovec case) and then it calls
> fault_in_pages_readable() on this random memory. Ouch...

hmm, yes. Like this?

diff -puN mm/filemap.c~generic_file_buffered_write-fixes mm/filemap.c
--- 25/mm/filemap.c~generic_file_buffered_write-fixes 2005-04-24 14:18:58.445943000 -0700
+++ 25-akpm/mm/filemap.c 2005-04-24 14:20:21.995241576 -0700
@@ -1944,7 +1944,7 @@ generic_file_buffered_write(struct kiocb
buf = iov->iov_base + written;
else {
filemap_set_next_iovec(&cur_iov, &iov_base, written);
- buf = iov->iov_base + iov_base;
+ buf = cur_iov->iov_base + iov_base;
}

do {
@@ -2002,9 +2002,11 @@ generic_file_buffered_write(struct kiocb
count -= status;
pos += status;
buf += status;
- if (unlikely(nr_segs > 1))
+ if (unlikely(nr_segs > 1)) {
filemap_set_next_iovec(&cur_iov,
&iov_base, status);
+ buf = cur_iov->iov_base + iov_base;
+ }
}
}
if (unlikely(copied != bytes))
_

2005-04-25 07:45:46

by Anton Altaparmakov

[permalink] [raw]

Subject: Re: [patch] fix race in __block_prepare_write (again)

On Sun, 2005-04-24 at 14:24 -0700, Andrew Morton wrote:
> Anton Altaparmakov <[email protected]> wrote:
> >
> > mm/filemap.c::file_buffered_write():
> >
> > - It calls fault_in_pages_readable() which is completely bogus if
> > @nr_segs > 1. It needs to be replaced by a to be written
> > "fault_in_pages_readable_iovec()".
> >
> > - It increments @buf even in the iovec case thus @buf can point to
> > random memory really quickly (in the iovec case) and then it calls
> > fault_in_pages_readable() on this random memory. Ouch...
>
> hmm, yes. Like this?

Yes, certainly a big improvement over what is there at the moment.

> diff -puN mm/filemap.c~generic_file_buffered_write-fixes mm/filemap.c
> --- 25/mm/filemap.c~generic_file_buffered_write-fixes 2005-04-24 14:18:58.445943000 -0700
> +++ 25-akpm/mm/filemap.c 2005-04-24 14:20:21.995241576 -0700
> @@ -1944,7 +1944,7 @@ generic_file_buffered_write(struct kiocb
> buf = iov->iov_base + written;
> else {
> filemap_set_next_iovec(&cur_iov, &iov_base, written);
> - buf = iov->iov_base + iov_base;
> + buf = cur_iov->iov_base + iov_base;
> }
>
> do {
> @@ -2002,9 +2002,11 @@ generic_file_buffered_write(struct kiocb
> count -= status;
> pos += status;
> buf += status;
> - if (unlikely(nr_segs > 1))
> + if (unlikely(nr_segs > 1)) {
> filemap_set_next_iovec(&cur_iov,
> &iov_base, status);
> + buf = cur_iov->iov_base + iov_base;
> + }
> }
> }
> if (unlikely(copied != bytes))

Best regards,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/