LinuxLists.cc - [PATCH] SMP race in ext2

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 11:45:47AM -0400, Alexander Viro wrote:
> Ext2 does getblk+wait_on_buffer for new metadata blocks before
> filling them with zeroes. While that is enough for single-processor,
> on SMP we have the following race:
>
> getblk gives us unlocked, non-uptodate bh
> wait_on_buffer() does nothing
> read from device locks it and starts IO
> we zero it out.
> on-disk data overwrites our zeroes.
> we mark it dirty
> bdflush writes the old data (_not_ zeroes) back to disk.
>
> Result: crap in metadata block. Proposed fix: lock_buffer()/unlock_buffer()
> around memset()/mark_buffer_uptodate() instead of wait_on_buffer() before
> them.
>
> Patch against 2.4.4-pre7 follows. Please, apply.

correct. I bet other fs are affected as well btw.

Andrea

2001-04-26 18:26:06

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Andrea Arcangeli wrote:

> correct. I bet other fs are affected as well btw.

If only... block_read() vs. block_write() has the same race. I'm going
through the list of all wait_on_buffer() users right now.

2001-04-26 18:52:32

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 11:45:47AM -0400, Alexander Viro wrote:
>
> Ext2 does getblk+wait_on_buffer for new metadata blocks before
> filling them with zeroes. While that is enough for single-processor,
> on SMP we have the following race:
>
> getblk gives us unlocked, non-uptodate bh
> wait_on_buffer() does nothing
> read from device locks it and starts IO

I see the race, but I don't see how you can actually trigger it.

Exactly _who_ does the "read from device" part? Somebody doing a
"fsck" while the filesystem is mounted read-write and actively written
to? Yeah, you'd get disk corruption that way, but you'll get it regardless
of this bug.

There's nothing else that should be using that block at that stage. And if
there were, that would be a bug in itself, as far as I can tell. We've
just allocated it, and we're the only and exclusive owners of that block
on the disk. Anybody else who touches it is seriously broken.

Now, I don't disagree with your patch (it's just obviously cleaner to lock
it properly), but I don't think this is a real bug. I suspect that even
the wait-on-buffer is not strictly necessary: it's probably there to make
sure old write-backs have completed, but that doesn't really matter
either.

We used to have "breada()" do physical read-ahead that could have
triggered this, but we've long since gotten rid of that.

Or am I overlooking something?

Linus

2001-04-26 19:06:13

by Chris Mason

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thursday, April 26, 2001 02:24:26 PM -0400 Alexander Viro
<[email protected]> wrote:

>
>
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
>
>> correct. I bet other fs are affected as well btw.
>
> If only... block_read() vs. block_write() has the same race. I'm going
> through the list of all wait_on_buffer() users right now.
>

Looks like reiserfs has it too when allocating tree blocks, but it should
be harder to hit. The fix should be small but it will take me a bit to
make sure it doesn't affect the rest of the balancing code.

-chris

2001-04-26 19:10:03

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Linus Torvalds wrote:

> I see the race, but I don't see how you can actually trigger it.
>
> Exactly _who_ does the "read from device" part? Somebody doing a
> "fsck" while the filesystem is mounted read-write and actively written
> to? Yeah, you'd get disk corruption that way, but you'll get it regardless
> of this bug.

> There's nothing else that should be using that block at that stage. And if
> there were, that would be a bug in itself, as far as I can tell. We've
> just allocated it, and we're the only and exclusive owners of that block
> on the disk. Anybody else who touches it is seriously broken.

> Now, I don't disagree with your patch (it's just obviously cleaner to lock
> it properly), but I don't think this is a real bug. I suspect that even
> the wait-on-buffer is not strictly necessary: it's probably there to make
> sure old write-backs have completed, but that doesn't really matter
> either.
>
> We used to have "breada()" do physical read-ahead that could have
> triggered this, but we've long since gotten rid of that.
>
> Or am I overlooking something?

Somebody doing dd(1) _from_ that disk. Sure, he's bound to get crap.
But I really don't think that opening device for read should be able
to affect its contents in any way.

BTW, same race exists between block_read() and block_write(). And that
one is even more obviously wrong:

xterm A: xterm B:

dd if=/dev/hda of=/dev/hdb dd if=/dev/hdb of=/dev/null

result: some blocks on hdb retaining their old contents.

IMO "no matter what you read, you don't affect the contents" is a good
general principle. Sure, you can get crap if you read in the middle of
write. That's expected and sane. However, the final contents of file
depends only on the things done by writers.

Al

2001-04-26 19:19:44

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, I wrote:

> On Thu, 26 Apr 2001, Linus Torvalds wrote:
>
> > I see the race, but I don't see how you can actually trigger it.
> >
> > Exactly _who_ does the "read from device" part? Somebody doing a
> > "fsck" while the filesystem is mounted read-write and actively written
> > to? Yeah, you'd get disk corruption that way, but you'll get it regardless
> > of this bug.

OK, I think I've a better explanation now:

Suppose /dev/hda1 is owned by root.disks and permissions are 640.
It is mounted read-write.

Process foo belongs to pfy.staff. PFY is included into disks, but doesn't
have root. I claim that he should be unable to cause fs corruption on
/dev/hda1.

Currently foo _can_ cause such corruption, even though it has nothing
resembling write permissions for device in question.

IMO it is wrong. I'm not saying that it's a real security problem. I'm
not saying that PFY is not idiot or that his actions make any sense.
However, I think that situation when he can do that without write
access to device is just plain wrong.

Does the above make sense?
Al

2001-04-26 19:22:26

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 11:49:14AM -0700, Linus Torvalds wrote:
>
> On Thu, Apr 26, 2001 at 11:45:47AM -0400, Alexander Viro wrote:
> >
> > Ext2 does getblk+wait_on_buffer for new metadata blocks before
> > filling them with zeroes. While that is enough for single-processor,
> > on SMP we have the following race:
> >
> > getblk gives us unlocked, non-uptodate bh
> > wait_on_buffer() does nothing
> > read from device locks it and starts IO
>
> I see the race, but I don't see how you can actually trigger it.
>
> Exactly _who_ does the "read from device" part? Somebody doing a

/sbin/dump

> the wait-on-buffer is not strictly necessary: it's probably there to make

maybe not but I need to check some more bit to be sure.

Andrea

2001-04-26 19:35:46

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Andrea Arcangeli wrote:

> > the wait-on-buffer is not strictly necessary: it's probably there to make
>
> maybe not but I need to check some more bit to be sure.

Same scenario, but with read-in-progress started before we do getblk(). BTW,
old writeback is harmless - we will overwrite anyway. And _that_ can happen
without direct access to device - truncate() doesn't terminate writeout of
the indirect blocks it frees (IMO it should, but that's another story).

2001-04-26 19:45:15

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 09:15:57PM +0200, Andrea Arcangeli wrote:
> maybe not but I need to check some more bit to be sure.

yes we probably don't need it for fs against fs in 2.4 because we make
the new metadata block visible to a reader (splice) only after they're
all uptodate and all directory operations are serialized by the vfs
(with the i_zombie).

Andrea

2001-04-26 19:51:06

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 03:34:00PM -0400, Alexander Viro wrote:
> Same scenario, but with read-in-progress started before we do getblk(). BTW,

how can the read in progress see a branch that we didn't spliced yet? We
clear and mark uptodate the new part of the branch before it's visible
to any reader no? then in splice we write the key into the where->p and
the branch become visible to the readers but by that time the reader
won't start I/O because the buffer are just uptodate. I only had a short
look now and to verify Ingo's fix, so maybe I overlooked something.

> without direct access to device - truncate() doesn't terminate writeout of
> the indirect blocks it frees (IMO it should, but that's another story).

If the block is under I/O or dirty that's another story, the only issue
here is when the buffer block is new and not uptodate.

Andrea

2001-04-26 19:57:16

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Andrea Arcangeli wrote:

> On Thu, Apr 26, 2001 at 03:34:00PM -0400, Alexander Viro wrote:
> > Same scenario, but with read-in-progress started before we do getblk(). BTW,
>
> how can the read in progress see a branch that we didn't spliced yet? We

fd = open("/dev/hda1", O_RDONLY);
read(fd, buf, sizeof(buf));

2001-04-26 20:11:55

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Alexander Viro wrote:
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
> >
> > how can the read in progress see a branch that we didn't spliced yet? We
>
> fd = open("/dev/hda1", O_RDONLY);
> read(fd, buf, sizeof(buf));

Note that I think all these arguments are fairly bogus. Doing things like
"dump" on a live filesystem is stupid and dangerous (in my opinion it is
stupid and dangerous to use "dump" at _all_, but that's a whole 'nother
discussion in itself), and there really are no valid uses for opening a
block device that is already mounted. More importantly, I don't think
anybody actually does.

The fact that you _can_ do so makes the patch valid, and I do agree with
Al on the "least surprise" issue. I've already applied the patch, in fact.
But the fact is that nobody should ever do the thing that could cause
problems.

Linus

2001-04-26 20:12:45

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 03:17:54PM -0400, Alexander Viro wrote:
>
>
> On Thu, 26 Apr 2001, I wrote:
>
> > On Thu, 26 Apr 2001, Linus Torvalds wrote:
> >
> > > I see the race, but I don't see how you can actually trigger it.
> > >
> > > Exactly _who_ does the "read from device" part? Somebody doing a
> > > "fsck" while the filesystem is mounted read-write and actively written
> > > to? Yeah, you'd get disk corruption that way, but you'll get it regardless
> > > of this bug.
>
> OK, I think I've a better explanation now:
>
> Suppose /dev/hda1 is owned by root.disks and permissions are 640.
> It is mounted read-write.
>
> Process foo belongs to pfy.staff. PFY is included into disks, but doesn't
> have root. I claim that he should be unable to cause fs corruption on
> /dev/hda1.
>
> Currently foo _can_ cause such corruption, even though it has nothing
> resembling write permissions for device in question.
>
> IMO it is wrong. I'm not saying that it's a real security problem. I'm
> not saying that PFY is not idiot or that his actions make any sense.
> However, I think that situation when he can do that without write
> access to device is just plain wrong.
>
> Does the above make sense?

Sure. And as said `dump` has the same issues.

Andrea

2001-04-26 20:44:47

by Richard B. Johnson

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Alexander Viro wrote:

>
>
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
>
> > > the wait-on-buffer is not strictly necessary: it's probably there to make
> >
> > maybe not but I need to check some more bit to be sure.
>
> Same scenario, but with read-in-progress started before we do getblk(). BTW,
> old writeback is harmless - we will overwrite anyway. And _that_ can happen
> without direct access to device - truncate() doesn't terminate writeout of
> the indirect blocks it frees (IMO it should, but that's another story).
>

This seems to be the problem reported about a year ago, but never fixed.
It exists, even in early kernels.

mke2fs /dev/fd0
mount /dev/fd0 /mnt
cp stuff /mnt

lilo -C - <<EOF
boot = /dev/fd0
map = /mnt/map
backup = /dev/null
install=/mnt/boot.b
image=/mnt/vmlinuz
initrd=/mnt/initrd
root=/dev/ram0
EOF

umount /dev/fd0
cp /dev/fd0 raw.bin

The disk image, raw.bin, does NOT contain the image of the floppy.
Most of boot stuff added by lilo is missing. It will eventually
get there, but it's not there now, even though the floppy was
un-mounted!

A work-around was to do:

ioctl(fd, FDFLUSH, NULL);

... from a program before copying the image.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.

2001-04-26 20:44:07

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
>
> What I'm saying above is that even without the wait_on_buffer ext2 can
> screwup itself because the splice happens after the buffer are just all
> uptodate so any "reader" (I mean any reader through ext2 not through
> block_dev) will never try to do a bread on that blocks before they're
> just zeroed and uptodate.

I assume you meant "..can _not_ screw up itself..", otherwise the rest of
the sentence doesn't seem to make much sense.

Linus

2001-04-26 20:51:17

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Linus Torvalds wrote:

> Note that I think all these arguments are fairly bogus. Doing things like
> "dump" on a live filesystem is stupid and dangerous (in my opinion it is
> stupid and dangerous to use "dump" at _all_, but that's a whole 'nother
> discussion in itself), and there really are no valid uses for opening a
> block device that is already mounted. More importantly, I don't think
> anybody actually does.

Agreed.

> The fact that you _can_ do so makes the patch valid, and I do agree with
> Al on the "least surprise" issue. I've already applied the patch, in fact.
> But the fact is that nobody should ever do the thing that could cause
> problems.

IMO the real issue is in fuzzy rules for use of wait_on_buffer(). There is
one case when it's 100% correct - when we had done ll_rw_block() on that
bh at earlier point and want to make sure that it's completed. And there
is a lot of uses that are kinda-sorta correct for UP, but break on SMP.
unmap_buffer() was similar to that race. So are races in minix, sysvfs and
ufs. So is one in block_write() and here the problem is quite real - it's
not as idiotic as device/mounted fs races.

Basically, all legitimate cases are ones where we would be very unhappy
about buffer being not uptodate afterwards.
getblk(); if (!buffer_uptodate) wait_on_buffer();
is not in that class. It _is_ OK on UP as long as we don't block, but on
SMP it doesn't guarantee anything - buffer_head can be in any state
when we are done. IMO all such places require fixing.

How about adding
if (!buffer_uptodate(bh)) {
printk(KERN_ERR "IO error or racy use of wait_on_buffer()");
show_task(current);
}
in the end of wait_on_buffer() for a while?
Al

2001-04-26 20:57:17

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Richard B. Johnson wrote:

> The disk image, raw.bin, does NOT contain the image of the floppy.
> Most of boot stuff added by lilo is missing. It will eventually
> get there, but it's not there now, even though the floppy was
> un-mounted!

I doubt that you can reproduce that on anything remotely recent.
All buffers are flushed when last user closes device.

2001-04-26 21:01:48

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 01:08:25PM -0700, Linus Torvalds wrote:
> But the fact is that nobody should ever do the thing that could cause
> problems.

dump in 2.4 also gets uncoherent view of the data which make things even
worse than in 2.2 (to change that we should hash in the buffer hashtable
all the bh overlapped in the pagecache and no I'm not suggesting that
relax). The only reason it has a chance to work with ext2 is because
ext2 is very dumb and it misses an inode map and in turn inodes are at a
predictable location on disk so it cannot run totally out of control.

Andrea

2001-04-26 21:01:48

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 03:55:19PM -0400, Alexander Viro wrote:
>
>
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
>
> > On Thu, Apr 26, 2001 at 03:34:00PM -0400, Alexander Viro wrote:
> > > Same scenario, but with read-in-progress started before we do getblk(). BTW,
> >
> > how can the read in progress see a branch that we didn't spliced yet? We
>
> fd = open("/dev/hda1", O_RDONLY);
> read(fd, buf, sizeof(buf));

You misunderstood the context of what I said, I perfectly know the race
you are talking about, I was answering Linus's question "the
wait_on_buffer isn't even necessary to protect ext2 against ext2". You
are talking about the other race that is "ext2" against "block_dev", and
I obviously agree on that one since the first place as I immediatly
answered you "correct".

What I'm saying above is that even without the wait_on_buffer ext2 can
screwup itself because the splice happens after the buffer are just all
uptodate so any "reader" (I mean any reader through ext2 not through
block_dev) will never try to do a bread on that blocks before they're
just zeroed and uptodate.

Andrea

2001-04-26 21:01:44

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 10:11:09PM +0200, Andrea Arcangeli wrote:
> On Thu, Apr 26, 2001 at 03:55:19PM -0400, Alexander Viro wrote:
> >
> >
> > On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
> >
> > > On Thu, Apr 26, 2001 at 03:34:00PM -0400, Alexander Viro wrote:
> > > > Same scenario, but with read-in-progress started before we do getblk(). BTW,
> > >
> > > how can the read in progress see a branch that we didn't spliced yet? We
> >
> > fd = open("/dev/hda1", O_RDONLY);
> > read(fd, buf, sizeof(buf));
>
> You misunderstood the context of what I said, I perfectly know the race
> you are talking about, I was answering Linus's question "the
> wait_on_buffer isn't even necessary to protect ext2 against ext2". You
> are talking about the other race that is "ext2" against "block_dev", and
> I obviously agree on that one since the first place as I immediatly
> answered you "correct".
>
> What I'm saying above is that even without the wait_on_buffer ext2 can
^^^ "cannot" of course
> screwup itself because the splice happens after the buffer are just all
> uptodate so any "reader" (I mean any reader through ext2 not through
> block_dev) will never try to do a bread on that blocks before they're
> just zeroed and uptodate.
>
> Andrea

Andrea

2001-04-26 21:13:37

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 04:49:20PM -0400, Alexander Viro wrote:
> getblk(); if (!buffer_uptodate) wait_on_buffer();
> is not in that class. It _is_ OK on UP as long as we don't block, but on
> SMP it doesn't guarantee anything - buffer_head can be in any state
> when we are done. IMO all such places require fixing.

Yes, actually it's probably ok for most of other "fs" against "fs" cases
because those fses still hold the big lock while handling metadata but
they should really use the lock_buffer way if they want to protect
against the block_dev accesses too.

> How about adding
> if (!buffer_uptodate(bh)) {
> printk(KERN_ERR "IO error or racy use of wait_on_buffer()");
> show_task(current);
> }
> in the end of wait_on_buffer() for a while?

At the _top_ of wait_on_buffer would be better then at the end.

Andrea

2001-04-26 21:16:42

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 01:26:15PM -0700, Linus Torvalds wrote:
>
>
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
> >
> > What I'm saying above is that even without the wait_on_buffer ext2 can
> > screwup itself because the splice happens after the buffer are just all
> > uptodate so any "reader" (I mean any reader through ext2 not through
> > block_dev) will never try to do a bread on that blocks before they're
> > just zeroed and uptodate.
>
> I assume you meant "..can _not_ screw up itself..", otherwise the rest of

yes, it was a typo sorry.

Andrea

2001-04-26 21:23:17

by Andrzej Krzysztofowicz

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

>
> Note that I think all these arguments are fairly bogus. Doing things like
> "dump" on a live filesystem is stupid and dangerous (in my opinion it is
> stupid and dangerous to use "dump" at _all_, but that's a whole 'nother
> discussion in itself), and there really are no valid uses for opening a
> block device that is already mounted. More importantly, I don't think
> anybody actually does.

I know a few people that often do:

dd if=/dev/hda1 of=/dev/hdc1
e2fsck /dev/hdc1

to make an "exact" copy of a currently working system.

Maybe it is stupid, but they do.
Fortunately, their systems are not SMP...

Andrzej

2001-04-26 23:27:15

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Andrea Arcangeli wrote:

> > How about adding
> > if (!buffer_uptodate(bh)) {
> > printk(KERN_ERR "IO error or racy use of wait_on_buffer()");
> > show_task(current);
> > }
> > in the end of wait_on_buffer() for a while?
>
> At the _top_ of wait_on_buffer would be better then at the end.

In that case ll_rw_block() + wait_on_buffer() (absolutely legitimate
combination) will scream at you.

2001-04-27 00:10:44

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 07:25:23PM -0400, Alexander Viro wrote:
>
>
> On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
>
> > > How about adding
> > > if (!buffer_uptodate(bh)) {
> > > printk(KERN_ERR "IO error or racy use of wait_on_buffer()");
> > > show_task(current);
> > > }
> > > in the end of wait_on_buffer() for a while?
> >
> > At the _top_ of wait_on_buffer would be better then at the end.
>
> In that case ll_rw_block() + wait_on_buffer() (absolutely legitimate
> combination) will scream at you.

--- 2.4.4pre7/include/linux/locks.h Thu Apr 26 05:22:11 2001
+++ 2.4.4pre7aa1/include/linux/locks.h Fri Apr 27 01:52:31 2001
@@ -18,6 +18,11 @@
{
if (test_bit(BH_Lock, &bh->b_state))
__wait_on_buffer(bh);
+ else if (!buffer_uptodate(bh)) {
+ __label__ here;
+ here:
+ printk(KERN_ERR "IO error or racy use of wait_on_buffer() from %p\n", &&here);
+ }
}

extern inline void lock_buffer(struct buffer_head * bh)

Andrea

2001-04-27 00:23:04

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Fri, 27 Apr 2001, Andrea Arcangeli wrote:
> + __label__ here;
> + here:
> + printk(KERN_ERR "IO error or racy use of wait_on_buffer() from %p\n", &&here);

Detail nit: don't do this. Use "current_text_addr()" instead. Simpler to
read, and gcc will actually do the right thing wrt inlining etc.

Linus

2001-04-27 00:42:06

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 05:19:26PM -0700, Linus Torvalds wrote:
> Detail nit: don't do this. Use "current_text_addr()" instead. Simpler to
> read, and gcc will actually do the right thing wrt inlining etc.

Agreed, thanks for the info.

Andrea

2001-04-27 11:45:22

by Richard B. Johnson

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, 26 Apr 2001, Alexander Viro wrote:

>
>
> On Thu, 26 Apr 2001, Richard B. Johnson wrote:
>
> > The disk image, raw.bin, does NOT contain the image of the floppy.
> > Most of boot stuff added by lilo is missing. It will eventually
> > get there, but it's not there now, even though the floppy was
> > un-mounted!
>
> I doubt that you can reproduce that on anything remotely recent.
> All buffers are flushed when last user closes device.
>
2.4.3

Buffers are not flushed (actually written) to disk. The floppy continues
to be written for 20 to 30 seconds after `umount` returns to
the shell. A program like `cp` , accessing the raw device during this time
does not get what will eventually be written.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.

2001-04-27 13:05:35

by Vojtech Pavlik

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 01:08:25PM -0700, Linus Torvalds wrote:

> On Thu, 26 Apr 2001, Alexander Viro wrote:
> > On Thu, 26 Apr 2001, Andrea Arcangeli wrote:
> > >
> > > how can the read in progress see a branch that we didn't spliced yet? We
> >
> > fd = open("/dev/hda1", O_RDONLY);
> > read(fd, buf, sizeof(buf));
>
> Note that I think all these arguments are fairly bogus. Doing things like
> "dump" on a live filesystem is stupid and dangerous (in my opinion it is
> stupid and dangerous to use "dump" at _all_, but that's a whole 'nother
> discussion in itself), and there really are no valid uses for opening a
> block device that is already mounted. More importantly, I don't think
> anybody actually does.

Actually this is done quite often, even on mounted fs's:

hdparm -t /dev/hda

> The fact that you _can_ do so makes the patch valid, and I do agree with
> Al on the "least surprise" issue. I've already applied the patch, in fact.
> But the fact is that nobody should ever do the thing that could cause
> problems.

--
Vojtech Pavlik
SuSE Labs

2001-04-27 13:25:46

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Fri, 27 Apr 2001, Vojtech Pavlik wrote:

> Actually this is done quite often, even on mounted fs's:
>
> hdparm -t /dev/hda

You would need either hdparm -t /dev/hda<something> or mounting the
whole /dev/hda.

Buffer cache for the disk is unrelated to buffer cache for parititions.

2001-04-27 14:30:36

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Thu, Apr 26, 2001 at 01:08:25PM -0700, Linus Torvalds wrote:
> Note that I think all these arguments are fairly bogus. Doing things like
> "dump" on a live filesystem is stupid and dangerous (in my opinion it is
> stupid and dangerous to use "dump" at _all_, but that's a whole 'nother
> discussion in itself), and there really are no valid uses for opening a
> block device that is already mounted. More importantly, I don't think
> anybody actually does.

You can use LVM snapshot volumes to do it safely.

-Andi

2001-04-27 14:33:16

by Ville Herva

[permalink] [raw]

Subject: Re: [PATCH] SMP race in ext2 - metadata corruption.

On Fri, Apr 27, 2001 at 09:23:57AM -0400, you [Alexander Viro] claimed:
>
>
> On Fri, 27 Apr 2001, Vojtech Pavlik wrote:
>
> > Actually this is done quite often, even on mounted fs's:
> >
> > hdparm -t /dev/hda
>
> You would need either hdparm -t /dev/hda<something> or mounting the
> whole /dev/hda.
>
> Buffer cache for the disk is unrelated to buffer cache for parititions.

Well, I for one have been running

hdparm -t /dev/md0
or
time head -c 1000m /dev/md0 > /dev/null

while /dev/md0 was mounted without realizing that this could be "stupid" or
that it could eat my data.

/dev/md0 on /backup-versioned type ext2 (rw)

I often cat(1) or head(1) partitions or devices (even mounted ones) if I
need dummy randomish test data for compression or tape drives (that I've
been having trouble with).

BTW: is 2.2 affected? 2.0?

-- v --

[email protected]

2001-04-27 16:53:10