2011-02-26 20:40:20

by Amir Goldstein

[permalink] [raw]
Subject: [PATCH] ext3: skip orphan cleanup on rocompat fs

Orphan cleanup is currently executed even on readonly mount.
It deletes inodes and frees blocks, which could be very bad for some
RO_COMPAT features, HAS_SNAPSHOT to name one.

Now the damage is done, because current stock kernels may
corrupt future fs with readonly compatible features,
when doing orphan cleanup.

This patch skips the orphan cleanup if readonly compatible features
would prevent the fs from being mounted (or remounted) readwrite.

Signed-off-by: Amir Goldstein <[email protected]>
---
fs/ext3/super.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index 85c8cc8..6cdd575 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1464,6 +1464,13 @@ static void ext3_orphan_cleanup (struct super_block * sb,
return;
}

+ /* Check if feature set allows readwrite operations */
+ if (EXT3_HAS_RO_COMPAT_FEATURE(sb, ~EXT3_FEATURE_RO_COMPAT_SUPP)) {
+ ext3_msg(sb, KERN_INFO, "Skipping orphan cleanup on readonly-"
+ "compatible fs");
+ return;
+ }
+
if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
if (es->s_last_orphan)
jbd_debug(1, "Errors on filesystem, "
--
1.7.0.4


2011-02-28 10:14:56

by Rogier Wolff

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs


On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote:
> This patch skips the orphan cleanup if readonly compatible features
> would prevent the fs from being mounted (or remounted) readwrite.

I use the "mount readonly" option to, for instance, view/check the
filesystem to determine wether or not I need to fsck first. I use the
"readonly" feature to prevent the mounting to be a mistake-prone
situation. It prevents e.g. applications from dropping temporary files
in my current directory.

Every time fsck or such a cleanup does something, there is the option
of the cleanup or fixup being wrong. When you honour the "readonly"
request from the user, the careful user can go back to the situation
where he/she started.

If the cleanup/fixup is really neccesary, do so in in-core buffers of
the filesystem. Write the infrastructure that allows us to have dirty
buffers that MAY NOT (yet?!?) be written to the device. This will also
solve the problem of journal recovery on readonly mount of a root
filesystem. when it has been fscked, and it's remounted rw, we can
remove the ban on the writeback of the dirty buffers.

So I stronly disagree with your patch: It should not only prevent the
cleanup when writing is not allows due to ro-compat situation, but
also when requested by the user.


Roger.

------------------

Back in the old days I was still using minix. Linux didn't exist or
wasn't usable enough. I had something that needed removing, so I typed
rm -rf *, thinking I was in the directory that needed removing. I
wasn't! There went my (modified) kernel tree! It took me some three
seconds to find, verify and execute the solution: Powerswitch.

In such an incident, cleaning up inodes that you think were deleted
anyway removes information about files that may need recovery. Imagine
that accidentally my big database file was unlinked. Ooops. But the
database server is still keeping the file open. Phew. We can continue
to run until we find a solution.... So the inode is now orphaned. But
we can recover it with some filesystem magic. Maybe not by answering
yes to fsck questions, but it is recoverable without dataloss, right?

Then the power goes out... Ooops. Instead of two more days to get
everything ready for the recovery we have to do it NOW. There goes.
Boot from CD, let's just mount the partition readonly to get access to
our tools and binaries that may facilitate the recovery of our
database. BAM! Away goes the nicely allocated inode. (ext2 used to
just mark the inode as not in use, ext3 cleans it up so that we lose
the pointers to the 7 datablocks, the indirect block and double
indirect block!) Now we will have to guess where the file started etc
etc.

The principle is: Do as I say. That keeps things predictable. If you
try to outguess the user, it will be horribly wrong every once in a
while. You're right. In 99% of the cases, the system just
crashed/rebooted while some temporary files were still open, but
already deleted. And in 99% of the cases, the system will boot,
perform an automatic fsck and eventually remount rw. So writing those
orphaned inodes back early doesn't make a real difference.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2011-02-28 13:10:26

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Mon, Feb 28, 2011 at 12:14 PM, Rogier Wolff <[email protected]> wrote:
>
> On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote:
>> This patch skips the orphan cleanup if readonly compatible features
>> would prevent the fs from being mounted (or remounted) readwrite.
>
> I use the "mount readonly" option to, for instance, view/check the
> filesystem to determine wether or not I need to fsck first. I use the
> "readonly" feature to prevent the mounting to be a mistake-prone
> situation. It prevents e.g. applications from dropping temporary files
> in my current directory.
>
> Every time fsck or such a cleanup does something, there is the option
> of the cleanup or fixup being wrong. When you honour the "readonly"
> request from the user, the careful user can go back to the situation
> where he/she started.
>
> If the cleanup/fixup is really neccesary, do so in in-core buffers of
> the filesystem. Write the infrastructure that allows us to have dirty
> buffers that MAY NOT (yet?!?) be written to the device. This will also
> solve the problem of journal recovery on readonly mount of a root
> filesystem. when it has been fscked, and it's remounted rw, we can
> remove the ban on the writeback of the dirty buffers.
>
> So I stronly disagree with your patch: It should not only prevent the
> cleanup when writing is not allows due to ro-compat situation, but
> also when requested by the user.
>

I am not disagreeing with your disagreement, but implementing what
you desire has more implications than my patch, so I rather that my
patch gets merged, because it is mostly-harmless and we can continue
the discussion, whether or not file systems need to honor the readonly
mount option to the word in parallel.

Amir.

2011-02-28 18:09:52

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Sat 26-02-11 22:40:19, Amir Goldstein wrote:
> Orphan cleanup is currently executed even on readonly mount.
> It deletes inodes and frees blocks, which could be very bad for some
> RO_COMPAT features, HAS_SNAPSHOT to name one.
>
> Now the damage is done, because current stock kernels may
> corrupt future fs with readonly compatible features,
> when doing orphan cleanup.
>
> This patch skips the orphan cleanup if readonly compatible features
> would prevent the fs from being mounted (or remounted) readwrite.
Thanks Amir. I actually took the version Ted committed to ext4 and ported
it for ext3. Anyway, the patch is in my tree.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-02-28 18:22:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Mon 28-02-11 11:14:55, Rogier Wolff wrote:
>
> On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote:
> > This patch skips the orphan cleanup if readonly compatible features
> > would prevent the fs from being mounted (or remounted) readwrite.
>
> I use the "mount readonly" option to, for instance, view/check the
> filesystem to determine wether or not I need to fsck first. I use the
> "readonly" feature to prevent the mounting to be a mistake-prone
> situation. It prevents e.g. applications from dropping temporary files
> in my current directory.
>
> Every time fsck or such a cleanup does something, there is the option
> of the cleanup or fixup being wrong. When you honour the "readonly"
> request from the user, the careful user can go back to the situation
> where he/she started.
>
> If the cleanup/fixup is really neccesary, do so in in-core buffers of
Mounting (even read-only) without recovering the journal will give you a
view of a corrupted filesystem. Usually not what you want (although I agree
with you that there are occasions where this *is* what you want).

> the filesystem. Write the infrastructure that allows us to have dirty
> buffers that MAY NOT (yet?!?) be written to the device. This will also
> solve the problem of journal recovery on readonly mount of a root
> filesystem. when it has been fscked, and it's remounted rw, we can
> remove the ban on the writeback of the dirty buffers.
Yes, this would be a nice feature but noone ever got to implementing it.
You are welcome to contribute it :).

> So I stronly disagree with your patch: It should not only prevent the
> cleanup when writing is not allows due to ro-compat situation, but
> also when requested by the user.
As Amir said, the patch is trivial and a clear improvement. So it goes
in.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-02-28 18:23:09

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Mon, Feb 28, 2011 at 11:14:55AM +0100, Rogier Wolff wrote:
> If the cleanup/fixup is really neccesary, do so in in-core buffers of
> the filesystem. Write the infrastructure that allows us to have dirty
> buffers that MAY NOT (yet?!?) be written to the device. This will also
> solve the problem of journal recovery on readonly mount of a root
> filesystem. when it has been fscked, and it's remounted rw, we can
> remove the ban on the writeback of the dirty buffers.

That would be an interesting and useful thing to add, but it's not a
trivial change. There is the risk that if the journal is very large,
and system memory is very, that there might not be enough memory to
hold all of the dirty buffers (or it might handicap the machine).
That shouldn't be an issue on reasonably configured machines, but as
we know, not all file servers are reasonably configured (see previous
discussions about tdb :-)

If someone want to work on this as a project, that would be great, but
to be honest it's not high on my priority list at the moment.

- Ted

2011-02-28 18:49:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Mon, Feb 28, 2011 at 07:22:01PM +0100, Jan Kara wrote:
> As Amir said, the patch is trivial and a clear improvement. So it goes
> in.

You might want to look at the improvements I made to the ext4 version
of the patch, in terms of a slightly better error message and commit
description, and make similar changes to the ext3 version of the
patch.

See: http://article.gmane.org/gmane.comp.file-systems.ext4/23620

- Ted

2011-02-28 19:05:15

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On 2/28/11 12:22 PM, Jan Kara wrote:
> On Mon 28-02-11 11:14:55, Rogier Wolff wrote:
>>
>> On Sat, Feb 26, 2011 at 10:40:19PM +0200, Amir Goldstein wrote:
>>> This patch skips the orphan cleanup if readonly compatible features
>>> would prevent the fs from being mounted (or remounted) readwrite.
>>
>> I use the "mount readonly" option to, for instance, view/check the
>> filesystem to determine wether or not I need to fsck first. I use the
>> "readonly" feature to prevent the mounting to be a mistake-prone
>> situation. It prevents e.g. applications from dropping temporary files
>> in my current directory.
>>
>> Every time fsck or such a cleanup does something, there is the option
>> of the cleanup or fixup being wrong. When you honour the "readonly"
>> request from the user, the careful user can go back to the situation
>> where he/she started.
>>
>> If the cleanup/fixup is really neccesary, do so in in-core buffers of
> Mounting (even read-only) without recovering the journal will give you a
> view of a corrupted filesystem. Usually not what you want (although I agree
> with you that there are occasions where this *is* what you want).

Also, you can tell ext3/4 -not- to recover the journal, or, you can mark the
device RO before mount. So while not the default behavior, it is at least
possible.

-Eric

2011-02-28 19:32:40

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

On Mon 28-02-11 13:49:29, Ted Ts'o wrote:
> On Mon, Feb 28, 2011 at 07:22:01PM +0100, Jan Kara wrote:
> > As Amir said, the patch is trivial and a clear improvement. So it goes
> > in.
>
> You might want to look at the improvements I made to the ext4 version
> of the patch, in terms of a slightly better error message and commit
> description, and make similar changes to the ext3 version of the
> patch.
>
> See: http://article.gmane.org/gmane.comp.file-systems.ext4/23620
Yes, I took your version and ported it to ext3 in the end :).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-03-24 10:34:06

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH] ext3: skip orphan cleanup on rocompat fs

Hi,

Please apply commit ce654b37 to any stable kernel out there.

The relevant ext4 patch is in Ted's tree and has not reached mainline yet.
Ted, can you please add CC: stable to my patch.

Thanks,
Amir.

On Mon, Feb 28, 2011 at 8:09 PM, Jan Kara <[email protected]> wrote:
> On Sat 26-02-11 22:40:19, Amir Goldstein wrote:
>> Orphan cleanup is currently executed even on readonly mount.
>> It deletes inodes and frees blocks, which could be very bad for some
>> RO_COMPAT features, HAS_SNAPSHOT to name one.
>>
>> Now the damage is done, because current stock kernels may
>> corrupt future fs with readonly compatible features,
>> when doing orphan cleanup.
>>
>> This patch skips the orphan cleanup if readonly compatible features
>> would prevent the fs from being mounted (or remounted) readwrite.
> ?Thanks Amir. I actually took the version Ted committed to ext4 and ported
> it for ext3. Anyway, the patch is in my tree.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>

2011-03-24 16:07:52

by Greg KH

[permalink] [raw]
Subject: Re: [stable] [PATCH] ext3: skip orphan cleanup on rocompat fs

On Thu, Mar 24, 2011 at 12:34:05PM +0200, Amir Goldstein wrote:
> Hi,
>
> Please apply commit ce654b37 to any stable kernel out there.

Now applied, thanks.

greg k-h