in ext3_orphan_cleanup (same for ext4) we do:
if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
if (es->s_last_orphan)
jbd_debug(1, "Errors on filesystem, "
"clearing orphan list.\n");
es->s_last_orphan = 0;
jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
return;
}
I can sort of understand not processing the orphan inode list if the
fs is already known to be potentially corrupted, but actually clearing
the list seems to go too far. This means that a subsequent e2fsck
will find even more problems as a result of the orphan list not being
available.
It's been this way for a while though, so the original reason for the
behavior may be lost. Does anyone know?
I've been alerted to a somewhat odd behavior where a filesystem with
an orphan inode list *and* in error state behaves differently if:
1) e2fsck -p is done: e2fsck fixes things and exits happily
vs.
2) mount is done first, then e2fsck -p: due to the orphan inode
list being gone, enough errors are found that e2fsck exits with
UNEXPECTED INCONSISTENCY.
The 2nd case above has the tendency to halt the boot process, which
is unfortunate.
The situation might be improved by at least not clearing the orphan
inode list when the fs is mounted readonly. What do folks think?
Thanks,
-Eric
When we have a filesystem with an orphan inode list *and* in error
state, things behave differently if:
1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
happily (barring other significant problems)
vs.
2) mount is done first, then e2fsck -p: due to the orphan inode
list removal, more errors are found and e2fsck exits with
UNEXPECTED INCONSISTENCY.
The 2nd case above, on the root filesystem, has the tendency to halt
the boot process, which is unfortunate.
The situation can be improved by not clearing the orphan
inode list when the fs is mounted readonly.
Signed-off-by: Eric Sandeen <[email protected]>
---
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2d51cd9..2e1ea01 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2165,10 +2165,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
}
if (EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS) {
- if (es->s_last_orphan)
+ /* don't clear list on RO mount w/ errors */
+ if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
jbd_debug(1, "Errors on filesystem, "
"clearing orphan list.\n");
- es->s_last_orphan = 0;
+ es->s_last_orphan = 0;
+ }
jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
return;
}
When we have a filesystem with an orphan inode list *and* in error
state, things behave differently if:
1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
happily (barring other significant problems)
vs.
2) mount is done first, then e2fsck -p: due to the orphan inode
list removal, more errors are found and e2fsck exits with
UNEXPECTED INCONSISTENCY.
The 2nd case above, on the root filesystem, has the tendency to halt
the boot process, which is unfortunate.
The situation can be improved by not clearing the orphan
inode list when the fs is mounted readonly.
Signed-off-by: Eric Sandeen <[email protected]>
---
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index ff9bcdc..485b4fa 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -1490,10 +1490,12 @@ static void ext3_orphan_cleanup (struct super_block * sb,
}
if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
- if (es->s_last_orphan)
+ /* don't clear list on RO mount w/ errors */
+ if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
jbd_debug(1, "Errors on filesystem, "
"clearing orphan list.\n");
- es->s_last_orphan = 0;
+ es->s_last_orphan = 0;
+ }
jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
return;
}
On 2012-08-27, at 1:27 PM, Eric Sandeen wrote:
> When we have a filesystem with an orphan inode list *and* in error
> state, things behave differently if:
>
> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
> happily (barring other significant problems)
>
> vs.
>
> 2) mount is done first, then e2fsck -p: due to the orphan inode
> list removal, more errors are found and e2fsck exits with
> UNEXPECTED INCONSISTENCY.
>
> The 2nd case above, on the root filesystem, has the tendency to halt
> the boot process, which is unfortunate.
I think the reasoning is that if the filesystem is corrupted, then
processing the orphan list may introduce further corruption. If one
has to run a full e2fsck run anyway, then there is no benefit to be
had from processing the orphan list in advance, and a potential
downside (e.g. corrupt inode in the list pointing to some valid inode
and causing it to be deleted).
That said, it depends on how robust the orphan handling code is -
if it won't get confused and delete an in-use inode (i.e. dtime == 0)
then it probably is OK. I wouldn't trust the inode bitmaps to determine
if the inode is in use or not, only whether it is referenced by some
directory.
That said, no value in trying to clear the orphan list on a read-only fs,
so I think you patch is OK.
Acked-by: Andreas Dilger <[email protected]>
> The situation can be improved by not clearing the orphan
> inode list when the fs is mounted readonly.
>
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2d51cd9..2e1ea01 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -2165,10 +2165,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
> }
>
> if (EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS) {
> - if (es->s_last_orphan)
> + /* don't clear list on RO mount w/ errors */
> + if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
> jbd_debug(1, "Errors on filesystem, "
> "clearing orphan list.\n");
> - es->s_last_orphan = 0;
> + es->s_last_orphan = 0;
> + }
> jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
> return;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
On 8/27/12 6:31 PM, Andreas Dilger wrote:
> On 2012-08-27, at 1:27 PM, Eric Sandeen wrote:
>> When we have a filesystem with an orphan inode list *and* in error
>> state, things behave differently if:
>>
>> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
>> happily (barring other significant problems)
>>
>> vs.
>>
>> 2) mount is done first, then e2fsck -p: due to the orphan inode
>> list removal, more errors are found and e2fsck exits with
>> UNEXPECTED INCONSISTENCY.
>>
>> The 2nd case above, on the root filesystem, has the tendency to halt
>> the boot process, which is unfortunate.
>
> I think the reasoning is that if the filesystem is corrupted, then
> processing the orphan list may introduce further corruption. If one
> has to run a full e2fsck run anyway, then there is no benefit to be
> had from processing the orphan list in advance, and a potential
> downside (e.g. corrupt inode in the list pointing to some valid inode
> and causing it to be deleted).
>
> That said, it depends on how robust the orphan handling code is -
> if it won't get confused and delete an in-use inode (i.e. dtime == 0)
> then it probably is OK. I wouldn't trust the inode bitmaps to determine
> if the inode is in use or not, only whether it is referenced by some
> directory.
What's interesting, though, is that e2fsck trusts the orphan inode list
even in the ERROR_FS case. Seems inconsistent with the kernel, I guess,
although e2fsck will only be processing it, not adding to it... *shrug*
> That said, no value in trying to clear the orphan list on a read-only fs,
> so I think you patch is OK.
>
> Acked-by: Andreas Dilger <[email protected]>
Thanks,
-Eric
>> The situation can be improved by not clearing the orphan
>> inode list when the fs is mounted readonly.
>>
>> Signed-off-by: Eric Sandeen <[email protected]>
>> ---
>>
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 2d51cd9..2e1ea01 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -2165,10 +2165,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
>> }
>>
>> if (EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS) {
>> - if (es->s_last_orphan)
>> + /* don't clear list on RO mount w/ errors */
>> + if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
>> jbd_debug(1, "Errors on filesystem, "
>> "clearing orphan list.\n");
>> - es->s_last_orphan = 0;
>> + es->s_last_orphan = 0;
>> + }
>> jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
>> return;
>> }
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>
On Mon 27-08-12 14:30:40, Eric Sandeen wrote:
> When we have a filesystem with an orphan inode list *and* in error
> state, things behave differently if:
>
> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
> happily (barring other significant problems)
>
> vs.
>
> 2) mount is done first, then e2fsck -p: due to the orphan inode
> list removal, more errors are found and e2fsck exits with
> UNEXPECTED INCONSISTENCY.
>
> The 2nd case above, on the root filesystem, has the tendency to halt
> the boot process, which is unfortunate.
>
> The situation can be improved by not clearing the orphan
> inode list when the fs is mounted readonly.
Yeah, makes sense. I've added the patch to my tree. Thanks.
Honza
>
> Signed-off-by: Eric Sandeen <[email protected]>
> ---
>
> diff --git a/fs/ext3/super.c b/fs/ext3/super.c
> index ff9bcdc..485b4fa 100644
> --- a/fs/ext3/super.c
> +++ b/fs/ext3/super.c
> @@ -1490,10 +1490,12 @@ static void ext3_orphan_cleanup (struct super_block * sb,
> }
>
> if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
> - if (es->s_last_orphan)
> + /* don't clear list on RO mount w/ errors */
> + if (es->s_last_orphan && !(s_flags & MS_RDONLY)) {
> jbd_debug(1, "Errors on filesystem, "
> "clearing orphan list.\n");
> - es->s_last_orphan = 0;
> + es->s_last_orphan = 0;
> + }
> jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
> return;
> }
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On 8/28/12 3:02 AM, Jan Kara wrote:
> On Mon 27-08-12 14:30:40, Eric Sandeen wrote:
>> When we have a filesystem with an orphan inode list *and* in error
>> state, things behave differently if:
>>
>> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
>> happily (barring other significant problems)
>>
>> vs.
>>
>> 2) mount is done first, then e2fsck -p: due to the orphan inode
>> list removal, more errors are found and e2fsck exits with
>> UNEXPECTED INCONSISTENCY.
>>
>> The 2nd case above, on the root filesystem, has the tendency to halt
>> the boot process, which is unfortunate.
>>
>> The situation can be improved by not clearing the orphan
>> inode list when the fs is mounted readonly.
> Yeah, makes sense. I've added the patch to my tree. Thanks.
>
> Honza
After a little more investigation, I'm now wondering if this is really
worth doing.
e2fsck zaps the orphan list just like the kernel does:
* If the filesystem contains errors, don't run the orphan
* list, since the orphan list can't be trusted; and we're
* going to be running a full e2fsck run anyway...
and my 1) and 2) differences above were due to testing an older version
of e2fsck which didn't properly propagate the error flag. (Sorry...)
Since upstream e2fsck will _also_ ignore the orphan inode list, there's
probably no great reason for preserving it on a readonly mount after all,
unless it's just to minimize changes when mounting RO (which may be a
sufficient reason, I suppose). So feel free to take it or leave it,
I guess.
Thanks,
-Eric
On 8/27/12 2:12 PM, Eric Sandeen wrote:
> in ext3_orphan_cleanup (same for ext4) we do:
>
> if (EXT3_SB(sb)->s_mount_state & EXT3_ERROR_FS) {
> if (es->s_last_orphan)
> jbd_debug(1, "Errors on filesystem, "
> "clearing orphan list.\n");
> es->s_last_orphan = 0;
> jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
> return;
> }
>
> I can sort of understand not processing the orphan inode list if the
> fs is already known to be potentially corrupted, but actually clearing
> the list seems to go too far. This means that a subsequent e2fsck
> will find even more problems as a result of the orphan list not being
> available.
>
> It's been this way for a while though, so the original reason for the
> behavior may be lost. Does anyone know?
>
> I've been alerted to a somewhat odd behavior where a filesystem with
> an orphan inode list *and* in error state behaves differently if:
>
> 1) e2fsck -p is done: e2fsck fixes things and exits happily
>
> vs.
>
> 2) mount is done first, then e2fsck -p: due to the orphan inode
> list being gone, enough errors are found that e2fsck exits with
> UNEXPECTED INCONSISTENCY.
>
> The 2nd case above has the tendency to halt the boot process, which
> is unfortunate.
Just for posterity, replying to this first email rather than just down-thread.
I was testing a version of e2fsck which was missing one or both of these fixes (sorry):
63b3913dbc0bc7cdf8a63f3bdb0c8d7d605e9a40 e2fsck: correctly propagate error from journal to superblock
6d75685e2b76f4099589ad33732cf59f279b5d65 e2fsck: handle an already recovered journal with a non-zero s_error field
which are present in 1.42.4. With error state properly propagated, e2fsck *also* junks the orphan inode list, and stops the preen pass:
/* Deal with inodes that were part of corrupted orphan linked
list (latch question) */
{ PR_1_ORPHAN_LIST_REFUGEES,
N_("@is that were part of a corrupted orphan linked list found. "),
PROMPT_FIX, 0 },
So there is no inconsistency here between kernel & e2fsck behavior; neither trusts the orphan list in this case. I guess the only remaining question is whether it's really necessary to stop the preen pass, but I suppose it is.
-Eric
On Tue 04-09-12 13:51:57, Eric Sandeen wrote:
> On 8/28/12 3:02 AM, Jan Kara wrote:
> > On Mon 27-08-12 14:30:40, Eric Sandeen wrote:
> >> When we have a filesystem with an orphan inode list *and* in error
> >> state, things behave differently if:
> >>
> >> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
> >> happily (barring other significant problems)
> >>
> >> vs.
> >>
> >> 2) mount is done first, then e2fsck -p: due to the orphan inode
> >> list removal, more errors are found and e2fsck exits with
> >> UNEXPECTED INCONSISTENCY.
> >>
> >> The 2nd case above, on the root filesystem, has the tendency to halt
> >> the boot process, which is unfortunate.
> >>
> >> The situation can be improved by not clearing the orphan
> >> inode list when the fs is mounted readonly.
> > Yeah, makes sense. I've added the patch to my tree. Thanks.
> >
> > Honza
>
> After a little more investigation, I'm now wondering if this is really
> worth doing.
>
> e2fsck zaps the orphan list just like the kernel does:
>
> * If the filesystem contains errors, don't run the orphan
> * list, since the orphan list can't be trusted; and we're
> * going to be running a full e2fsck run anyway...
>
> and my 1) and 2) differences above were due to testing an older version
> of e2fsck which didn't properly propagate the error flag. (Sorry...)
>
> Since upstream e2fsck will _also_ ignore the orphan inode list, there's
> probably no great reason for preserving it on a readonly mount after all,
> unless it's just to minimize changes when mounting RO (which may be a
> sufficient reason, I suppose). So feel free to take it or leave it,
> I guess.
Since I've already pushed this to Linus and minimizing changes on RO
filesystem makes sense anyway, I'll leave the patch in... Thanks for the
update.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon, Aug 27, 2012 at 02:27:32PM -0500, Eric Sandeen wrote:
> When we have a filesystem with an orphan inode list *and* in error
> state, things behave differently if:
>
> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
> happily (barring other significant problems)
>
> vs.
>
> 2) mount is done first, then e2fsck -p: due to the orphan inode
> list removal, more errors are found and e2fsck exits with
> UNEXPECTED INCONSISTENCY.
>
> The 2nd case above, on the root filesystem, has the tendency to halt
> the boot process, which is unfortunate.
>
> The situation can be improved by not clearing the orphan
> inode list when the fs is mounted readonly.
>
> Signed-off-by: Eric Sandeen <[email protected]>
I've applied this commit since I agree with Jan's observation that if
the file system is mounted read-only, we should try to minimize
changes to it if it contains errors. I have modified the commit
description though:
ext4: don't clear orphan list on ro mount with errors
From: Eric Sandeen <[email protected]>
If the file system contains errors and it is being mounted read-only,
don't clear the orphan list. We should minimize changes to the file
system if it is mounted read-only.
Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
- Ted
On 9/26/12 10:32 PM, Theodore Ts'o wrote:
> On Mon, Aug 27, 2012 at 02:27:32PM -0500, Eric Sandeen wrote:
>> When we have a filesystem with an orphan inode list *and* in error
>> state, things behave differently if:
>>
>> 1) e2fsck -p is done prior to mount: e2fsck fixes things and exits
>> happily (barring other significant problems)
>>
>> vs.
>>
>> 2) mount is done first, then e2fsck -p: due to the orphan inode
>> list removal, more errors are found and e2fsck exits with
>> UNEXPECTED INCONSISTENCY.
>>
>> The 2nd case above, on the root filesystem, has the tendency to halt
>> the boot process, which is unfortunate.
>>
>> The situation can be improved by not clearing the orphan
>> inode list when the fs is mounted readonly.
>>
>> Signed-off-by: Eric Sandeen <[email protected]>
>
> I've applied this commit since I agree with Jan's observation that if
> the file system is mounted read-only, we should try to minimize
> changes to it if it contains errors. I have modified the commit
> description though:
Fair enough, thanks.
-Eric
> ext4: don't clear orphan list on ro mount with errors
>
> From: Eric Sandeen <[email protected]>
>
> If the file system contains errors and it is being mounted read-only,
> don't clear the orphan list. We should minimize changes to the file
> system if it is mounted read-only.
>
> Signed-off-by: Eric Sandeen <[email protected]>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>