2009-12-11 14:07:00

by Surbhi Palande

[permalink] [raw]
Subject: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

This patch fixes the upstream bug# 14286. When the address of an extent
corresponding to a given block is NULL and the tree is being traversed for
fetching such an address, a -EIO should be reported instead of a BUG(). This
situation should normally not occur. However if it does, then the system
should be rendered usable.

Signed-off-by: Surbhi Palande <[email protected]>
---
fs/ext4/extents.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a7928f..51f87f3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3190,7 +3190,12 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
*/
- BUG_ON(path[depth].p_ext == NULL && depth != 0);
+ if (path[depth].p_ext == NULL && depth != 0) {
+ err = -EIO;
+ printk(KERN_ERR "\n ext4 fs error in %s,%s,%s while reading a block ", \
+ __FILE__, __func__, __LINE__);
+ goto out2;
+ }
eh = path[depth].p_hdr;

ex = path[depth].p_ext;
--
1.6.3.3



2009-12-11 18:07:36

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

On Fri, 2009-12-11 at 16:06 +0200, Surbhi Palande wrote:
> This patch fixes the upstream bug# 14286. When the address of an extent
> corresponding to a given block is NULL and the tree is being traversed for
> fetching such an address, a -EIO should be reported instead of a BUG(). This
> situation should normally not occur. However if it does, then the system
> should be rendered usable.
>
> Signed-off-by: Surbhi Palande <[email protected]>
> ---
> fs/ext4/extents.c | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3a7928f..51f87f3 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3190,7 +3190,12 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> * this situation is possible, though, _during_ tree modification;
> * this is why assert can't be put in ext4_ext_find_extent()
> */
> - BUG_ON(path[depth].p_ext == NULL && depth != 0);
> + if (path[depth].p_ext == NULL && depth != 0) {
> + err = -EIO;
> + printk(KERN_ERR "\n ext4 fs error in %s,%s,%s while reading a block ", \
> + __FILE__, __func__, __LINE__);
> + goto out2;
> + }
> eh = path[depth].p_hdr;
>
> ex = path[depth].p_ext;

As it happens, I fixed this locally but, as I considered it a band-aid
fix at the time, I never pushed it to you guys. My version of the fix
is below; it dumps more information before returning the EIO. I don't
have a strong preference between the two versions but I would like to
see the commentary included and the extra information is often useful
during debugging.

Signed-off-by: Frank Mayhar <[email protected]>

fs/ext4/extents.c | 18 +++++++++++++++++-
1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f7bdd55..7aa0bf6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2859,8 +2859,24 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* consistent leaf must not be empty;
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
+ *
+ * We don't want to panic in this case, as it can lead to a crash
+ * loop; instead we want to catch the error and abort.
*/
- BUG_ON(path[depth].p_ext == NULL && depth != 0);
+ if (path[depth].p_ext == NULL && depth != 0) {
+ printk(KERN_ERR
+ "EXT4-fs (%s): corrupt extent node ino %lu iblock %d"
+ " depth %d pblock %lld\n",
+ inode->i_sb->s_id, inode->i_ino, iblock, depth,
+ path[depth].p_block);
+ if (!path[depth].p_hdr)
+ path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
+ printk(KERN_ERR "EXT4-fs (%s): eh_entries %d eh_max %d\n",
+ inode->i_sb->s_id, path[depth].p_hdr->eh_entries,
+ path[depth].p_hdr->eh_max);
+ err = -EIO;
+ goto out2;
+ }
eh = path[depth].p_hdr;

ex = path[depth].p_ext;
--
Frank Mayhar <[email protected]>
Google, Inc.


2009-12-11 20:02:59

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

Frank Mayhar wrote:
> On Fri, 2009-12-11 at 16:06 +0200, Surbhi Palande wrote:
>> This patch fixes the upstream bug# 14286. When the address of an extent
>> corresponding to a given block is NULL and the tree is being traversed for
>> fetching such an address, a -EIO should be reported instead of a BUG(). This
>> situation should normally not occur. However if it does, then the system
>> should be rendered usable.
>>
>> Signed-off-by: Surbhi Palande <[email protected]>
>> ---
>> fs/ext4/extents.c | 7 ++++++-
>> 1 files changed, 6 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 3a7928f..51f87f3 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3190,7 +3190,12 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
>> * this situation is possible, though, _during_ tree modification;
>> * this is why assert can't be put in ext4_ext_find_extent()
>> */
>> - BUG_ON(path[depth].p_ext == NULL && depth != 0);
>> + if (path[depth].p_ext == NULL && depth != 0) {
>> + err = -EIO;
>> + printk(KERN_ERR "\n ext4 fs error in %s,%s,%s while reading a block ", \
>> + __FILE__, __func__, __LINE__);
>> + goto out2;
>> + }
>> eh = path[depth].p_hdr;
>>
>> ex = path[depth].p_ext;
>
> As it happens, I fixed this locally but, as I considered it a band-aid
> fix at the time, I never pushed it to you guys. My version of the fix
> is below; it dumps more information before returning the EIO. I don't
> have a strong preference between the two versions but I would like to
> see the commentary included and the extra information is often useful
> during debugging.

My first thought was that this was a bandaid too, but I guess it can
come about due to on-disk corruption for any reason, so it should
be handled gracefully, and I suppose this approach seems fine.

I think the original plan was that a BUG() should be primarily for things
that would arise from a programming error, though EIO & graceful shutdown
even for that class of errors would probably be best.

Since this is catching on-disk corruption, though, it'd be better to call
ext4_error() and let the mount-time error-handling policy decide what to do.

I like having more info but below seems awfully wordy ;) Maybe the first
printk would suffice, and switching it to an ext4_error() would be best,
I think.

-Eric


> Signed-off-by: Frank Mayhar <[email protected]>
>
> fs/ext4/extents.c | 18 +++++++++++++++++-
> 1 files changed, 17 insertions(+), 1 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index f7bdd55..7aa0bf6 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2859,8 +2859,24 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
> * consistent leaf must not be empty;
> * this situation is possible, though, _during_ tree modification;
> * this is why assert can't be put in ext4_ext_find_extent()
> + *
> + * We don't want to panic in this case, as it can lead to a crash
> + * loop; instead we want to catch the error and abort.
> */
> - BUG_ON(path[depth].p_ext == NULL && depth != 0);
> + if (path[depth].p_ext == NULL && depth != 0) {
> + printk(KERN_ERR
> + "EXT4-fs (%s): corrupt extent node ino %lu iblock %d"
> + " depth %d pblock %lld\n",
> + inode->i_sb->s_id, inode->i_ino, iblock, depth,
> + path[depth].p_block);
> + if (!path[depth].p_hdr)
> + path[depth].p_hdr = ext_block_hdr(path[depth].p_bh);
> + printk(KERN_ERR "EXT4-fs (%s): eh_entries %d eh_max %d\n",
> + inode->i_sb->s_id, path[depth].p_hdr->eh_entries,
> + path[depth].p_hdr->eh_max);
> + err = -EIO;
> + goto out2;
> + }
> eh = path[depth].p_hdr;
>
> ex = path[depth].p_ext;


2009-12-11 20:22:20

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

On Fri, 2009-12-11 at 14:02 -0600, Eric Sandeen wrote:
> My first thought was that this was a bandaid too, but I guess it can
> come about due to on-disk corruption for any reason, so it should
> be handled gracefully, and I suppose this approach seems fine.

That's why we've been running with it, yes.

> Since this is catching on-disk corruption, though, it'd be better to call
> ext4_error() and let the mount-time error-handling policy decide what to do.
>
> I like having more info but below seems awfully wordy ;) Maybe the first
> printk would suffice, and switching it to an ext4_error() would be best,
> I think.

Okay, I'll rework the patch a bit and resubmit it.
--
Frank Mayhar <[email protected]>
Google, Inc.


2009-12-11 20:31:51

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

Frank Mayhar wrote:
> On Fri, 2009-12-11 at 14:02 -0600, Eric Sandeen wrote:
>> My first thought was that this was a bandaid too, but I guess it can
>> come about due to on-disk corruption for any reason, so it should
>> be handled gracefully, and I suppose this approach seems fine.
>
> That's why we've been running with it, yes.

now if this is coming about as the result of a programming error, we'd
better sort that out ;) Do you have any reason to believe that the
corruption a hardware or admin issue, vs. an actual bug somewhere?

>> Since this is catching on-disk corruption, though, it'd be better to call
>> ext4_error() and let the mount-time error-handling policy decide what to do.
>>
>> I like having more info but below seems awfully wordy ;) Maybe the first
>> printk would suffice, and switching it to an ext4_error() would be best,
>> I think.
>
> Okay, I'll rework the patch a bit and resubmit it.


Thanks!

The amount of info printed is probably just a judgement call; for a developer,
printing out the inode & iblock is enough 'cause we can then just go use
debugfs & look at it. For a bug report, perhaps more info would be useful
because that one set of printks may be all we'll get ... up to you.

Maybe we should think about a generic "print corrupted inode information"
infrastructure that could be reused ...

Thanks,
-Eric

2009-12-11 21:25:50

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

On Fri, 2009-12-11 at 14:31 -0600, Eric Sandeen wrote:
> Frank Mayhar wrote:
> > On Fri, 2009-12-11 at 14:02 -0600, Eric Sandeen wrote:
> >> My first thought was that this was a bandaid too, but I guess it can
> >> come about due to on-disk corruption for any reason, so it should
> >> be handled gracefully, and I suppose this approach seems fine.
> >
> > That's why we've been running with it, yes.
>
> now if this is coming about as the result of a programming error, we'd
> better sort that out ;) Do you have any reason to believe that the
> corruption a hardware or admin issue, vs. an actual bug somewhere?

In fact the original corruption was due to an ext4 bug. After we fixed
the bug, though, we had a problem with machines going into crash loops.
Analysis revealed that the machines in question had corruptions caused
by the (now fixed) bug but the nature of the corruptions were causing us
to hit this case. Every time the system came back up it would start
processing again only to run into the same corruption. And, of course,
the corruption wasn't being found and fixed by fsck. All we could do
here was change this path to return EIO in this case; rebuilding the
file systems was out of the question at the time since they didn't
belong to us.

Philosophically, I prefer returning EIO here to doing a BUG_ON
regardless, since it also makes the code more robust in the face of
possible hardware errors.

> The amount of info printed is probably just a judgement call; for a developer,
> printing out the inode & iblock is enough 'cause we can then just go use
> debugfs & look at it. For a bug report, perhaps more info would be useful
> because that one set of printks may be all we'll get ... up to you.

Yeah, I tend to lean toward "print everything I can think of" in
situations like this because you never know just what you might need...
But that's mostly for the hardcore testing cycle and not for production
use.

> Maybe we should think about a generic "print corrupted inode information"
> infrastructure that could be reused ...

That would probably be useful in the long run, yes.
--
Frank Mayhar <[email protected]>
Google, Inc.


2009-12-11 22:11:30

by Surbhi Palande

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

Hi Frank,

Frank Mayhar wrote:
> On Fri, 2009-12-11 at 14:02 -0600, Eric Sandeen wrote:
>
>> My first thought was that this was a bandaid too, but I guess it can
>> come about due to on-disk corruption for any reason, so it should
>> be handled gracefully, and I suppose this approach seems fine.
>>
>
> That's why we've been running with it, yes.
>
>
>> Since this is catching on-disk corruption, though, it'd be better to call
>> ext4_error() and let the mount-time error-handling policy decide what to do.
>>
>> I like having more info but below seems awfully wordy ;) Maybe the first
>> printk would suffice, and switching it to an ext4_error() would be best,
>> I think.
>>
>
> Okay, I'll rework the patch a bit and resubmit it.
>
Since I published this patch first against a reported upstream bug, I
will work on the comments and resubmit the patch!

Thanks for your comments !

Warm Regards,
Surbhi.

2009-12-11 22:17:12

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] replaced BUG() with return -EIO from ext4_ext_get_blocks

On Sat, 2009-12-12 at 00:11 +0200, Surbhi Palande wrote:
> Hi Frank,
>
> Frank Mayhar wrote:
> > On Fri, 2009-12-11 at 14:02 -0600, Eric Sandeen wrote:
> >
> >> My first thought was that this was a bandaid too, but I guess it can
> >> come about due to on-disk corruption for any reason, so it should
> >> be handled gracefully, and I suppose this approach seems fine.
> >>
> >
> > That's why we've been running with it, yes.
> >
> >
> >> Since this is catching on-disk corruption, though, it'd be better to call
> >> ext4_error() and let the mount-time error-handling policy decide what to do.
> >>
> >> I like having more info but below seems awfully wordy ;) Maybe the first
> >> printk would suffice, and switching it to an ext4_error() would be best,
> >> I think.
> >>
> >
> > Okay, I'll rework the patch a bit and resubmit it.
> >
> Since I published this patch first against a reported upstream bug, I
> will work on the comments and resubmit the patch!

Okay, go for it, thanks.
--
Frank Mayhar <[email protected]>
Google, Inc.


2009-12-13 01:27:32

by Surbhi Palande

[permalink] [raw]
Subject: [PATCH v2] replaced BUG() with return -EIO from ext4_ext_get_blocks

This patch fixes the upstream bug# 14286. When the address of an extent
corresponding to a valid block is corrupted, a -EIO should be reported instead
of a BUG(). This situation would not normally not occur. If however it does,
then the system should not panic directly but depending on the mount time options
appropriate action should be taken. If the mount options so permit, the I/O
should be gracefully aborted by returning a -EIO.

Signed-off-by: Surbhi Palande <[email protected]>
---
fs/ext4/extents.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a7928f..d59aa12 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3189,8 +3189,17 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* consistent leaf must not be empty;
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
+ *
+ * We don't want to panic in this case, try and abort I/O gracefully
*/
- BUG_ON(path[depth].p_ext == NULL && depth != 0);
+ if (path[depth].p_ext == NULL && depth != 0) {
+ err = -EIO;
+ ext4_error(inode->i_sb, __func__,
+ "bad extent address "
+ "inode: %lu, iblock: %d, depth: %d",
+ inode->i_ino, iblock, depth);
+ goto out2;
+ }
eh = path[depth].p_hdr;

ex = path[depth].p_ext;
--
1.6.3.3


2009-12-14 15:04:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH v2] replaced BUG() with return -EIO from ext4_ext_get_blocks

Surbhi,

Thanks for the patch, and since I believe this is your first patch
submission, welcome to the ext4 developer's community!

I made a few stylistic changes to your patch before adding it to the
ext4 patch queue. I removed the comment "We don't want to panic..."
since the explanation is not going to help someone reading the code in
the future (comments referring to non-existent code, such as why we're
not going to call BUG_ON are best placed in the commit description,
since when someone looks at the code a year from now, they won't see
the missing BUG_ON :-).

(BTW, the comment to which you added the "We don't want to panic.."
aside is itself kind of nasty:

/*
* consistent leaf must not be empty;
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
*/

This comment really should be made in the code body of
ext4_ext_find_extent(), with an comment above that function indicating
that callers who care should be making this check. A code clean up
for another day, along with auditing all of the *other* callers of
ext4_ext_find_extent() that they are making this check when
appropriate...)

I also made some minor whitespace changes with respect to argument
alignment, and in the commit description I used the term "Kernel BZ"
and included the URL instead of using the term "upstream bug #", since
that's commonly used terminology for commits in the kernel tree.
(Upstream makes sense if the patch lives in a distro tree, but it's
less clear when it's actually in the upstream repository, since the
"upstream" developers generally don't refer to themselves or their
project in those terms.)

Thanks, regards,

- Ted

commit 9f31d1227c3f2611405fc29e092c61a56b37da33
Author: Surbhi Palande <[email protected]>
Date: Mon Dec 14 09:53:52 2009 -0500

ext4: replace BUG() with return -EIO in ext4_ext_get_blocks

This patch fixes the Kernel BZ #14286. When the address of an extent
corresponding to a valid block is corrupted, a -EIO should be reported
instead of a BUG(). This situation should not normally not occur
except in the case of a corrupted filesystem. If however it does,
then the system should not panic directly but depending on the mount
time options appropriate action should be taken. If the mount options
so permit, the I/O should be gracefully aborted by returning a -EIO.

http://bugzilla.kernel.org/show_bug.cgi?id=14286

Signed-off-by: Surbhi Palande <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a7928f..8fd6c56 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3190,7 +3190,13 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
* this situation is possible, though, _during_ tree modification;
* this is why assert can't be put in ext4_ext_find_extent()
*/
- BUG_ON(path[depth].p_ext == NULL && depth != 0);
+ if (path[depth].p_ext == NULL && depth != 0) {
+ ext4_error(inode->i_sb, __func__, "bad extent address "
+ "inode: %lu, iblock: %d, depth: %d",
+ inode->i_ino, iblock, depth);
+ err = -EIO;
+ goto out2;
+ }
eh = path[depth].p_hdr;

ex = path[depth].p_ext;