LinuxLists.cc - [PATCH] ext4: catch integer overflow in ext4_cache

2020-07-13 12:59:03

Subject: [PATCH] ext4: catch integer overflow in ext4_cache_extents

From: Wolfgang Frisch <[email protected]>

When extent tree is corrupted we can hit BUG_ON in
ext4_es_cache_extent(). Check for this and abort caching instead of
crashing the machine.

Signed-off-by: Wolfgang Frisch <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/extents.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 221f240eae60..e76d00fda104 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -471,6 +471,10 @@ static void ext4_cache_extents(struct inode *inode,
ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
int len = ext4_ext_get_actual_len(ex);

+ /* Corrupted extent tree? Stop caching... */
+ if (lblk + len < lblk || lblk + len > EXT4_MAX_LOGICAL_BLOCK)
+ return;
+
if (prev && (prev != lblk))
ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
EXTENT_STATUS_HOLE);
--
2.16.4

2020-07-13 13:46:13

by Ritesh Harjani

[permalink] [raw]

Subject: Re: [PATCH] ext4: catch integer overflow in ext4_cache_extents

On 7/13/20 6:28 PM, Jan Kara wrote:
> From: Wolfgang Frisch <[email protected]>
>
> When extent tree is corrupted we can hit BUG_ON in
> ext4_es_cache_extent(). Check for this and abort caching instead of
> crashing the machine.

Was it intentionally made corrupted by crafting a corrupted disk image?
Are there more such logic in place which checks for such corruption at
other places? Maybe a background over the issue which you saw may help.
Also how did it recover out of it?

Do you think it make sense to still emit a WARN_ON() here and then
return which warns that this could possibly a corrupted extent
entry? (maybe WARN_ON_ONCE() or via some ratelimiting if multiple extent
entries are corrupted for that inode).

-ritesh

2020-07-14 12:34:43

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH] ext4: catch integer overflow in ext4_cache_extents

On Mon 13-07-20 19:14:47, Ritesh Harjani wrote:
>
>
> On 7/13/20 6:28 PM, Jan Kara wrote:
> > From: Wolfgang Frisch <[email protected]>
> >
> > When extent tree is corrupted we can hit BUG_ON in
> > ext4_es_cache_extent(). Check for this and abort caching instead of
> > crashing the machine.
>
> Was it intentionally made corrupted by crafting a corrupted disk image?

I'm not sure how Wolfgang hit the issue. I'd expect some fs image
fuzzing... Wolfgang?

> Are there more such logic in place which checks for such corruption at other
> places?

That's a good question. But now that I'm looking at it ext4_ext_check()
should actually catch a corruption like this. It is only the path in
ext4_find_extent()->ext4_cache_extents() that can face the issue so
probably instead of a fix in ext4_cache_extents() we should rather add more
careful extent info checks for the extents contained directly in the inode.
I'll look into it.

> Maybe a background over the issue which you saw may help.
> Also how did it recover out of it?

e2fsck I suppose :)

> Do you think it make sense to still emit a WARN_ON() here and then
> return which warns that this could possibly a corrupted extent
> entry? (maybe WARN_ON_ONCE() or via some ratelimiting if multiple extent
> entries are corrupted for that inode).

No, WARN is definitely wrong in this case. We could call ext4_error() if we
wanted. That would make sence although I've decided not to add it to
the original Wolfgang's fix since this is more like a failing readahead.
But OTOH it's metadata corruption that's unlikely to go away so I can be
easily convinced to put ext4_error() there :).

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-07-15 12:25:12

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH] ext4: catch integer overflow in ext4_cache_extents

On Tue 14-07-20 14:31:22, Jan Kara wrote:
> On Mon 13-07-20 19:14:47, Ritesh Harjani wrote:
> >
> >
> > On 7/13/20 6:28 PM, Jan Kara wrote:
> > > From: Wolfgang Frisch <[email protected]>
> > >
> > > When extent tree is corrupted we can hit BUG_ON in
> > > ext4_es_cache_extent(). Check for this and abort caching instead of
> > > crashing the machine.
> >
> > Was it intentionally made corrupted by crafting a corrupted disk image?
>
> I'm not sure how Wolfgang hit the issue. I'd expect some fs image
> fuzzing... Wolfgang?
>
> > Are there more such logic in place which checks for such corruption at other
> > places?
>
> That's a good question. But now that I'm looking at it ext4_ext_check()
> should actually catch a corruption like this. It is only the path in
> ext4_find_extent()->ext4_cache_extents() that can face the issue so
> probably instead of a fix in ext4_cache_extents() we should rather add more
> careful extent info checks for the extents contained directly in the inode.
> I'll look into it.

I was checking this more and indeed the problem can actually happen only
with the journal inode because that is special-cased when checking extent
tree. I'll send a new series that fixes this in a cleaner way.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-07-15 17:30:17

by Wolfgang Frisch

[permalink] [raw]

Subject: Re: [PATCH] ext4: catch integer overflow in ext4_cache_extents

On 14/07/2020 14.31, Jan Kara wrote:
> On Mon 13-07-20 19:14:47, Ritesh Harjani wrote:
>> On 7/13/20 6:28 PM, Jan Kara wrote:
>>> From: Wolfgang Frisch <[email protected]>
>>>
>>> When extent tree is corrupted we can hit BUG_ON in
>>> ext4_es_cache_extent(). Check for this and abort caching instead of
>>> crashing the machine.
>>
>> Was it intentionally made corrupted by crafting a corrupted disk image?
>
> I'm not sure how Wolfgang hit the issue. I'd expect some fs image
> fuzzing... Wolfgang?The bug was discovered accidentally when we tried to use a physically
defective USB flash drive. It was possible to partition and format the
device but the subsequent mount operation unearthed this problem.

As it was impossible to image the defective drive entirely, I used
blktrace(8) to gather a minimal list of read operations executed by
mount . A script then copied just the necessary blocks from the device
into a sparse QCOW2 image that is attached to the original bug report [1].

>> Are there more such logic in place which checks for such corruption at other
>> places?
>
> That's a good question. But now that I'm looking at it ext4_ext_check()
> should actually catch a corruption like this. It is only the path in
> ext4_find_extent()->ext4_cache_extents() that can face the issue so
> probably instead of a fix in ext4_cache_extents() we should rather add more
> careful extent info checks for the extents contained directly in the inode.
> I'll look into it.

That sounds very reasonable. I see you already implemented it!

Best regards
Wolfgang

[1] https://bugzilla.suse.com/show_bug.cgi?id=1173485

--
Wolfgang Frisch <[email protected]>
Security Engineer
OpenPGP fingerprint: A2E6 B7D4 53E9 544F BC13 D26B D9B3 56BD 4D4A 2D15
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nuremberg, Germany
(HRB 36809, AG Nürnberg)
Managing Director: Felix Imendörffer

Attachments:

signature.asc (849.00 B)
OpenPGP digital signature