On 9/20/23 5:41 AM, Eric Whitney wrote:
> * Muhammad Usama Anjum <[email protected]>:
>> Syzbot has hit the following bug on current and all older kernels:
>> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
>> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
>> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
>>
>> On investigation, I've found that eh->eh_entries is zero, ex is
>> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
>> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
>> buffer read.
>>
>> element: FFFF8882F8F0D06C <----- ex
>> element: FFFF8882F8F0D060
>> element: FFFF8882F8F0D054
>> element: FFFF8882F8F0D048
>> element: FFFF8882F8F0D03C
>> element: FFFF8882F8F0D030
>> element: FFFF8882F8F0D024
>> element: FFFF8882F8F0D018
>> element: FFFF8882F8F0D00C <------ EXT_FIRST_EXTENT(eh)
>> header: FFFF8882F8F0D000 <------ EXT_LAST_EXTENT(eh) and eh
>>
>> Cc: [email protected]
>> Reported-by: [email protected]
>> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
>> Fixes: d583fb87a3ff ("ext4: punch out extents")
>> Signed-off-by: Muhammad Usama Anjum <[email protected]>
>> ---
>> This patch is only fixing the local issue. There may be bigger bug. Why
>> is ex set to last entry if the eh->eh_entries is 0. If any ext4
>> developer want to look at the bug, please don't hesitate.
>> ---
>> fs/ext4/extents.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index e4115d338f101..7b7779b4cb87f 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
>> * If the extent was completely released,
>> * we need to remove it from the leaf
>> */
>> - if (num == 0) {
>> + if (num == 0 && eh->eh_entries) {
>> if (end != EXT_MAX_BLOCKS - 1) {
>> /*
>> * For hole punching, we need to scoot all the
>> --
>> 2.40.1
>>
>
> Hi:
>
> First, thanks for taking the time to look at this.
Thank you for replying and giving me pointers that I need to start looking
at problem from first warning until the bug which can be difficult until I
debug the problem smartly and learn at least the basics of ext4.
>
> I'm suspicious that syzbot may be fuzzing an extent header or other extent
> tree components. As you noticed, eh_entries and ex appear to be inconsistent.
> Also, note the long series of corrupted file system reports in the console log
> occurring before the KASAN bug - ext4 had been detecting and rejecting bad
> data up to that point. The file system on the disk image provided by sysbot
> indicates that metadata checksumming was enabled (and it fscks cleanly).
> That should have caught a corrupted extent header or inode, but perhaps
> there's a problem.
>
> The console log indicates that the problem occurred on inode #16. Does the
> information you've provided above come from testing you did on inode #16
> (looks like the name was /bin/base64)?
I couldn't analyze the problem in broad spectrum. There must be some bigger
thing wrong here.
>
> By any chance, have you found a simpler reproducer than what syzbot provides?
Not yet, this gets reproduced after a while. I'll try to come up with
better reproducer if I can.
>
> Thanks,
> Eric
>
>
--
BR,
Muhammad Usama Anjum
* Muhammad Usama Anjum <[email protected]>:
> On 9/20/23 5:41 AM, Eric Whitney wrote:
> > * Muhammad Usama Anjum <[email protected]>:
> >> Syzbot has hit the following bug on current and all older kernels:
> >> BUG: KASAN: out-of-bounds in ext4_ext_rm_leaf fs/ext4/extents.c:2736 [inline]
> >> BUG: KASAN: out-of-bounds in ext4_ext_remove_space+0x2482/0x4d90 fs/ext4/extents.c:2958
> >> Read of size 18446744073709551508 at addr ffff888073aea078 by task syz-executor420/6443
> >>
> >> On investigation, I've found that eh->eh_entries is zero, ex is
> >> referring to last entry and EXT_LAST_EXTENT(eh) is referring to first.
> >> Hence EXT_LAST_EXTENT(eh) - ex becomes negative and causes the wrong
> >> buffer read.
> >>
> >> element: FFFF8882F8F0D06C <----- ex
> >> element: FFFF8882F8F0D060
> >> element: FFFF8882F8F0D054
> >> element: FFFF8882F8F0D048
> >> element: FFFF8882F8F0D03C
> >> element: FFFF8882F8F0D030
> >> element: FFFF8882F8F0D024
> >> element: FFFF8882F8F0D018
> >> element: FFFF8882F8F0D00C <------ EXT_FIRST_EXTENT(eh)
> >> header: FFFF8882F8F0D000 <------ EXT_LAST_EXTENT(eh) and eh
> >>
> >> Cc: [email protected]
> >> Reported-by: [email protected]
> >> Closes: https://groups.google.com/g/syzkaller-bugs/c/G6zS-LKgDW0/m/63MgF6V7BAAJ
> >> Fixes: d583fb87a3ff ("ext4: punch out extents")
> >> Signed-off-by: Muhammad Usama Anjum <[email protected]>
> >> ---
> >> This patch is only fixing the local issue. There may be bigger bug. Why
> >> is ex set to last entry if the eh->eh_entries is 0. If any ext4
> >> developer want to look at the bug, please don't hesitate.
> >> ---
> >> fs/ext4/extents.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index e4115d338f101..7b7779b4cb87f 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -2726,7 +2726,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >> * If the extent was completely released,
> >> * we need to remove it from the leaf
> >> */
> >> - if (num == 0) {
> >> + if (num == 0 && eh->eh_entries) {
> >> if (end != EXT_MAX_BLOCKS - 1) {
> >> /*
> >> * For hole punching, we need to scoot all the
> >> --
> >> 2.40.1
> >>
> >
> > Hi:
> >
> > First, thanks for taking the time to look at this.
> Thank you for replying and giving me pointers that I need to start looking
> at problem from first warning until the bug which can be difficult until I
> debug the problem smartly and learn at least the basics of ext4.
>
> >
> > I'm suspicious that syzbot may be fuzzing an extent header or other extent
> > tree components. As you noticed, eh_entries and ex appear to be inconsistent.
> > Also, note the long series of corrupted file system reports in the console log
> > occurring before the KASAN bug - ext4 had been detecting and rejecting bad
> > data up to that point. The file system on the disk image provided by sysbot
> > indicates that metadata checksumming was enabled (and it fscks cleanly).
> > That should have caught a corrupted extent header or inode, but perhaps
> > there's a problem.
> >
> > The console log indicates that the problem occurred on inode #16. Does the
> > information you've provided above come from testing you did on inode #16
> > (looks like the name was /bin/base64)?
> I couldn't analyze the problem in broad spectrum. There must be some bigger
> thing wrong here.
>
> >
> > By any chance, have you found a simpler reproducer than what syzbot provides?
> Not yet, this gets reproduced after a while. I'll try to come up with
> better reproducer if I can.
>
My suggestion would be to first determine whether syzbot has disabled
metadata checksumming by the point in time when the problem occurs (or
whether temporarily modifying ext4 to make it impossible to disable
metadata checksumming also makes it impossible to reproduce the failure).
It may have done this as part of its test. If so, this becomes a very low
priority bug for ext4, and you could avoid the effort to find a reproducer.
Eric