2022-12-15 13:24:24

by Tudor Ambarus

[permalink] [raw]
Subject: BUG: unable to handle kernel paging request in z_erofs_decompress_queue

Hi, Gao, Chao, Yue, Jeffle, all,

Syzbot reported a bug at [1] that is reproducible in upstream kernel
since
commit 47e4937a4a7c ("erofs: move erofs out of staging")

and up to (inclusively)
commit 2bfab9c0edac ("erofs: record the longest decompressed size in
this round")

The first commit that makes this bug go away is:
commit 267f2492c8f7 ("erofs: introduce multi-reference pclusters
(fully-referenced)")
Although, this commit looks like new support and not like an explicit
bug fix.

I'd like to fix the lts kernels. I'm happy to try any suggestions or do
some tests. Please let me know if the bug rings a bell.

Thanks,
ta

[1]
https://syzkaller.appspot.com/bug?id=a9b56d324d0de9233ad80633826fac76836d792a


2022-12-15 13:57:35

by Tudor Ambarus

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request in z_erofs_decompress_queue



On 15.12.2022 14:58, Tudor Ambarus wrote:
> Hi, Gao, Chao, Yue, Jeffle, all,
>
> Syzbot reported a bug at [1] that is reproducible in upstream kernel
> since
>   commit 47e4937a4a7c ("erofs: move erofs out of staging")
>
> and up to (inclusively)
>   commit 2bfab9c0edac ("erofs: record the longest decompressed size in
> this round")
>
> The first commit that makes this bug go away is:
>   commit 267f2492c8f7 ("erofs: introduce multi-reference pclusters
> (fully-referenced)")
> Although, this commit looks like new support and not like an explicit
> bug fix.
>
> I'd like to fix the lts kernels. I'm happy to try any suggestions or do
> some tests. Please let me know if the bug rings a bell.
>

There's something else that may help. I enabled CONFIG_EROFS_FS_DEBUG
while at
commit 2bfab9c0edac ("erofs: record the longest decompressed size in
this round")
and I got the following: https://termbin.com/4bm8

Cheers,
ta

> [1]
> https://syzkaller.appspot.com/bug?id=a9b56d324d0de9233ad80633826fac76836d792a

2022-12-15 14:35:16

by Gao Xiang

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request in z_erofs_decompress_queue

Hi Tudor,

On Thu, Dec 15, 2022 at 02:58:10PM +0200, Tudor Ambarus wrote:
> Hi, Gao, Chao, Yue, Jeffle, all,
>
> Syzbot reported a bug at [1] that is reproducible in upstream kernel
> since
> commit 47e4937a4a7c ("erofs: move erofs out of staging")
>
> and up to (inclusively)
> commit 2bfab9c0edac ("erofs: record the longest decompressed size in this
> round")
>
> The first commit that makes this bug go away is:
> commit 267f2492c8f7 ("erofs: introduce multi-reference pclusters
> (fully-referenced)")
> Although, this commit looks like new support and not like an explicit
> bug fix.
>
> I'd like to fix the lts kernels. I'm happy to try any suggestions or do
> some tests. Please let me know if the bug rings a bell.

Thanks for your report. I will try to seek time to look at this this
weekend. But just from your description, I guess that there could be
something wrong on several compressed extents pointing to the same
blocks (i.e. the same pcluster). But prior to commit 267f2492c8f7, such
image is always considered as corrupted images.

Anyway, I will look into that and see if there could be alternative ways
to fix this rather than backport the whole multi-reference pcluster
feature. Yet I think no need to worry since such image is pretty
crafted and should be used as normal images.

Thanks,
Gao Xiang

>
> Thanks,
> ta
>
> [1]
> https://syzkaller.appspot.com/bug?id=a9b56d324d0de9233ad80633826fac76836d792a

2022-12-20 16:05:00

by Tudor Ambarus

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request in z_erofs_decompress_queue

Hi, Gao,

On 15.12.2022 16:24, Gao Xiang wrote:
> Hi Tudor,
>
> On Thu, Dec 15, 2022 at 02:58:10PM +0200, Tudor Ambarus wrote:
>> Hi, Gao, Chao, Yue, Jeffle, all,
>>
>> Syzbot reported a bug at [1] that is reproducible in upstream kernel
>> since
>> commit 47e4937a4a7c ("erofs: move erofs out of staging")
>>
>> and up to (inclusively)
>> commit 2bfab9c0edac ("erofs: record the longest decompressed size in this
>> round")
>>
>> The first commit that makes this bug go away is:
>> commit 267f2492c8f7 ("erofs: introduce multi-reference pclusters
>> (fully-referenced)")
>> Although, this commit looks like new support and not like an explicit
>> bug fix.
>>
>> I'd like to fix the lts kernels. I'm happy to try any suggestions or do
>> some tests. Please let me know if the bug rings a bell.
>
> Thanks for your report. I will try to seek time to look at this this
> weekend. But just from your description, I guess that there could be
> something wrong on several compressed extents pointing to the same
> blocks (i.e. the same pcluster). But prior to commit 267f2492c8f7, such
> image is always considered as corrupted images.
>
> Anyway, I will look into that and see if there could be alternative ways
> to fix this rather than backport the whole multi-reference pcluster
> feature. Yet I think no need to worry since such image is pretty
> crafted and should be used as normal images.

I guess to backport the multi-reference pcluster feature is not an
option for stable - just fixes are accepted. If you think it is worth
fixing the problem without adding new support, I can dive into it.
Let me know what you think.

Thanks,
ta

>
> Thanks,
> Gao Xiang
>
>>
>> Thanks,
>> ta
>>
>> [1]
>> https://syzkaller.appspot.com/bug?id=a9b56d324d0de9233ad80633826fac76836d792a

2022-12-21 02:51:02

by Gao Xiang

[permalink] [raw]
Subject: Re: BUG: unable to handle kernel paging request in z_erofs_decompress_queue

On Tue, Dec 20, 2022 at 05:42:21PM +0200, Tudor Ambarus wrote:
> Hi, Gao,
>
> On 15.12.2022 16:24, Gao Xiang wrote:
> > Hi Tudor,
> >
> > On Thu, Dec 15, 2022 at 02:58:10PM +0200, Tudor Ambarus wrote:
> > > Hi, Gao, Chao, Yue, Jeffle, all,
> > >
> > > Syzbot reported a bug at [1] that is reproducible in upstream kernel
> > > since
> > > commit 47e4937a4a7c ("erofs: move erofs out of staging")
> > >
> > > and up to (inclusively)
> > > commit 2bfab9c0edac ("erofs: record the longest decompressed size in this
> > > round")
> > >
> > > The first commit that makes this bug go away is:
> > > commit 267f2492c8f7 ("erofs: introduce multi-reference pclusters
> > > (fully-referenced)")
> > > Although, this commit looks like new support and not like an explicit
> > > bug fix.
> > >
> > > I'd like to fix the lts kernels. I'm happy to try any suggestions or do
> > > some tests. Please let me know if the bug rings a bell.
> >
> > Thanks for your report. I will try to seek time to look at this this
> > weekend. But just from your description, I guess that there could be
> > something wrong on several compressed extents pointing to the same
> > blocks (i.e. the same pcluster). But prior to commit 267f2492c8f7, such
> > image is always considered as corrupted images.
> >
> > Anyway, I will look into that and see if there could be alternative ways
> > to fix this rather than backport the whole multi-reference pcluster
> > feature. Yet I think no need to worry since such image is pretty
> > crafted and should be used as normal images.
>
> I guess to backport the multi-reference pcluster feature is not an
> option for stable - just fixes are accepted. If you think it is worth
> fixing the problem without adding new support, I can dive into it.
> Let me know what you think.

Thanks, I was quite busy these days. Partially due to my main part of
work is not only EROFS.

Even that I have some wild guess, if you have some interests, I
think you could use dump.erofs or filefrag -v to dump out related inode
extents (assuming that is root inode) and see if there are any strange
first.

That would be helpful for me to know where it could lead to this issue.

Thanks,
Gao Xiang