2020-09-28 18:05:15

by Nick Terrell

[permalink] [raw]
Subject: Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1



> On Sep 28, 2020, at 1:55 AM, Feng Tang <[email protected]> wrote:
>
> Hi Nick,
>
> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
> build), and it could be related with ZSTD code, though initially we bisected
> to some other commits.
>
> The error messages are:
>
> early console in setup code
> Wrong EFI loader signature.
> early console in extract_kernel
> input_data: 0x046f50b4
> input_len: 0x01ebbeb6
> output: 0x01000000
> output_len: 0x04fc535c
> kernel_total_size: 0x055f5000
> needed_size: 0x055f5000
>
> Decompressing Linux...
>
> ZSTD-compressed data is corrupt
>
> This could be reproduced by compiling the kernel with attached config,
> and use QEMU to boot it.
>
> We suspect it could be related with the kernel size, as we only see
> it on big kernel, and some more info are:
>
> * If we remove a lot of kernel config to build a much smaller kernel,
> it will boot fine
>
> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
> boot fine with below patch
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 3962f59..8fe71ba 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -147,7 +147,7 @@ $(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
> $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
> - $(call if_changed,zstd22)
> + $(call if_changed,zstd)
>
>
> Please let me know if you need more info, and sorry for the late report
> as we just tracked down to this point.

Thanks for the report, I will look into it today.

Best,
Nick

> Thanks,
> Feng
>
>
>
> <zstd_x86_32.config>


2020-09-29 05:18:57

by Nick Terrell

[permalink] [raw]
Subject: Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1



> On Sep 28, 2020, at 11:02 AM, Nick Terrell <[email protected]> wrote:
>
>
>
>> On Sep 28, 2020, at 1:55 AM, Feng Tang <[email protected]> wrote:
>>
>> Hi Nick,
>>
>> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
>> build), and it could be related with ZSTD code, though initially we bisected
>> to some other commits.
>>
>> The error messages are:
>>
>> early console in setup code
>> Wrong EFI loader signature.
>> early console in extract_kernel
>> input_data: 0x046f50b4
>> input_len: 0x01ebbeb6
>> output: 0x01000000
>> output_len: 0x04fc535c
>> kernel_total_size: 0x055f5000
>> needed_size: 0x055f5000
>>
>> Decompressing Linux...
>>
>> ZSTD-compressed data is corrupt
>>
>> This could be reproduced by compiling the kernel with attached config,
>> and use QEMU to boot it.
>>
>> We suspect it could be related with the kernel size, as we only see
>> it on big kernel, and some more info are:
>>
>> * If we remove a lot of kernel config to build a much smaller kernel,
>> it will boot fine
>>
>> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
>> boot fine with below patch
>>
>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>> index 3962f59..8fe71ba 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -147,7 +147,7 @@ $(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
>> $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
>> - $(call if_changed,zstd22)
>> + $(call if_changed,zstd)
>>
>>
>> Please let me know if you need more info, and sorry for the late report
>> as we just tracked down to this point.
>
> Thanks for the report, I will look into it today.

CC: Petr Malat

I’ve successfully reproduced, and found the issue. It turns out that this
patch [0] from Petr Malat fixes the issue. As I mentioned in that thread, his
fix corresponds to this upstream commit [1].

Can we get Petr's patch merged into v5.9?

This bug only happens when the window size is > 8 MB. A non-kernel workaround
would be to compress the kernel level 19 instead of level 22, which uses an
8 MB window size, instead of a 128 MB window size.

The reason it only shows up for large kernels, is that the code is only buggy
when an offset > 8 MB is used, so a kernel <= 8 MB can't trigger the bug.

Best,
Nick

[0] https://lkml.org/lkml/2020/9/14/94
[1] https://github.com/facebook/zstd/commit/8a5c0c98ae5a7884694589d7a69bc99011add94d

> Best,
> Nick
>
>> Thanks,
>> Feng
>>
>>
>>
>> <zstd_x86_32.config>

2020-09-29 05:48:59

by Feng Tang

[permalink] [raw]
Subject: Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1

On Tue, Sep 29, 2020 at 05:15:38AM +0000, Nick Terrell wrote:
>
>
> > On Sep 28, 2020, at 11:02 AM, Nick Terrell <[email protected]> wrote:
> >
> >
> >
> >> On Sep 28, 2020, at 1:55 AM, Feng Tang <[email protected]> wrote:
> >>
> >> Hi Nick,
> >>
> >> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
> >> build), and it could be related with ZSTD code, though initially we bisected
> >> to some other commits.
> >>
> >> The error messages are:
> >> Decompressing Linux...
> >>
> >> ZSTD-compressed data is corrupt
> >>
> >> This could be reproduced by compiling the kernel with attached config,
> >> and use QEMU to boot it.
> >>
> >> We suspect it could be related with the kernel size, as we only see
> >> it on big kernel, and some more info are:
> >>
> >> * If we remove a lot of kernel config to build a much smaller kernel,
> >> it will boot fine
> >>
> >> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
> >> boot fine with below patch
> >>
> >> Please let me know if you need more info, and sorry for the late report
> >> as we just tracked down to this point.
> >
> > Thanks for the report, I will look into it today.
>
> CC: Petr Malat
>
> I’ve successfully reproduced, and found the issue. It turns out that this
> patch [0] from Petr Malat fixes the issue. As I mentioned in that thread, his
> fix corresponds to this upstream commit [1].

Glad to know there is already a fix.

> Can we get Petr's patch merged into v5.9?
>
> This bug only happens when the window size is > 8 MB. A non-kernel workaround
> would be to compress the kernel level 19 instead of level 22, which uses an
> 8 MB window size, instead of a 128 MB window size.
>
> The reason it only shows up for large kernels, is that the code is only buggy
> when an offset > 8 MB is used, so a kernel <= 8 MB can't trigger the bug.
>
> Best,
> Nick
>
> [0] https://lkml.org/lkml/2020/9/14/94

With this patch, all the failed cases on my side could boot fine.

Tested-by: Feng Tang <[email protected]>

Thanks,
Feng

> [1] https://github.com/facebook/zstd/commit/8a5c0c98ae5a7884694589d7a69bc99011add94d


2020-10-03 18:51:21

by Sedat Dilek

[permalink] [raw]
Subject: Re: PROBLEM: zstd bzImage decompression fails for some x86_32 config on 5.9-rc1

On Tue, Sep 29, 2020 at 7:47 AM Feng Tang <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 05:15:38AM +0000, Nick Terrell wrote:
> >
> >
> > > On Sep 28, 2020, at 11:02 AM, Nick Terrell <[email protected]> wrote:
> > >
> > >
> > >
> > >> On Sep 28, 2020, at 1:55 AM, Feng Tang <[email protected]> wrote:
> > >>
> > >> Hi Nick,
> > >>
> > >> 0day has found some kernel decomprssion failure case since 5.9-rc1 (X86_32
> > >> build), and it could be related with ZSTD code, though initially we bisected
> > >> to some other commits.
> > >>
> > >> The error messages are:
> > >> Decompressing Linux...
> > >>
> > >> ZSTD-compressed data is corrupt
> > >>
> > >> This could be reproduced by compiling the kernel with attached config,
> > >> and use QEMU to boot it.
> > >>
> > >> We suspect it could be related with the kernel size, as we only see
> > >> it on big kernel, and some more info are:
> > >>
> > >> * If we remove a lot of kernel config to build a much smaller kernel,
> > >> it will boot fine
> > >>
> > >> * If we change the zstd algorithm from zstd22 to zstd19, the kernel will
> > >> boot fine with below patch
> > >>
> > >> Please let me know if you need more info, and sorry for the late report
> > >> as we just tracked down to this point.
> > >
> > > Thanks for the report, I will look into it today.
> >
> > CC: Petr Malat
> >
> > I’ve successfully reproduced, and found the issue. It turns out that this
> > patch [0] from Petr Malat fixes the issue. As I mentioned in that thread, his
> > fix corresponds to this upstream commit [1].
>
> Glad to know there is already a fix.
>
> > Can we get Petr's patch merged into v5.9?
> >
> > This bug only happens when the window size is > 8 MB. A non-kernel workaround
> > would be to compress the kernel level 19 instead of level 22, which uses an
> > 8 MB window size, instead of a 128 MB window size.
> >
> > The reason it only shows up for large kernels, is that the code is only buggy
> > when an offset > 8 MB is used, so a kernel <= 8 MB can't trigger the bug.
> >
> > Best,
> > Nick
> >
> > [0] https://lkml.org/lkml/2020/9/14/94
>
> With this patch, all the failed cases on my side could boot fine.
>
> Tested-by: Feng Tang <[email protected]>
>

I applied this patch to see if it is OK with x86 64bit - Yes, it is.

Feel free to add my:

Tested-by: Sedat Dilek <[email protected]>

- Sedat -

> Thanks,
> Feng
>
> > [1] https://github.com/facebook/zstd/commit/8a5c0c98ae5a7884694589d7a69bc99011add94d
>
>