2023-09-15 05:55:40

by Cabiddu, Giovanni

[permalink] [raw]
Subject: Re: Linux 6.1.52 regression: Intel QAT kernel panic (memory corruption)

On Thu, Sep 14, 2023 at 10:27:22PM -0700, Kyle Sanderson wrote:
> Hello Intel QAT Maintainers,
>
> It looks like QAT has regressed again. The present symptom is just
> straight up memory corruption. I was running Canonical 6.1.0-1017-oem
> and it doesn't happen, with 6.1.0-1020-oem and 6.1.0-1021-oem it does.
> I don't know what these map to upstream, however with NixOS installed
> the same corruption failure occurs on 6.1.52. The stack traces give
> illegal instructions and all kinds of badness across all modules when
> the device is simply present on the system, resulting in a hung
> system, or a multitude of processes crashing and the system failing to
> start. Disabling the device in the system BIOS results in a working
> system, and no extreme corruption. kmem_cache_alloc_node is the common
> fixture in the traces (I don't have a serial line), but I suspect
> that's not where the problem is. The corruption this time happens
> without block crypto being involved, and simply booting the installer
> from a USB stick.
This is probably be related to [1].
Versions from 6.1.39 to 6.1.52 are affected. Fixed in v6.1.53.

[1] https://www.spinics.net/lists/stable/msg678947.html

Regards,

--
Giovanni


2023-10-05 14:00:01

by Kyle Sanderson

[permalink] [raw]
Subject: Re: Linux 6.1.52 regression: Intel QAT kernel panic (memory corruption)

On Thu, Sep 14, 2023 at 10:55 PM Giovanni Cabiddu
<[email protected]> wrote:
>
> On Thu, Sep 14, 2023 at 10:27:22PM -0700, Kyle Sanderson wrote:
> > Hello Intel QAT Maintainers,
> >
> > It looks like QAT has regressed again. The present symptom is just
> > straight up memory corruption. I was running Canonical 6.1.0-1017-oem
> > and it doesn't happen, with 6.1.0-1020-oem and 6.1.0-1021-oem it does.
> > I don't know what these map to upstream, however with NixOS installed
> > the same corruption failure occurs on 6.1.52.
> This is probably be related to [1].
> Versions from 6.1.39 to 6.1.52 are affected. Fixed in v6.1.53.
>
> [1] https://www.spinics.net/lists/stable/msg678947.html
>
> Regards,
>
> --
> Giovanni

Thank you Giovanni - that appears to have been it. Ubuntu
6.1.0-1023-oem (v6.1.53) no longer reproduces the issue.

K.

On Thu, Sep 14, 2023 at 10:55 PM Giovanni Cabiddu
<[email protected]> wrote:
>
> On Thu, Sep 14, 2023 at 10:27:22PM -0700, Kyle Sanderson wrote:
> > Hello Intel QAT Maintainers,
> >
> > It looks like QAT has regressed again. The present symptom is just
> > straight up memory corruption. I was running Canonical 6.1.0-1017-oem
> > and it doesn't happen, with 6.1.0-1020-oem and 6.1.0-1021-oem it does.
> > I don't know what these map to upstream, however with NixOS installed
> > the same corruption failure occurs on 6.1.52. The stack traces give
> > illegal instructions and all kinds of badness across all modules when
> > the device is simply present on the system, resulting in a hung
> > system, or a multitude of processes crashing and the system failing to
> > start. Disabling the device in the system BIOS results in a working
> > system, and no extreme corruption. kmem_cache_alloc_node is the common
> > fixture in the traces (I don't have a serial line), but I suspect
> > that's not where the problem is. The corruption this time happens
> > without block crypto being involved, and simply booting the installer
> > from a USB stick.
> This is probably be related to [1].
> Versions from 6.1.39 to 6.1.52 are affected. Fixed in v6.1.53.
>
> [1] https://www.spinics.net/lists/stable/msg678947.html
>
> Regards,
>
> --
> Giovanni