2023-09-15 11:59:32

by Kyle Sanderson

[permalink] [raw]
Subject: Linux 6.1.52 regression: Intel QAT kernel panic (memory corruption)

Hello Intel QAT Maintainers,

It looks like QAT has regressed again. The present symptom is just
straight up memory corruption. I was running Canonical 6.1.0-1017-oem
and it doesn't happen, with 6.1.0-1020-oem and 6.1.0-1021-oem it does.
I don't know what these map to upstream, however with NixOS installed
the same corruption failure occurs on 6.1.52. The stack traces give
illegal instructions and all kinds of badness across all modules when
the device is simply present on the system, resulting in a hung
system, or a multitude of processes crashing and the system failing to
start. Disabling the device in the system BIOS results in a working
system, and no extreme corruption. kmem_cache_alloc_node is the common
fixture in the traces (I don't have a serial line), but I suspect
that's not where the problem is. The corruption this time happens
without block crypto being involved, and simply booting the installer
from a USB stick.

I genuinely do not understand how this keeps breaking in these
critical contexts completely hosing the machine. I was under the
impression it was disabled on this box the entire time since the last
run-in with this, but it must have gotten flipped back on when I did a
firmware update back in December 2022.

Kyle.