Hi, folks. Please CC me on replies, I'm not subscribed to the list. The
downstream bug report for this is
https://bugzilla.redhat.com/show_bug.cgi?id=2274770 .
I maintain Fedora's openQA instance - https://openqa.fedoraproject.org/
(openQA is an automated testing system which runs jobs on qemu VMs,
inputting keyboard and mouse events via VNC, and monitoring results via
screenshots and the serial console).
We have several tests that involve doing an install of Fedora with root
storage encrypted and then booting it. Some of these install enough
packages for us to hit the 'graphical' mode of plymouth (the bootsplash
manager thingy), so we see a graphical passphrase prompt like
https://openqa.fedoraproject.org/tests/2642868#step/_graphical_wait_login/3
; some are minimal installs, so we see a text prompt like
https://openqa.fedoraproject.org/tests/2642845#step/disk_guided_encrypted_postinstall/1
.
Recently I switched up our configuration so most of these tests run on
UEFI VMs (previously they mostly ran on BIOS VMs). When I did that, the
tests that hit the graphical prompt started failing frequently on Fedora
Rawhide. The tests that hit the text prompt do not seem to be affected.
At first I figured this was caused by a plymouth change, but some
testing indicates it's actually related to kernel version: it seems to
have been introduced in kernel 6.9. Fedora 40 uses kernel 6.8, so tests
on F40 are not usually affected by this, but I engineered some runs of
an affected test on an F40 install with kernel 6.9, and they hit the bug.
So to summarize, we hit the bug when all the following conditions are met:
* Running on UEFI qemu-kvm VM
* Graphical passphrase prompt encountered on boot
* Running kernel 6.9
When it sees the passphrase prompt, the test system types the correct
password. When the bug happens, this input seems to simply be ignored -
plymouth does not echo dots back to the screen representing the typed
characters, and on hitting enter the system does not attempt to proceed
with decryption. (Unfortunately this also means we don't get any logs
from the failure, as the test system needs a booted system to be able to
upload any logs).
Looking at results from the last month and a half, the bug happens on
about 30% of the tests run.
I have reproduced this manually in a similar VM, but did not yet manage
to reproduce it on hardware (which is unfortunate, as it'd make it
somewhat easier to attempt some kind of bisect).
The earliest build I can say for sure the bug happened with is
kernel-6.9.0-0.rc0.20240322git8e938e398669.14.fc41 .
--
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @[email protected]
https://www.happyassassin.net
On Fri, 2024-05-24 at 09:08 -0700, Adam Williamson wrote:
> Hi, folks. Please CC me on replies, I'm not subscribed to the list.
> The
> downstream bug report for this is
> https://bugzilla.redhat.com/show_bug.cgi?id=2274770 .
1) FYI : I see same thing booting dell xps 13 9320 laptop (no VM) on
mainline 6.10-rc1 (started sometime early 6.9 as far as I recall)
bios 2.11.0
cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
Boot using systemd-boot
dmesg (trimmed) attached.
2) Note there is a crash later in boot (in mei_csi_probe). Assume this
crash is a separate issue.
gene
On Mon, 2024-05-27 at 11:18 -0400, Genes Lists wrote:
>
> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
>
Sorry - wrong cpu - the correct cpu is:
13th Gen Intel(R) Core(TM) i7-1360P
Raptor Lake-P [Iris Xe Graphics] (rev 04)
--
Gene
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]
Side note: a bug report just to the LKML (e.g. w/o any subsystem lists
or maintainers) is unlikely to gain traction; I considered adding the
DRM folks, but lets try one thing first:
On 27.05.24 17:18, Genes Lists wrote:
> On Fri, 2024-05-24 at 09:08 -0700, Adam Williamson wrote:
>> Hi, folks. Please CC me on replies, I'm not subscribed to the list.
>> The
>> downstream bug report for this is
>> https://bugzilla.redhat.com/show_bug.cgi?id=2274770 .
>
> 1) FYI : I see same thing booting dell xps 13 9320 laptop (no VM) on
> mainline 6.10-rc1 (started sometime early 6.9 as far as I recall)
>
> bios 2.11.0
>
> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
> Boot using systemd-boot
>
> dmesg (trimmed) attached.
Does this happen every boot or only sometimes? Could you maybe upload
the full dmesg from a boot where things worked and one where only the
text UI came up? It's just a shot in the dark, but maybe that will tell
us where the root of the problem might be.
> 2) Note there is a crash later in boot (in mei_csi_probe). Assume this
> crash is a separate issue.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
P.S.: let me add this to the regression tracking
#regzbot report: /
#regzbot introduced: v6.8..v6.9
#regzbot summary: Intermittent inability to type in graphical Plymouth
on UEFI VMs since kernel 6.9
On Wed, 2024-05-29 at 15:01 +0200, Linux regression tracking (Thorsten
Leemhuis) wrote:
> >
> > cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
Sorry, this should be: 13th Gen Intel(R) Core(TM) i7-1360P
> >
>
> Does this happen every boot or only sometimes? Could you maybe upload
> the full dmesg from a boot where things worked and one where only the
For me it is every boot - the first few key strokes are accepted but no
asterisks are displayed - and it works fine even though fewer asterisks
are displayed than characters typed.
full dmesg attached.
>
> P.S.: let me add this to the regression tracking
>
> #regzbot report: /
> #regzbot introduced: v6.8..v6.9
> #regzbot summary: Intermittent inability to type in graphical
> Plymouth
> on UEFI VMs since kernel 6.9
>
Thank you.
--
Gene
On 29.05.24 15:35, Genes Lists wrote:
> On Wed, 2024-05-29 at 15:01 +0200, Linux regression tracking (Thorsten
> Leemhuis) wrote:
>>>
>>> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
> Sorry, this should be: 13th Gen Intel(R) Core(TM) i7-1360P
>>>
>>
>> Does this happen every boot or only sometimes? Could you maybe upload
>> the full dmesg from a boot where things worked and one where only the
> For me it is every boot
Ahh, good to know. Would you be able to bisect the problem? That would
help tremendously!
https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
> - the first few key strokes are accepted but no
> asterisks are displayed - and it works fine even though fewer asterisks
> are displayed than characters typed.
Strange.
> full dmesg attached.
Do you by chance also have a dmesg at hand for a boot where everything
worked normally?
Adam, do you maybe have dmesg output for the affected cases somewhere?
>> P.S.: let me add this to the regression tracking
>>
>> #regzbot report: /
>> #regzbot introduced: v6.8..v6.9
>> #regzbot summary: Intermittent inability to type in graphical
>> Plymouth
>> on UEFI VMs since kernel 6.9
>
> Thank you.
np; but without a bisecting or at least locating the subsystem that is
causing this we might not get any further. :-/
Ciao, Thorsten
On 2024-05-29 06:35, Genes Lists wrote:
> On Wed, 2024-05-29 at 15:01 +0200, Linux regression tracking (Thorsten
> Leemhuis) wrote:
>>>
>>> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
>
> Sorry, this should be: 13th Gen Intel(R) Core(TM) i7-1360P
>
>
>>>
>>
>> Does this happen every boot or only sometimes? Could you maybe upload
>> the full dmesg from a boot where things worked and one where only the
>
> For me it is every boot - the first few key strokes are accepted but no
> asterisks are displayed - and it works fine even though fewer asterisks
> are displayed than characters typed.
That sounds different from my case. In openQA (and the one time I saw it
live), the keystrokes do not appear to have any effect - no dots are
echoed at all, and hitting enter does not submit the passphrase.
I have no idea where to send emails reporting kernel bugs. It's a very
difficult world to penetrate if you're not already in it. A proper bug
tracker would make things much easier.
--
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @[email protected]
https://www.happyassassin.net
On 29.05.24 17:09, Adam Williamson wrote:
> On 2024-05-29 06:35, Genes Lists wrote:
>> On Wed, 2024-05-29 at 15:01 +0200, Linux regression tracking (Thorsten
>> Leemhuis) wrote:
>>>>
>>>> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
>>
>> Sorry, this should be: 13th Gen Intel(R) Core(TM) i7-1360P
>>>
>>> Does this happen every boot or only sometimes? Could you maybe upload
>>> the full dmesg from a boot where things worked and one where only the
>>
>> For me it is every boot - the first few key strokes are accepted but no
>> asterisks are displayed - and it works fine even though fewer asterisks
>> are displayed than characters typed.
>
> That sounds different from my case.
Hmm, bummer. That might have made things a lot easier...
> In openQA (and the one time I saw it
> live), the keystrokes do not appear to have any effect - no dots are
> echoed at all, and hitting enter does not submit the passphrase.
And no dmesg for working and non-working I suppose? Argh. :-/
> I have no idea where to send emails reporting kernel bugs. It's a very
> difficult world to penetrate
I totally agree so far.
> if you're not already in it.
Up to a point that's what I'm here for. But right now I'm a bit
uncertain who to involve. The input folks? The drm maintainers? But
without a bit more data I doubt any of them will take a closer look at
the problem.
> A proper bug tracker would make things much easier.
Not really I'd say, as the problem is the same here: someone needs to
triage bugs and assign them to developers that are willing to look into
them.
Ciao, Thorsten
On Wed, 2024-05-29 at 16:04 +0200, Linux regression tracking (Thorsten
Leemhuis) wrote:
>
> np; but without a bisecting or at least locating the subsystem that
>
Yep. I will set up luks + plymouth on a (different) machine first
instead of my primary laptop. If that reproduces the issue, then
bisect should be quite doable. Will take a little time but will work on
it soon.
--
Gene
On Wed, 2024-05-29 at 14:17 -0400, Genes Lists wrote:
> >
>
> Yep. I will set up luks + plymouth on a (different) machine first
> instead of my primary laptop.
Unfortunately the second (older machine) works fine. I rebooted
itmultiple times with mainline and 6.9.2 and they all worked as
expected every time.
So this will be more difficult and take longer but I will try and find
time to do a bisect on my primary laptop which shows the symptoms.
Gene
On 2024-05-29 10:12, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 29.05.24 17:09, Adam Williamson wrote:
>> On 2024-05-29 06:35, Genes Lists wrote:
>>> On Wed, 2024-05-29 at 15:01 +0200, Linux regression tracking (Thorsten
>>> Leemhuis) wrote:
>>>>>
>>>>> cpu i9-12900K / Raptor Lake-P [Iris Xe Graphics]
>>>
>>> Sorry, this should be: 13th Gen Intel(R) Core(TM) i7-1360P
>>>>
>>>> Does this happen every boot or only sometimes? Could you maybe upload
>>>> the full dmesg from a boot where things worked and one where only the
>>>
>>> For me it is every boot - the first few key strokes are accepted but no
>>> asterisks are displayed - and it works fine even though fewer asterisks
>>> are displayed than characters typed.
>>
>> That sounds different from my case.
>
> Hmm, bummer. That might have made things a lot easier...
Well, it turns out I can reproduce this fairly easily in a VM on my
regular laptop, which helps a bit. I've attached some logs to the
downstream bug at https://bugzilla.redhat.com/show_bug.cgi?id=2274770 ,
and Ray is taking a look at it ATM. More eyes appreciated.
--
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @[email protected]
https://www.happyassassin.net