Hi All,
Yesterday I got a new Lenovo ThinkPad X1 yoga gen 7 laptop, since I plan
to make this my new day to day laptop I have copied over the entire
rootfs, /home, etc. from my current laptop to avoid having to tweak
everything to my liking again.
This meant I had an initramfs generated for the other laptop. Which should
be fine since both are Intel machines and the old 5.19.y initramfs-es
worked fine. But 6.0.0 crashed with what seems like random memory
corruption (list integrity checks failing) until I regenerated the initrd ...
Comparing the old vs regenerated initrds showed no relevant differences,
which made me think this is a CPU ucode issue (which is pre-fixed
to the initrd for early microcode loading).
After some tests I have the following obeservations with 6.0.0:
1. The least stable is the old initrd (so with the wrong
ucode prefixed) this crashes before ever reaching gdm.
I believe that this is caused by late microcode loading
kicking in in this case (I though that was being removed?)
and doing load microcode loading on the i7-1260P with its
mix of P + E cores seems to seriously mess things up.
2. Slightly more stable, lasting at least a few minutes
before crashing is using dis_ucode_ldr
3. Using nomodeset seems to stabilize things even with
the old initrd with the wrong microcode prefixed
4. 5.19, with an old initrd and with normal modesetting
enabled works fine, so in a way this is a 6.0.0 regression
5. Using 6.0 with the new initrd with the new microcode
seems mostly stable, although sometimes this seems to
hang very early during boot, esp. if a previous boot
crashed and I have not run this for a long time yet.
6. After crashes it seems to be necessary to powercycle
the machine to get things back in working condition.
With 6.0 the following WARN triggers:
drivers/gpu/drm/i915/display/intel_bios.c:477:
drm_WARN(&i915->drm, min_size == 0,
"Block %d min_size is zero\n", section_id);
Since nomodeset helps this might be quite relevant, in 5.19.13
this does not happen, but I'm not sure if 5.19 has this check
at all.
There is a 2022/10/07 BIOS update which includes a CPU microcode
update available from Lenovo, I have not applied this yet in case
people want to investigate this further first.
Regards,
Hans
Hi,
On 10/13/22 22:33, Hans de Goede wrote:
> Hi All,
>
> Yesterday I got a new Lenovo ThinkPad X1 yoga gen 7 laptop, since I plan
> to make this my new day to day laptop I have copied over the entire
> rootfs, /home, etc. from my current laptop to avoid having to tweak
> everything to my liking again.
>
> This meant I had an initramfs generated for the other laptop. Which should
> be fine since both are Intel machines and the old 5.19.y initramfs-es
> worked fine. But 6.0.0 crashed with what seems like random memory
> corruption (list integrity checks failing) until I regenerated the initrd ...
>
> Comparing the old vs regenerated initrds showed no relevant differences,
> which made me think this is a CPU ucode issue (which is pre-fixed
> to the initrd for early microcode loading).
>
> After some tests I have the following obeservations with 6.0.0:
>
> 1. The least stable is the old initrd (so with the wrong
> ucode prefixed) this crashes before ever reaching gdm.
> I believe that this is caused by late microcode loading
> kicking in in this case (I though that was being removed?)
> and doing load microcode loading on the i7-1260P with its
> mix of P + E cores seems to seriously mess things up.
>
> 2. Slightly more stable, lasting at least a few minutes
> before crashing is using dis_ucode_ldr
>
> 3. Using nomodeset seems to stabilize things even with
> the old initrd with the wrong microcode prefixed
>
> 4. 5.19, with an old initrd and with normal modesetting
> enabled works fine, so in a way this is a 6.0.0 regression
>
> 5. Using 6.0 with the new initrd with the new microcode
> seems mostly stable, although sometimes this seems to
> hang very early during boot, esp. if a previous boot
> crashed and I have not run this for a long time yet.
>
> 6. After crashes it seems to be necessary to powercycle
> the machine to get things back in working condition.
>
>
> With 6.0 the following WARN triggers:
> drivers/gpu/drm/i915/display/intel_bios.c:477:
>
> drm_WARN(&i915->drm, min_size == 0,
> "Block %d min_size is zero\n", section_id);
>
> Since nomodeset helps this might be quite relevant, in 5.19.13
> this does not happen, but I'm not sure if 5.19 has this check
> at all.
>
>
> There is a 2022/10/07 BIOS update which includes a CPU microcode
> update available from Lenovo, I have not applied this yet in case
> people want to investigate this further first.
A quick update on this, the microcode being in the initrd or not
seems to be a bit of a red herring. Yesterday the machine crashed
twice at boot with 6.0.0 with an initrd which did correctly have
the alderlake microcode cpio archive prefixed.
Where as with 5.19 it boots correctly everytime. I will try to
make some time to git bisect this sometime next week. I expect
this is an i915 issue though since 6.0.0 with nomodeset on
the cmdline does seem to boot successfully every time.
Regards,
Hans
+ Jani and Ville for the intel_bios.c warn - no idea if that is relevant.
Hi,
On 15/10/2022 15:25, Hans de Goede wrote:
> Hi,
>
> On 10/13/22 22:33, Hans de Goede wrote:
>> Hi All,
>>
>> Yesterday I got a new Lenovo ThinkPad X1 yoga gen 7 laptop, since I plan
>> to make this my new day to day laptop I have copied over the entire
>> rootfs, /home, etc. from my current laptop to avoid having to tweak
>> everything to my liking again.
>>
>> This meant I had an initramfs generated for the other laptop. Which should
>> be fine since both are Intel machines and the old 5.19.y initramfs-es
>> worked fine. But 6.0.0 crashed with what seems like random memory
>> corruption (list integrity checks failing) until I regenerated the initrd ...
>>
>> Comparing the old vs regenerated initrds showed no relevant differences,
>> which made me think this is a CPU ucode issue (which is pre-fixed
>> to the initrd for early microcode loading).
>>
>> After some tests I have the following obeservations with 6.0.0:
>>
>> 1. The least stable is the old initrd (so with the wrong
>> ucode prefixed) this crashes before ever reaching gdm.
>> I believe that this is caused by late microcode loading
>> kicking in in this case (I though that was being removed?)
>> and doing load microcode loading on the i7-1260P with its
>> mix of P + E cores seems to seriously mess things up.
>>
>> 2. Slightly more stable, lasting at least a few minutes
>> before crashing is using dis_ucode_ldr
>>
>> 3. Using nomodeset seems to stabilize things even with
>> the old initrd with the wrong microcode prefixed
>>
>> 4. 5.19, with an old initrd and with normal modesetting
>> enabled works fine, so in a way this is a 6.0.0 regression
>>
>> 5. Using 6.0 with the new initrd with the new microcode
>> seems mostly stable, although sometimes this seems to
>> hang very early during boot, esp. if a previous boot
>> crashed and I have not run this for a long time yet.
>>
>> 6. After crashes it seems to be necessary to powercycle
>> the machine to get things back in working condition.
>>
>>
>> With 6.0 the following WARN triggers:
>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>
>> drm_WARN(&i915->drm, min_size == 0,
>> "Block %d min_size is zero\n", section_id);
>>
>> Since nomodeset helps this might be quite relevant, in 5.19.13
>> this does not happen, but I'm not sure if 5.19 has this check
>> at all.
>>
>>
>> There is a 2022/10/07 BIOS update which includes a CPU microcode
>> update available from Lenovo, I have not applied this yet in case
>> people want to investigate this further first.
>
> A quick update on this, the microcode being in the initrd or not
> seems to be a bit of a red herring. Yesterday the machine crashed
> twice at boot with 6.0.0 with an initrd which did correctly have
> the alderlake microcode cpio archive prefixed.
>
> Where as with 5.19 it boots correctly everytime. I will try to
> make some time to git bisect this sometime next week. I expect
> this is an i915 issue though since 6.0.0 with nomodeset on
> the cmdline does seem to boot successfully every time.
Maybe try with KASAN to see if it catches something before random list
corruption starts happening?
Regards,
Tvrtko
On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
> With 6.0 the following WARN triggers:
> drivers/gpu/drm/i915/display/intel_bios.c:477:
>
> drm_WARN(&i915->drm, min_size == 0,
> "Block %d min_size is zero\n", section_id);
What's the value of section_id that gets printed?
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
Hi,
On 10/17/22 10:30, Jani Nikula wrote:
> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>> With 6.0 the following WARN triggers:
>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>
>> drm_WARN(&i915->drm, min_size == 0,
>> "Block %d min_size is zero\n", section_id);
>
> What's the value of section_id that gets printed?
It is 42.
Regards,
Hans
On Mon, 17 Oct 2022, Jani Nikula <[email protected]> wrote:
> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>> With 6.0 the following WARN triggers:
>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>
>> drm_WARN(&i915->drm, min_size == 0,
>> "Block %d min_size is zero\n", section_id);
>
> What's the value of section_id that gets printed?
I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
Use hardcoded fp_timing size for generating LFP data pointers") in
v6.1-rc1.
I don't think this is the root cause for your issues, but I wonder if
you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
already too?
BR,
Jani.
[1] https://gitlab.freedesktop.org/drm/intel/-/issues/6592
--
Jani Nikula, Intel Open Source Graphics Center
Hi,
On 10/17/22 10:39, Jani Nikula wrote:
> On Mon, 17 Oct 2022, Jani Nikula <[email protected]> wrote:
>> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>>> With 6.0 the following WARN triggers:
>>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>>
>>> drm_WARN(&i915->drm, min_size == 0,
>>> "Block %d min_size is zero\n", section_id);
>>
>> What's the value of section_id that gets printed?
>
> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
> Use hardcoded fp_timing size for generating LFP data pointers") in
> v6.1-rc1.
>
> I don't think this is the root cause for your issues, but I wonder if
> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
> already too?
6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
crashed during boot (but not always).
Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
Any other commits which I can try before I go down the bisect route ?
(I'm assuming this will also affect other users, so we really need
to fix this for 6.0.x before it starts hitting Arch + Fedora users)
Regards,
Hans
> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/6592
CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
On 17.10.22 12:48, Hans de Goede wrote:
> On 10/17/22 10:39, Jani Nikula wrote:
>> On Mon, 17 Oct 2022, Jani Nikula <[email protected]> wrote:
>>> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>>>> With 6.0 the following WARN triggers:
>>>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>>>
>>>> drm_WARN(&i915->drm, min_size == 0,
>>>> "Block %d min_size is zero\n", section_id);
>>>
>>> What's the value of section_id that gets printed?
>>
>> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
>> Use hardcoded fp_timing size for generating LFP data pointers") in
>> v6.1-rc1.
>>
>> I don't think this is the root cause for your issues, but I wonder if
>> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
>> already too?
>
> 6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
> reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
> crashed during boot (but not always).
>
> Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
>
> Any other commits which I can try before I go down the bisect route ?
>
> (I'm assuming this will also affect other users, so we really need
> to fix this for 6.0.x
+1
> before it starts hitting Arch + Fedora users)
FWIW, I heard both openSUSE Tumbleweed and Arch switched to 6.0.y in the
past few days already.
Ciao, Thorsten
On Mon, 17 Oct 2022, Hans de Goede <[email protected]> wrote:
> Hi,
>
> On 10/17/22 10:39, Jani Nikula wrote:
>> On Mon, 17 Oct 2022, Jani Nikula <[email protected]> wrote:
>>> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>>>> With 6.0 the following WARN triggers:
>>>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>>>
>>>> drm_WARN(&i915->drm, min_size == 0,
>>>> "Block %d min_size is zero\n", section_id);
>>>
>>> What's the value of section_id that gets printed?
>>
>> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
>> Use hardcoded fp_timing size for generating LFP data pointers") in
>> v6.1-rc1.
>>
>> I don't think this is the root cause for your issues, but I wonder if
>> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
>> already too?
>
> 6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
> reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
> crashed during boot (but not always).
>
> Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
My guess is that d3a7051841f0 is a red herring. Sure, it's a warning
splat that would be nice to get fixed in v6.0, but I doubt it has
relevance to the problems you're seeing.
Cc: Ville, your thoughts?
> Any other commits which I can try before I go down the bisect route ?
Seems pretty vague I'm afraid. I know it's painful, but likely bisect is
the fastest way to pinpoint the issue and get at the root cause.
Also, filing a bug at [1] would help us get more attention.
BR,
Jani.
[1] https://gitlab.freedesktop.org/drm/intel/issues/new
>
> (I'm assuming this will also affect other users, so we really need
> to fix this for 6.0.x before it starts hitting Arch + Fedora users)
>
> Regards,
>
> Hans
>
>
>
>> [1] https://gitlab.freedesktop.org/drm/intel/-/issues/6592
>
--
Jani Nikula, Intel Open Source Graphics Center
Hi,
On 10/17/22 13:19, Thorsten Leemhuis wrote:
> CCing the regression mailing list, as it should be in the loop for all
> regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
Yes sorry about that I meant to Cc the regressions list, not you personally,
but the auto-completion picked the wrong address-book entry
(and I did not notice this).
> On 17.10.22 12:48, Hans de Goede wrote:
>> On 10/17/22 10:39, Jani Nikula wrote:
>>> On Mon, 17 Oct 2022, Jani Nikula <[email protected]> wrote:
>>>> On Thu, 13 Oct 2022, Hans de Goede <[email protected]> wrote:
>>>>> With 6.0 the following WARN triggers:
>>>>> drivers/gpu/drm/i915/display/intel_bios.c:477:
>>>>>
>>>>> drm_WARN(&i915->drm, min_size == 0,
>>>>> "Block %d min_size is zero\n", section_id);
>>>>
>>>> What's the value of section_id that gets printed?
>>>
>>> I'm guessing this is [1] fixed by commit d3a7051841f0 ("drm/i915/bios:
>>> Use hardcoded fp_timing size for generating LFP data pointers") in
>>> v6.1-rc1.
>>>
>>> I don't think this is the root cause for your issues, but I wonder if
>>> you could try v6.1-rc1 or drm-tip and see if we've fixed the other stuff
>>> already too?
>>
>> 6.1-rc1 indeed does not trigger the drm_WARN and for now (couple of
>> reboots, running for 5 minutes now) it seems stable. 6.0.0 usually
>> crashed during boot (but not always).
>>
>> Do you think it would be worthwhile to try 6.0.0 with d3a7051841f0 ?
So I have been trying 6.0.0 with d3a7051841f0 doing a whole bunch of
reboots + general use and that seems stable, then I reverted it and
the very first boot of the kernel with that broke again, so I'm
pretty sure that d3a7051841f0 fixes things.
So d3a7051841f0 seems to do more then just fix the WARN().
So lets try to get d3a7051841f0 added to the official stable series
ASAP (I just noticed that Mark Pearson from Lenovo has already added it
to Fedora's 6.0.2 build.
Regards,
Hans