Subject: Linux regressions report for mainline [2024-02-25]

Hi Linus, things look mostly normal from my point of view and a few
fixes for tracked issues are heading your way today or next week.
Nevertheless let me mention one issue where I fear that they might not
be fixed before the release:

* Decreased network outgoing speed due to irq sharing that started with
f977f4c9301c8a ("xhci: add handler for only one interrupt line"); the
report is already 22 days old and the discussion is slowly progressing
with no fix in sight (tglx was only brought in recently). Will prod
tomorrow, maybe that will help.
https://lore.kernel.org/lkml/CABXGCsNnUfCCYVSb_-j-a-cAdONu1r6Fe8p2OtQ5op_wskOfpw@mail.gmail.com/
https://lore.kernel.org/lkml/[email protected]/

Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
X13s, but he send out patch series to fix some or all of those[1], so
with a bit of luck those issues will soon be fixed as well.
https://lore.kernel.org/lkml/[email protected]/

[1]
https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

Ciao, Thorsten

---

Hi, this is regzbot, the Linux kernel regression tracking bot.

Currently I'm aware of 8 regressions in linux-mainline. Find the
current status below and the latest on the web:

https://linux-regtracking.leemhuis.info/regzbot/mainline/

Bye bye, hope to see you soon for the next report.
Regzbot (on behalf of Thorsten Leemhuis)


======================================================
current cycle (v6.7.. aka v6.8-rc), culprit identified
======================================================


MXSFB error: -ENODEV: Cannot connect bridge
-------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/34yzygh3mbwpqr2re7nxmhyxy3s7qmqy4vhxvoyxnoguktriur@z66m7gvpqlia/
https://lore.kernel.org/lkml/34yzygh3mbwpqr2re7nxmhyxy3s7qmqy4vhxvoyxnoguktriur@z66m7gvpqlia/
https://lore.kernel.org/regressions/20240214110822.GA81133@francesco-nb/

By Hiago De Franco and Francesco Dolcini; 16 days ago; 7 activities, latest 0 days ago.
Introduced in edbbae7fba49 (v6.8-rc1)

Recent activities from: Shawn Guo (1), Francesco Dolcini (1)

One patch associated with this regression:
* Re: MXSFB error: -ENODEV: Cannot connect bridge
https://lore.kernel.org/lkml/20240214105223.GA78582@francesco-nb/
11 days ago, by Francesco Dolcini

Noteworthy links:
* [PATCH v1] ARM: dts: imx7: remove DSI port endpoints
https://lore.kernel.org/lkml/[email protected]/
9 days ago, by Francesco Dolcini; thread monitored.


sched/cpufreq: reduced maximum CPU frequency is ignored.
--------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/lkml/002f01da5ba0%2449cbf810%24dd63e830%[email protected]/

By Doug Smythies; 15 days ago; 15 activities, latest 0 days ago.
Introduced in 9c0b4bb7f630 (v6.8-rc1)

Fix incoming:
* cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
https://lore.kernel.org/regressions/[email protected]/


usb: typec: boot issues on rk3399-roc-pc due to revert
------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

By Mark Brown; 16 days ago; 12 activities, latest 2 days ago.
Introduced in b717dfbf73e8 (v6.8-rc3)

Fix incoming:
* usb: typec: tpcm: Fix issues with power being removed during reset
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=69f89168b310878be82d7d97bc0d22068ad858c0


[ *NEW* ] drm/msm/dp: runtime PM cause internal eDP display on the Lenovo ThinkPad X13s to not always show up on boot (2/2 regressions)
---------------------------------------------------------------------------------------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/regressions/[email protected]/

By Johan Hovold; 2 days ago; 5 activities, latest 2 days ago.
Introduced in 5814b8bf086a (v6.8-rc1)

Recent activities from: Johan Hovold (3), Linux regression tracking
#adding (Thorsten Leemhuis) (1), Abhinav Kumar (1)

Noteworthy links:
* drm/msm: Second DisplayPort regression in 6.8-rc1
https://lore.kernel.org/lkml/[email protected]/
6 days ago, by Johan Hovold; thread monitored.


`lis3lv02d_i2c_suspend()` causes `unbalanced disables for regulator-dummy` and `Failed to disable Vdd_IO: -EIO`
---------------------------------------------------------------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

By Paul Menzel; 23 days ago; 10 activities, latest 3 days ago.
Introduced in 2f189493ae32 (v6.8-rc1)

Recent activities from: Hans de Goede (6)

Noteworthy links:
* [PATCH regression fix] misc: lis3lv02d_i2c: Fix regulators getting en-/dis-abled twice on suspend/resume
https://lore.kernel.org/lkml/[email protected]/
4 days ago, by Hans de Goede; thread monitored.


[ *NEW* ] irq/net/usb: performance decrease now that network device and xhci share IRQs
---------------------------------------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/CABXGCsNnUfCCYVSb_-j-a-cAdONu1r6Fe8p2OtQ5op_wskOfpw@mail.gmail.com/
https://lore.kernel.org/lkml/CABXGCsNnUfCCYVSb_-j-a-cAdONu1r6Fe8p2OtQ5op_wskOfpw@mail.gmail.com/

By Mikhail Gavrilov; 22 days ago; 21 activities, latest 3 days ago.
Introduced in f977f4c9301c (v6.8-rc1)

Recent activities from: Randy Dunlap (2), Mikhail Gavrilov (2), Mathias
Nyman (1)


drm as well as soc: qcom: internal eDP display on the Lenovo ThinkPad X13s does not always show up on boot
----------------------------------------------------------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

By Johan Hovold; 12 days ago; 13 activities, latest 4 days ago.
Introduced in 2bcca96abfbf (v6.8-rc1)

Fix incoming:
* soc: qcom: pmic_glink_altmode: fix drm bridge use-after-free
https://lore.kernel.org/regressions/[email protected]/


sc7180-trogdor-lazor image corruption regression for USB-C DP Alt Mode ([PATCH 0/2] Add param for the highest bank bit)
-----------------------------------------------------------------------------------------------------------------------
https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
https://lore.kernel.org/regressions/[email protected]/

By Leonard Lausen; 7 days ago; 4 activities, latest 6 days ago.
Introduced in 8814455a0e54 (v6.8-rc1)

One patch associated with this regression:
* Re: [REGRESSION] sc7180-trogdor-lazor image corruption regression for USB-C DP Alt Mode ([PATCH 0/2] Add param for the highest bank bit)
https://lore.kernel.org/regressions/[email protected]/
7 days ago, by Dmitry Baryshkov


=============
End of report
=============

All regressions marked '[ *NEW* ]' were added since the previous report,
which can be found here:
https://lore.kernel.org/r/[email protected]

Thanks for your attention, have a nice day!

Regzbot, your hard working Linux kernel regression tracking robot


P.S.: Wanna know more about regzbot or how to use it to track regressions
for your subsystem? Then check out the getting started guide or the
reference documentation:

https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The short version: if you see a regression report you want to see
tracked, just send a reply to the report where you Cc
[email protected] with a line like this:

#regzbot introduced: v5.13..v5.14-rc1

If you want to fix a tracked regression, just do what is expected
anyway: add a 'Link:' tag with the url to the report, e.g.:

Link: https://lore.kernel.org/all/[email protected]/


Subject: Re: Linux regressions report for mainline [2024-02-25]

On 25.02.24 14:21, Regzbot (on behalf of Thorsten Leemhuis) wrote:
> Hi Linus, things look mostly normal from my point of view and a few
> fixes for tracked issues are heading your way today or next week.
> Nevertheless let me mention one issue where I fear that they might not
> be fixed before the release:
>
> * Decreased network outgoing speed due to irq sharing that started with
> [...]

Sorry, forgot something: there is a patch to fix a ntfs3 build problem
that was posted 10+ days ago[1] that didn't get any reaction from the
ntfs3 maintainer at all. Given the history of occasional slow responses
for that subsystem I thought I'd let you know in case you want to pick
the fix up directly; but if you do, consider using v2 of the patch[2].

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Ciao, Thorsten

2024-02-26 17:34:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Linux regressions report for mainline [2024-02-25]

On Sun, 25 Feb 2024 at 06:21, Linux regression tracking (Thorsten
Leemhuis) <[email protected]> wrote:
>
> Sorry, forgot something: there is a patch to fix a ntfs3 build problem
> that was posted 10+ days ago[1] that didn't get any reaction from the
> ntfs3 maintainer at all. Given the history of occasional slow responses
> for that subsystem I thought I'd let you know in case you want to pick
> the fix up directly; but if you do, consider using v2 of the patch[2].

Ack. Picked up directly.

Linus

2024-02-27 10:49:01

by Johan Hovold

[permalink] [raw]
Subject: Re: Linux regressions report for mainline [2024-02-25]

On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:

> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
> X13s, but he send out patch series to fix some or all of those[1], so
> with a bit of luck those issues will soon be fixed as well.
> https://lore.kernel.org/lkml/[email protected]/

> [1]
> https://lore.kernel.org/lkml/[email protected]/

This series addresses a use-after-free in the PMIC glink driver caused
by DRM bridge changes in rc1 and which can result in the internal
display not showing up on boot.

The DRM/SoC fixes here have now been merged to drm-misc for 6.8.

> https://lore.kernel.org/lkml/[email protected]/

This series is unrelated to the DRM regressions and deals with PCIe ASPM
issues.

> [ *NEW* ] drm/msm/dp: runtime PM cause internal eDP display on the Lenovo ThinkPad X13s to not always show up on boot (2/2 regressions)
> ---------------------------------------------------------------------------------------------------------------------------------------
> https://linux-regtracking.leemhuis.info/regzbot/regression/lore/[email protected]/
> https://lore.kernel.org/regressions/[email protected]/
>
> By Johan Hovold; 2 days ago; 5 activities, latest 2 days ago.
> Introduced in 5814b8bf086a (v6.8-rc1)

> Noteworthy links:
> * drm/msm: Second DisplayPort regression in 6.8-rc1
> https://lore.kernel.org/lkml/[email protected]/
> 6 days ago, by Johan Hovold; thread monitored.

This regression is the most severe one as it triggers hard resets on
boot occasionally (I just tried updating the regzbot title).

This turned out to be due to a long-standing issue in a Qualcomm PM
domain driver. Bjorn A posted a fix over night which addresses this:

https://lore.kernel.org/linux-arm-msm/[email protected]/T/#u

But also with these fixes, there are still a couple of regressions
related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
separate reports to track those.

Johan

2024-02-27 11:57:02

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Linux regressions report for mainline [2024-02-25]

On 26.02.24 18:33, Linus Torvalds wrote:
> On Sun, 25 Feb 2024 at 06:21, Linux regression tracking (Thorsten
> Leemhuis) <[email protected]> wrote:
>>
>> Sorry, forgot something: there is a patch to fix a ntfs3 build problem
>> that was posted 10+ days ago[1] that didn't get any reaction from the
>> ntfs3 maintainer at all. Given the history of occasional slow responses
>> for that subsystem I thought I'd let you know in case you want to pick
>> the fix up directly; but if you do, consider using v2 of the patch[2].
>
> Ack. Picked up directly.

Thx!

BTW, let me quickly mention two somewhat tricky regressions where I'm
unsure if they are handled how you want them to be handled.

* Multiple users were changing the minimum power limit of their Radeon
graphic cards to reduce the power consumption. Since 1958946858a62b
("drm/amd/pm: Support for getting power1_cap_min value") [v6.7-rc1] they
are unable to go as low as before, as amdgpu now respects a lower-bound
power limit ignored earlier. For details see:
https://lore.kernel.org/regressions/[email protected]/
https://gitlab.freedesktop.org/drm/amd/-/issues/3183
https://gitlab.freedesktop.org/drm/amd/-/issues/3137
https://gitlab.freedesktop.org/drm/amd/-/issues/2992#note_2247003

There was the idea to introduce a module parameter (see the lore
discussion linked above) to allow users what they were able to do
before. The amdgpu developers don't want to go down that path.


* Mikhail Gavrilov reported decreased network outgoing performance
caused by f977f4c9301c8a ("xhci: add handler for only one interrupt
line") [v6.8-rc1]: down from ~97-110MB/sec to 66-70MB/sec. Turns out
this is caused by the network device and xhci (USB) drivers now sharing
interrupts:
https://lore.kernel.org/lkml/CABXGCsNnUfCCYVSb_-j-a-cAdONu1r6Fe8p2OtQ5op_wskOfpw@mail.gmail.com/

Mathias Nyman said that "Mikhail got unlucky" and Tglx called it a
"unfortunate coincidence"; both do not see a need for fix.


In both cases I up to a point can totally understand the point of view
of the developers that handle the situation; at the same time I was
unsure if those situation are handled as you want them to be handled.
That's why I brought them up here. If I don't hear anything from you
I'll assume everything is fine the way it is and will stop tracking both
regressions.

Ciao, Thorsten

Subject: Re: Linux regressions report for mainline [2024-02-25]

On 27.02.24 11:20, Johan Hovold wrote:
> On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
>
>> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
>> X13s, but he send out patch series to fix some or all of those[1], so
>> with a bit of luck those issues will soon be fixed as well.
>> https://lore.kernel.org/lkml/[email protected]/
>
>> [1]
>> https://lore.kernel.org/lkml/[email protected]/
>
> This series addresses a use-after-free in the PMIC glink driver caused
> [...]

Thx for the updates. Due to the various issues I got a bit lost here,
but seems you are at least somewhat on top of this and fixes are in sight.

> (I just tried updating the regzbot title).

Great (and yes, that worked).
> But also with these fixes, there are still a couple of regressions
> related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
> separate reports to track those.

Thx again.

Ciao, Thorsten

Subject: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

[dropping Linus from CC, we can add him back later when needed]

On 27.02.24 11:20, Johan Hovold wrote:
> On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
>
>> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
>> X13s, but he send out patch series to fix some or all of those[1], so
>> with a bit of luck those issues will soon be fixed as well.
>> https://lore.kernel.org/lkml/[email protected]/

As 6.8 final might be just five days away, could you please help me out
with a short status update wrt. unresolved regressions from your side if
you have a minute? It's easy to get lost in all those issues. :-/ :-D

>> [1]
>> https://lore.kernel.org/lkml/[email protected]/
>
> This series addresses a use-after-free in the PMIC glink driver caused
> by DRM bridge changes in rc1 and which can result in the internal
> display not showing up on boot.
>
> The DRM/SoC fixes here have now been merged to drm-misc for 6.8.

What about the others from that series? Can they wait till 6.9? Or are
they on track for 6.8?

> [...]

> But also with these fixes, there are still a couple of regressions
> related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
> separate reports to track those.

Any decision yet if they are going to be reverted for now?

Am I right assuming those would fix
https://lore.kernel.org/lkml/[email protected]/
which did not get even a single reply?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2024-03-05 14:12:37

by Johan Hovold

[permalink] [raw]
Subject: Re: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

[ +CC: Vinod, Bjorn, Abhinav ]

On Tue, Mar 05, 2024 at 10:33:39AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> [dropping Linus from CC, we can add him back later when needed]
>
> On 27.02.24 11:20, Johan Hovold wrote:
> > On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
> >
> >> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
> >> X13s, but he send out patch series to fix some or all of those[1], so
> >> with a bit of luck those issues will soon be fixed as well.
> >> https://lore.kernel.org/lkml/[email protected]/
>
> As 6.8 final might be just five days away, could you please help me out
> with a short status update wrt. unresolved regressions from your side if
> you have a minute? It's easy to get lost in all those issues. :-/ :-D

Heh. Indeed. It's been a rough cycle. :)

> >> [1]
> >> https://lore.kernel.org/lkml/[email protected]/
> >
> > This series addresses a use-after-free in the PMIC glink driver caused
> > by DRM bridge changes in rc1 and which can result in the internal
> > display not showing up on boot.
> >
> > The DRM/SoC fixes here have now been merged to drm-misc for 6.8.
>
> What about the others from that series? Can they wait till 6.9? Or are
> they on track for 6.8?

Vinod, the PHY maintainer, just told me he will try to get them into
6.8.

> > But also with these fixes, there are still a couple of regressions
> > related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
> > separate reports to track those.
>
> Any decision yet if they are going to be reverted for now?
>
> Am I right assuming those would fix
> https://lore.kernel.org/lkml/[email protected]/
> which did not get even a single reply?

That was the hope, but I've managed to trigger a reset on disconnect
once also with the runtime PM series reverted.

One of the patches from that series has already been reverted (to fix
the VT console hotplug regression) and there is some indication that
that was sufficient to address the issue with hotplug not being detected
in X/Wayland too. I'm waiting for confirmation from some users that have
not been able to use their external displays at all since 6.8-rc1, but
it does seem to fix the X/Wayland issues I could reproduce here.

But either way, the reset on disconnect is still there, and have since
been reproduced by Bjorn also on another Qualcomm platform without a
hypervisor so that we've now got a call stack. I've heard that Abhinav
is looking into that, but I don't know if there's any chance to have a
fix ready this week.

Johan

Subject: Re: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

On 05.03.24 14:51, Johan Hovold wrote:
> [ +CC: Vinod, Bjorn, Abhinav ]
>
> On Tue, Mar 05, 2024 at 10:33:39AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [dropping Linus from CC, we can add him back later when needed]
>>
>> On 27.02.24 11:20, Johan Hovold wrote:
>>> On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
>>>
>>>> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
>>>> X13s, but he send out patch series to fix some or all of those[1], so
>>>> with a bit of luck those issues will soon be fixed as well.
>>>> https://lore.kernel.org/lkml/[email protected]/
>> As 6.8 final might be just five days away, could you please help me out
>> with a short status update wrt. unresolved regressions from your side if
>> you have a minute? It's easy to get lost in all those issues. :-/ :-D
> Heh. Indeed. It's been a rough cycle. :)

:D

>>>> [1]
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> This series addresses a use-after-free in the PMIC glink driver caused
>>> by DRM bridge changes in rc1 and which can result in the internal
>>> display not showing up on boot.
>>> The DRM/SoC fixes here have now been merged to drm-misc for 6.8.
>> What about the others from that series? Can they wait till 6.9? Or are
>> they on track for 6.8?
> Vinod, the PHY maintainer, just told me he will try to get them into
> 6.8.

Ahh, good.

>>> But also with these fixes, there are still a couple of regressions
>>> related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
>>> separate reports to track those.
>
>> Any decision yet if they are going to be reverted for now?
>>
>> Am I right assuming those would fix
>> https://lore.kernel.org/lkml/[email protected]/
>> which did not get even a single reply?
>
> That was the hope, but I've managed to trigger a reset on disconnect
> once also with the runtime PM series reverted.

Ohh. So did the PM series increase the chance of hitting this? Because
if not, then...

> One of the patches from that series has already been reverted (to fix
> the VT console hotplug regression) and there is some indication that
> that was sufficient to address the issue with hotplug not being detected
> in X/Wayland too. I'm waiting for confirmation from some users that have
> not been able to use their external displays at all since 6.8-rc1, but
> it does seem to fix the X/Wayland issues I could reproduce here.
>
> But either way, the reset on disconnect is still there, and have since
> been reproduced by Bjorn also on another Qualcomm platform without a
> hypervisor so that we've now got a call stack. I've heard that Abhinav
> is looking into that, but I don't know if there's any chance to have a
> fix ready this week.

..this sounds (please correct me if I'm wrong) like on Sunday the
situation likely will be "the problem is basically in 6.7.y already, so
there is not much we can do for 6.8 and reverting or delaying the
release is unneeded" -- unless of course a fix comes in reach during
this week.

Ciao, Thorsten

2024-03-05 15:19:56

by Johan Hovold

[permalink] [raw]
Subject: Re: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

On Tue, Mar 05, 2024 at 03:50:13PM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 05.03.24 14:51, Johan Hovold wrote:
> > On Tue, Mar 05, 2024 at 10:33:39AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> >> [dropping Linus from CC, we can add him back later when needed]
> >>
> >> On 27.02.24 11:20, Johan Hovold wrote:

> >>> But also with these fixes, there are still a couple of regressions
> >>> related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
> >>> separate reports to track those.
> >
> >> Any decision yet if they are going to be reverted for now?
> >>
> >> Am I right assuming those would fix
> >> https://lore.kernel.org/lkml/[email protected]/
> >> which did not get even a single reply?
> >
> > That was the hope, but I've managed to trigger a reset on disconnect
> > once also with the runtime PM series reverted.
>
> Ohh. So did the PM series increase the chance of hitting this? Because
> if not, then...

What we know is that some change in 6.8-rc1 either introduced or
increased the chances of hitting the disconnect resets, while the
runtime PM series (and patch which has now been reverted) broke hotplug
detect.

> > One of the patches from that series has already been reverted (to fix
> > the VT console hotplug regression) and there is some indication that
> > that was sufficient to address the issue with hotplug not being detected
> > in X/Wayland too. I'm waiting for confirmation from some users that have
> > not been able to use their external displays at all since 6.8-rc1, but
> > it does seem to fix the X/Wayland issues I could reproduce here.
> >
> > But either way, the reset on disconnect is still there, and have since
> > been reproduced by Bjorn also on another Qualcomm platform without a
> > hypervisor so that we've now got a call stack. I've heard that Abhinav
> > is looking into that, but I don't know if there's any chance to have a
> > fix ready this week.
>
> ...this sounds (please correct me if I'm wrong) like on Sunday the
> situation likely will be "the problem is basically in 6.7.y already, so
> there is not much we can do for 6.8 and reverting or delaying the
> release is unneeded" -- unless of course a fix comes in reach during
> this week.

Yes, unless Abhinav and Bjorn can pinpoint the change that makes us hit
this since 6.8-rc1 and revert that change (or come up with some
temporary band-aid).

It is also possible that we're dealing more than one bug here, since
we're seeing resets both on disconnect and when stopping X some time
after a disconnect.

Johan

2024-03-06 04:20:06

by Bjorn Andersson

[permalink] [raw]
Subject: Re: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

On Tue, Mar 05, 2024 at 02:51:44PM +0100, Johan Hovold wrote:
> [ +CC: Vinod, Bjorn, Abhinav ]
>
> On Tue, Mar 05, 2024 at 10:33:39AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
> > [dropping Linus from CC, we can add him back later when needed]
> >
> > On 27.02.24 11:20, Johan Hovold wrote:
> > > On Sun, Feb 25, 2024 at 01:21:46PM +0000, Regzbot (on behalf of Thorsten Leemhuis) wrote:
> > >
> > >> Johan Hovold also deals with multiple issues affecting Lenovo ThinkPad
> > >> X13s, but he send out patch series to fix some or all of those[1], so
> > >> with a bit of luck those issues will soon be fixed as well.
> > >> https://lore.kernel.org/lkml/[email protected]/
> >
> > As 6.8 final might be just five days away, could you please help me out
> > with a short status update wrt. unresolved regressions from your side if
> > you have a minute? It's easy to get lost in all those issues. :-/ :-D
>
> Heh. Indeed. It's been a rough cycle. :)
>
> > >> [1]
> > >> https://lore.kernel.org/lkml/[email protected]/
> > >
> > > This series addresses a use-after-free in the PMIC glink driver caused
> > > by DRM bridge changes in rc1 and which can result in the internal
> > > display not showing up on boot.
> > >
> > > The DRM/SoC fixes here have now been merged to drm-misc for 6.8.
> >
> > What about the others from that series? Can they wait till 6.9? Or are
> > they on track for 6.8?
>
> Vinod, the PHY maintainer, just told me he will try to get them into
> 6.8.
>
> > > But also with these fixes, there are still a couple of regressions
> > > related to the Qualcomm DRM runtime PM rework in 6.8-rc1. I'll send
> > > separate reports to track those.
> >
> > Any decision yet if they are going to be reverted for now?
> >
> > Am I right assuming those would fix
> > https://lore.kernel.org/lkml/[email protected]/
> > which did not get even a single reply?
>
> That was the hope, but I've managed to trigger a reset on disconnect
> once also with the runtime PM series reverted.
>
> One of the patches from that series has already been reverted (to fix
> the VT console hotplug regression) and there is some indication that
> that was sufficient to address the issue with hotplug not being detected
> in X/Wayland too. I'm waiting for confirmation from some users that have
> not been able to use their external displays at all since 6.8-rc1, but
> it does seem to fix the X/Wayland issues I could reproduce here.
>

I bumped my X13s to v6.8-rc7 earlier today and took it for a spin.

I was successfully able to plug/unplug my main display both in fbcon and
Wayland (sway) a number of times, I was able to boot with external
display connected and have it show up in fbcon and then survive into
sway. I tried suspending (echo mem > /sys/power/state) and got back from
that state a few times without problems.

Mixing connection/disconnection with being in suspended state was less
successful and I was able to crash the machine twice here - but I can't
say this worked before... (As previously we would not have eDP after
suspending with external display).

So, things are looking much better with -rc7, but of course, my test
scope is limited.

Regards,
Bjorn

> But either way, the reset on disconnect is still there, and have since
> been reproduced by Bjorn also on another Qualcomm platform without a
> hypervisor so that we've now got a call stack. I've heard that Abhinav
> is looking into that, but I don't know if there's any chance to have a
> fix ready this week.
>
> Johan

2024-03-08 12:55:50

by Johan Hovold

[permalink] [raw]
Subject: Re: Lenovo ThinkPad X13s regerssions (was Re: Linux regressions report for mainline [2024-02-25])

On Tue, Mar 05, 2024 at 08:19:47PM -0800, Bjorn Andersson wrote:
> On Tue, Mar 05, 2024 at 02:51:44PM +0100, Johan Hovold wrote:
> > On Tue, Mar 05, 2024 at 10:33:39AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:

> > > Any decision yet if they are going to be reverted for now?
> > >
> > > Am I right assuming those would fix
> > > https://lore.kernel.org/lkml/[email protected]/
> > > which did not get even a single reply?
> >
> > That was the hope, but I've managed to trigger a reset on disconnect
> > once also with the runtime PM series reverted.

I have not been able to reproduce the reset with the series reverted,
and after reviewing the code in question it seems unlikely that I ever
did so.

> > One of the patches from that series has already been reverted (to fix
> > the VT console hotplug regression) and there is some indication that
> > that was sufficient to address the issue with hotplug not being detected
> > in X/Wayland too. I'm waiting for confirmation from some users that have
> > not been able to use their external displays at all since 6.8-rc1, but
> > it does seem to fix the X/Wayland issues I could reproduce here.

> I bumped my X13s to v6.8-rc7 earlier today and took it for a spin.
>
> I was successfully able to plug/unplug my main display both in fbcon and
> Wayland (sway) a number of times, I was able to boot with external
> display connected and have it show up in fbcon and then survive into
> sway. I tried suspending (echo mem > /sys/power/state) and got back from
> that state a few times without problems.
>
> Mixing connection/disconnection with being in suspended state was less
> successful and I was able to crash the machine twice here - but I can't
> say this worked before... (As previously we would not have eDP after
> suspending with external display).
>
> So, things are looking much better with -rc7, but of course, my test
> scope is limited.

Thanks for confirming. The revert in rc7 seems to help with the hotplug
detect issues I could reproduce too. And for some reason, I can no
longer seem to reproduce the reset either, possibly due to unrelated
changes in timing (e.g. as I don't see it after reapplying the reverted
patch either).

I just spent some more time on this driver and sent a follow-up report
here:

https://lore.kernel.org/lkml/[email protected]/

It seems quite clear to me that the reset-on-disconnect regression have
been introduced by the runtime PM series and I don't currently see how
the hotplug notification revert in rc7 could have fixed it.

Johan