Some userspace presumes that the first connected connector is the main
display, where it's supposed to display e.g. the login screen. For
laptops, this should be the main panel.
This patch call drm_helper_move_panel_connectors_to_head() after
drm_bridge_connector_init() to make sure eDP stay at head of
connected connector list. This fixes unexpected corruption happen
at eDP panel if eDP is not placed at head of connected connector
list.
Signed-off-by: Kuogee Hsieh <[email protected]>
---
drivers/gpu/drm/msm/dp/dp_drm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/msm/dp/dp_drm.c b/drivers/gpu/drm/msm/dp/dp_drm.c
index ce0ec3a..2d18884 100644
--- a/drivers/gpu/drm/msm/dp/dp_drm.c
+++ b/drivers/gpu/drm/msm/dp/dp_drm.c
@@ -136,5 +136,7 @@ struct drm_connector *dp_drm_connector_init(struct msm_dp *dp_display)
drm_connector_attach_encoder(connector, dp_display->encoder);
+ drm_helper_move_panel_connectors_to_head(dp_display->drm_dev);
+
return connector;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh <[email protected]> wrote:
>Some userspace presumes that the first connected connector is the main
>display, where it's supposed to display e.g. the login screen. For
>laptops, this should be the main panel.
>
>This patch call drm_helper_move_panel_connectors_to_head() after
>drm_bridge_connector_init() to make sure eDP stay at head of
>connected connector list. This fixes unexpected corruption happen
>at eDP panel if eDP is not placed at head of connected connector
>list.
The change itself is a good fix anyway. (And I'd ack it.) However I would like to understand why does it fix the corruption issue. What is we have eDP and DSI, with DSI ending up before the eDP? Would we see the issue?
Also could you please describe the mind of corruption you are observing?
>
>Signed-off-by: Kuogee Hsieh <[email protected]>
>---
> drivers/gpu/drm/msm/dp/dp_drm.c | 2 ++
> 1 file changed, 2 insertions(+)
>
>diff --git a/drivers/gpu/drm/msm/dp/dp_drm.c b/drivers/gpu/drm/msm/dp/dp_drm.c
>index ce0ec3a..2d18884 100644
>--- a/drivers/gpu/drm/msm/dp/dp_drm.c
>+++ b/drivers/gpu/drm/msm/dp/dp_drm.c
>@@ -136,5 +136,7 @@ struct drm_connector *dp_drm_connector_init(struct msm_dp *dp_display)
>
> drm_connector_attach_encoder(connector, dp_display->encoder);
>
>+ drm_helper_move_panel_connectors_to_head(dp_display->drm_dev);
>+
> return connector;
> }
Hi,
On Tue, Jun 28, 2022 at 1:14 PM Dmitry Baryshkov
<[email protected]> wrote:
>
> On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh <[email protected]> wrote:
> >Some userspace presumes that the first connected connector is the main
> >display, where it's supposed to display e.g. the login screen. For
> >laptops, this should be the main panel.
> >
> >This patch call drm_helper_move_panel_connectors_to_head() after
> >drm_bridge_connector_init() to make sure eDP stay at head of
> >connected connector list. This fixes unexpected corruption happen
> >at eDP panel if eDP is not placed at head of connected connector
> >list.
>
> The change itself is a good fix anyway. (And I'd ack it.) However I would like to understand why does it fix the corruption issue. What is we have eDP and DSI, with DSI ending up before the eDP? Would we see the issue?
> Also could you please describe the mind of corruption you are observing?
I've spent a whole bunch of time poking at this and in the end my
conclusion is this:
1. The glitchyness seems to be a result of the Chrome OS userspace
somehow telling the kernel to do something wrong.
2. I believe (though I have no proof other than Kuogee's patch fixing
things) that the Chrome OS userspace is simply confused by the eDP
connector being second. This would imply that Kuogee's patch is
actually the right one.
3. It would be ideal if the Chrome OS userspace were fixed to handle
this, but it's an area of code that I've never looked at. It also
seems terribly low priority to fix since apparently other OSes have
similar problems (seems like this code was originally added by
RedHat?)
Specifically, I tested with a similar but "persistent" glitch that I
reproduced. The glitch Kuogee was digging into was a transitory glitch
on the eDP (internal) display when you plugged in a DP (external)
display. It would show up for a frame or two and then be fixed. I can
get a similar-looking glitch (vertical black and white bars) that
persists by doing these steps on a Chrome OS device (and Chrome OS
kernel):
a) Observe screen looks good.
b) Observe DP not connected.
c) Plug in DP
d) See transitory glitch on screen, then it all looks fine.
e) set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
f) Wait for screen to turn off
g) Unplug DP
h) Hit key on keyboard to wake device.
i) See glitchy.
j) Within 5 seconds: set_power_policy --ac_screen_dim_delay=5000
--ac_screen_off_delay=10000
Once I'm in the persistent glitch:
* The "screenshot" command in Chrome OS shows corruption. Not exactly
black and white bars, but the image produced has distinct bands of
garbage.
* I can actually toggle between VT2 and the main screen (VT1). Note
that VT1/VT2 are not quite the normal Linux managed solution--I
believe they're handled by frecon. In any case, when I switch to VT2
it looks normal (I can see the login prompt). Then back to VT1 and the
vertical bars glitch. Back to VT2 and it's normal. Back to VT1 and the
glitch again. This implies (especially with the extra evidence of
screenshot) that the display controller hardware is all fine and that
it's the underlying data that's somehow messed up.
When I pick Kuogee's patch then this "persistent" glitch goes away
just like the transitory one does.
I'm going to go ahead and do:
Reviewed-by: Douglas Anderson <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
On Wed, Jun 29, 2022 at 5:36 PM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Tue, Jun 28, 2022 at 1:14 PM Dmitry Baryshkov
> <[email protected]> wrote:
> >
> > On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh <[email protected]> wrote:
> > >Some userspace presumes that the first connected connector is the main
> > >display, where it's supposed to display e.g. the login screen. For
> > >laptops, this should be the main panel.
> > >
> > >This patch call drm_helper_move_panel_connectors_to_head() after
> > >drm_bridge_connector_init() to make sure eDP stay at head of
> > >connected connector list. This fixes unexpected corruption happen
> > >at eDP panel if eDP is not placed at head of connected connector
> > >list.
> >
> > The change itself is a good fix anyway. (And I'd ack it.) However I would like to understand why does it fix the corruption issue. What is we have eDP and DSI, with DSI ending up before the eDP? Would we see the issue?
> > Also could you please describe the mind of corruption you are observing?
>
> I've spent a whole bunch of time poking at this and in the end my
> conclusion is this:
>
> 1. The glitchyness seems to be a result of the Chrome OS userspace
> somehow telling the kernel to do something wrong.
>
> 2. I believe (though I have no proof other than Kuogee's patch fixing
> things) that the Chrome OS userspace is simply confused by the eDP
> connector being second. This would imply that Kuogee's patch is
> actually the right one.
>
> 3. It would be ideal if the Chrome OS userspace were fixed to handle
> this, but it's an area of code that I've never looked at. It also
> seems terribly low priority to fix since apparently other OSes have
> similar problems (seems like this code was originally added by
> RedHat?)
>
>
> Specifically, I tested with a similar but "persistent" glitch that I
> reproduced. The glitch Kuogee was digging into was a transitory glitch
> on the eDP (internal) display when you plugged in a DP (external)
> display. It would show up for a frame or two and then be fixed. I can
> get a similar-looking glitch (vertical black and white bars) that
> persists by doing these steps on a Chrome OS device (and Chrome OS
> kernel):
>
> a) Observe screen looks good.
> b) Observe DP not connected.
> c) Plug in DP
> d) See transitory glitch on screen, then it all looks fine.
> e) set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
> f) Wait for screen to turn off
> g) Unplug DP
> h) Hit key on keyboard to wake device.
> i) See glitchy.
> j) Within 5 seconds: set_power_policy --ac_screen_dim_delay=5000
> --ac_screen_off_delay=10000
>
> Once I'm in the persistent glitch:
>
> * The "screenshot" command in Chrome OS shows corruption. Not exactly
> black and white bars, but the image produced has distinct bands of
> garbage.
>
> * I can actually toggle between VT2 and the main screen (VT1). Note
> that VT1/VT2 are not quite the normal Linux managed solution--I
> believe they're handled by frecon. In any case, when I switch to VT2
> it looks normal (I can see the login prompt). Then back to VT1 and the
> vertical bars glitch. Back to VT2 and it's normal. Back to VT1 and the
> glitch again. This implies (especially with the extra evidence of
> screenshot) that the display controller hardware is all fine and that
> it's the underlying data that's somehow messed up.
fwiw, from looking at this a bit w/ Doug, I think the "glitch" is
simply just an un-renderered buffer being interpreted by the display
controller as UBWC (because userspace tells it to)
BR,
-R
> When I pick Kuogee's patch then this "persistent" glitch goes away
> just like the transitory one does.
>
> I'm going to go ahead and do:
>
> Reviewed-by: Douglas Anderson <[email protected]>
> Tested-by: Douglas Anderson <[email protected]>
On 30 June 2022 04:57:35 GMT+03:00, Rob Clark <[email protected]> wrote:
>On Wed, Jun 29, 2022 at 5:36 PM Doug Anderson <[email protected]> wrote:
>>
>> Hi,
>>
>> On Tue, Jun 28, 2022 at 1:14 PM Dmitry Baryshkov
>> <[email protected]> wrote:
>> >
>> > On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh <[email protected]> wrote:
>> > >Some userspace presumes that the first connected connector is the main
>> > >display, where it's supposed to display e.g. the login screen. For
>> > >laptops, this should be the main panel.
>> > >
>> > >This patch call drm_helper_move_panel_connectors_to_head() after
>> > >drm_bridge_connector_init() to make sure eDP stay at head of
>> > >connected connector list. This fixes unexpected corruption happen
>> > >at eDP panel if eDP is not placed at head of connected connector
>> > >list.
>> >
>> > The change itself is a good fix anyway. (And I'd ack it.) However I would like to understand why does it fix the corruption issue. What is we have eDP and DSI, with DSI ending up before the eDP? Would we see the issue?
>> > Also could you please describe the mind of corruption you are observing?
>>
>> I've spent a whole bunch of time poking at this and in the end my
>> conclusion is this:
>>
>> 1. The glitchyness seems to be a result of the Chrome OS userspace
>> somehow telling the kernel to do something wrong.
>>
>> 2. I believe (though I have no proof other than Kuogee's patch fixing
>> things) that the Chrome OS userspace is simply confused by the eDP
>> connector being second. This would imply that Kuogee's patch is
>> actually the right one.
>>
>> 3. It would be ideal if the Chrome OS userspace were fixed to handle
>> this, but it's an area of code that I've never looked at. It also
>> seems terribly low priority to fix since apparently other OSes have
>> similar problems (seems like this code was originally added by
>> RedHat?)
>>
>>
>> Specifically, I tested with a similar but "persistent" glitch that I
>> reproduced. The glitch Kuogee was digging into was a transitory glitch
>> on the eDP (internal) display when you plugged in a DP (external)
>> display. It would show up for a frame or two and then be fixed. I can
>> get a similar-looking glitch (vertical black and white bars) that
>> persists by doing these steps on a Chrome OS device (and Chrome OS
>> kernel):
>>
>> a) Observe screen looks good.
>> b) Observe DP not connected.
>> c) Plug in DP
>> d) See transitory glitch on screen, then it all looks fine.
>> e) set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
>> f) Wait for screen to turn off
>> g) Unplug DP
>> h) Hit key on keyboard to wake device.
>> i) See glitchy.
>> j) Within 5 seconds: set_power_policy --ac_screen_dim_delay=5000
>> --ac_screen_off_delay=10000
>>
>> Once I'm in the persistent glitch:
>>
>> * The "screenshot" command in Chrome OS shows corruption. Not exactly
>> black and white bars, but the image produced has distinct bands of
>> garbage.
>>
>> * I can actually toggle between VT2 and the main screen (VT1). Note
>> that VT1/VT2 are not quite the normal Linux managed solution--I
>> believe they're handled by frecon. In any case, when I switch to VT2
>> it looks normal (I can see the login prompt). Then back to VT1 and the
>> vertical bars glitch. Back to VT2 and it's normal. Back to VT1 and the
>> glitch again. This implies (especially with the extra evidence of
>> screenshot) that the display controller hardware is all fine and that
>> it's the underlying data that's somehow messed up.
>
>fwiw, from looking at this a bit w/ Doug, I think the "glitch" is
>simply just an un-renderered buffer being interpreted by the display
>controller as UBWC (because userspace tells it to)
Thanks for the description. I think the userspace code should be fixed too, but this patch can go in on its own.
Reviewed-by: Dmitry Baryshkov <[email protected]>
>
>BR,
>-R
>
>> When I pick Kuogee's patch then this "persistent" glitch goes away
>> just like the transitory one does.
>>
>> I'm going to go ahead and do:
>>
>> Reviewed-by: Douglas Anderson <[email protected]>
>> Tested-by: Douglas Anderson <[email protected]>
--
With best wishes
Dmitry
On 30/06/2022 09:14, Dmitry Baryshkov wrote:
>
>
> On 30 June 2022 04:57:35 GMT+03:00, Rob Clark <[email protected]> wrote:
>> On Wed, Jun 29, 2022 at 5:36 PM Doug Anderson <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> On Tue, Jun 28, 2022 at 1:14 PM Dmitry Baryshkov
>>> <[email protected]> wrote:
>>>>
>>>> On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh <[email protected]> wrote:
>>>>> Some userspace presumes that the first connected connector is the main
>>>>> display, where it's supposed to display e.g. the login screen. For
>>>>> laptops, this should be the main panel.
>>>>>
>>>>> This patch call drm_helper_move_panel_connectors_to_head() after
>>>>> drm_bridge_connector_init() to make sure eDP stay at head of
>>>>> connected connector list. This fixes unexpected corruption happen
>>>>> at eDP panel if eDP is not placed at head of connected connector
>>>>> list.
>>>>
>>>> The change itself is a good fix anyway. (And I'd ack it.) However I would like to understand why does it fix the corruption issue. What is we have eDP and DSI, with DSI ending up before the eDP? Would we see the issue?
>>>> Also could you please describe the mind of corruption you are observing?
>>>
>>> I've spent a whole bunch of time poking at this and in the end my
>>> conclusion is this:
>>>
>>> 1. The glitchyness seems to be a result of the Chrome OS userspace
>>> somehow telling the kernel to do something wrong.
>>>
>>> 2. I believe (though I have no proof other than Kuogee's patch fixing
>>> things) that the Chrome OS userspace is simply confused by the eDP
>>> connector being second. This would imply that Kuogee's patch is
>>> actually the right one.
>>>
>>> 3. It would be ideal if the Chrome OS userspace were fixed to handle
>>> this, but it's an area of code that I've never looked at. It also
>>> seems terribly low priority to fix since apparently other OSes have
>>> similar problems (seems like this code was originally added by
>>> RedHat?)
>>>
>>>
>>> Specifically, I tested with a similar but "persistent" glitch that I
>>> reproduced. The glitch Kuogee was digging into was a transitory glitch
>>> on the eDP (internal) display when you plugged in a DP (external)
>>> display. It would show up for a frame or two and then be fixed. I can
>>> get a similar-looking glitch (vertical black and white bars) that
>>> persists by doing these steps on a Chrome OS device (and Chrome OS
>>> kernel):
>>>
>>> a) Observe screen looks good.
>>> b) Observe DP not connected.
>>> c) Plug in DP
>>> d) See transitory glitch on screen, then it all looks fine.
>>> e) set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
>>> f) Wait for screen to turn off
>>> g) Unplug DP
>>> h) Hit key on keyboard to wake device.
>>> i) See glitchy.
>>> j) Within 5 seconds: set_power_policy --ac_screen_dim_delay=5000
>>> --ac_screen_off_delay=10000
>>>
>>> Once I'm in the persistent glitch:
>>>
>>> * The "screenshot" command in Chrome OS shows corruption. Not exactly
>>> black and white bars, but the image produced has distinct bands of
>>> garbage.
>>>
>>> * I can actually toggle between VT2 and the main screen (VT1). Note
>>> that VT1/VT2 are not quite the normal Linux managed solution--I
>>> believe they're handled by frecon. In any case, when I switch to VT2
>>> it looks normal (I can see the login prompt). Then back to VT1 and the
>>> vertical bars glitch. Back to VT2 and it's normal. Back to VT1 and the
>>> glitch again. This implies (especially with the extra evidence of
>>> screenshot) that the display controller hardware is all fine and that
>>> it's the underlying data that's somehow messed up.
>>
>> fwiw, from looking at this a bit w/ Doug, I think the "glitch" is
>> simply just an un-renderered buffer being interpreted by the display
>> controller as UBWC (because userspace tells it to)
>
> Thanks for the description. I think the userspace code should be fixed too, but this patch can go in on its own.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
After some time (please excuse me), musing with the code and even
picking up the commit for the merge branch, I understood the fact that I
did not like about this change. It moves all panel connectors (generic
code) from the DP-specific driver.
I'd like to retract my R-b. Please move this call to the msm_drm_init().
Calling this function somewhere after the ->kms_init() would make sure
that all panel connectors are close to the top of the list, whichever
MDP/DPU driver is used and whichever actual interface is bound to this
panel.
>
>
>>
>> BR,
>> -R
>>
>>> When I pick Kuogee's patch then this "persistent" glitch goes away
>>> just like the transitory one does.
>>>
>>> I'm going to go ahead and do:
>>>
>>> Reviewed-by: Douglas Anderson <[email protected]>
>>> Tested-by: Douglas Anderson <[email protected]>
>
--
With best wishes
Dmitry
On 7/4/2022 11:14 AM, Dmitry Baryshkov wrote:
> On 30/06/2022 09:14, Dmitry Baryshkov wrote:
>>
>>
>> On 30 June 2022 04:57:35 GMT+03:00, Rob Clark <[email protected]>
>> wrote:
>>> On Wed, Jun 29, 2022 at 5:36 PM Doug Anderson <[email protected]>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Tue, Jun 28, 2022 at 1:14 PM Dmitry Baryshkov
>>>> <[email protected]> wrote:
>>>>>
>>>>> On 28 June 2022 18:20:06 GMT+03:00, Kuogee Hsieh
>>>>> <[email protected]> wrote:
>>>>>> Some userspace presumes that the first connected connector is the
>>>>>> main
>>>>>> display, where it's supposed to display e.g. the login screen. For
>>>>>> laptops, this should be the main panel.
>>>>>>
>>>>>> This patch call drm_helper_move_panel_connectors_to_head() after
>>>>>> drm_bridge_connector_init() to make sure eDP stay at head of
>>>>>> connected connector list. This fixes unexpected corruption happen
>>>>>> at eDP panel if eDP is not placed at head of connected connector
>>>>>> list.
>>>>>
>>>>> The change itself is a good fix anyway. (And I'd ack it.) However I
>>>>> would like to understand why does it fix the corruption issue. What
>>>>> is we have eDP and DSI, with DSI ending up before the eDP? Would we
>>>>> see the issue?
>>>>> Also could you please describe the mind of corruption you are
>>>>> observing?
>>>>
>>>> I've spent a whole bunch of time poking at this and in the end my
>>>> conclusion is this:
>>>>
>>>> 1. The glitchyness seems to be a result of the Chrome OS userspace
>>>> somehow telling the kernel to do something wrong.
>>>>
>>>> 2. I believe (though I have no proof other than Kuogee's patch fixing
>>>> things) that the Chrome OS userspace is simply confused by the eDP
>>>> connector being second. This would imply that Kuogee's patch is
>>>> actually the right one.
>>>>
>>>> 3. It would be ideal if the Chrome OS userspace were fixed to handle
>>>> this, but it's an area of code that I've never looked at. It also
>>>> seems terribly low priority to fix since apparently other OSes have
>>>> similar problems (seems like this code was originally added by
>>>> RedHat?)
>>>>
>>>>
>>>> Specifically, I tested with a similar but "persistent" glitch that I
>>>> reproduced. The glitch Kuogee was digging into was a transitory glitch
>>>> on the eDP (internal) display when you plugged in a DP (external)
>>>> display. It would show up for a frame or two and then be fixed. I can
>>>> get a similar-looking glitch (vertical black and white bars) that
>>>> persists by doing these steps on a Chrome OS device (and Chrome OS
>>>> kernel):
>>>>
>>>> a) Observe screen looks good.
>>>> b) Observe DP not connected.
>>>> c) Plug in DP
>>>> d) See transitory glitch on screen, then it all looks fine.
>>>> e) set_power_policy --ac_screen_dim_delay=5 --ac_screen_off_delay=10
>>>> f) Wait for screen to turn off
>>>> g) Unplug DP
>>>> h) Hit key on keyboard to wake device.
>>>> i) See glitchy.
>>>> j) Within 5 seconds: set_power_policy --ac_screen_dim_delay=5000
>>>> --ac_screen_off_delay=10000
>>>>
>>>> Once I'm in the persistent glitch:
>>>>
>>>> * The "screenshot" command in Chrome OS shows corruption. Not exactly
>>>> black and white bars, but the image produced has distinct bands of
>>>> garbage.
>>>>
>>>> * I can actually toggle between VT2 and the main screen (VT1). Note
>>>> that VT1/VT2 are not quite the normal Linux managed solution--I
>>>> believe they're handled by frecon. In any case, when I switch to VT2
>>>> it looks normal (I can see the login prompt). Then back to VT1 and the
>>>> vertical bars glitch. Back to VT2 and it's normal. Back to VT1 and the
>>>> glitch again. This implies (especially with the extra evidence of
>>>> screenshot) that the display controller hardware is all fine and that
>>>> it's the underlying data that's somehow messed up.
>>>
>>> fwiw, from looking at this a bit w/ Doug, I think the "glitch" is
>>> simply just an un-renderered buffer being interpreted by the display
>>> controller as UBWC (because userspace tells it to)
>>
>> Thanks for the description. I think the userspace code should be fixed
>> too, but this patch can go in on its own.
>>
>> Reviewed-by: Dmitry Baryshkov <[email protected]>
>
> After some time (please excuse me), musing with the code and even
> picking up the commit for the merge branch, I understood the fact that I
> did not like about this change. It moves all panel connectors (generic
> code) from the DP-specific driver.
>
> I'd like to retract my R-b. Please move this call to the msm_drm_init().
> Calling this function somewhere after the ->kms_init() would make sure
> that all panel connectors are close to the top of the list, whichever
> MDP/DPU driver is used and whichever actual interface is bound to this
> panel.
>
Ah. True, but just to add. It should be after kms_init() but before
drm_dev_register().
>>
>>
>>>
>>> BR,
>>> -R
>>>
>>>> When I pick Kuogee's patch then this "persistent" glitch goes away
>>>> just like the transitory one does.
>>>>
>>>> I'm going to go ahead and do:
>>>>
>>>> Reviewed-by: Douglas Anderson <[email protected]>
>>>> Tested-by: Douglas Anderson <[email protected]>
>>
>
>