dp_bridge_disable() is the first step toward tearing down main link.
Its major function is to start transmitting idle pattern to replace
video stream. This patch will check hpd_state to make sure main link
is enabled before commit changes of main link's configuration to
push idle pattern out to avoid system crashing due to main link clock
is disabled while access main link registers.
SError Interrupt on CPU7, code 0x00000000be000411 -- SError
CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
Hardware name: Google Lazor (rev3 - 8) (DT)
pstate: a04000c9 (NzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __cmpxchg_case_acq_32+0x14/0x2c
lr : do_raw_spin_lock+0xa4/0xdc
sp : ffffffc01092b6a0
x29: ffffffc01092b6a0 x28: 0000000000000028 x27: 0000000000000038
x26: 0000000000000004 x25: ffffffd2973dce48 x24: 0000000000000000
x23: 00000000ffffffff x22: 00000000ffffffff x21: ffffffd2978d0008
x20: ffffffd2978d0008 x19: ffffff80ff759fc0 x18: 0000000000000000
x17: 004800a501260460 x16: 0441043b04600438 x15: 04380000089807d0
x14: 07b0089807800780 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000438 x10: 00000000000007d0 x9 : ffffffd2973e09e4
x8 : ffffff8092d53300 x7 : ffffff808902e8b8 x6 : 0000000000000001
x5 : ffffff808902e880 x4 : 0000000000000000 x3 : ffffff80ff759fc0
x2 : 0000000000000001 x1 : 0000000000000000 x0 : ffffff80ff759fc0
Kernel panic - not syncing: Asynchronous SError Interrupt
CPU: 7 PID: 3878 Comm: Xorg Not tainted 5.19.0-stb-cbq #19
Hardware name: Google Lazor (rev3 - 8) (DT)
Call trace:
dump_backtrace.part.0+0xbc/0xe4
show_stack+0x24/0x70
dump_stack_lvl+0x68/0x84
dump_stack+0x18/0x34
panic+0x14c/0x32c
nmi_panic+0x58/0x7c
arm64_serror_panic+0x78/0x84
do_serror+0x40/0x64
el1h_64_error_handler+0x30/0x48
el1h_64_error+0x68/0x6c
__cmpxchg_case_acq_32+0x14/0x2c
_raw_spin_lock_irqsave+0x38/0x4c
lock_timer_base+0x40/0x78
__mod_timer+0xf4/0x25c
schedule_timeout+0xd4/0xfc
__wait_for_common+0xac/0x140
wait_for_completion_timeout+0x2c/0x54
dp_ctrl_push_idle+0x40/0x88
dp_bridge_disable+0x24/0x30
drm_atomic_bridge_chain_disable+0x90/0xbc
drm_atomic_helper_commit_modeset_disables+0x198/0x444
msm_atomic_commit_tail+0x1d0/0x374
commit_tail+0x80/0x108
drm_atomic_helper_commit+0x118/0x11c
drm_atomic_commit+0xb4/0xe0
drm_client_modeset_commit_atomic+0x184/0x224
drm_client_modeset_commit_locked+0x58/0x160
drm_client_modeset_commit+0x3c/0x64
__drm_fb_helper_restore_fbdev_mode_unlocked+0x98/0xac
drm_fb_helper_set_par+0x74/0x80
drm_fb_helper_hotplug_event+0xdc/0xe0
__drm_fb_helper_restore_fbdev_mode_unlocked+0x7c/0xac
drm_fb_helper_restore_fbdev_mode_unlocked+0x20/0x2c
drm_fb_helper_lastclose+0x20/0x2c
drm_lastclose+0x44/0x6c
drm_release+0x88/0xd4
__fput+0x104/0x220
____fput+0x1c/0x28
task_work_run+0x8c/0x100
do_exit+0x450/0x8d0
do_group_exit+0x40/0xac
__wake_up_parent+0x0/0x38
invoke_syscall+0x84/0x11c
el0_svc_common.constprop.0+0xb8/0xe4
do_el0_svc+0x8c/0xb8
el0_svc+0x2c/0x54
el0t_64_sync_handler+0x120/0x1c0
el0t_64_sync+0x190/0x194
SMP: stopping secondary CPUs
Kernel Offset: 0x128e800000 from 0xffffffc008000000
PHYS_OFFSET: 0x80000000
CPU features: 0x800,00c2a015,19801c82
Memory Limit: none
Changes in v3:
-- correct Reported-by
-- add call stack trace
Changes in v2:
-- changes Fixes patch
-- fix eported-by
-- add Closes tag
Fixes: 375a126090b9 ("drm/msm/dp: tear down main link at unplug handle immediately")
Reported-by: Leonard Lausen <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/17
Signed-off-by: Kuogee Hsieh <[email protected]>
---
drivers/gpu/drm/msm/dp/dp_display.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index b36f8b6..678289a 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
struct msm_dp *dp = dp_bridge->dp_display;
struct dp_display_private *dp_display;
+ u32 state;
dp_display = container_of(dp, struct dp_display_private, dp_display);
+ mutex_lock(&dp_display->event_mutex);
+
+ state = dp_display->hpd_state;
+ if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
+ mutex_unlock(&dp_display->event_mutex);
+ return;
+ }
+
dp_ctrl_push_idle(dp_display->ctrl);
+ mutex_unlock(&dp_display->event_mutex);
}
void dp_bridge_post_disable(struct drm_bridge *drm_bridge)
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index b36f8b6..678289a 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
> struct msm_dp *dp = dp_bridge->dp_display;
> struct dp_display_private *dp_display;
> + u32 state;
>
> dp_display = container_of(dp, struct dp_display_private, dp_display);
>
> + mutex_lock(&dp_display->event_mutex);
> +
> + state = dp_display->hpd_state;
> + if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?
> + mutex_unlock(&dp_display->event_mutex);
> + return;
> + }
> +
> dp_ctrl_push_idle(dp_display->ctrl);
> + mutex_unlock(&dp_display->event_mutex);
On 8/10/2022 3:22 PM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
>> index b36f8b6..678289a 100644
>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
>> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>> struct msm_dp *dp = dp_bridge->dp_display;
>> struct dp_display_private *dp_display;
>> + u32 state;
>>
>> dp_display = container_of(dp, struct dp_display_private, dp_display);
>>
>> + mutex_lock(&dp_display->event_mutex);
>> +
>> + state = dp_display->hpd_state;
>> + if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
> It's concerning that we have to check this at all. Are we still
> interjecting into the disable path when the cable is disconnected?
yes,
The problem is not from cable disconnected.
There is a corner case that this function is called at drm shutdown
(drm_release).
At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
cause system crash.
>> + mutex_unlock(&dp_display->event_mutex);
>> + return;
>> + }
>> +
>> dp_ctrl_push_idle(dp_display->ctrl);
>> + mutex_unlock(&dp_display->event_mutex);
Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>
> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
> > Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> >> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> >> index b36f8b6..678289a 100644
> >> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> >> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> >> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
> >> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
> >> struct msm_dp *dp = dp_bridge->dp_display;
> >> struct dp_display_private *dp_display;
> >> + u32 state;
> >>
> >> dp_display = container_of(dp, struct dp_display_private, dp_display);
> >>
> >> + mutex_lock(&dp_display->event_mutex);
> >> +
> >> + state = dp_display->hpd_state;
> >> + if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
> > It's concerning that we have to check this at all. Are we still
> > interjecting into the disable path when the cable is disconnected?
>
> yes,
>
> The problem is not from cable disconnected.
>
> There is a corner case that this function is called at drm shutdown
> (drm_release).
>
> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
> cause system crash.
The mainlink is only disabled when the cable is disconnected though?
Let me put it this way, if we have to check that the state is
"connected" or "disconnected pending" in the disable path then there's
an issue where this driver is being called in unexpected ways. This
driver is fighting the drm core each time there's a state check. We
really need to get rid of the state tracking entirely, and make sure
that the drm core is calling into the driver at the right time, i.e.
bridge disable is only called when the mainlink is enabled, etc.
Hi Stephen
On 8/10/2022 5:09 PM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>>
>> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
>>> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
>>>> index b36f8b6..678289a 100644
>>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>>>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
>>>> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>>>> struct msm_dp *dp = dp_bridge->dp_display;
>>>> struct dp_display_private *dp_display;
>>>> + u32 state;
>>>>
>>>> dp_display = container_of(dp, struct dp_display_private, dp_display);
>>>>
>>>> + mutex_lock(&dp_display->event_mutex);
>>>> +
>>>> + state = dp_display->hpd_state;
>>>> + if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
>>> It's concerning that we have to check this at all. Are we still
>>> interjecting into the disable path when the cable is disconnected?
>>
>> yes,
>>
>> The problem is not from cable disconnected.
>>
>> There is a corner case that this function is called at drm shutdown
>> (drm_release).
>>
>> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
>> cause system crash.
>
> The mainlink is only disabled when the cable is disconnected though?
>
> Let me put it this way, if we have to check that the state is
> "connected" or "disconnected pending" in the disable path then there's
> an issue where this driver is being called in unexpected ways. This
> driver is fighting the drm core each time there's a state check. We
> really need to get rid of the state tracking entirely, and make sure
> that the drm core is calling into the driver at the right time, i.e.
> bridge disable is only called when the mainlink is enabled, etc.
So if link training failed, we do not send a uevent to usermode and will
bail out here:
rc = dp_ctrl_on_link(dp->ctrl);
if (rc) {
DRM_ERROR("failed to complete DP link training\n");
goto end;
}
So this commit is not coming from usermode but from the drm_release() path.
Even then, you do have a valid point. DRM framework should not have
caused the disable path to happen without an enable.
I went through the stack mentioned in the issue.
Lets see this part of the stack:
dp_ctrl_push_idle+0x40/0x88
dp_bridge_disable+0x24/0x30
drm_atomic_bridge_chain_disable+0x90/0xbc
drm_atomic_helper_commit_modeset_disables+0x198/0x444
msm_atomic_commit_tail+0x1d0/0x374
In drm_atomic_helper_commit_modeset_disables(), we call disable_outputs().
AFAICT, this is the only place which has a protection to not call the
disable() flow if it was not enabled here:
https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
But that function is only checking crtc_state->active. Thats set by the
usermode:
https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
Now, if usermode sets that to true and then crashed it can bypass this
check and we will crash in the location kuogee is trying to fix.
From the issue mentioned in
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
mention the usermode crashed.
So this is my tentative analysis of whats happening here.
Ideally yes, we should have been protected by the location mentioned
above in disable_outputs() but looks to me due to the above hypothesis
its getting bypassed.
Thanks
Abhinav
On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> Hi Stephen
>
> On 8/10/2022 5:09 PM, Stephen Boyd wrote:
>> Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>>>
>>> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
>>>> Quoting Kuogee Hsieh (2022-08-10 12:25:51)
>>>>> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> b/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> index b36f8b6..678289a 100644
>>>>> --- a/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
>>>>> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge
>>>>> *drm_bridge)
>>>>> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
>>>>> struct msm_dp *dp = dp_bridge->dp_display;
>>>>> struct dp_display_private *dp_display;
>>>>> + u32 state;
>>>>>
>>>>> dp_display = container_of(dp, struct dp_display_private,
>>>>> dp_display);
>>>>>
>>>>> + mutex_lock(&dp_display->event_mutex);
>>>>> +
>>>>> + state = dp_display->hpd_state;
>>>>> + if (state != ST_DISCONNECT_PENDING && state !=
>>>>> ST_CONNECTED) {
>>>> It's concerning that we have to check this at all. Are we still
>>>> interjecting into the disable path when the cable is disconnected?
>>>
>>> yes,
>>>
>>> The problem is not from cable disconnected.
>>>
>>> There is a corner case that this function is called at drm shutdown
>>> (drm_release).
>>>
>>> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
>>> cause system crash.
>>
>> The mainlink is only disabled when the cable is disconnected though?
>>
>> Let me put it this way, if we have to check that the state is
>> "connected" or "disconnected pending" in the disable path then there's
>> an issue where this driver is being called in unexpected ways. This
>> driver is fighting the drm core each time there's a state check. We
>> really need to get rid of the state tracking entirely, and make sure
>> that the drm core is calling into the driver at the right time, i.e.
>> bridge disable is only called when the mainlink is enabled, etc.
>
> So if link training failed, we do not send a uevent to usermode and
> will bail out here:
>
> rc = dp_ctrl_on_link(dp->ctrl);
> if (rc) {
> DRM_ERROR("failed to complete DP link training\n");
> goto end;
> }
>
> So this commit is not coming from usermode but from the drm_release()
> path.
>
> Even then, you do have a valid point. DRM framework should not have
> caused the disable path to happen without an enable.
>
> I went through the stack mentioned in the issue.
>
> Lets see this part of the stack:
>
> dp_ctrl_push_idle+0x40/0x88
> dp_bridge_disable+0x24/0x30
> drm_atomic_bridge_chain_disable+0x90/0xbc
> drm_atomic_helper_commit_modeset_disables+0x198/0x444
> msm_atomic_commit_tail+0x1d0/0x374
>
> In drm_atomic_helper_commit_modeset_disables(), we call
> disable_outputs().
>
> AFAICT, this is the only place which has a protection to not call the
> disable() flow if it was not enabled here:
>
> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
>
>
> But that function is only checking crtc_state->active. Thats set by
> the usermode:
>
> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
>
>
> Now, if usermode sets that to true and then crashed it can bypass this
> check and we will crash in the location kuogee is trying to fix.
>
> From the issue mentioned in
> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
> mention the usermode crashed.
>
> So this is my tentative analysis of whats happening here.
>
> Ideally yes, we should have been protected by the location mentioned
> above in disable_outputs() but looks to me due to the above hypothesis
> its getting bypassed.
>
> Thanks
>
> Abhinav
>
>
Ii sound likes that there is a hole either at user space or drm.
But that should not cause dp_bridge_disable() at dp driver to crash.
Therefore it is properly to check hdp_state condition at
dp_bridge_disable() to prevent it from crashing.
Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>
> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> >
> > Even then, you do have a valid point. DRM framework should not have
> > caused the disable path to happen without an enable.
> >
> > I went through the stack mentioned in the issue.
> >
> > Lets see this part of the stack:
> >
> > dp_ctrl_push_idle+0x40/0x88
> > dp_bridge_disable+0x24/0x30
> > drm_atomic_bridge_chain_disable+0x90/0xbc
> > drm_atomic_helper_commit_modeset_disables+0x198/0x444
> > msm_atomic_commit_tail+0x1d0/0x374
> >
> > In drm_atomic_helper_commit_modeset_disables(), we call
> > disable_outputs().
> >
> > AFAICT, this is the only place which has a protection to not call the
> > disable() flow if it was not enabled here:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
> >
> >
> > But that function is only checking crtc_state->active. Thats set by
> > the usermode:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
> >
> >
> > Now, if usermode sets that to true and then crashed it can bypass this
> > check and we will crash in the location kuogee is trying to fix.
That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.
> >
> > From the issue mentioned in
> > https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
> > mention the usermode crashed.
> >
> > So this is my tentative analysis of whats happening here.
> >
> > Ideally yes, we should have been protected by the location mentioned
> > above in disable_outputs() but looks to me due to the above hypothesis
> > its getting bypassed.
Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.
Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.
> >
> > Thanks
> >
> > Abhinav
> >
> >
> Ii sound likes that there is a hole either at user space or drm.
>
> But that should not cause dp_bridge_disable() at dp driver to crash.
Agreed.
>
> Therefore it is properly to check hdp_state condition at
> dp_bridge_disable() to prevent it from crashing.
>
Disagree. Userspace shouldn't be able to get drm into a wedged state.
On 8/15/2022 10:08 AM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
>>> Even then, you do have a valid point. DRM framework should not have
>>> caused the disable path to happen without an enable.
>>>
>>> I went through the stack mentioned in the issue.
>>>
>>> Lets see this part of the stack:
>>>
>>> dp_ctrl_push_idle+0x40/0x88
>>> dp_bridge_disable+0x24/0x30
>>> drm_atomic_bridge_chain_disable+0x90/0xbc
>>> drm_atomic_helper_commit_modeset_disables+0x198/0x444
>>> msm_atomic_commit_tail+0x1d0/0x374
>>>
>>> In drm_atomic_helper_commit_modeset_disables(), we call
>>> disable_outputs().
>>>
>>> AFAICT, this is the only place which has a protection to not call the
>>> disable() flow if it was not enabled here:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
>>>
>>>
>>> But that function is only checking crtc_state->active. Thats set by
>>> the usermode:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
>>>
>>>
>>> Now, if usermode sets that to true and then crashed it can bypass this
>>> check and we will crash in the location kuogee is trying to fix.
> That seems bad, no? We don't want userspace to be able to crash and then
> be able to call the disable path when enable never succeeded.
>
>>> From the issue mentioned in
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
>>> mention the usermode crashed.
>>>
>>> So this is my tentative analysis of whats happening here.
>>>
>>> Ideally yes, we should have been protected by the location mentioned
>>> above in disable_outputs() but looks to me due to the above hypothesis
>>> its getting bypassed.
> Can you fix the problem there? Not fixing it means that every driver out
> there has to develop the same "fix", when it could be fixed in the core
> one time.
The reporter is running GNOME Display Manager (gdm3) instead of chrome.
We can not duplicate this problem locally. Hence we can not confirm this
is the root cause of this bug or not.
Do you know who is more proper to investigate more into this?
> Ideally drivers are simple. They configure the hardware for what the
> function pointer is asking for. State management and things like that
> should be pushed into the core framework so that we don't have to
> duplicate that multiple times.
>
>>> Thanks
>>>
>>> Abhinav
>>>
>>>
>> Ii sound likes that there is a hole either at user space or drm.
>>
>> But that should not cause dp_bridge_disable() at dp driver to crash.
> Agreed.
>
>> Therefore it is properly to check hdp_state condition at
>> dp_bridge_disable() to prevent it from crashing.
>>
> Disagree. Userspace shouldn't be able to get drm into a wedged state.
not sure for this.
I agree if this is simple driver such as i2c which does not need to
maintain any states during operation.
but mdp/dp is far more complexity. we really do not want to have any
crash happen at mdp/dp in the filed.
would you please reconsider our point to add this hdp_state checking
here to prevent any crash happen.
Hi Stephen
On 8/15/2022 10:08 AM, Stephen Boyd wrote:
> Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>>
>> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
>>>
>>> Even then, you do have a valid point. DRM framework should not have
>>> caused the disable path to happen without an enable.
>>>
>>> I went through the stack mentioned in the issue.
>>>
>>> Lets see this part of the stack:
>>>
>>> dp_ctrl_push_idle+0x40/0x88
>>> dp_bridge_disable+0x24/0x30
>>> drm_atomic_bridge_chain_disable+0x90/0xbc
>>> drm_atomic_helper_commit_modeset_disables+0x198/0x444
>>> msm_atomic_commit_tail+0x1d0/0x374
>>>
>>> In drm_atomic_helper_commit_modeset_disables(), we call
>>> disable_outputs().
>>>
>>> AFAICT, this is the only place which has a protection to not call the
>>> disable() flow if it was not enabled here:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
>>>
>>>
>>> But that function is only checking crtc_state->active. Thats set by
>>> the usermode:
>>>
>>> https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
>>>
>>>
>>> Now, if usermode sets that to true and then crashed it can bypass this
>>> check and we will crash in the location kuogee is trying to fix.
>
> That seems bad, no? We don't want userspace to be able to crash and then
> be able to call the disable path when enable never succeeded.
>
>>>
>>> From the issue mentioned in
>>> https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
>>> mention the usermode crashed.
>>>
>>> So this is my tentative analysis of whats happening here.
>>>
>>> Ideally yes, we should have been protected by the location mentioned
>>> above in disable_outputs() but looks to me due to the above hypothesis
>>> its getting bypassed.
>
> Can you fix the problem there? Not fixing it means that every driver out
> there has to develop the same "fix", when it could be fixed in the core
> one time.
>
As per discussion on IRC with Rob, we have pushed another fix for this
issue
https://lore.kernel.org/all/[email protected]/.
So, we can drop this one in favor of the other.
Thanks
Abhinav
> Ideally drivers are simple. They configure the hardware for what the
> function pointer is asking for. State management and things like that
> should be pushed into the core framework so that we don't have to
> duplicate that multiple times.
>
>>>
>>> Thanks
>>>
>>> Abhinav
>>>
>>>
>> Ii sound likes that there is a hole either at user space or drm.
>>
>> But that should not cause dp_bridge_disable() at dp driver to crash.
>
> Agreed.
>
>>
>> Therefore it is properly to check hdp_state condition at
>> dp_bridge_disable() to prevent it from crashing.
>>
>
> Disagree. Userspace shouldn't be able to get drm into a wedged state.