Subject: Re: [PATCH] drm/mediatek: Add spinlock for setting vblank event in atomic_begin

Il 22/08/23 15:26, Jason-JH.Lin ha scritto:
> Add spinlock protection to avoid race condition on vblank event
> between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip().
>

Hello Jason,

Can you please provide more information about this race condition?
(check below)

> Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
> Signed-off-by: Jason-JH.Lin <[email protected]>
> ---
> drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> index d40142842f85..128a672fe3c9 100644
> --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,
> crtc);
> struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state);
> struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&crtc->dev->event_lock, flags);
>
> if (mtk_crtc->event && mtk_crtc_state->base.event)
> DRM_ERROR("new event while there is still a pending event\n");
> @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,

...because my suspect is that what creates the race condition in this function is
the unlocked *assignment* to mtk_crtc->event, not the rest.

If I'm right, you don't need to unconditionally spinlock at the beginning of this
function hence ever-so-slightly improving performance compared to this version.

Can you please try this one and check if this *also* solves the issue?

if (mtk_crtc_state->base.event) {
mtk_crtc_state->base.event->pipe = drm_crtc_index(crtc);
WARN_ON(drm_crtc_vblank_get(crtc) != 0);

spin_lock_irqsave(&crtc->dev->event_lock, flags);
mtk_crtc->event = mtk_crtc_state->base.event;
spin_lock_irqrestore(&crtc->dev->event_lock, flags);

mtk_crtc_state->base.event = NULL;
}

P.S.: I'd try that myself, but I can't seem to reproduce the issue.

Regards,
Angelo


2023-09-18 11:48:54

by Fei Shao

[permalink] [raw]
Subject: Re: [PATCH] drm/mediatek: Add spinlock for setting vblank event in atomic_begin

Hi Angelo,

On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno
<[email protected]> wrote:
>
> Il 22/08/23 15:26, Jason-JH.Lin ha scritto:
> > Add spinlock protection to avoid race condition on vblank event
> > between mtk_drm_crtc_atomic_begin() and mtk_drm_finish_page_flip().
> >
>
> Hello Jason,
>
> Can you please provide more information about this race condition?
> (check below)
>
> > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.")
> > Signed-off-by: Jason-JH.Lin <[email protected]>
> > ---
> > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > index d40142842f85..128a672fe3c9 100644
> > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,
> > crtc);
> > struct mtk_crtc_state *mtk_crtc_state = to_mtk_crtc_state(crtc_state);
> > struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&crtc->dev->event_lock, flags);
> >
> > if (mtk_crtc->event && mtk_crtc_state->base.event)
> > DRM_ERROR("new event while there is still a pending event\n");
> > @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct drm_crtc *crtc,
>
> ...because my suspect is that what creates the race condition in this function is
> the unlocked *assignment* to mtk_crtc->event, not the rest.
>
> If I'm right, you don't need to unconditionally spinlock at the beginning of this
> function hence ever-so-slightly improving performance compared to this version.
>
> Can you please try this one and check if this *also* solves the issue?
>
> if (mtk_crtc_state->base.event) {
> mtk_crtc_state->base.event->pipe = drm_crtc_index(crtc);
> WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>
> spin_lock_irqsave(&crtc->dev->event_lock, flags);
> mtk_crtc->event = mtk_crtc_state->base.event;
> spin_lock_irqrestore(&crtc->dev->event_lock, flags);
>
> mtk_crtc_state->base.event = NULL;
> }
>
> P.S.: I'd try that myself, but I can't seem to reproduce the issue.

I'm still able to reproduce it so I gave it a try, and this approach
also seems to fix the issue. :)
FWIW, the way I reproduce that is to toggle the night light mode on
and off repeatedly through the UI panel while playing YouTube videos
on my device.

Jason, can you post a new version with Angelo's suggestion?

Regards,
Fei

>
> Regards,
> Angelo
>

2023-09-19 11:33:08

by Jason-JH.Lin

[permalink] [raw]
Subject: Re: [PATCH] drm/mediatek: Add spinlock for setting vblank event in atomic_begin

Hi Angelo

Thanks for the reviews.

Hi Fei,

Thanks for the testing.

On Mon, 2023-09-18 at 18:47 +0800, Fei Shao wrote:
>
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> Hi Angelo,
>
> On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno
> <[email protected]> wrote:
> >
> > Il 22/08/23 15:26, Jason-JH.Lin ha scritto:
> > > Add spinlock protection to avoid race condition on vblank event
> > > between mtk_drm_crtc_atomic_begin() and
> mtk_drm_finish_page_flip().
> > >
> >
> > Hello Jason,
> >
> > Can you please provide more information about this race condition?
> > (check below)
> >
> > > Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek
> SoC MT8173.")
> > > Signed-off-by: Jason-JH.Lin <[email protected]>
> > > ---
> > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > index d40142842f85..128a672fe3c9 100644
> > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct
> drm_crtc *crtc,
> >
> >
> crtc);
> > > struct mtk_crtc_state *mtk_crtc_state =
> to_mtk_crtc_state(crtc_state);
> > > struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
> > > + unsigned long flags;
> > > +
> > > + spin_lock_irqsave(&crtc->dev->event_lock, flags);
> > >
> > > if (mtk_crtc->event && mtk_crtc_state->base.event)
> > > DRM_ERROR("new event while there is still a pending
> event\n");
> > > @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct
> drm_crtc *crtc,
> >
> > ...because my suspect is that what creates the race condition in
> this function is
> > the unlocked *assignment* to mtk_crtc->event, not the rest.
> >
> > If I'm right, you don't need to unconditionally spinlock at the
> beginning of this
> > function hence ever-so-slightly improving performance compared to
> this version.
> >
> > Can you please try this one and check if this *also* solves the
> issue?
> >
> > if (mtk_crtc_state->base.event) {
> > mtk_crtc_state->base.event->pipe =
> drm_crtc_index(crtc);
> > WARN_ON(drm_crtc_vblank_get(crtc) != 0);
> >
> > spin_lock_irqsave(&crtc->dev->event_lock, flags);
> > mtk_crtc->event = mtk_crtc_state->base.event;
> > spin_lock_irqrestore(&crtc->dev->event_lock,
> flags);
> >
> > mtk_crtc_state->base.event = NULL;
> > }
> >
> > P.S.: I'd try that myself, but I can't seem to reproduce the issue.
>
I can't reproduce the system hang issue reported from customer by
toggling the night light mode in UI panel either.
But I can see the error message "new event while there is still a
pending event" when I was reproducing the issue.

Here is the debug info from "Wei-Shun <[email protected]>":

From the kernel tracing:
99342.377173: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.377226:
drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
99342.393831:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.393887:
drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
99342.410469:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.410519:
drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
99342.427094:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.443831:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.460495:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.477157:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.493841:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
99342.510453:
mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler

Every mtk_crtc_ddp_irq should come with a drm_crtc_send_vblank_event.
However, when the issue happens, the mtk_crtc_ddp_irq keeps firing but
no drm_crtc_send_vblank_event. I suspect this is the main reason
causing flip_done timeout.

In mtk_drm_crtc.c, mtk_crtc->config_updating and mtk_crtc-
>pending_needs_vblank are the conditions that may impact the vblank
event.

static void mtk_drm_finish_page_flip(struct mtk_drm_crtc *mtk_crtc)
{
struct drm_crtc *crtc = &mtk_crtc->base;
unsigned long flags;

drm_crtc_handle_vblank(&mtk_crtc->base);

spin_lock_irqsave(&crtc->dev->event_lock, flags);
==> if (!mtk_crtc->config_updating && mtk_crtc->pending_needs_vblank) {
mtk_drm_crtc_finish_page_flip(mtk_crtc);
mtk_crtc->pending_needs_vblank = false;
}
spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
}


There are 3 lines are called in mtk_drm_crtc_finish_page_flip():
drm_crtc_send_vblank_event(crtc, mtk_crtc->event);
drm_crtc_vblank_put(crtc);
mtk_crtc->event = NULL;

So I want to protect these 3 things to avoid them encountering race
conditions:
mtk_crtc_state->base.event //will be provided on atomic_commit complete
drm_crtc_vblank_get(crtc)
mtk_crtc->event


I have tried to protect this line only:

spin_lock_irqsave(&crtc->dev->event_lock, flags);
mtk_crtc->event = mtk_crtc_state->base.event;
spin_lock_irqrestore(&crtc->dev->event_lock,flags);

I can still see the error message "new event while there is still a
pending event" when I toggled the night light mode.

Maybe that is mtk_crtc->event here:
if (mtk_crtc->event && mtk_crtc_state->base.event)
haven't set to NULL by mtk_drm_crtc_finish_page_flip().

But that not a problem because we will protect mtk_crtc->event here:
mtk_crtc->event = mtk_crtc_state->base.event;


So should we remove that error message since it doesn't help and may
cause the confuse after we fix this issue?


> I'm still able to reproduce it so I gave it a try, and this approach
> also seems to fix the issue. :)
> FWIW, the way I reproduce that is to toggle the night light mode on
> and off repeatedly through the UI panel while playing YouTube videos
> on my device.
>
> Jason, can you post a new version with Angelo's suggestion?
>

OK, I can take Angelo's suggestion. Thanks!
But I am wondering if we should remove this log?
DRM_ERROR("new event while there is still a pending event\n");


Regards,
Jason-JH.Lin

> Regards,
> Fei
>
> >
> > Regards,
> > Angelo
> >
>

Subject: Re: [PATCH] drm/mediatek: Add spinlock for setting vblank event in atomic_begin

Il 19/09/23 09:37, Jason-JH Lin (林睿祥) ha scritto:
> Hi Angelo
>
> Thanks for the reviews.
>
> Hi Fei,
>
> Thanks for the testing.
>
> On Mon, 2023-09-18 at 18:47 +0800, Fei Shao wrote:
>>
>> External email : Please do not click links or open attachments until
>> you have verified the sender or the content.
>> Hi Angelo,
>>
>> On Wed, Sep 13, 2023 at 4:35 PM AngeloGioacchino Del Regno
>> <[email protected]> wrote:
>>>
>>> Il 22/08/23 15:26, Jason-JH.Lin ha scritto:
>>>> Add spinlock protection to avoid race condition on vblank event
>>>> between mtk_drm_crtc_atomic_begin() and
>> mtk_drm_finish_page_flip().
>>>>
>>>
>>> Hello Jason,
>>>
>>> Can you please provide more information about this race condition?
>>> (check below)
>>>
>>>> Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek
>> SoC MT8173.")
>>>> Signed-off-by: Jason-JH.Lin <[email protected]>
>>>> ---
>>>> drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
>> b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
>>>> index d40142842f85..128a672fe3c9 100644
>>>> --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
>>>> +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
>>>> @@ -746,6 +746,9 @@ static void mtk_drm_crtc_atomic_begin(struct
>> drm_crtc *crtc,
>>>
>>>
>> crtc);
>>>> struct mtk_crtc_state *mtk_crtc_state =
>> to_mtk_crtc_state(crtc_state);
>>>> struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
>>>> + unsigned long flags;
>>>> +
>>>> + spin_lock_irqsave(&crtc->dev->event_lock, flags);
>>>>
>>>> if (mtk_crtc->event && mtk_crtc_state->base.event)
>>>> DRM_ERROR("new event while there is still a pending
>> event\n");
>>>> @@ -756,6 +759,8 @@ static void mtk_drm_crtc_atomic_begin(struct
>> drm_crtc *crtc,
>>>
>>> ...because my suspect is that what creates the race condition in
>> this function is
>>> the unlocked *assignment* to mtk_crtc->event, not the rest.
>>>
>>> If I'm right, you don't need to unconditionally spinlock at the
>> beginning of this
>>> function hence ever-so-slightly improving performance compared to
>> this version.
>>>
>>> Can you please try this one and check if this *also* solves the
>> issue?
>>>
>>> if (mtk_crtc_state->base.event) {
>>> mtk_crtc_state->base.event->pipe =
>> drm_crtc_index(crtc);
>>> WARN_ON(drm_crtc_vblank_get(crtc) != 0);
>>>
>>> spin_lock_irqsave(&crtc->dev->event_lock, flags);
>>> mtk_crtc->event = mtk_crtc_state->base.event;
>>> spin_lock_irqrestore(&crtc->dev->event_lock,
>> flags);
>>>
>>> mtk_crtc_state->base.event = NULL;
>>> }
>>>
>>> P.S.: I'd try that myself, but I can't seem to reproduce the issue.
>>
> I can't reproduce the system hang issue reported from customer by
> toggling the night light mode in UI panel either.
> But I can see the error message "new event while there is still a
> pending event" when I was reproducing the issue.
>
> Here is the debug info from "Wei-Shun <[email protected]>":
>
> From the kernel tracing:
> 99342.377173: mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.377226:
> drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
> 99342.393831:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.393887:
> drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
> 99342.410469:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.410519:
> drm_crtc_send_vblank_event <-mtk_crtc_ddp_irq
> 99342.427094:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.443831:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.460495:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.477157:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.493841:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
> 99342.510453:
> mtk_crtc_ddp_irq <-mtk_disp_ovl_irq_handler
>
> Every mtk_crtc_ddp_irq should come with a drm_crtc_send_vblank_event.
> However, when the issue happens, the mtk_crtc_ddp_irq keeps firing but
> no drm_crtc_send_vblank_event. I suspect this is the main reason
> causing flip_done timeout.
>
> In mtk_drm_crtc.c, mtk_crtc->config_updating and mtk_crtc-
>> pending_needs_vblank are the conditions that may impact the vblank
> event.
>
> static void mtk_drm_finish_page_flip(struct mtk_drm_crtc *mtk_crtc)
> {
> struct drm_crtc *crtc = &mtk_crtc->base;
> unsigned long flags;
>
> drm_crtc_handle_vblank(&mtk_crtc->base);
>
> spin_lock_irqsave(&crtc->dev->event_lock, flags);
> ==> if (!mtk_crtc->config_updating && mtk_crtc->pending_needs_vblank) {
> mtk_drm_crtc_finish_page_flip(mtk_crtc);
> mtk_crtc->pending_needs_vblank = false;
> }
> spin_unlock_irqrestore(&crtc->dev->event_lock, flags);
> }
>
>
> There are 3 lines are called in mtk_drm_crtc_finish_page_flip():
> drm_crtc_send_vblank_event(crtc, mtk_crtc->event);
> drm_crtc_vblank_put(crtc);
> mtk_crtc->event = NULL;
>
> So I want to protect these 3 things to avoid them encountering race
> conditions:
> mtk_crtc_state->base.event //will be provided on atomic_commit complete
> drm_crtc_vblank_get(crtc)
> mtk_crtc->event
>
>
> I have tried to protect this line only:
>
> spin_lock_irqsave(&crtc->dev->event_lock, flags);
> mtk_crtc->event = mtk_crtc_state->base.event;
> spin_lock_irqrestore(&crtc->dev->event_lock,flags);
>
> I can still see the error message "new event while there is still a
> pending event" when I toggled the night light mode.
>
> Maybe that is mtk_crtc->event here:
> if (mtk_crtc->event && mtk_crtc_state->base.event)
> haven't set to NULL by mtk_drm_crtc_finish_page_flip().
>
> But that not a problem because we will protect mtk_crtc->event here:
> mtk_crtc->event = mtk_crtc_state->base.event;
>
>
> So should we remove that error message since it doesn't help and may
> cause the confuse after we fix this issue?
>
>
>> I'm still able to reproduce it so I gave it a try, and this approach
>> also seems to fix the issue. :)
>> FWIW, the way I reproduce that is to toggle the night light mode on
>> and off repeatedly through the UI panel while playing YouTube videos
>> on my device.
>>
>> Jason, can you post a new version with Angelo's suggestion?
>>
>
> OK, I can take Angelo's suggestion. Thanks!
> But I am wondering if we should remove this log?
> DRM_ERROR("new event while there is still a pending event\n");
>

Thanks for the analysis.

Please don't remove that line, this indicates that there might be another issue
somewhere else, and someone will tackle that later.

Regards,
Angelo

>
> Regards,
> Jason-JH.Lin
>
>> Regards,
>> Fei
>>
>>>
>>> Regards,
>>> Angelo
>>>
>>