2021-09-30 13:50:46

by Enric Balletbo Serra

[permalink] [raw]
Subject: Re: [v2 PATCH 1/3] drm/mediatek: Fix crash at using pkt->cl->chan in cmdq_pkt_finalize

Hi Chun-Kuang,

Missatge de Chun-Kuang Hu <[email protected]> del dia dj., 30 de
set. 2021 a les 15:11:
>
> Hi, Enric:
>
> Enric Balletbo Serra <[email protected]> 於 2021年9月30日 週四 下午3:12寫道:
> >
> > Hi Jason,
> >
> >
> > Missatge de jason-jh.lin <[email protected]> del dia dj., 30
> > de set. 2021 a les 4:47:
> > >
> > > Because mtk_drm_crtc_create_pkt didn't assign pkt->cl, it will
> > > crash at using pkt->cl->chan in cmdq_pkt_finalize.
> > >
> > > So add struct cmdq_client and let mtk_drm_crtc instance define
> > > cmdq_client as:
> > >
> > > struct mtk_drm_crtc {
> > > /* client instance data */
> > > struct cmdq_client cmdq_client;
> > > };
> > >
> > > and in rx_callback function can use pkt->cl to get
> > > struct cmdq_client.
> > >
> > > Fixes: f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> >
> > Looking at this patchset looks like you're fixing the above commit by
> > reintroducing the 'struct cmdq_client' again, which makes the above
> > commit as a non-sense commit. That's confusing and not clear. I'm
> > wondering if it wouldn't be more clear if you can just revert that
> > patch. Then if there are more changes that need to be done do it with
> > a follow up patch and really explain why these changes are needed.
>
> The patch f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> does two things. One is to remove struct cmdq_client, another one is
> to embed cmdq_cl

Then it should have been two patches, one thing for patch really
helps, specially when something breaks and you try to bisect it.

> in mtk_drm_crtc (This means the pointer of cmdq_cl could be used to
> find the pointer of mtk_drm_crtc). The correct way to fix that patch
> is to remove the access to cmdq_client in cmdq_pkt_finalize(), but
> that would be a long term process. The simple way is to revert that
> patch, but the other patches depend on embedding cmdq_cl in
> mtk_drm_crtc. So this patch just revert the removing of struct
> cmdq_client but keep embedding cmdq_cl in mtk_drm_crtc.
>

Yes, I know and I suffered that when bisecting and I ended to revert
the full series in my local tree, although I figured out that the
problem was this specific patch.

The following series landed during -rc1 cycle and break the Acer Chromebook R13

9efb16c2fdd6 ("drm/mediatek: Clear pending flag when cmdq packet is done")
bc9241be73d9 ("drm/mediatek: Add cmdq_handle in mtk_crtc")
8cdcb3653424 ("drm/mediatek: Detect CMDQ execution timeout")
f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
c1ec54b7b5af ("drm/mediatek: Use mailbox rx_callback instead of cmdq_task_cb")

Apart from that it was a pain bisecting and introduced different
behaviours between patches, all the above commits have a follow-up
patch (see [1] and [2]) as a fix for the landed series. That makes me
think that were no stable enough. As we're in the rc, and as you said
this is not the correct way to fix it, and the landed patches seems
more a cleanup that really solving a real problem I'd consider to just
revert the full series and resubmit again for next release with these
fixes squashed. IMO that will also help to no miss anything when
someone would backport all this to the stable versions and understand
better the history.

Just my 5 cents. In any case, I can confirm that applying the full
series solves the current problems that I have with my Acer Chromebook
R13.

Thanks,
Enric

[1] https://patchwork.kernel.org/project/linux-mediatek/list/?series=555383
[2] https://patchwork.kernel.org/project/linux-mediatek/list/?series=554767



> Regards,
> Chun-Kuang.
>
> >
> > Thanks,
> > Enric
> >
> >
> > > Signed-off-by: jason-jh.lin <[email protected]>
> > > ---
> > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 73 +++++++++++++------------
> > > 1 file changed, 38 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > index 5f81489fc60c..411d99fcbb8f 100644
> > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > @@ -52,8 +52,7 @@ struct mtk_drm_crtc {
> > > bool pending_async_planes;
> > >
> > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > - struct mbox_client cmdq_cl;
> > > - struct mbox_chan *cmdq_chan;
> > > + struct cmdq_client cmdq_client;
> > > struct cmdq_pkt cmdq_handle;
> > > u32 cmdq_event;
> > > u32 cmdq_vblank_cnt;
> > > @@ -227,8 +226,8 @@ struct mtk_ddp_comp *mtk_drm_ddp_comp_for_plane(struct drm_crtc *crtc,
> > > }
> > >
> > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > -static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > - size_t size)
> > > +static int mtk_drm_cmdq_pkt_create(struct cmdq_client *client, struct cmdq_pkt *pkt,
> > > + size_t size)
> > > {
> > > struct device *dev;
> > > dma_addr_t dma_addr;
> > > @@ -239,8 +238,9 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > return -ENOMEM;
> > > }
> > > pkt->buf_size = size;
> > > + pkt->cl = (void *)client;
> > >
> > > - dev = chan->mbox->dev;
> > > + dev = client->chan->mbox->dev;
> > > dma_addr = dma_map_single(dev, pkt->va_base, pkt->buf_size,
> > > DMA_TO_DEVICE);
> > > if (dma_mapping_error(dev, dma_addr)) {
> > > @@ -255,9 +255,11 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > return 0;
> > > }
> > >
> > > -static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pkt)
> > > +static void mtk_drm_cmdq_pkt_destroy(struct cmdq_pkt *pkt)
> > > {
> > > - dma_unmap_single(chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > + struct cmdq_client *client = (struct cmdq_client *)pkt->cl;
> > > +
> > > + dma_unmap_single(client->chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > DMA_TO_DEVICE);
> > > kfree(pkt->va_base);
> > > kfree(pkt);
> > > @@ -265,8 +267,9 @@ static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pk
> > >
> > > static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > {
> > > - struct mtk_drm_crtc *mtk_crtc = container_of(cl, struct mtk_drm_crtc, cmdq_cl);
> > > struct cmdq_cb_data *data = mssg;
> > > + struct cmdq_client *cmdq_cl = container_of(cl, struct cmdq_client, client);
> > > + struct mtk_drm_crtc *mtk_crtc = container_of(cmdq_cl, struct mtk_drm_crtc, cmdq_client);
> > > struct mtk_crtc_state *state;
> > > unsigned int i;
> > >
> > > @@ -299,7 +302,7 @@ static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > }
> > >
> > > mtk_crtc->cmdq_vblank_cnt = 0;
> > > - mtk_drm_cmdq_pkt_destroy(mtk_crtc->cmdq_chan, data->pkt);
> > > + mtk_drm_cmdq_pkt_destroy(data->pkt);
> > > }
> > > #endif
> > >
> > > @@ -550,24 +553,24 @@ static void mtk_drm_crtc_update_config(struct mtk_drm_crtc *mtk_crtc,
> > > mtk_mutex_release(mtk_crtc->mutex);
> > > }
> > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > - if (mtk_crtc->cmdq_chan) {
> > > - mbox_flush(mtk_crtc->cmdq_chan, 2000);
> > > + if (mtk_crtc->cmdq_client.chan) {
> > > + mbox_flush(mtk_crtc->cmdq_client.chan, 2000);
> > > cmdq_handle->cmd_buf_size = 0;
> > > cmdq_pkt_clear_event(cmdq_handle, mtk_crtc->cmdq_event);
> > > cmdq_pkt_wfe(cmdq_handle, mtk_crtc->cmdq_event, false);
> > > mtk_crtc_ddp_config(crtc, cmdq_handle);
> > > cmdq_pkt_finalize(cmdq_handle);
> > > - dma_sync_single_for_device(mtk_crtc->cmdq_chan->mbox->dev,
> > > - cmdq_handle->pa_base,
> > > - cmdq_handle->cmd_buf_size,
> > > - DMA_TO_DEVICE);
> > > + dma_sync_single_for_device(mtk_crtc->cmdq_client.chan->mbox->dev,
> > > + cmdq_handle->pa_base,
> > > + cmdq_handle->cmd_buf_size,
> > > + DMA_TO_DEVICE);
> > > /*
> > > * CMDQ command should execute in next vblank,
> > > * If it fail to execute in next 2 vblank, timeout happen.
> > > */
> > > mtk_crtc->cmdq_vblank_cnt = 2;
> > > - mbox_send_message(mtk_crtc->cmdq_chan, cmdq_handle);
> > > - mbox_client_txdone(mtk_crtc->cmdq_chan, 0);
> > > + mbox_send_message(mtk_crtc->cmdq_client.chan, cmdq_handle);
> > > + mbox_client_txdone(mtk_crtc->cmdq_client.chan, 0);
> > > }
> > > #endif
> > > mtk_crtc->config_updating = false;
> > > @@ -581,7 +584,7 @@ static void mtk_crtc_ddp_irq(void *data)
> > > struct mtk_drm_private *priv = crtc->dev->dev_private;
> > >
> > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > - if (!priv->data->shadow_register && !mtk_crtc->cmdq_chan)
> > > + if (!priv->data->shadow_register && !mtk_crtc->cmdq_client.chan)
> > > mtk_crtc_ddp_config(crtc, NULL);
> > > else if (mtk_crtc->cmdq_vblank_cnt > 0 && --mtk_crtc->cmdq_vblank_cnt == 0)
> > > DRM_ERROR("mtk_crtc %d CMDQ execute command timeout!\n",
> > > @@ -924,20 +927,20 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > mutex_init(&mtk_crtc->hw_lock);
> > >
> > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > - mtk_crtc->cmdq_cl.dev = mtk_crtc->mmsys_dev;
> > > - mtk_crtc->cmdq_cl.tx_block = false;
> > > - mtk_crtc->cmdq_cl.knows_txdone = true;
> > > - mtk_crtc->cmdq_cl.rx_callback = ddp_cmdq_cb;
> > > - mtk_crtc->cmdq_chan =
> > > - mbox_request_channel(&mtk_crtc->cmdq_cl,
> > > - drm_crtc_index(&mtk_crtc->base));
> > > - if (IS_ERR(mtk_crtc->cmdq_chan)) {
> > > + mtk_crtc->cmdq_client.client.dev = mtk_crtc->mmsys_dev;
> > > + mtk_crtc->cmdq_client.client.tx_block = false;
> > > + mtk_crtc->cmdq_client.client.knows_txdone = true;
> > > + mtk_crtc->cmdq_client.client.rx_callback = ddp_cmdq_cb;
> > > + mtk_crtc->cmdq_client.chan =
> > > + mbox_request_channel(&mtk_crtc->cmdq_client.client,
> > > + drm_crtc_index(&mtk_crtc->base));
> > > + if (IS_ERR(mtk_crtc->cmdq_client.chan)) {
> > > dev_dbg(dev, "mtk_crtc %d failed to create mailbox client, writing register by CPU now\n",
> > > drm_crtc_index(&mtk_crtc->base));
> > > - mtk_crtc->cmdq_chan = NULL;
> > > + mtk_crtc->cmdq_client.chan = NULL;
> > > }
> > >
> > > - if (mtk_crtc->cmdq_chan) {
> > > + if (mtk_crtc->cmdq_client.chan) {
> > > ret = of_property_read_u32_index(priv->mutex_node,
> > > "mediatek,gce-events",
> > > drm_crtc_index(&mtk_crtc->base),
> > > @@ -945,17 +948,17 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > if (ret) {
> > > dev_dbg(dev, "mtk_crtc %d failed to get mediatek,gce-events property\n",
> > > drm_crtc_index(&mtk_crtc->base));
> > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > - mtk_crtc->cmdq_chan = NULL;
> > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > + mtk_crtc->cmdq_client.chan = NULL;
> > > } else {
> > > - ret = mtk_drm_cmdq_pkt_create(mtk_crtc->cmdq_chan,
> > > - &mtk_crtc->cmdq_handle,
> > > - PAGE_SIZE);
> > > + ret = mtk_drm_cmdq_pkt_create(&mtk_crtc->cmdq_client,
> > > + &mtk_crtc->cmdq_handle,
> > > + PAGE_SIZE);
> > > if (ret) {
> > > dev_dbg(dev, "mtk_crtc %d failed to create cmdq packet\n",
> > > drm_crtc_index(&mtk_crtc->base));
> > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > - mtk_crtc->cmdq_chan = NULL;
> > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > + mtk_crtc->cmdq_client.chan = NULL;
> > > }
> > > }
> > > }
> > > --
> > > 2.18.0
> > >


2021-10-01 18:40:10

by Chun-Kuang Hu

[permalink] [raw]
Subject: Re: [v2 PATCH 1/3] drm/mediatek: Fix crash at using pkt->cl->chan in cmdq_pkt_finalize

Hi, Enric:

Enric Balletbo Serra <[email protected]> 於 2021年9月30日 週四 下午9:48寫道:
>
> Hi Chun-Kuang,
>
> Missatge de Chun-Kuang Hu <[email protected]> del dia dj., 30 de
> set. 2021 a les 15:11:
> >
> > Hi, Enric:
> >
> > Enric Balletbo Serra <[email protected]> 於 2021年9月30日 週四 下午3:12寫道:
> > >
> > > Hi Jason,
> > >
> > >
> > > Missatge de jason-jh.lin <[email protected]> del dia dj., 30
> > > de set. 2021 a les 4:47:
> > > >
> > > > Because mtk_drm_crtc_create_pkt didn't assign pkt->cl, it will
> > > > crash at using pkt->cl->chan in cmdq_pkt_finalize.
> > > >
> > > > So add struct cmdq_client and let mtk_drm_crtc instance define
> > > > cmdq_client as:
> > > >
> > > > struct mtk_drm_crtc {
> > > > /* client instance data */
> > > > struct cmdq_client cmdq_client;
> > > > };
> > > >
> > > > and in rx_callback function can use pkt->cl to get
> > > > struct cmdq_client.
> > > >
> > > > Fixes: f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> > >
> > > Looking at this patchset looks like you're fixing the above commit by
> > > reintroducing the 'struct cmdq_client' again, which makes the above
> > > commit as a non-sense commit. That's confusing and not clear. I'm
> > > wondering if it wouldn't be more clear if you can just revert that
> > > patch. Then if there are more changes that need to be done do it with
> > > a follow up patch and really explain why these changes are needed.
> >
> > The patch f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> > does two things. One is to remove struct cmdq_client, another one is
> > to embed cmdq_cl
>
> Then it should have been two patches, one thing for patch really
> helps, specially when something breaks and you try to bisect it.
>
> > in mtk_drm_crtc (This means the pointer of cmdq_cl could be used to
> > find the pointer of mtk_drm_crtc). The correct way to fix that patch
> > is to remove the access to cmdq_client in cmdq_pkt_finalize(), but
> > that would be a long term process. The simple way is to revert that
> > patch, but the other patches depend on embedding cmdq_cl in
> > mtk_drm_crtc. So this patch just revert the removing of struct
> > cmdq_client but keep embedding cmdq_cl in mtk_drm_crtc.
> >
>
> Yes, I know and I suffered that when bisecting and I ended to revert
> the full series in my local tree, although I figured out that the
> problem was this specific patch.
>
> The following series landed during -rc1 cycle and break the Acer Chromebook R13
>
> 9efb16c2fdd6 ("drm/mediatek: Clear pending flag when cmdq packet is done")
> bc9241be73d9 ("drm/mediatek: Add cmdq_handle in mtk_crtc")
> 8cdcb3653424 ("drm/mediatek: Detect CMDQ execution timeout")
> f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> c1ec54b7b5af ("drm/mediatek: Use mailbox rx_callback instead of cmdq_task_cb")
>
> Apart from that it was a pain bisecting and introduced different
> behaviours between patches, all the above commits have a follow-up
> patch (see [1] and [2]) as a fix for the landed series. That makes me
> think that were no stable enough. As we're in the rc, and as you said
> this is not the correct way to fix it, and the landed patches seems
> more a cleanup that really solving a real problem I'd consider to just
> revert the full series and resubmit again for next release with these
> fixes squashed. IMO that will also help to no miss anything when
> someone would backport all this to the stable versions and understand
> better the history.
>
> Just my 5 cents. In any case, I can confirm that applying the full
> series solves the current problems that I have with my Acer Chromebook
> R13.

OK, that series depend on an WARN_ON fixes in mailbox driver, and need
a better solution in cmdq helper, so let's revert that series first.
Would you like to send the revert patches? Or I send the revert
patches and let you test?

Regards,
Chun-Kuang.

>
> Thanks,
> Enric
>
> [1] https://patchwork.kernel.org/project/linux-mediatek/list/?series=555383
> [2] https://patchwork.kernel.org/project/linux-mediatek/list/?series=554767
>
>
>
> > Regards,
> > Chun-Kuang.
> >
> > >
> > > Thanks,
> > > Enric
> > >
> > >
> > > > Signed-off-by: jason-jh.lin <[email protected]>
> > > > ---
> > > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 73 +++++++++++++------------
> > > > 1 file changed, 38 insertions(+), 35 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > index 5f81489fc60c..411d99fcbb8f 100644
> > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > @@ -52,8 +52,7 @@ struct mtk_drm_crtc {
> > > > bool pending_async_planes;
> > > >
> > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > - struct mbox_client cmdq_cl;
> > > > - struct mbox_chan *cmdq_chan;
> > > > + struct cmdq_client cmdq_client;
> > > > struct cmdq_pkt cmdq_handle;
> > > > u32 cmdq_event;
> > > > u32 cmdq_vblank_cnt;
> > > > @@ -227,8 +226,8 @@ struct mtk_ddp_comp *mtk_drm_ddp_comp_for_plane(struct drm_crtc *crtc,
> > > > }
> > > >
> > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > -static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > - size_t size)
> > > > +static int mtk_drm_cmdq_pkt_create(struct cmdq_client *client, struct cmdq_pkt *pkt,
> > > > + size_t size)
> > > > {
> > > > struct device *dev;
> > > > dma_addr_t dma_addr;
> > > > @@ -239,8 +238,9 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > return -ENOMEM;
> > > > }
> > > > pkt->buf_size = size;
> > > > + pkt->cl = (void *)client;
> > > >
> > > > - dev = chan->mbox->dev;
> > > > + dev = client->chan->mbox->dev;
> > > > dma_addr = dma_map_single(dev, pkt->va_base, pkt->buf_size,
> > > > DMA_TO_DEVICE);
> > > > if (dma_mapping_error(dev, dma_addr)) {
> > > > @@ -255,9 +255,11 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > return 0;
> > > > }
> > > >
> > > > -static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pkt)
> > > > +static void mtk_drm_cmdq_pkt_destroy(struct cmdq_pkt *pkt)
> > > > {
> > > > - dma_unmap_single(chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > > + struct cmdq_client *client = (struct cmdq_client *)pkt->cl;
> > > > +
> > > > + dma_unmap_single(client->chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > > DMA_TO_DEVICE);
> > > > kfree(pkt->va_base);
> > > > kfree(pkt);
> > > > @@ -265,8 +267,9 @@ static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pk
> > > >
> > > > static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > > {
> > > > - struct mtk_drm_crtc *mtk_crtc = container_of(cl, struct mtk_drm_crtc, cmdq_cl);
> > > > struct cmdq_cb_data *data = mssg;
> > > > + struct cmdq_client *cmdq_cl = container_of(cl, struct cmdq_client, client);
> > > > + struct mtk_drm_crtc *mtk_crtc = container_of(cmdq_cl, struct mtk_drm_crtc, cmdq_client);
> > > > struct mtk_crtc_state *state;
> > > > unsigned int i;
> > > >
> > > > @@ -299,7 +302,7 @@ static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > > }
> > > >
> > > > mtk_crtc->cmdq_vblank_cnt = 0;
> > > > - mtk_drm_cmdq_pkt_destroy(mtk_crtc->cmdq_chan, data->pkt);
> > > > + mtk_drm_cmdq_pkt_destroy(data->pkt);
> > > > }
> > > > #endif
> > > >
> > > > @@ -550,24 +553,24 @@ static void mtk_drm_crtc_update_config(struct mtk_drm_crtc *mtk_crtc,
> > > > mtk_mutex_release(mtk_crtc->mutex);
> > > > }
> > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > - if (mtk_crtc->cmdq_chan) {
> > > > - mbox_flush(mtk_crtc->cmdq_chan, 2000);
> > > > + if (mtk_crtc->cmdq_client.chan) {
> > > > + mbox_flush(mtk_crtc->cmdq_client.chan, 2000);
> > > > cmdq_handle->cmd_buf_size = 0;
> > > > cmdq_pkt_clear_event(cmdq_handle, mtk_crtc->cmdq_event);
> > > > cmdq_pkt_wfe(cmdq_handle, mtk_crtc->cmdq_event, false);
> > > > mtk_crtc_ddp_config(crtc, cmdq_handle);
> > > > cmdq_pkt_finalize(cmdq_handle);
> > > > - dma_sync_single_for_device(mtk_crtc->cmdq_chan->mbox->dev,
> > > > - cmdq_handle->pa_base,
> > > > - cmdq_handle->cmd_buf_size,
> > > > - DMA_TO_DEVICE);
> > > > + dma_sync_single_for_device(mtk_crtc->cmdq_client.chan->mbox->dev,
> > > > + cmdq_handle->pa_base,
> > > > + cmdq_handle->cmd_buf_size,
> > > > + DMA_TO_DEVICE);
> > > > /*
> > > > * CMDQ command should execute in next vblank,
> > > > * If it fail to execute in next 2 vblank, timeout happen.
> > > > */
> > > > mtk_crtc->cmdq_vblank_cnt = 2;
> > > > - mbox_send_message(mtk_crtc->cmdq_chan, cmdq_handle);
> > > > - mbox_client_txdone(mtk_crtc->cmdq_chan, 0);
> > > > + mbox_send_message(mtk_crtc->cmdq_client.chan, cmdq_handle);
> > > > + mbox_client_txdone(mtk_crtc->cmdq_client.chan, 0);
> > > > }
> > > > #endif
> > > > mtk_crtc->config_updating = false;
> > > > @@ -581,7 +584,7 @@ static void mtk_crtc_ddp_irq(void *data)
> > > > struct mtk_drm_private *priv = crtc->dev->dev_private;
> > > >
> > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > - if (!priv->data->shadow_register && !mtk_crtc->cmdq_chan)
> > > > + if (!priv->data->shadow_register && !mtk_crtc->cmdq_client.chan)
> > > > mtk_crtc_ddp_config(crtc, NULL);
> > > > else if (mtk_crtc->cmdq_vblank_cnt > 0 && --mtk_crtc->cmdq_vblank_cnt == 0)
> > > > DRM_ERROR("mtk_crtc %d CMDQ execute command timeout!\n",
> > > > @@ -924,20 +927,20 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > > mutex_init(&mtk_crtc->hw_lock);
> > > >
> > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > - mtk_crtc->cmdq_cl.dev = mtk_crtc->mmsys_dev;
> > > > - mtk_crtc->cmdq_cl.tx_block = false;
> > > > - mtk_crtc->cmdq_cl.knows_txdone = true;
> > > > - mtk_crtc->cmdq_cl.rx_callback = ddp_cmdq_cb;
> > > > - mtk_crtc->cmdq_chan =
> > > > - mbox_request_channel(&mtk_crtc->cmdq_cl,
> > > > - drm_crtc_index(&mtk_crtc->base));
> > > > - if (IS_ERR(mtk_crtc->cmdq_chan)) {
> > > > + mtk_crtc->cmdq_client.client.dev = mtk_crtc->mmsys_dev;
> > > > + mtk_crtc->cmdq_client.client.tx_block = false;
> > > > + mtk_crtc->cmdq_client.client.knows_txdone = true;
> > > > + mtk_crtc->cmdq_client.client.rx_callback = ddp_cmdq_cb;
> > > > + mtk_crtc->cmdq_client.chan =
> > > > + mbox_request_channel(&mtk_crtc->cmdq_client.client,
> > > > + drm_crtc_index(&mtk_crtc->base));
> > > > + if (IS_ERR(mtk_crtc->cmdq_client.chan)) {
> > > > dev_dbg(dev, "mtk_crtc %d failed to create mailbox client, writing register by CPU now\n",
> > > > drm_crtc_index(&mtk_crtc->base));
> > > > - mtk_crtc->cmdq_chan = NULL;
> > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > }
> > > >
> > > > - if (mtk_crtc->cmdq_chan) {
> > > > + if (mtk_crtc->cmdq_client.chan) {
> > > > ret = of_property_read_u32_index(priv->mutex_node,
> > > > "mediatek,gce-events",
> > > > drm_crtc_index(&mtk_crtc->base),
> > > > @@ -945,17 +948,17 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > > if (ret) {
> > > > dev_dbg(dev, "mtk_crtc %d failed to get mediatek,gce-events property\n",
> > > > drm_crtc_index(&mtk_crtc->base));
> > > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > > - mtk_crtc->cmdq_chan = NULL;
> > > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > } else {
> > > > - ret = mtk_drm_cmdq_pkt_create(mtk_crtc->cmdq_chan,
> > > > - &mtk_crtc->cmdq_handle,
> > > > - PAGE_SIZE);
> > > > + ret = mtk_drm_cmdq_pkt_create(&mtk_crtc->cmdq_client,
> > > > + &mtk_crtc->cmdq_handle,
> > > > + PAGE_SIZE);
> > > > if (ret) {
> > > > dev_dbg(dev, "mtk_crtc %d failed to create cmdq packet\n",
> > > > drm_crtc_index(&mtk_crtc->base));
> > > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > > - mtk_crtc->cmdq_chan = NULL;
> > > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > }
> > > > }
> > > > }
> > > > --
> > > > 2.18.0
> > > >

2021-10-06 11:06:51

by Enric Balletbo Serra

[permalink] [raw]
Subject: Re: [v2 PATCH 1/3] drm/mediatek: Fix crash at using pkt->cl->chan in cmdq_pkt_finalize

Hi Chun-Kuang,

Missatge de Chun-Kuang Hu <[email protected]> del dia dv., 1
d’oct. 2021 a les 17:52:
>
> Hi, Enric:
>
> Enric Balletbo Serra <[email protected]> 於 2021年9月30日 週四 下午9:48寫道:
> >
> > Hi Chun-Kuang,
> >
> > Missatge de Chun-Kuang Hu <[email protected]> del dia dj., 30 de
> > set. 2021 a les 15:11:
> > >
> > > Hi, Enric:
> > >
> > > Enric Balletbo Serra <[email protected]> 於 2021年9月30日 週四 下午3:12寫道:
> > > >
> > > > Hi Jason,
> > > >
> > > >
> > > > Missatge de jason-jh.lin <[email protected]> del dia dj., 30
> > > > de set. 2021 a les 4:47:
> > > > >
> > > > > Because mtk_drm_crtc_create_pkt didn't assign pkt->cl, it will
> > > > > crash at using pkt->cl->chan in cmdq_pkt_finalize.
> > > > >
> > > > > So add struct cmdq_client and let mtk_drm_crtc instance define
> > > > > cmdq_client as:
> > > > >
> > > > > struct mtk_drm_crtc {
> > > > > /* client instance data */
> > > > > struct cmdq_client cmdq_client;
> > > > > };
> > > > >
> > > > > and in rx_callback function can use pkt->cl to get
> > > > > struct cmdq_client.
> > > > >
> > > > > Fixes: f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> > > >
> > > > Looking at this patchset looks like you're fixing the above commit by
> > > > reintroducing the 'struct cmdq_client' again, which makes the above
> > > > commit as a non-sense commit. That's confusing and not clear. I'm
> > > > wondering if it wouldn't be more clear if you can just revert that
> > > > patch. Then if there are more changes that need to be done do it with
> > > > a follow up patch and really explain why these changes are needed.
> > >
> > > The patch f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> > > does two things. One is to remove struct cmdq_client, another one is
> > > to embed cmdq_cl
> >
> > Then it should have been two patches, one thing for patch really
> > helps, specially when something breaks and you try to bisect it.
> >
> > > in mtk_drm_crtc (This means the pointer of cmdq_cl could be used to
> > > find the pointer of mtk_drm_crtc). The correct way to fix that patch
> > > is to remove the access to cmdq_client in cmdq_pkt_finalize(), but
> > > that would be a long term process. The simple way is to revert that
> > > patch, but the other patches depend on embedding cmdq_cl in
> > > mtk_drm_crtc. So this patch just revert the removing of struct
> > > cmdq_client but keep embedding cmdq_cl in mtk_drm_crtc.
> > >
> >
> > Yes, I know and I suffered that when bisecting and I ended to revert
> > the full series in my local tree, although I figured out that the
> > problem was this specific patch.
> >
> > The following series landed during -rc1 cycle and break the Acer Chromebook R13
> >
> > 9efb16c2fdd6 ("drm/mediatek: Clear pending flag when cmdq packet is done")
> > bc9241be73d9 ("drm/mediatek: Add cmdq_handle in mtk_crtc")
> > 8cdcb3653424 ("drm/mediatek: Detect CMDQ execution timeout")
> > f4be17cd5b14 ("drm/mediatek: Remove struct cmdq_client")
> > c1ec54b7b5af ("drm/mediatek: Use mailbox rx_callback instead of cmdq_task_cb")
> >
> > Apart from that it was a pain bisecting and introduced different
> > behaviours between patches, all the above commits have a follow-up
> > patch (see [1] and [2]) as a fix for the landed series. That makes me
> > think that were no stable enough. As we're in the rc, and as you said
> > this is not the correct way to fix it, and the landed patches seems
> > more a cleanup that really solving a real problem I'd consider to just
> > revert the full series and resubmit again for next release with these
> > fixes squashed. IMO that will also help to no miss anything when
> > someone would backport all this to the stable versions and understand
> > better the history.
> >
> > Just my 5 cents. In any case, I can confirm that applying the full
> > series solves the current problems that I have with my Acer Chromebook
> > R13.
>
> OK, that series depend on an WARN_ON fixes in mailbox driver, and need
> a better solution in cmdq helper, so let's revert that series first.
> Would you like to send the revert patches? Or I send the revert
> patches and let you test?
>

I'll let you to send the revert patches :-)

Thanks,
Enric

> Regards,
> Chun-Kuang.
>
> >
> > Thanks,
> > Enric
> >
> > [1] https://patchwork.kernel.org/project/linux-mediatek/list/?series=555383
> > [2] https://patchwork.kernel.org/project/linux-mediatek/list/?series=554767
> >
> >
> >
> > > Regards,
> > > Chun-Kuang.
> > >
> > > >
> > > > Thanks,
> > > > Enric
> > > >
> > > >
> > > > > Signed-off-by: jason-jh.lin <[email protected]>
> > > > > ---
> > > > > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 73 +++++++++++++------------
> > > > > 1 file changed, 38 insertions(+), 35 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > > index 5f81489fc60c..411d99fcbb8f 100644
> > > > > --- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > > +++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
> > > > > @@ -52,8 +52,7 @@ struct mtk_drm_crtc {
> > > > > bool pending_async_planes;
> > > > >
> > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > > - struct mbox_client cmdq_cl;
> > > > > - struct mbox_chan *cmdq_chan;
> > > > > + struct cmdq_client cmdq_client;
> > > > > struct cmdq_pkt cmdq_handle;
> > > > > u32 cmdq_event;
> > > > > u32 cmdq_vblank_cnt;
> > > > > @@ -227,8 +226,8 @@ struct mtk_ddp_comp *mtk_drm_ddp_comp_for_plane(struct drm_crtc *crtc,
> > > > > }
> > > > >
> > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > > -static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > > - size_t size)
> > > > > +static int mtk_drm_cmdq_pkt_create(struct cmdq_client *client, struct cmdq_pkt *pkt,
> > > > > + size_t size)
> > > > > {
> > > > > struct device *dev;
> > > > > dma_addr_t dma_addr;
> > > > > @@ -239,8 +238,9 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > > return -ENOMEM;
> > > > > }
> > > > > pkt->buf_size = size;
> > > > > + pkt->cl = (void *)client;
> > > > >
> > > > > - dev = chan->mbox->dev;
> > > > > + dev = client->chan->mbox->dev;
> > > > > dma_addr = dma_map_single(dev, pkt->va_base, pkt->buf_size,
> > > > > DMA_TO_DEVICE);
> > > > > if (dma_mapping_error(dev, dma_addr)) {
> > > > > @@ -255,9 +255,11 @@ static int mtk_drm_cmdq_pkt_create(struct mbox_chan *chan, struct cmdq_pkt *pkt,
> > > > > return 0;
> > > > > }
> > > > >
> > > > > -static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pkt)
> > > > > +static void mtk_drm_cmdq_pkt_destroy(struct cmdq_pkt *pkt)
> > > > > {
> > > > > - dma_unmap_single(chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > > > + struct cmdq_client *client = (struct cmdq_client *)pkt->cl;
> > > > > +
> > > > > + dma_unmap_single(client->chan->mbox->dev, pkt->pa_base, pkt->buf_size,
> > > > > DMA_TO_DEVICE);
> > > > > kfree(pkt->va_base);
> > > > > kfree(pkt);
> > > > > @@ -265,8 +267,9 @@ static void mtk_drm_cmdq_pkt_destroy(struct mbox_chan *chan, struct cmdq_pkt *pk
> > > > >
> > > > > static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > > > {
> > > > > - struct mtk_drm_crtc *mtk_crtc = container_of(cl, struct mtk_drm_crtc, cmdq_cl);
> > > > > struct cmdq_cb_data *data = mssg;
> > > > > + struct cmdq_client *cmdq_cl = container_of(cl, struct cmdq_client, client);
> > > > > + struct mtk_drm_crtc *mtk_crtc = container_of(cmdq_cl, struct mtk_drm_crtc, cmdq_client);
> > > > > struct mtk_crtc_state *state;
> > > > > unsigned int i;
> > > > >
> > > > > @@ -299,7 +302,7 @@ static void ddp_cmdq_cb(struct mbox_client *cl, void *mssg)
> > > > > }
> > > > >
> > > > > mtk_crtc->cmdq_vblank_cnt = 0;
> > > > > - mtk_drm_cmdq_pkt_destroy(mtk_crtc->cmdq_chan, data->pkt);
> > > > > + mtk_drm_cmdq_pkt_destroy(data->pkt);
> > > > > }
> > > > > #endif
> > > > >
> > > > > @@ -550,24 +553,24 @@ static void mtk_drm_crtc_update_config(struct mtk_drm_crtc *mtk_crtc,
> > > > > mtk_mutex_release(mtk_crtc->mutex);
> > > > > }
> > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > > - if (mtk_crtc->cmdq_chan) {
> > > > > - mbox_flush(mtk_crtc->cmdq_chan, 2000);
> > > > > + if (mtk_crtc->cmdq_client.chan) {
> > > > > + mbox_flush(mtk_crtc->cmdq_client.chan, 2000);
> > > > > cmdq_handle->cmd_buf_size = 0;
> > > > > cmdq_pkt_clear_event(cmdq_handle, mtk_crtc->cmdq_event);
> > > > > cmdq_pkt_wfe(cmdq_handle, mtk_crtc->cmdq_event, false);
> > > > > mtk_crtc_ddp_config(crtc, cmdq_handle);
> > > > > cmdq_pkt_finalize(cmdq_handle);
> > > > > - dma_sync_single_for_device(mtk_crtc->cmdq_chan->mbox->dev,
> > > > > - cmdq_handle->pa_base,
> > > > > - cmdq_handle->cmd_buf_size,
> > > > > - DMA_TO_DEVICE);
> > > > > + dma_sync_single_for_device(mtk_crtc->cmdq_client.chan->mbox->dev,
> > > > > + cmdq_handle->pa_base,
> > > > > + cmdq_handle->cmd_buf_size,
> > > > > + DMA_TO_DEVICE);
> > > > > /*
> > > > > * CMDQ command should execute in next vblank,
> > > > > * If it fail to execute in next 2 vblank, timeout happen.
> > > > > */
> > > > > mtk_crtc->cmdq_vblank_cnt = 2;
> > > > > - mbox_send_message(mtk_crtc->cmdq_chan, cmdq_handle);
> > > > > - mbox_client_txdone(mtk_crtc->cmdq_chan, 0);
> > > > > + mbox_send_message(mtk_crtc->cmdq_client.chan, cmdq_handle);
> > > > > + mbox_client_txdone(mtk_crtc->cmdq_client.chan, 0);
> > > > > }
> > > > > #endif
> > > > > mtk_crtc->config_updating = false;
> > > > > @@ -581,7 +584,7 @@ static void mtk_crtc_ddp_irq(void *data)
> > > > > struct mtk_drm_private *priv = crtc->dev->dev_private;
> > > > >
> > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > > - if (!priv->data->shadow_register && !mtk_crtc->cmdq_chan)
> > > > > + if (!priv->data->shadow_register && !mtk_crtc->cmdq_client.chan)
> > > > > mtk_crtc_ddp_config(crtc, NULL);
> > > > > else if (mtk_crtc->cmdq_vblank_cnt > 0 && --mtk_crtc->cmdq_vblank_cnt == 0)
> > > > > DRM_ERROR("mtk_crtc %d CMDQ execute command timeout!\n",
> > > > > @@ -924,20 +927,20 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > > > mutex_init(&mtk_crtc->hw_lock);
> > > > >
> > > > > #if IS_REACHABLE(CONFIG_MTK_CMDQ)
> > > > > - mtk_crtc->cmdq_cl.dev = mtk_crtc->mmsys_dev;
> > > > > - mtk_crtc->cmdq_cl.tx_block = false;
> > > > > - mtk_crtc->cmdq_cl.knows_txdone = true;
> > > > > - mtk_crtc->cmdq_cl.rx_callback = ddp_cmdq_cb;
> > > > > - mtk_crtc->cmdq_chan =
> > > > > - mbox_request_channel(&mtk_crtc->cmdq_cl,
> > > > > - drm_crtc_index(&mtk_crtc->base));
> > > > > - if (IS_ERR(mtk_crtc->cmdq_chan)) {
> > > > > + mtk_crtc->cmdq_client.client.dev = mtk_crtc->mmsys_dev;
> > > > > + mtk_crtc->cmdq_client.client.tx_block = false;
> > > > > + mtk_crtc->cmdq_client.client.knows_txdone = true;
> > > > > + mtk_crtc->cmdq_client.client.rx_callback = ddp_cmdq_cb;
> > > > > + mtk_crtc->cmdq_client.chan =
> > > > > + mbox_request_channel(&mtk_crtc->cmdq_client.client,
> > > > > + drm_crtc_index(&mtk_crtc->base));
> > > > > + if (IS_ERR(mtk_crtc->cmdq_client.chan)) {
> > > > > dev_dbg(dev, "mtk_crtc %d failed to create mailbox client, writing register by CPU now\n",
> > > > > drm_crtc_index(&mtk_crtc->base));
> > > > > - mtk_crtc->cmdq_chan = NULL;
> > > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > > }
> > > > >
> > > > > - if (mtk_crtc->cmdq_chan) {
> > > > > + if (mtk_crtc->cmdq_client.chan) {
> > > > > ret = of_property_read_u32_index(priv->mutex_node,
> > > > > "mediatek,gce-events",
> > > > > drm_crtc_index(&mtk_crtc->base),
> > > > > @@ -945,17 +948,17 @@ int mtk_drm_crtc_create(struct drm_device *drm_dev,
> > > > > if (ret) {
> > > > > dev_dbg(dev, "mtk_crtc %d failed to get mediatek,gce-events property\n",
> > > > > drm_crtc_index(&mtk_crtc->base));
> > > > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > > > - mtk_crtc->cmdq_chan = NULL;
> > > > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > > } else {
> > > > > - ret = mtk_drm_cmdq_pkt_create(mtk_crtc->cmdq_chan,
> > > > > - &mtk_crtc->cmdq_handle,
> > > > > - PAGE_SIZE);
> > > > > + ret = mtk_drm_cmdq_pkt_create(&mtk_crtc->cmdq_client,
> > > > > + &mtk_crtc->cmdq_handle,
> > > > > + PAGE_SIZE);
> > > > > if (ret) {
> > > > > dev_dbg(dev, "mtk_crtc %d failed to create cmdq packet\n",
> > > > > drm_crtc_index(&mtk_crtc->base));
> > > > > - mbox_free_channel(mtk_crtc->cmdq_chan);
> > > > > - mtk_crtc->cmdq_chan = NULL;
> > > > > + mbox_free_channel(mtk_crtc->cmdq_client.chan);
> > > > > + mtk_crtc->cmdq_client.chan = NULL;
> > > > > }
> > > > > }
> > > > > }
> > > > > --
> > > > > 2.18.0
> > > > >