In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.
Fix it by canceling the work before cleanup in the mtk_jpeg_remove
CPU0 CPU1
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <[email protected]>
---
- v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 0051f372a66c..6069ecf420b0 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
{
struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
+ cancel_delayed_work_sync(&jpeg->job_timeout_work);
pm_runtime_disable(&pdev->dev);
video_unregister_device(jpeg->vdev);
v4l2_m2m_release(jpeg->m2m_dev);
--
2.25.1
On 07/07/2023 11:24, Zheng Wang wrote:
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
Reviewed-by: Alexandre Mergnat <[email protected]>
--
Regards,
Alexandre
Hi,
This issue has not been resolved for a long time. Is there anyone who can help?
Best regards,
Zheng
Alexandre Mergnat <[email protected]> 于2023年7月7日周五 22:11写道:
>
>
>
> On 07/07/2023 11:24, Zheng Wang wrote:
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0 CPU1
> >
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
>
> Reviewed-by: Alexandre Mergnat <[email protected]>
>
> --
> Regards,
> Alexandre
Friendly ping
Zheng Hacker <[email protected]> 于2023年7月16日周日 00:08写道:
>
> Hi,
>
> This issue has not been resolved for a long time. Is there anyone who can help?
>
> Best regards,
> Zheng
>
> Alexandre Mergnat <[email protected]> 于2023年7月7日周五 22:11写道:
> >
> >
> >
> > On 07/07/2023 11:24, Zheng Wang wrote:
> > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > and mtk_jpeg_enc_device_run may be called to start the
> > > work.
> > > If we remove the module which will call mtk_jpeg_remove
> > > to make cleanup, there may be a unfinished work. The
> > > possible sequence is as follows, which will cause a
> > > typical UAF bug.
> > >
> > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> > >
> > > CPU0 CPU1
> > >
> > > |mtk_jpeg_job_timeout_work
> > > mtk_jpeg_remove |
> > > v4l2_m2m_release |
> > > kfree(m2m_dev); |
> > > |
> > > | v4l2_m2m_get_curr_priv
> > > | m2m_dev->curr_ctx //use
> >
> > Reviewed-by: Alexandre Mergnat <[email protected]>
> >
> > --
> > Regards,
> > Alexandre
On 18/07/2023 05:07, Zheng Hacker wrote:
> Friendly ping
>
> Zheng Hacker <[email protected]> 于2023年7月16日周日 00:08写道:
>>
>> Hi,
>>
>> This issue has not been resolved for a long time. Is there anyone who can help?
>>
>> Best regards,
>> Zheng
>>
>> Alexandre Mergnat <[email protected]> 于2023年7月7日周五 22:11写道:
>>>
>>>
>>>
>>> On 07/07/2023 11:24, Zheng Wang wrote:
>>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
>>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
>>>> and mtk_jpeg_enc_device_run may be called to start the
>>>> work.
>>>> If we remove the module which will call mtk_jpeg_remove
>>>> to make cleanup, there may be a unfinished work. The
>>>> possible sequence is as follows, which will cause a
>>>> typical UAF bug.
>>>>
>>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>>>>
>>>> CPU0 CPU1
>>>>
>>>> |mtk_jpeg_job_timeout_work
>>>> mtk_jpeg_remove |
>>>> v4l2_m2m_release |
>>>> kfree(m2m_dev); |
>>>> |
>>>> | v4l2_m2m_get_curr_priv
>>>> | m2m_dev->curr_ctx //use
>>>
>>> Reviewed-by: Alexandre Mergnat <[email protected]>
>>>
>>> --
>>> Regards,
>>> Alexandre
Hi Zheng,
If you asking me to merge patch, sorry but I can't, I'm just a reviewer.
I invite you to ping the maintainers directly:
Bin Liu <[email protected]> (supporter:MEDIATEK JPEG DRIVER)
Mauro Carvalho Chehab <[email protected]> (maintainer:MEDIA INPUT
INFRASTRUCTURE (V4L/DVB))
Matthias Brugger <[email protected]> (maintainer:ARM/Mediatek SoC
support)
Otherwise, I misunderstood what you asking me. If so, can you rephrase
your question please?
--
Regards,
Alexandre
Thanks dor your kind reply. I'll try to connect them later.
Best regards,
Zheng
Alexandre Mergnat <[email protected]> 于2023年7月19日周三 18:17写道:
>
>
>
> On 18/07/2023 05:07, Zheng Hacker wrote:
> > Friendly ping
> >
> > Zheng Hacker <[email protected]> 于2023年7月16日周日 00:08写道:
> >>
> >> Hi,
> >>
> >> This issue has not been resolved for a long time. Is there anyone who can help?
> >>
> >> Best regards,
> >> Zheng
> >>
> >> Alexandre Mergnat <[email protected]> 于2023年7月7日周五 22:11写道:
> >>>
> >>>
> >>>
> >>> On 07/07/2023 11:24, Zheng Wang wrote:
> >>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> >>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> >>>> and mtk_jpeg_enc_device_run may be called to start the
> >>>> work.
> >>>> If we remove the module which will call mtk_jpeg_remove
> >>>> to make cleanup, there may be a unfinished work. The
> >>>> possible sequence is as follows, which will cause a
> >>>> typical UAF bug.
> >>>>
> >>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >>>>
> >>>> CPU0 CPU1
> >>>>
> >>>> |mtk_jpeg_job_timeout_work
> >>>> mtk_jpeg_remove |
> >>>> v4l2_m2m_release |
> >>>> kfree(m2m_dev); |
> >>>> |
> >>>> | v4l2_m2m_get_curr_priv
> >>>> | m2m_dev->curr_ctx //use
> >>>
> >>> Reviewed-by: Alexandre Mergnat <[email protected]>
> >>>
> >>> --
> >>> Regards,
> >>> Alexandre
>
> Hi Zheng,
>
> If you asking me to merge patch, sorry but I can't, I'm just a reviewer.
> I invite you to ping the maintainers directly:
>
> Bin Liu <[email protected]> (supporter:MEDIATEK JPEG DRIVER)
> Mauro Carvalho Chehab <[email protected]> (maintainer:MEDIA INPUT
> INFRASTRUCTURE (V4L/DVB))
> Matthias Brugger <[email protected]> (maintainer:ARM/Mediatek SoC
> support)
>
> Otherwise, I misunderstood what you asking me. If so, can you rephrase
> your question please?
>
> --
> Regards,
> Alexandre
On Fri, Jul 7, 2023 at 5:25 PM Zheng Wang <[email protected]> wrote:
>
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> Signed-off-by: Zheng Wang <[email protected]>
Reviewed-by: Chen-Yu Tsai <[email protected]>
Il 07/07/23 11:24, Zheng Wang ha scritto:
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> Signed-off-by: Zheng Wang <[email protected]>
Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
Hello,
Is there any follow-up about this issue?
Thanks,
Zheng
AngeloGioacchino Del Regno <[email protected]>
于2023年7月20日周四 15:45写道:
>
> Il 07/07/23 11:24, Zheng Wang ha scritto:
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0 CPU1
> >
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
> > Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > Signed-off-by: Zheng Wang <[email protected]>
>
> Reviewed-by: AngeloGioacchino Del Regno <[email protected]>
>
>
Hi Dmitry,
The patch is on the stable queue. Could you please take a look at my
analysis? Thanks for your help.
Best regards,
Zheng
Zheng Hacker <[email protected]> 于2023年9月5日周二 12:24写道:
>
> Friendly ping.
>
> Zheng Hacker <[email protected]> 于2023年8月31日周四 16:18写道:
> >
> > Dmitry Osipenko <[email protected]> 于2023年8月28日周一 10:04写道:
> > >
> > > On 8/24/23 11:20, Zheng Hacker wrote:
> > > > Dmitry Osipenko <[email protected]> 于2023年8月23日周三 02:51写道:
> > > >
> > > >>
> > > >> Hello Zheng,
> > > >>
> > > >> On 7/7/23 12:24, Zheng Wang wrote:
> > > >>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > >>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > >>> and mtk_jpeg_enc_device_run may be called to start the
> > > >>> work.
> > > >>> If we remove the module which will call mtk_jpeg_remove
> > > >>> to make cleanup, there may be a unfinished work. The
> > > >>> possible sequence is as follows, which will cause a
> > > >>> typical UAF bug.
> > > >>>
> > > >>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> > > >>>
> > > >>> CPU0 CPU1
> > > >>>
> > > >>> |mtk_jpeg_job_timeout_work
> > > >>> mtk_jpeg_remove |
> > > >>> v4l2_m2m_release |
> > > >>> kfree(m2m_dev); |
> > > >>> |
> > > >>> | v4l2_m2m_get_curr_priv
> > > >>> | m2m_dev->curr_ctx //use
> > > >>> Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
> > > >>> Signed-off-by: Zheng Wang <[email protected]>
> > > >>> ---
> > > >>> - v2: use cancel_delayed_work_sync instead of cancel_delayed_work suggested by Kyrie.
> > > >>> ---
> > > >>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 1 +
> > > >>> 1 file changed, 1 insertion(+)
> > > >>>
> > > >>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> index 0051f372a66c..6069ecf420b0 100644
> > > >>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > >>> @@ -1816,6 +1816,7 @@ static void mtk_jpeg_remove(struct platform_device *pdev)
> > > >>> {
> > > >>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > > >>>
> > > >>> + cancel_delayed_work_sync(&jpeg->job_timeout_work);
> > > >>> pm_runtime_disable(&pdev->dev);
> > > >>> video_unregister_device(jpeg->vdev);
> > > >>> v4l2_m2m_release(jpeg->m2m_dev);
> > > >>
> > > >> AFAICS, there is a fundamental problem here. The job_timeout_work uses
> > > >> v4l2_m2m_get_curr_priv() and at the time when driver module is unloaded,
> > > >> all the v4l contexts must be closed and released. Hence the
> > > >> v4l2_m2m_get_curr_priv() shall return NULL and crash the kernel when
> > > >> work is executed before cancel_delayed_work_sync().
> > > >>
> > > >
> > > > Hi Dmitry,
> > > >
> > > > Thanks for your reply. I think you're right. As m2m_dev is freed in
> > > > v4l2_m2m_release,
> > > > the invoking in v4l2_m2m_get_curr_priv might cause either UAF or null
> > > > pointer dereference
> > > > bug. I am sure that context is closed when we invoke mtk_jpeg_remove.
> > > > But I'm not sure if
> > > > context is released when mtk_jpegdec_timeout_work running.
> > > >
> > > >> At the time when mtk_jpeg_remove() is invoked, there shall be no
> > > >> job_timeout_work running in background because all jobs should be
> > > >> completed before context is released. If you'll look at
> > > >> v4l2_m2m_cancel_job(), you can see that it waits for the task completion
> > > >> before closing context.
> > > >
> > > > Yes, so I think the better way is to put the cancel_delayed_work_sync
> > > > invoking into
> > > > v4l2_m2m_ctx_release function?
> > >
> > > The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> > > completion or for the interrupt fire. Apparently it doesn't work in
> > > yours case. You'll need to debug why v4l job or job_timeout_work is
> > > running after v4l2_m2m_ctx_release(), it shouldn't happen.
> > >
> >
> > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> > to trigger that.
> >
> > However, this is not the only path to call v4l2_m2m_job_finish. Here
> > is a invoking chain:
> > v4l_streamon
> > ->v4l2_m2m_ioctl_streamon
> > ->v4l2_m2m_streamon
> > ->v4l2_m2m_try_schedule
> > ->v4l2_m2m_try_run
> > ->mtk_jpeg_dec_device_run
> > ->schedule_delayed_work(&jpeg->job_timeout_work...
> > ->error path goto dec_end
> > ->v4l2_m2m_job_finish
> >
> > In some specific situation, it starts the worker and also calls
> > v4l2_m2m_job_finish, which might
> > make v4l2_m2m_cancel_job continues.
> >
> > > The interrupt handler cancels job_timeout_work, you shouldn't need to
> > > flush the work.
> >
> > It will, but as I said, there might be an early invocation chain to
> > start the work.(Not very sure)
> >
> > >
> > > Technically, interrupt handler may race with job_timeout_work, but the
> > > timeout is set to 1 second and in practice should be difficult to
> > > trigger the race. The interrupt handler needs to be threaded, it should
> > > use cancel_delayed_work_sync() and check the return value of this function.
> > >
> >
> > Yes, it's better to use cancel_delayed_work_sync here.
> >
> > > >>
> > > >> You shouldn't be able to remove driver module while it has active/opened
> > > >> v4l contexts. If you can do that, then this is yours bug that needs to
> > > >> be fixed.
> > > >>
> > > >> In addition to this all, the job_timeout_work is initialized only for
> > > >> the single-core JPEG device. I'd expect this patch should crash
> > > >> multi-core JPEG devices.
> > > >>
> > > >
> > > > I think that's true. As I'm not familiar with the code here. Could you
> > > > please give me some advice about the patch?
> > >
> > > We'll need to understand why v4l2_m2m_ctx_release() doesn't work as
> > > expected before thinking about the patch.
> > >
> > > --
> > > Best regards,
> > > Dmitry
> > >
On 9/19/23 21:24, Dmitry Osipenko wrote:
> On 8/31/23 11:18, Zheng Hacker wrote:
>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
>>> completion or for the interrupt fire. Apparently it doesn't work in
>>> yours case. You'll need to debug why v4l job or job_timeout_work is
>>> running after v4l2_m2m_ctx_release(), it shouldn't happen.
>>>
>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
>> to trigger that.
>>
>> However, this is not the only path to call v4l2_m2m_job_finish. Here
>> is a invoking chain:
>> v4l_streamon
>> ->v4l2_m2m_ioctl_streamon
>> ->v4l2_m2m_streamon
>> ->v4l2_m2m_try_schedule
>> ->v4l2_m2m_try_run
>> ->mtk_jpeg_dec_device_run
>> ->schedule_delayed_work(&jpeg->job_timeout_work...
>> ->error path goto dec_end
>> ->v4l2_m2m_job_finish
>>
>> In some specific situation, it starts the worker and also calls
>> v4l2_m2m_job_finish, which might
>> make v4l2_m2m_cancel_job continues.
>
> Then the error path should cancel the job_timeout_work, or better job
s/job/timeout work/
> needs to be run after the dec/enc has been started and not before.
>
> Looking further at the code, I'm confused by this hunk:
>
> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base);
> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
>
> The job should be marked as finished when h/w has finished processing
> the job and not right after the job has been started. So the job is
> always completed and mtk_jpeg_job_timeout_work() doesn't work as
> expected, am I missing something?
>
--
Best regards,
Dmitry
On 8/31/23 11:18, Zheng Hacker wrote:
>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
>> completion or for the interrupt fire. Apparently it doesn't work in
>> yours case. You'll need to debug why v4l job or job_timeout_work is
>> running after v4l2_m2m_ctx_release(), it shouldn't happen.
>>
> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> to trigger that.
>
> However, this is not the only path to call v4l2_m2m_job_finish. Here
> is a invoking chain:
> v4l_streamon
> ->v4l2_m2m_ioctl_streamon
> ->v4l2_m2m_streamon
> ->v4l2_m2m_try_schedule
> ->v4l2_m2m_try_run
> ->mtk_jpeg_dec_device_run
> ->schedule_delayed_work(&jpeg->job_timeout_work...
> ->error path goto dec_end
> ->v4l2_m2m_job_finish
>
> In some specific situation, it starts the worker and also calls
> v4l2_m2m_job_finish, which might
> make v4l2_m2m_cancel_job continues.
Then the error path should cancel the job_timeout_work, or better job
needs to be run after the dec/enc has been started and not before.
Looking further at the code, I'm confused by this hunk:
mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base);
v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
The job should be marked as finished when h/w has finished processing
the job and not right after the job has been started. So the job is
always completed and mtk_jpeg_job_timeout_work() doesn't work as
expected, am I missing something?
--
Best regards,
Dmitry
Dmitry Osipenko <[email protected]> 于2023年9月20日周三 02:24写道:
>
> On 8/31/23 11:18, Zheng Hacker wrote:
> >> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> >> completion or for the interrupt fire. Apparently it doesn't work in
> >> yours case. You'll need to debug why v4l job or job_timeout_work is
> >> running after v4l2_m2m_ctx_release(), it shouldn't happen.
> >>
> > Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> > the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> > to trigger that.
> >
> > However, this is not the only path to call v4l2_m2m_job_finish. Here
> > is a invoking chain:
> > v4l_streamon
> > ->v4l2_m2m_ioctl_streamon
> > ->v4l2_m2m_streamon
> > ->v4l2_m2m_try_schedule
> > ->v4l2_m2m_try_run
> > ->mtk_jpeg_dec_device_run
> > ->schedule_delayed_work(&jpeg->job_timeout_work...
> > ->error path goto dec_end
> > ->v4l2_m2m_job_finish
> >
> > In some specific situation, it starts the worker and also calls
> > v4l2_m2m_job_finish, which might
> > make v4l2_m2m_cancel_job continues.
>
> Then the error path should cancel the job_timeout_work, or better job
> needs to be run after the dec/enc has been started and not before.
>
Hi,
Sorry for my late reply for I just went on a long vacation.
Get it. I'll write another patch and change the summary to the lack of
canceling job in error path.
> Looking further at the code, I'm confused by this hunk:
>
> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base);
> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
>
> The job should be marked as finished when h/w has finished processing
> the job and not right after the job has been started. So the job is
> always completed and mtk_jpeg_job_timeout_work() doesn't work as
> expected, am I missing something?
After reading the code I still don't know. I didn't see any function
like mtk_jpeg_dec_end. The same thing
happens on mtk_jpeg_enc_start. I think I'd better fix the first
problem and wait for someone familiar with
the second part.
Best regards,
Zheng
>
> --
> Best regards,
> Dmitry
>
On 10/8/23 12:13, Zheng Hacker wrote:
> Dmitry Osipenko <[email protected]> 于2023年9月20日周三 02:24写道:
>>
>> On 8/31/23 11:18, Zheng Hacker wrote:
>>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
>>>> completion or for the interrupt fire. Apparently it doesn't work in
>>>> yours case. You'll need to debug why v4l job or job_timeout_work is
>>>> running after v4l2_m2m_ctx_release(), it shouldn't happen.
>>>>
>>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
>>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
>>> to trigger that.
>>>
>>> However, this is not the only path to call v4l2_m2m_job_finish. Here
>>> is a invoking chain:
>>> v4l_streamon
>>> ->v4l2_m2m_ioctl_streamon
>>> ->v4l2_m2m_streamon
>>> ->v4l2_m2m_try_schedule
>>> ->v4l2_m2m_try_run
>>> ->mtk_jpeg_dec_device_run
>>> ->schedule_delayed_work(&jpeg->job_timeout_work...
>>> ->error path goto dec_end
>>> ->v4l2_m2m_job_finish
>>>
>>> In some specific situation, it starts the worker and also calls
>>> v4l2_m2m_job_finish, which might
>>> make v4l2_m2m_cancel_job continues.
>>
>> Then the error path should cancel the job_timeout_work, or better job
>> needs to be run after the dec/enc has been started and not before.
>>
>
> Hi,
>
> Sorry for my late reply for I just went on a long vacation.
>
> Get it. I'll write another patch and change the summary to the lack of
> canceling job in error path.
>
>> Looking further at the code, I'm confused by this hunk:
>>
>> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base);
>> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
>>
>> The job should be marked as finished when h/w has finished processing
>> the job and not right after the job has been started. So the job is
>> always completed and mtk_jpeg_job_timeout_work() doesn't work as
>> expected, am I missing something?
>
> After reading the code I still don't know. I didn't see any function
> like mtk_jpeg_dec_end. The same thing
> happens on mtk_jpeg_enc_start. I think I'd better fix the first
> problem and wait for someone familiar with
> the second part.
I missed that the code mentioned above is related to the multi-core hw version, while you care about single-core. Nevertheless, the multi-core device_run() looks incorrect,
So, the error code paths need to be corrected. Please try to revert yours fix and test this change:
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 0051f372a66c..fd3b0587fcad 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1254,9 +1254,6 @@ static void mtk_jpegdec_worker(struct work_struct *work)
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
- schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx,
&jpeg_src_buf->dec_param,
@@ -1266,6 +1263,9 @@ static void mtk_jpegdec_worker(struct work_struct *work)
goto setdst_end;
}
+ schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags);
ctx->total_frame_num++;
mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
@@ -1330,13 +1330,13 @@ static void mtk_jpeg_dec_device_run(void *priv)
if (ret < 0)
goto dec_end;
- schedule_delayed_work(&jpeg->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb))
goto dec_end;
+ schedule_delayed_work(&jpeg->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&jpeg->hw_lock, flags);
mtk_jpeg_dec_reset(jpeg->reg_base);
mtk_jpeg_dec_set_config(jpeg->reg_base,
--
Best regards,
Dmitry
Thanks for your patch. I think this should fix the problem. As I have
no experience in reverting, can I submit the patch with your fix as
well as reverting my fix?
Best regards,
Zheng
Dmitry Osipenko <[email protected]> 于2023年10月20日周五 03:56写道:
>
> On 10/8/23 12:13, Zheng Hacker wrote:
> > Dmitry Osipenko <[email protected]> 于2023年9月20日周三 02:24写道:
> >>
> >> On 8/31/23 11:18, Zheng Hacker wrote:
> >>>> The v4l2_m2m_ctx_release() already should wait for the job_timeout_work
> >>>> completion or for the interrupt fire. Apparently it doesn't work in
> >>>> yours case. You'll need to debug why v4l job or job_timeout_work is
> >>>> running after v4l2_m2m_ctx_release(), it shouldn't happen.
> >>>>
> >>> Yes, v4l2_m2m_cancel_job waits for m2m_ctx->job_flags to be ~TRANS_RUNNING,
> >>> the mtk_jpeg_job_timeout_work will finally invoke v4l2_m2m_job_finish
> >>> to trigger that.
> >>>
> >>> However, this is not the only path to call v4l2_m2m_job_finish. Here
> >>> is a invoking chain:
> >>> v4l_streamon
> >>> ->v4l2_m2m_ioctl_streamon
> >>> ->v4l2_m2m_streamon
> >>> ->v4l2_m2m_try_schedule
> >>> ->v4l2_m2m_try_run
> >>> ->mtk_jpeg_dec_device_run
> >>> ->schedule_delayed_work(&jpeg->job_timeout_work...
> >>> ->error path goto dec_end
> >>> ->v4l2_m2m_job_finish
> >>>
> >>> In some specific situation, it starts the worker and also calls
> >>> v4l2_m2m_job_finish, which might
> >>> make v4l2_m2m_cancel_job continues.
> >>
> >> Then the error path should cancel the job_timeout_work, or better job
> >> needs to be run after the dec/enc has been started and not before.
> >>
> >
> > Hi,
> >
> > Sorry for my late reply for I just went on a long vacation.
> >
> > Get it. I'll write another patch and change the summary to the lack of
> > canceling job in error path.
> >
> >> Looking further at the code, I'm confused by this hunk:
> >>
> >> mtk_jpeg_dec_start(comp_jpeg[hw_id]->reg_base);
> >> v4l2_m2m_job_finish(jpeg->m2m_dev, ctx->fh.m2m_ctx);
> >>
> >> The job should be marked as finished when h/w has finished processing
> >> the job and not right after the job has been started. So the job is
> >> always completed and mtk_jpeg_job_timeout_work() doesn't work as
> >> expected, am I missing something?
> >
> > After reading the code I still don't know. I didn't see any function
> > like mtk_jpeg_dec_end. The same thing
> > happens on mtk_jpeg_enc_start. I think I'd better fix the first
> > problem and wait for someone familiar with
> > the second part.
>
> I missed that the code mentioned above is related to the multi-core hw version, while you care about single-core. Nevertheless, the multi-core device_run() looks incorrect,
>
> So, the error code paths need to be corrected. Please try to revert yours fix and test this change:
>
> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> index 0051f372a66c..fd3b0587fcad 100644
> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> @@ -1254,9 +1254,6 @@ static void mtk_jpegdec_worker(struct work_struct *work)
> v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
> v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
>
> - schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
> - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
> -
> mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
> if (mtk_jpeg_set_dec_dst(ctx,
> &jpeg_src_buf->dec_param,
> @@ -1266,6 +1263,9 @@ static void mtk_jpegdec_worker(struct work_struct *work)
> goto setdst_end;
> }
>
> + schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
> + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
> +
> spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags);
> ctx->total_frame_num++;
> mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
> @@ -1330,13 +1330,13 @@ static void mtk_jpeg_dec_device_run(void *priv)
> if (ret < 0)
> goto dec_end;
>
> - schedule_delayed_work(&jpeg->job_timeout_work,
> - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
> -
> mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
> if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb))
> goto dec_end;
>
> + schedule_delayed_work(&jpeg->job_timeout_work,
> + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
> +
> spin_lock_irqsave(&jpeg->hw_lock, flags);
> mtk_jpeg_dec_reset(jpeg->reg_base);
> mtk_jpeg_dec_set_config(jpeg->reg_base,
>
> --
> Best regards,
> Dmitry
>