2023-03-06 06:28:17

by Zheng Wang

[permalink] [raw]
Subject: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.

Fix it by canceling the work before cleanup in the mtk_jpeg_remove

CPU0 CPU1

|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use

Signed-off-by: Zheng Wang <[email protected]>
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 969516a940ba..364513e7897e 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
static int mtk_jpeg_remove(struct platform_device *pdev)
{
struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
-
+ cancel_delayed_work(&jpeg->job_timeout_work);
pm_runtime_disable(&pdev->dev);
video_unregister_device(jpeg->vdev);
v4l2_m2m_release(jpeg->m2m_dev);
--
2.25.1



2023-03-07 09:28:04

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi,

Is there anyone who can help with this? I can provide more details
like invoking chain if needed.

Thanks,
Zheng

Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
>
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
>
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>
> CPU0 CPU1
>
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
>
> Signed-off-by: Zheng Wang <[email protected]>
> ---
> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> index 969516a940ba..364513e7897e 100644
> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
> static int mtk_jpeg_remove(struct platform_device *pdev)
> {
> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> -
> + cancel_delayed_work(&jpeg->job_timeout_work);
> pm_runtime_disable(&pdev->dev);
> video_unregister_device(jpeg->vdev);
> v4l2_m2m_release(jpeg->m2m_dev);
> --
> 2.25.1
>

Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Il 07/03/23 10:27, Zheng Hacker ha scritto:
> Hi,
>
> Is there anyone who can help with this? I can provide more details
> like invoking chain if needed.
>

Providing more details is always good. Please do.

Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.

Regards,
Angelo

> Thanks,
> Zheng
>
> Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
>>
>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
>> and mtk_jpeg_enc_device_run may be called to start the
>> work.
>> If we remove the module which will call mtk_jpeg_remove
>> to make cleanup, there may be a unfinished work. The
>> possible sequence is as follows, which will cause a
>> typical UAF bug.
>>
>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>>
>> CPU0 CPU1
>>
>> |mtk_jpeg_job_timeout_work
>> mtk_jpeg_remove |
>> v4l2_m2m_release |
>> kfree(m2m_dev); |
>> |
>> | v4l2_m2m_get_curr_priv
>> | m2m_dev->curr_ctx //use
>>
>> Signed-off-by: Zheng Wang <[email protected]>
>> ---
>> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>> index 969516a940ba..364513e7897e 100644
>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>> @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
>> static int mtk_jpeg_remove(struct platform_device *pdev)
>> {
>> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
>> -
>> + cancel_delayed_work(&jpeg->job_timeout_work);
>> pm_runtime_disable(&pdev->dev);
>> video_unregister_device(jpeg->vdev);
>> v4l2_m2m_release(jpeg->m2m_dev);
>> --
>> 2.25.1
>>




2023-03-07 10:13:08

by Irui Wang (王瑞)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Dear Angelo and Zheng,

Thanks for your patch and comments.

Dear Kyrie,

Please help to check this, thanks.

Best Regards

On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno wrote:
> Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > Hi,
> >
> > Is there anyone who can help with this? I can provide more details
> > like invoking chain if needed.
> >
>
> Providing more details is always good. Please do.
>
> Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
>
> Regards,
> Angelo
>
> > Thanks,
> > Zheng
> >
> > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > >
> > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > and mtk_jpeg_enc_device_run may be called to start the
> > > work.
> > > If we remove the module which will call mtk_jpeg_remove
> > > to make cleanup, there may be a unfinished work. The
> > > possible sequence is as follows, which will cause a
> > > typical UAF bug.
> > >
> > > Fix it by canceling the work before cleanup in the
> > > mtk_jpeg_remove
> > >
> > > CPU0 CPU1
> > >
> > > |mtk_jpeg_job_timeout_work
> > > mtk_jpeg_remove |
> > > v4l2_m2m_release |
> > > kfree(m2m_dev); |
> > > |
> > > | v4l2_m2m_get_curr_priv
> > > | m2m_dev->curr_ctx //use
> > >
> > > Signed-off-by: Zheng Wang <[email protected]>
> > > ---
> > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > index 969516a940ba..364513e7897e 100644
> > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > platform_device *pdev)
> > > static int mtk_jpeg_remove(struct platform_device *pdev)
> > > {
> > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > > -
> > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > pm_runtime_disable(&pdev->dev);
> > > video_unregister_device(jpeg->vdev);
> > > v4l2_m2m_release(jpeg->m2m_dev);
> > > --
> > > 2.25.1
> > >
>
>
>

2023-03-07 15:06:10

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

The timer function was inited in mtk_jpeg_probe with
mtk_jpeg_job_timeout_work function.
And the worker is started in mtk_jpeg_dec_device_run.
There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which
may cancel the worker.
They are used as IRQ handler function which is saved as function
pointer in a variable.
In mtk_jpeg_probe, they are registered by devm_request_irq:

ret = devm_request_irq(&pdev->dev,
jpeg_irq,
jpeg->variant->irq_handler,
0,
pdev->name, jpeg);
if (ret) {
dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
jpeg_irq, ret);
return ret;
}

However, if we remove the module without triigering the irq, the
worker will never be removed.

As for the schedule, mtk_jpeg_dec_device_run and
mtk_jpeg_enc_device_run will start the worker.
The schedule invoking is quite complicated. As far as I know, the
invoking chain is as follows:

v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run

the v4l2_m2m_device_run_work is also a worker which is inited in
v4l2_m2m_init and started in
v4l2_m2m_schedule_next_job.

Before calling remove function, the mtk_jpeg_release was invoked to
release the related resource.

v4l2_m2m_cancel_job will cancel the job by calling
m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).

But this will only cancel the current queue by
list_del(&m2m_dev->curr_ctx->queue);

I think this can not cancel the posted task mentioned before. So I
think if mtk_jpeg_job_timeout_work

is working on, and use jpeg->m2m_dev after freeing it in
mtk_jpeg_remove, it wll cause UAF bug.

static int mtk_jpeg_release(struct file *file)
{
struct mtk_jpeg_dev *jpeg = video_drvdata(file);
struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data);

mutex_lock(&jpeg->lock);
v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
[1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
v4l2_fh_del(&ctx->fh);
v4l2_fh_exit(&ctx->fh);
kfree(ctx);
mutex_unlock(&jpeg->lock);
return 0;
}

void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
{
/* wait until the current context is dequeued from job_queue */
[2] v4l2_m2m_cancel_job(m2m_ctx);

vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
vb2_queue_release(&m2m_ctx->out_q_ctx.q);

kfree(m2m_ctx);
}

Note that, all of this is static analysis, which may be false positive.
Feel free to tell me if there is something I've missed.

Regard,
Zheng

Zheng Hacker <[email protected]> 于2023年3月7日周二 17:27写道:
>
> Hi,
>
> Is there anyone who can help with this? I can provide more details
> like invoking chain if needed.
>
> Thanks,
> Zheng
>
> Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> >
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0 CPU1
> >
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
> >
> > Signed-off-by: Zheng Wang <[email protected]>
> > ---
> > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > index 969516a940ba..364513e7897e 100644
> > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
> > static int mtk_jpeg_remove(struct platform_device *pdev)
> > {
> > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > -
> > + cancel_delayed_work(&jpeg->job_timeout_work);
> > pm_runtime_disable(&pdev->dev);
> > video_unregister_device(jpeg->vdev);
> > v4l2_m2m_release(jpeg->m2m_dev);
> > --
> > 2.25.1
> >

2023-03-07 15:10:47

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

The timer function was set in mtk_jpeg_probe with
mtk_jpeg_job_timeout_work function.
And the worker is started in mtk_jpeg_dec_device_run.
There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which
may cancel the worker.
They are used as IRQ handler function which is saved as function
pointer in a variable.
In mtk_jpeg_probe, they are registered by devm_request_irq:

ret = devm_request_irq(&pdev->dev,
jpeg_irq,
jpeg->variant->irq_handler,
0,
pdev->name, jpeg);
if (ret) {
dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
jpeg_irq, ret);
return ret;
}

However, if we remove the module without triggering the irq, the
worker will never be removed.

As for the schedule, mtk_jpeg_dec_device_run and
mtk_jpeg_enc_device_run will start the worker.
The schedule invoking is quite complicated. As far as I know, the
invoking chain is as follows:

v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run

the v4l2_m2m_device_run_work function is also a worker which is set in
v4l2_m2m_init and started in
v4l2_m2m_schedule_next_job.

Before calling remove function, the mtk_jpeg_release was invoked to
release the related resource.

v4l2_m2m_cancel_job will cancel the job by calling
m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).

But this will only cancel the current queue by
list_del(&m2m_dev->curr_ctx->queue);

I think this can not cancel the posted task mentioned before. So I
think if mtk_jpeg_job_timeout_work

is working on, and using jpeg->m2m_dev after freeing it in
mtk_jpeg_remove, it will cause a UAF bug.

static int mtk_jpeg_release(struct file *file)
{
struct mtk_jpeg_dev *jpeg = video_drvdata(file);
struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data);

mutex_lock(&jpeg->lock);
v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
[1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
v4l2_fh_del(&ctx->fh);
v4l2_fh_exit(&ctx->fh);
kfree(ctx);
mutex_unlock(&jpeg->lock);
return 0;
}

void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
{
/* wait until the current context is dequeued from job_queue */
[2] v4l2_m2m_cancel_job(m2m_ctx);

vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
vb2_queue_release(&m2m_ctx->out_q_ctx.q);

kfree(m2m_ctx);
}

Note that all of this is static analysis, which may be false positive.
Feel free to tell me if there is something I've missed.

Regard,
Zheng

Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
>
> Dear Angelo and Zheng,
>
> Thanks for your patch and comments.
>
> Dear Kyrie,
>
> Please help to check this, thanks.
>
> Best Regards
>
> On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno wrote:
> > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > Hi,
> > >
> > > Is there anyone who can help with this? I can provide more details
> > > like invoking chain if needed.
> > >
> >
> > Providing more details is always good. Please do.
> >
> > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> >
> > Regards,
> > Angelo
> >
> > > Thanks,
> > > Zheng
> > >
> > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > >
> > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > work.
> > > > If we remove the module which will call mtk_jpeg_remove
> > > > to make cleanup, there may be a unfinished work. The
> > > > possible sequence is as follows, which will cause a
> > > > typical UAF bug.
> > > >
> > > > Fix it by canceling the work before cleanup in the
> > > > mtk_jpeg_remove
> > > >
> > > > CPU0 CPU1
> > > >
> > > > |mtk_jpeg_job_timeout_work
> > > > mtk_jpeg_remove |
> > > > v4l2_m2m_release |
> > > > kfree(m2m_dev); |
> > > > |
> > > > | v4l2_m2m_get_curr_priv
> > > > | m2m_dev->curr_ctx //use
> > > >
> > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > ---
> > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > index 969516a940ba..364513e7897e 100644
> > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > platform_device *pdev)
> > > > static int mtk_jpeg_remove(struct platform_device *pdev)
> > > > {
> > > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > > > -
> > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > pm_runtime_disable(&pdev->dev);
> > > > video_unregister_device(jpeg->vdev);
> > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > --
> > > > 2.25.1
> > > >
> >
> >
> >

2023-03-08 01:41:47

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> The timer function was set in mtk_jpeg_probe with
> mtk_jpeg_job_timeout_work function.
> And the worker is started in mtk_jpeg_dec_device_run.
> There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which
> may cancel the worker.
> They are used as IRQ handler function which is saved as function
> pointer in a variable.
> In mtk_jpeg_probe, they are registered by devm_request_irq:
>
> ret = devm_request_irq(&pdev->dev,
> jpeg_irq,
> jpeg->variant->irq_handler,
> 0,
> pdev->name, jpeg);
> if (ret) {
> dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> jpeg_irq, ret);
> return ret;
> }
>
> However, if we remove the module without triggering the irq, the
> worker will never be removed.
>
> As for the schedule, mtk_jpeg_dec_device_run and
> mtk_jpeg_enc_device_run will start the worker.
> The schedule invoking is quite complicated. As far as I know, the
> invoking chain is as follows:
>
> v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
>
> the v4l2_m2m_device_run_work function is also a worker which is set
> in
> v4l2_m2m_init and started in
> v4l2_m2m_schedule_next_job.
>
> Before calling remove function, the mtk_jpeg_release was invoked to
> release the related resource.
>
> v4l2_m2m_cancel_job will cancel the job by calling
> m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
>
> But this will only cancel the current queue by
> list_del(&m2m_dev->curr_ctx->queue);
>
> I think this can not cancel the posted task mentioned before. So I
> think if mtk_jpeg_job_timeout_work
>
> is working on, and using jpeg->m2m_dev after freeing it in
> mtk_jpeg_remove, it will cause a UAF bug.
>
> static int mtk_jpeg_release(struct file *file)
> {
> struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data);
>
> mutex_lock(&jpeg->lock);
> v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> v4l2_fh_del(&ctx->fh);
> v4l2_fh_exit(&ctx->fh);
> kfree(ctx);
> mutex_unlock(&jpeg->lock);
> return 0;
> }
>
> void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> {
> /* wait until the current context is dequeued from job_queue */
> [2] v4l2_m2m_cancel_job(m2m_ctx);
>
> vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> vb2_queue_release(&m2m_ctx->out_q_ctx.q);
>
> kfree(m2m_ctx);
> }
>
> Note that all of this is static analysis, which may be false
> positive.
> Feel free to tell me if there is something I've missed.
>
> Regard,
> Zheng

Dear Zheng,

You set up an application scenario:
cpu1 is using the mtk-jpeg driver and timeout work has been scheduled.
At the same time cpu0 wanted to remove the mtk-jpeg driver, which
caused the UAF bug.
I wonder if such an irrational application scenario could exist. This
scenario, as you described, not only leads to the problems you
mentioned, but also to output&capture memory leaks and unreleased
resources, such as spinlock.
Typically, if we want to remove the driver, we firstly do stop
streaming, which ensures that the worker has been canceled.
I agree with your changes from the perspective of strengthening the
robustness of the driver code.

Regards,
Kyrie.
>
> Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> >
> > Dear Angelo and Zheng,
> >
> > Thanks for your patch and comments.
> >
> > Dear Kyrie,
> >
> > Please help to check this, thanks.
> >
> > Best Regards
> >
> > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > wrote:
> > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > Hi,
> > > >
> > > > Is there anyone who can help with this? I can provide more
> > > > details
> > > > like invoking chain if needed.
> > > >
> > >
> > > Providing more details is always good. Please do.
> > >
> > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > >
> > > Regards,
> > > Angelo
> > >
> > > > Thanks,
> > > > Zheng
> > > >
> > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > >
> > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > work.
> > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > to make cleanup, there may be a unfinished work. The
> > > > > possible sequence is as follows, which will cause a
> > > > > typical UAF bug.
> > > > >
> > > > > Fix it by canceling the work before cleanup in the
> > > > > mtk_jpeg_remove
> > > > >
> > > > > CPU0 CPU1
> > > > >
> > > > > |mtk_jpeg_job_timeout_work
> > > > > mtk_jpeg_remove |
> > > > > v4l2_m2m_release |
> > > > > kfree(m2m_dev); |
> > > > > |
> > > > > | v4l2_m2m_get_curr_priv
> > > > > | m2m_dev->curr_ctx //use
> > > > >
> > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > ---
> > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git
> > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > index 969516a940ba..364513e7897e 100644
> > > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > platform_device *pdev)
> > > > > static int mtk_jpeg_remove(struct platform_device *pdev)
> > > > > {
> > > > > struct mtk_jpeg_dev *jpeg =
> > > > > platform_get_drvdata(pdev);
> > > > > -
> > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > pm_runtime_disable(&pdev->dev);
> > > > > video_unregister_device(jpeg->vdev);
> > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > --
> > > > > 2.25.1
> > > > >
> > >
> > >
> > >

2023-03-08 02:20:54

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi Kyrie,

Thank you for your careful analysis and response. I still have some
areas that I don't quite understand and would like to ask for
clarification. That is, how do the function pointers for stop
streaming, initialized as mtk_jpeg_enc_stop_streaming and
mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I
would greatly appreciate your response.

Best regards,
Zheng

Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
>
> On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > The timer function was set in mtk_jpeg_probe with
> > mtk_jpeg_job_timeout_work function.
> > And the worker is started in mtk_jpeg_dec_device_run.
> > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which
> > may cancel the worker.
> > They are used as IRQ handler function which is saved as function
> > pointer in a variable.
> > In mtk_jpeg_probe, they are registered by devm_request_irq:
> >
> > ret = devm_request_irq(&pdev->dev,
> > jpeg_irq,
> > jpeg->variant->irq_handler,
> > 0,
> > pdev->name, jpeg);
> > if (ret) {
> > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> > jpeg_irq, ret);
> > return ret;
> > }
> >
> > However, if we remove the module without triggering the irq, the
> > worker will never be removed.
> >
> > As for the schedule, mtk_jpeg_dec_device_run and
> > mtk_jpeg_enc_device_run will start the worker.
> > The schedule invoking is quite complicated. As far as I know, the
> > invoking chain is as follows:
> >
> > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> >
> > the v4l2_m2m_device_run_work function is also a worker which is set
> > in
> > v4l2_m2m_init and started in
> > v4l2_m2m_schedule_next_job.
> >
> > Before calling remove function, the mtk_jpeg_release was invoked to
> > release the related resource.
> >
> > v4l2_m2m_cancel_job will cancel the job by calling
> > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> >
> > But this will only cancel the current queue by
> > list_del(&m2m_dev->curr_ctx->queue);
> >
> > I think this can not cancel the posted task mentioned before. So I
> > think if mtk_jpeg_job_timeout_work
> >
> > is working on, and using jpeg->m2m_dev after freeing it in
> > mtk_jpeg_remove, it will cause a UAF bug.
> >
> > static int mtk_jpeg_release(struct file *file)
> > {
> > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data);
> >
> > mutex_lock(&jpeg->lock);
> > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > v4l2_fh_del(&ctx->fh);
> > v4l2_fh_exit(&ctx->fh);
> > kfree(ctx);
> > mutex_unlock(&jpeg->lock);
> > return 0;
> > }
> >
> > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > {
> > /* wait until the current context is dequeued from job_queue */
> > [2] v4l2_m2m_cancel_job(m2m_ctx);
> >
> > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> >
> > kfree(m2m_ctx);
> > }
> >
> > Note that all of this is static analysis, which may be false
> > positive.
> > Feel free to tell me if there is something I've missed.
> >
> > Regard,
> > Zheng
>
> Dear Zheng,
>
> You set up an application scenario:
> cpu1 is using the mtk-jpeg driver and timeout work has been scheduled.
> At the same time cpu0 wanted to remove the mtk-jpeg driver, which
> caused the UAF bug.
> I wonder if such an irrational application scenario could exist. This
> scenario, as you described, not only leads to the problems you
> mentioned, but also to output&capture memory leaks and unreleased
> resources, such as spinlock.
> Typically, if we want to remove the driver, we firstly do stop
> streaming, which ensures that the worker has been canceled.
> I agree with your changes from the perspective of strengthening the
> robustness of the driver code.
>
> Regards,
> Kyrie.
> >
> > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > >
> > > Dear Angelo and Zheng,
> > >
> > > Thanks for your patch and comments.
> > >
> > > Dear Kyrie,
> > >
> > > Please help to check this, thanks.
> > >
> > > Best Regards
> > >
> > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > > wrote:
> > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > Hi,
> > > > >
> > > > > Is there anyone who can help with this? I can provide more
> > > > > details
> > > > > like invoking chain if needed.
> > > > >
> > > >
> > > > Providing more details is always good. Please do.
> > > >
> > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > > >
> > > > Regards,
> > > > Angelo
> > > >
> > > > > Thanks,
> > > > > Zheng
> > > > >
> > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > >
> > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > > work.
> > > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > possible sequence is as follows, which will cause a
> > > > > > typical UAF bug.
> > > > > >
> > > > > > Fix it by canceling the work before cleanup in the
> > > > > > mtk_jpeg_remove
> > > > > >
> > > > > > CPU0 CPU1
> > > > > >
> > > > > > |mtk_jpeg_job_timeout_work
> > > > > > mtk_jpeg_remove |
> > > > > > v4l2_m2m_release |
> > > > > > kfree(m2m_dev); |
> > > > > > |
> > > > > > | v4l2_m2m_get_curr_priv
> > > > > > | m2m_dev->curr_ctx //use
> > > > > >
> > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > ---
> > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git
> > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > > platform_device *pdev)
> > > > > > static int mtk_jpeg_remove(struct platform_device *pdev)
> > > > > > {
> > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > platform_get_drvdata(pdev);
> > > > > > -
> > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > video_unregister_device(jpeg->vdev);
> > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > --
> > > > > > 2.25.1
> > > > > >
> > > >
> > > >
> > > >

2023-03-08 03:32:35

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> Hi Kyrie,
>
> Thank you for your careful analysis and response. I still have some
> areas that I don't quite understand and would like to ask for
> clarification. That is, how do the function pointers for stop
> streaming, initialized as mtk_jpeg_enc_stop_streaming and
> mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I
> would greatly appreciate your response.
>
> Best regards,
> Zheng

Dear zheng,

For stop streaming, what I mean is that stoppping jpeg decoding or
encoding.
Ok, let me introduce the sw flow of stop streaming:
Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call
v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
running), it will wait event, the event is wake up by
v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg
dec/enc irq handler, which means that the waitting would result mtk hw
to finish dec/enc, irq will occur and irq handler would cancel timeout
worker. The follow is shown as blow.
v4l2_m2m_ioctl_streamoff
v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq
wait evnet <------ wake up ------v4l2_m2m_job_finish
cancel timeout work

As mentioned above, if it is normal stop streaming action, there will
be no happen that the timeout worker does not canceled.

But if mtk_jpeg_remove is called directly without above flow, it would
cause lots of issues.

Regards,
Kyrie.
>
> Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> >
> > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > The timer function was set in mtk_jpeg_probe with
> > > mtk_jpeg_job_timeout_work function.
> > > And the worker is started in mtk_jpeg_dec_device_run.
> > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq)
> > > which
> > > may cancel the worker.
> > > They are used as IRQ handler function which is saved as function
> > > pointer in a variable.
> > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > >
> > > ret = devm_request_irq(&pdev->dev,
> > > jpeg_irq,
> > > jpeg->variant->irq_handler,
> > > 0,
> > > pdev->name, jpeg);
> > > if (ret) {
> > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> > > jpeg_irq, ret);
> > > return ret;
> > > }
> > >
> > > However, if we remove the module without triggering the irq, the
> > > worker will never be removed.
> > >
> > > As for the schedule, mtk_jpeg_dec_device_run and
> > > mtk_jpeg_enc_device_run will start the worker.
> > > The schedule invoking is quite complicated. As far as I know, the
> > > invoking chain is as follows:
> > >
> > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > >
> > > the v4l2_m2m_device_run_work function is also a worker which is
> > > set
> > > in
> > > v4l2_m2m_init and started in
> > > v4l2_m2m_schedule_next_job.
> > >
> > > Before calling remove function, the mtk_jpeg_release was invoked
> > > to
> > > release the related resource.
> > >
> > > v4l2_m2m_cancel_job will cancel the job by calling
> > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > >
> > > But this will only cancel the current queue by
> > > list_del(&m2m_dev->curr_ctx->queue);
> > >
> > > I think this can not cancel the posted task mentioned before. So
> > > I
> > > think if mtk_jpeg_job_timeout_work
> > >
> > > is working on, and using jpeg->m2m_dev after freeing it in
> > > mtk_jpeg_remove, it will cause a UAF bug.
> > >
> > > static int mtk_jpeg_release(struct file *file)
> > > {
> > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > >private_data);
> > >
> > > mutex_lock(&jpeg->lock);
> > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > v4l2_fh_del(&ctx->fh);
> > > v4l2_fh_exit(&ctx->fh);
> > > kfree(ctx);
> > > mutex_unlock(&jpeg->lock);
> > > return 0;
> > > }
> > >
> > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > {
> > > /* wait until the current context is dequeued from job_queue */
> > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > >
> > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > >
> > > kfree(m2m_ctx);
> > > }
> > >
> > > Note that all of this is static analysis, which may be false
> > > positive.
> > > Feel free to tell me if there is something I've missed.
> > >
> > > Regard,
> > > Zheng
> >
> > Dear Zheng,
> >
> > You set up an application scenario:
> > cpu1 is using the mtk-jpeg driver and timeout work has been
> > scheduled.
> > At the same time cpu0 wanted to remove the mtk-jpeg driver, which
> > caused the UAF bug.
> > I wonder if such an irrational application scenario could exist.
> > This
> > scenario, as you described, not only leads to the problems you
> > mentioned, but also to output&capture memory leaks and unreleased
> > resources, such as spinlock.
> > Typically, if we want to remove the driver, we firstly do stop
> > streaming, which ensures that the worker has been canceled.
> > I agree with your changes from the perspective of strengthening the
> > robustness of the driver code.
> >
> > Regards,
> > Kyrie.
> > >
> > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > >
> > > > Dear Angelo and Zheng,
> > > >
> > > > Thanks for your patch and comments.
> > > >
> > > > Dear Kyrie,
> > > >
> > > > Please help to check this, thanks.
> > > >
> > > > Best Regards
> > > >
> > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > > > wrote:
> > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > Hi,
> > > > > >
> > > > > > Is there anyone who can help with this? I can provide more
> > > > > > details
> > > > > > like invoking chain if needed.
> > > > > >
> > > > >
> > > > > Providing more details is always good. Please do.
> > > > >
> > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > > > >
> > > > > Regards,
> > > > > Angelo
> > > > >
> > > > > > Thanks,
> > > > > > Zheng
> > > > > >
> > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > >
> > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > > > work.
> > > > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > possible sequence is as follows, which will cause a
> > > > > > > typical UAF bug.
> > > > > > >
> > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > mtk_jpeg_remove
> > > > > > >
> > > > > > > CPU0 CPU1
> > > > > > >
> > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > mtk_jpeg_remove |
> > > > > > > v4l2_m2m_release |
> > > > > > > kfree(m2m_dev); |
> > > > > > > |
> > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > | m2m_dev->curr_ctx //use
> > > > > > >
> > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > ---
> > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c |
> > > > > > > 2 +-
> > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git
> > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > ---
> > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > +++
> > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > > > platform_device *pdev)
> > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > *pdev)
> > > > > > > {
> > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > platform_get_drvdata(pdev);
> > > > > > > -
> > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > --
> > > > > > > 2.25.1
> > > > > > >
> > > > >
> > > > >
> > > > >

2023-03-08 04:33:09

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi Kyrie,

Thanks for your quick response and detailed information. I understand
more about the design with your explanation. As you say, there
shouldn't be any issues with the normal flow. And if the attacker use
some tricks to enhance the race window and call remove function
directly as [1] mentioned, this might trigger uaf bug unexpectedly.

Besides, I found another code path which may call v4l2_m2m_cancel_job
and finally make the situation looks safe. The invoking chain is

mtk_jpeg_release
v4l2_m2m_ctx_release
v4l2_m2m_cancel_job

If the attacker have to call mtk_jpeg_release function before the
remove function, it might be harmless. But I'm not sure about that.

[1] https://www.usenix.org/system/files/sec21-lee-yoochan.pdf

Best regards,
Zheng

Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
>
> On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > Hi Kyrie,
> >
> > Thank you for your careful analysis and response. I still have some
> > areas that I don't quite understand and would like to ask for
> > clarification. That is, how do the function pointers for stop
> > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I
> > would greatly appreciate your response.
> >
> > Best regards,
> > Zheng
>
> Dear zheng,
>
> For stop streaming, what I mean is that stoppping jpeg decoding or
> encoding.
> Ok, let me introduce the sw flow of stop streaming:
> Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call
> v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> running), it will wait event, the event is wake up by
> v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg
> dec/enc irq handler, which means that the waitting would result mtk hw
> to finish dec/enc, irq will occur and irq handler would cancel timeout
> worker. The follow is shown as blow.
> v4l2_m2m_ioctl_streamoff
> v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq
> wait evnet <------ wake up ------v4l2_m2m_job_finish
> cancel timeout work
>
> As mentioned above, if it is normal stop streaming action, there will
> be no happen that the timeout worker does not canceled.
>
> But if mtk_jpeg_remove is called directly without above flow, it would
> cause lots of issues.
>
> Regards,
> Kyrie.
> >
> > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > >
> > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > The timer function was set in mtk_jpeg_probe with
> > > > mtk_jpeg_job_timeout_work function.
> > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq)
> > > > which
> > > > may cancel the worker.
> > > > They are used as IRQ handler function which is saved as function
> > > > pointer in a variable.
> > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > >
> > > > ret = devm_request_irq(&pdev->dev,
> > > > jpeg_irq,
> > > > jpeg->variant->irq_handler,
> > > > 0,
> > > > pdev->name, jpeg);
> > > > if (ret) {
> > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> > > > jpeg_irq, ret);
> > > > return ret;
> > > > }
> > > >
> > > > However, if we remove the module without triggering the irq, the
> > > > worker will never be removed.
> > > >
> > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > mtk_jpeg_enc_device_run will start the worker.
> > > > The schedule invoking is quite complicated. As far as I know, the
> > > > invoking chain is as follows:
> > > >
> > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > >
> > > > the v4l2_m2m_device_run_work function is also a worker which is
> > > > set
> > > > in
> > > > v4l2_m2m_init and started in
> > > > v4l2_m2m_schedule_next_job.
> > > >
> > > > Before calling remove function, the mtk_jpeg_release was invoked
> > > > to
> > > > release the related resource.
> > > >
> > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > >
> > > > But this will only cancel the current queue by
> > > > list_del(&m2m_dev->curr_ctx->queue);
> > > >
> > > > I think this can not cancel the posted task mentioned before. So
> > > > I
> > > > think if mtk_jpeg_job_timeout_work
> > > >
> > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > >
> > > > static int mtk_jpeg_release(struct file *file)
> > > > {
> > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > >private_data);
> > > >
> > > > mutex_lock(&jpeg->lock);
> > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > v4l2_fh_del(&ctx->fh);
> > > > v4l2_fh_exit(&ctx->fh);
> > > > kfree(ctx);
> > > > mutex_unlock(&jpeg->lock);
> > > > return 0;
> > > > }
> > > >
> > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > {
> > > > /* wait until the current context is dequeued from job_queue */
> > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > >
> > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > >
> > > > kfree(m2m_ctx);
> > > > }
> > > >
> > > > Note that all of this is static analysis, which may be false
> > > > positive.
> > > > Feel free to tell me if there is something I've missed.
> > > >
> > > > Regard,
> > > > Zheng
> > >
> > > Dear Zheng,
> > >
> > > You set up an application scenario:
> > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > scheduled.
> > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which
> > > caused the UAF bug.
> > > I wonder if such an irrational application scenario could exist.
> > > This
> > > scenario, as you described, not only leads to the problems you
> > > mentioned, but also to output&capture memory leaks and unreleased
> > > resources, such as spinlock.
> > > Typically, if we want to remove the driver, we firstly do stop
> > > streaming, which ensures that the worker has been canceled.
> > > I agree with your changes from the perspective of strengthening the
> > > robustness of the driver code.
> > >
> > > Regards,
> > > Kyrie.
> > > >
> > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > > >
> > > > > Dear Angelo and Zheng,
> > > > >
> > > > > Thanks for your patch and comments.
> > > > >
> > > > > Dear Kyrie,
> > > > >
> > > > > Please help to check this, thanks.
> > > > >
> > > > > Best Regards
> > > > >
> > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > > > > wrote:
> > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is there anyone who can help with this? I can provide more
> > > > > > > details
> > > > > > > like invoking chain if needed.
> > > > > > >
> > > > > >
> > > > > > Providing more details is always good. Please do.
> > > > > >
> > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > > > > >
> > > > > > Regards,
> > > > > > Angelo
> > > > > >
> > > > > > > Thanks,
> > > > > > > Zheng
> > > > > > >
> > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > > >
> > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > > > > work.
> > > > > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > typical UAF bug.
> > > > > > > >
> > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > mtk_jpeg_remove
> > > > > > > >
> > > > > > > > CPU0 CPU1
> > > > > > > >
> > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > mtk_jpeg_remove |
> > > > > > > > v4l2_m2m_release |
> > > > > > > > kfree(m2m_dev); |
> > > > > > > > |
> > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > >
> > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c |
> > > > > > > > 2 +-
> > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > ---
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > +++
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > > > > platform_device *pdev)
> > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > *pdev)
> > > > > > > > {
> > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > -
> > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >

2023-03-08 06:10:59

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi Kyrie,

After reviewing the code, I found anothe possible code path. As I am
not familiar with the module. It has high possibility it's wrong.
Could please help me check this? Very much appreciated for your
valuable time.

In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started in
mtk_jpeg_multicore_dec_device_run, which made it running on cpu1.
Inside the mtk_jpeg_multicore_dec_device_run, it will call
schedule_delayed_work to start the timeout_work, which will make it
running on cpu2. Meanwhile, we can call
mtk_jpeg_release to cancel the job. But there might be a race between
mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
v4l2_m2m_job_finish too early to wake up the event.
The remove will go on, the other race is as described earlier.

cpu0 cpu1 cpu2
(1)->device_run
mtk_jpeg_multicore
_dec_device_run
queue_work
(jpeg->workqueue,
&ctx->jpeg_work);
(2)mtk_jpegdec_worker
(3)mtk_jpeg_release
v4l2_m2m_cancel_job
wait event

schedule_delayed_work
(4)mtk_jpeg_job_timeout_work
(5)v4l2_m2m_job_finish
wake up
(6)mtk_jpeg_remove
v4l2_m2m_release
kfree(m2m_dev)
(7)v4l2_m2m_get_curr_priv

Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
>
> On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > Hi Kyrie,
> >
> > Thank you for your careful analysis and response. I still have some
> > areas that I don't quite understand and would like to ask for
> > clarification. That is, how do the function pointers for stop
> > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I
> > would greatly appreciate your response.
> >
> > Best regards,
> > Zheng
>
> Dear zheng,
>
> For stop streaming, what I mean is that stoppping jpeg decoding or
> encoding.
> Ok, let me introduce the sw flow of stop streaming:
> Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call
> v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> running), it will wait event, the event is wake up by
> v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg
> dec/enc irq handler, which means that the waitting would result mtk hw
> to finish dec/enc, irq will occur and irq handler would cancel timeout
> worker. The follow is shown as blow.
> v4l2_m2m_ioctl_streamoff
> v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq
> wait evnet <------ wake up ------v4l2_m2m_job_finish
> cancel timeout work
>
> As mentioned above, if it is normal stop streaming action, there will
> be no happen that the timeout worker does not canceled.
>
> But if mtk_jpeg_remove is called directly without above flow, it would
> cause lots of issues.
>
> Regards,
> Kyrie.
> >
> > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > >
> > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > The timer function was set in mtk_jpeg_probe with
> > > > mtk_jpeg_job_timeout_work function.
> > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq)
> > > > which
> > > > may cancel the worker.
> > > > They are used as IRQ handler function which is saved as function
> > > > pointer in a variable.
> > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > >
> > > > ret = devm_request_irq(&pdev->dev,
> > > > jpeg_irq,
> > > > jpeg->variant->irq_handler,
> > > > 0,
> > > > pdev->name, jpeg);
> > > > if (ret) {
> > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n",
> > > > jpeg_irq, ret);
> > > > return ret;
> > > > }
> > > >
> > > > However, if we remove the module without triggering the irq, the
> > > > worker will never be removed.
> > > >
> > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > mtk_jpeg_enc_device_run will start the worker.
> > > > The schedule invoking is quite complicated. As far as I know, the
> > > > invoking chain is as follows:
> > > >
> > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > >
> > > > the v4l2_m2m_device_run_work function is also a worker which is
> > > > set
> > > > in
> > > > v4l2_m2m_init and started in
> > > > v4l2_m2m_schedule_next_job.
> > > >
> > > > Before calling remove function, the mtk_jpeg_release was invoked
> > > > to
> > > > release the related resource.
> > > >
> > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > >
> > > > But this will only cancel the current queue by
> > > > list_del(&m2m_dev->curr_ctx->queue);
> > > >
> > > > I think this can not cancel the posted task mentioned before. So
> > > > I
> > > > think if mtk_jpeg_job_timeout_work
> > > >
> > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > >
> > > > static int mtk_jpeg_release(struct file *file)
> > > > {
> > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > >private_data);
> > > >
> > > > mutex_lock(&jpeg->lock);
> > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > v4l2_fh_del(&ctx->fh);
> > > > v4l2_fh_exit(&ctx->fh);
> > > > kfree(ctx);
> > > > mutex_unlock(&jpeg->lock);
> > > > return 0;
> > > > }
> > > >
> > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > {
> > > > /* wait until the current context is dequeued from job_queue */
> > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > >
> > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > >
> > > > kfree(m2m_ctx);
> > > > }
> > > >
> > > > Note that all of this is static analysis, which may be false
> > > > positive.
> > > > Feel free to tell me if there is something I've missed.
> > > >
> > > > Regard,
> > > > Zheng
> > >
> > > Dear Zheng,
> > >
> > > You set up an application scenario:
> > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > scheduled.
> > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which
> > > caused the UAF bug.
> > > I wonder if such an irrational application scenario could exist.
> > > This
> > > scenario, as you described, not only leads to the problems you
> > > mentioned, but also to output&capture memory leaks and unreleased
> > > resources, such as spinlock.
> > > Typically, if we want to remove the driver, we firstly do stop
> > > streaming, which ensures that the worker has been canceled.
> > > I agree with your changes from the perspective of strengthening the
> > > robustness of the driver code.
> > >
> > > Regards,
> > > Kyrie.
> > > >
> > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > > >
> > > > > Dear Angelo and Zheng,
> > > > >
> > > > > Thanks for your patch and comments.
> > > > >
> > > > > Dear Kyrie,
> > > > >
> > > > > Please help to check this, thanks.
> > > > >
> > > > > Best Regards
> > > > >
> > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno
> > > > > wrote:
> > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Is there anyone who can help with this? I can provide more
> > > > > > > details
> > > > > > > like invoking chain if needed.
> > > > > > >
> > > > > >
> > > > > > Providing more details is always good. Please do.
> > > > > >
> > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg.
> > > > > >
> > > > > > Regards,
> > > > > > Angelo
> > > > > >
> > > > > > > Thanks,
> > > > > > > Zheng
> > > > > > >
> > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > > >
> > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > > > > > > > and mtk_jpeg_enc_device_run may be called to start the
> > > > > > > > work.
> > > > > > > > If we remove the module which will call mtk_jpeg_remove
> > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > typical UAF bug.
> > > > > > > >
> > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > mtk_jpeg_remove
> > > > > > > >
> > > > > > > > CPU0 CPU1
> > > > > > > >
> > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > mtk_jpeg_remove |
> > > > > > > > v4l2_m2m_release |
> > > > > > > > kfree(m2m_dev); |
> > > > > > > > |
> > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > >
> > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c |
> > > > > > > > 2 +-
> > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > ---
> > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > +++
> > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct
> > > > > > > > platform_device *pdev)
> > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > *pdev)
> > > > > > > > {
> > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > -
> > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > --
> > > > > > > > 2.25.1
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >

2023-03-08 07:39:33

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> Hi Kyrie,
>
> After reviewing the code, I found anothe possible code path. As I am
> not familiar with the module. It has high possibility it's wrong.
> Could please help me check this? Very much appreciated for your
> valuable time.
>
> In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started
> in
> mtk_jpeg_multicore_dec_device_run, which made it running on cpu1.
> Inside the mtk_jpeg_multicore_dec_device_run, it will call
> schedule_delayed_work to start the timeout_work, which will make it
> running on cpu2. Meanwhile, we can call
> mtk_jpeg_release to cancel the job. But there might be a race between
> mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> v4l2_m2m_job_finish too early to wake up the event.
> The remove will go on, the other race is as described earlier.
>
> cpu0 cpu1 cpu2
> (1)->device_run
> mtk_jpeg_multicore
> _dec_device_run
> queue_work
> (jpeg->workqueue,
> &ctx->jpeg_work);
> (2)mtk_jpegdec_worker
> (3)mtk_jpeg_release
> v4l2_m2m_cancel_job
> wait event
>
> schedule_delayed_work
> (4)mtk_jpeg_job_timeout_w
> ork
> (5)v4l2_m2m_job_finish
> wake up
> (6)mtk_jpeg_remove
> v4l2_m2m_release
> kfree(m2m_dev)
> (7)v4l2_m2m_get_curr_priv
>
Dear zheng,

The mtk_jpeg_multicore_dec_device_run function is used for multi-hw
jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work,
mtk_jpegdec_worker is scheduled in this function.

The mtk_jpeg_dec_device_run function is used for single hw jpeg
decoding, which schedules mtk_jpeg_job_timeout_work.

A driver is either a single hw driver or a multi-hw driver and cannot
represent both at the same time.
mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at
the same time.
So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause
competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job.

Regards,
Kyrie.


> Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> >
> > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > Hi Kyrie,
> > >
> > > Thank you for your careful analysis and response. I still have
> > > some
> > > areas that I don't quite understand and would like to ask for
> > > clarification. That is, how do the function pointers for stop
> > > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled?
> > > I
> > > would greatly appreciate your response.
> > >
> > > Best regards,
> > > Zheng
> >
> > Dear zheng,
> >
> > For stop streaming, what I mean is that stoppping jpeg decoding or
> > encoding.
> > Ok, let me introduce the sw flow of stop streaming:
> > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will
> > call
> > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> > running), it will wait event, the event is wake up by
> > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by
> > jpeg
> > dec/enc irq handler, which means that the waitting would result mtk
> > hw
> > to finish dec/enc, irq will occur and irq handler would cancel
> > timeout
> > worker. The follow is shown as blow.
> > v4l2_m2m_ioctl_streamoff
> > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec
> > _irq
> > wait evnet <------ wake up ------v4l2_m2m_job_finish
> > cancel timeout work
> >
> > As mentioned above, if it is normal stop streaming action, there
> > will
> > be no happen that the timeout worker does not canceled.
> >
> > But if mtk_jpeg_remove is called directly without above flow, it
> > would
> > cause lots of issues.
> >
> > Regards,
> > Kyrie.
> > >
> > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > > >
> > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > > The timer function was set in mtk_jpeg_probe with
> > > > > mtk_jpeg_job_timeout_work function.
> > > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > mtk_jpeg_dec_irq)
> > > > > which
> > > > > may cancel the worker.
> > > > > They are used as IRQ handler function which is saved as
> > > > > function
> > > > > pointer in a variable.
> > > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > > >
> > > > > ret = devm_request_irq(&pdev->dev,
> > > > > jpeg_irq,
> > > > > jpeg->variant->irq_handler,
> > > > > 0,
> > > > > pdev->name, jpeg);
> > > > > if (ret) {
> > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d
> > > > > (%d)\n",
> > > > > jpeg_irq, ret);
> > > > > return ret;
> > > > > }
> > > > >
> > > > > However, if we remove the module without triggering the irq,
> > > > > the
> > > > > worker will never be removed.
> > > > >
> > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > The schedule invoking is quite complicated. As far as I know,
> > > > > the
> > > > > invoking chain is as follows:
> > > > >
> > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > > >
> > > > > the v4l2_m2m_device_run_work function is also a worker which
> > > > > is
> > > > > set
> > > > > in
> > > > > v4l2_m2m_init and started in
> > > > > v4l2_m2m_schedule_next_job.
> > > > >
> > > > > Before calling remove function, the mtk_jpeg_release was
> > > > > invoked
> > > > > to
> > > > > release the related resource.
> > > > >
> > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > >
> > > > > But this will only cancel the current queue by
> > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > >
> > > > > I think this can not cancel the posted task mentioned before.
> > > > > So
> > > > > I
> > > > > think if mtk_jpeg_job_timeout_work
> > > > >
> > > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > >
> > > > > static int mtk_jpeg_release(struct file *file)
> > > > > {
> > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > > > private_data);
> > > > >
> > > > > mutex_lock(&jpeg->lock);
> > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > v4l2_fh_del(&ctx->fh);
> > > > > v4l2_fh_exit(&ctx->fh);
> > > > > kfree(ctx);
> > > > > mutex_unlock(&jpeg->lock);
> > > > > return 0;
> > > > > }
> > > > >
> > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > > {
> > > > > /* wait until the current context is dequeued from
> > > > > job_queue */
> > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > >
> > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > >
> > > > > kfree(m2m_ctx);
> > > > > }
> > > > >
> > > > > Note that all of this is static analysis, which may be false
> > > > > positive.
> > > > > Feel free to tell me if there is something I've missed.
> > > > >
> > > > > Regard,
> > > > > Zheng
> > > >
> > > > Dear Zheng,
> > > >
> > > > You set up an application scenario:
> > > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > > scheduled.
> > > > At the same time cpu0 wanted to remove the mtk-jpeg driver,
> > > > which
> > > > caused the UAF bug.
> > > > I wonder if such an irrational application scenario could
> > > > exist.
> > > > This
> > > > scenario, as you described, not only leads to the problems you
> > > > mentioned, but also to output&capture memory leaks and
> > > > unreleased
> > > > resources, such as spinlock.
> > > > Typically, if we want to remove the driver, we firstly do stop
> > > > streaming, which ensures that the worker has been canceled.
> > > > I agree with your changes from the perspective of strengthening
> > > > the
> > > > robustness of the driver code.
> > > >
> > > > Regards,
> > > > Kyrie.
> > > > >
> > > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > > > >
> > > > > > Dear Angelo and Zheng,
> > > > > >
> > > > > > Thanks for your patch and comments.
> > > > > >
> > > > > > Dear Kyrie,
> > > > > >
> > > > > > Please help to check this, thanks.
> > > > > >
> > > > > > Best Regards
> > > > > >
> > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del
> > > > > > Regno
> > > > > > wrote:
> > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Is there anyone who can help with this? I can provide
> > > > > > > > more
> > > > > > > > details
> > > > > > > > like invoking chain if needed.
> > > > > > > >
> > > > > > >
> > > > > > > Providing more details is always good. Please do.
> > > > > > >
> > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-
> > > > > > > jpeg.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Angelo
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Zheng
> > > > > > > >
> > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > > > >
> > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound
> > > > > > > > > with
> > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > and mtk_jpeg_enc_device_run may be called to start
> > > > > > > > > the
> > > > > > > > > work.
> > > > > > > > > If we remove the module which will call
> > > > > > > > > mtk_jpeg_remove
> > > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > > typical UAF bug.
> > > > > > > > >
> > > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > > mtk_jpeg_remove
> > > > > > > > >
> > > > > > > > > CPU0 CPU1
> > > > > > > > >
> > > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > v4l2_m2m_release |
> > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > |
> > > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > > >
> > > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > > |
> > > > > > > > > 2 +-
> > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git
> > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > c
> > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > c
> > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > ---
> > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > c
> > > > > > > > > +++
> > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > c
> > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > platform_device *pdev)
> > > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > > *pdev)
> > > > > > > > > {
> > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > -
> > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >

2023-03-08 09:11:46

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Dear Kyrie,

Sorry for my misunderstanding. I've seen the timeout worker is bound
with mtk_jpegdec_timeout_work rather than mtk_jpeg_job_timeout_work.So
the competition won't happen.

Back to the beginning scene, I still don't know if it's a UAF issue or
not. The normal schedule is very strong to me. If we can call
mtk_jpeg_remove directly without calling mtk_jpeg_release,
The UAF is still possible. Otherwis, I think it's safe here.

Best regards,
Zheng

Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
>
> On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > Hi Kyrie,
> >
> > After reviewing the code, I found anothe possible code path. As I am
> > not familiar with the module. It has high possibility it's wrong.
> > Could please help me check this? Very much appreciated for your
> > valuable time.
> >
> > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started
> > in
> > mtk_jpeg_multicore_dec_device_run, which made it running on cpu1.
> > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > schedule_delayed_work to start the timeout_work, which will make it
> > running on cpu2. Meanwhile, we can call
> > mtk_jpeg_release to cancel the job. But there might be a race between
> > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > v4l2_m2m_job_finish too early to wake up the event.
> > The remove will go on, the other race is as described earlier.
> >
> > cpu0 cpu1 cpu2
> > (1)->device_run
> > mtk_jpeg_multicore
> > _dec_device_run
> > queue_work
> > (jpeg->workqueue,
> > &ctx->jpeg_work);
> > (2)mtk_jpegdec_worker
> > (3)mtk_jpeg_release
> > v4l2_m2m_cancel_job
> > wait event
> >
> > schedule_delayed_work
> > (4)mtk_jpeg_job_timeout_w
> > ork
> > (5)v4l2_m2m_job_finish
> > wake up
> > (6)mtk_jpeg_remove
> > v4l2_m2m_release
> > kfree(m2m_dev)
> > (7)v4l2_m2m_get_curr_priv
> >
> Dear zheng,
>
> The mtk_jpeg_multicore_dec_device_run function is used for multi-hw
> jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work,
> mtk_jpegdec_worker is scheduled in this function.
>
> The mtk_jpeg_dec_device_run function is used for single hw jpeg
> decoding, which schedules mtk_jpeg_job_timeout_work.
>
> A driver is either a single hw driver or a multi-hw driver and cannot
> represent both at the same time.
> mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at
> the same time.
> So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause
> competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job.
>
> Regards,
> Kyrie.
>
>
> > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > >
> > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > Hi Kyrie,
> > > >
> > > > Thank you for your careful analysis and response. I still have
> > > > some
> > > > areas that I don't quite understand and would like to ask for
> > > > clarification. That is, how do the function pointers for stop
> > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled?
> > > > I
> > > > would greatly appreciate your response.
> > > >
> > > > Best regards,
> > > > Zheng
> > >
> > > Dear zheng,
> > >
> > > For stop streaming, what I mean is that stoppping jpeg decoding or
> > > encoding.
> > > Ok, let me introduce the sw flow of stop streaming:
> > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will
> > > call
> > > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> > > running), it will wait event, the event is wake up by
> > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by
> > > jpeg
> > > dec/enc irq handler, which means that the waitting would result mtk
> > > hw
> > > to finish dec/enc, irq will occur and irq handler would cancel
> > > timeout
> > > worker. The follow is shown as blow.
> > > v4l2_m2m_ioctl_streamoff
> > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec
> > > _irq
> > > wait evnet <------ wake up ------v4l2_m2m_job_finish
> > > cancel timeout work
> > >
> > > As mentioned above, if it is normal stop streaming action, there
> > > will
> > > be no happen that the timeout worker does not canceled.
> > >
> > > But if mtk_jpeg_remove is called directly without above flow, it
> > > would
> > > cause lots of issues.
> > >
> > > Regards,
> > > Kyrie.
> > > >
> > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > > > >
> > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > mtk_jpeg_dec_irq)
> > > > > > which
> > > > > > may cancel the worker.
> > > > > > They are used as IRQ handler function which is saved as
> > > > > > function
> > > > > > pointer in a variable.
> > > > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > > > >
> > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > jpeg_irq,
> > > > > > jpeg->variant->irq_handler,
> > > > > > 0,
> > > > > > pdev->name, jpeg);
> > > > > > if (ret) {
> > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d
> > > > > > (%d)\n",
> > > > > > jpeg_irq, ret);
> > > > > > return ret;
> > > > > > }
> > > > > >
> > > > > > However, if we remove the module without triggering the irq,
> > > > > > the
> > > > > > worker will never be removed.
> > > > > >
> > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > The schedule invoking is quite complicated. As far as I know,
> > > > > > the
> > > > > > invoking chain is as follows:
> > > > > >
> > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > > > >
> > > > > > the v4l2_m2m_device_run_work function is also a worker which
> > > > > > is
> > > > > > set
> > > > > > in
> > > > > > v4l2_m2m_init and started in
> > > > > > v4l2_m2m_schedule_next_job.
> > > > > >
> > > > > > Before calling remove function, the mtk_jpeg_release was
> > > > > > invoked
> > > > > > to
> > > > > > release the related resource.
> > > > > >
> > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > >
> > > > > > But this will only cancel the current queue by
> > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > >
> > > > > > I think this can not cancel the posted task mentioned before.
> > > > > > So
> > > > > > I
> > > > > > think if mtk_jpeg_job_timeout_work
> > > > > >
> > > > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > >
> > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > {
> > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > > > > private_data);
> > > > > >
> > > > > > mutex_lock(&jpeg->lock);
> > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > kfree(ctx);
> > > > > > mutex_unlock(&jpeg->lock);
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > > > {
> > > > > > /* wait until the current context is dequeued from
> > > > > > job_queue */
> > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > >
> > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > >
> > > > > > kfree(m2m_ctx);
> > > > > > }
> > > > > >
> > > > > > Note that all of this is static analysis, which may be false
> > > > > > positive.
> > > > > > Feel free to tell me if there is something I've missed.
> > > > > >
> > > > > > Regard,
> > > > > > Zheng
> > > > >
> > > > > Dear Zheng,
> > > > >
> > > > > You set up an application scenario:
> > > > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > > > scheduled.
> > > > > At the same time cpu0 wanted to remove the mtk-jpeg driver,
> > > > > which
> > > > > caused the UAF bug.
> > > > > I wonder if such an irrational application scenario could
> > > > > exist.
> > > > > This
> > > > > scenario, as you described, not only leads to the problems you
> > > > > mentioned, but also to output&capture memory leaks and
> > > > > unreleased
> > > > > resources, such as spinlock.
> > > > > Typically, if we want to remove the driver, we firstly do stop
> > > > > streaming, which ensures that the worker has been canceled.
> > > > > I agree with your changes from the perspective of strengthening
> > > > > the
> > > > > robustness of the driver code.
> > > > >
> > > > > Regards,
> > > > > Kyrie.
> > > > > >
> > > > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > > > > >
> > > > > > > Dear Angelo and Zheng,
> > > > > > >
> > > > > > > Thanks for your patch and comments.
> > > > > > >
> > > > > > > Dear Kyrie,
> > > > > > >
> > > > > > > Please help to check this, thanks.
> > > > > > >
> > > > > > > Best Regards
> > > > > > >
> > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del
> > > > > > > Regno
> > > > > > > wrote:
> > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Is there anyone who can help with this? I can provide
> > > > > > > > > more
> > > > > > > > > details
> > > > > > > > > like invoking chain if needed.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Providing more details is always good. Please do.
> > > > > > > >
> > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-
> > > > > > > > jpeg.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Angelo
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Zheng
> > > > > > > > >
> > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > > > > >
> > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound
> > > > > > > > > > with
> > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > and mtk_jpeg_enc_device_run may be called to start
> > > > > > > > > > the
> > > > > > > > > > work.
> > > > > > > > > > If we remove the module which will call
> > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > > > typical UAF bug.
> > > > > > > > > >
> > > > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > >
> > > > > > > > > > CPU0 CPU1
> > > > > > > > > >
> > > > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > |
> > > > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > >
> > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > > > |
> > > > > > > > > > 2 +-
> > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git
> > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > c
> > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > c
> > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > ---
> > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > c
> > > > > > > > > > +++
> > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > c
> > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > platform_device *pdev)
> > > > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > > > *pdev)
> > > > > > > > > > {
> > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > -
> > > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > --
> > > > > > > > > > 2.25.1
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >

2023-03-09 07:32:44

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Dear Kyrie,

Guenter gave some advice about this issue in [1]. As we are not
familiar with the code here, could you please see if the bug really
exists?

[1] https://lore.kernel.org/all/[email protected]/

Best regards,
Zheng



Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
>
> Dear Kyrie,
>
> Sorry for my misunderstanding. I've seen the timeout worker is bound
> with mtk_jpegdec_timeout_work rather than mtk_jpeg_job_timeout_work.So
> the competition won't happen.
>
> Back to the beginning scene, I still don't know if it's a UAF issue or
> not. The normal schedule is very strong to me. If we can call
> mtk_jpeg_remove directly without calling mtk_jpeg_release,
> The UAF is still possible. Otherwis, I think it's safe here.
>
> Best regards,
> Zheng
>
> Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> >
> > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > Hi Kyrie,
> > >
> > > After reviewing the code, I found anothe possible code path. As I am
> > > not familiar with the module. It has high possibility it's wrong.
> > > Could please help me check this? Very much appreciated for your
> > > valuable time.
> > >
> > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started
> > > in
> > > mtk_jpeg_multicore_dec_device_run, which made it running on cpu1.
> > > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > > schedule_delayed_work to start the timeout_work, which will make it
> > > running on cpu2. Meanwhile, we can call
> > > mtk_jpeg_release to cancel the job. But there might be a race between
> > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > v4l2_m2m_job_finish too early to wake up the event.
> > > The remove will go on, the other race is as described earlier.
> > >
> > > cpu0 cpu1 cpu2
> > > (1)->device_run
> > > mtk_jpeg_multicore
> > > _dec_device_run
> > > queue_work
> > > (jpeg->workqueue,
> > > &ctx->jpeg_work);
> > > (2)mtk_jpegdec_worker
> > > (3)mtk_jpeg_release
> > > v4l2_m2m_cancel_job
> > > wait event
> > >
> > > schedule_delayed_work
> > > (4)mtk_jpeg_job_timeout_w
> > > ork
> > > (5)v4l2_m2m_job_finish
> > > wake up
> > > (6)mtk_jpeg_remove
> > > v4l2_m2m_release
> > > kfree(m2m_dev)
> > > (7)v4l2_m2m_get_curr_priv
> > >
> > Dear zheng,
> >
> > The mtk_jpeg_multicore_dec_device_run function is used for multi-hw
> > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work,
> > mtk_jpegdec_worker is scheduled in this function.
> >
> > The mtk_jpeg_dec_device_run function is used for single hw jpeg
> > decoding, which schedules mtk_jpeg_job_timeout_work.
> >
> > A driver is either a single hw driver or a multi-hw driver and cannot
> > represent both at the same time.
> > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at
> > the same time.
> > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause
> > competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job.
> >
> > Regards,
> > Kyrie.
> >
> >
> > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > > >
> > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > > Hi Kyrie,
> > > > >
> > > > > Thank you for your careful analysis and response. I still have
> > > > > some
> > > > > areas that I don't quite understand and would like to ask for
> > > > > clarification. That is, how do the function pointers for stop
> > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled?
> > > > > I
> > > > > would greatly appreciate your response.
> > > > >
> > > > > Best regards,
> > > > > Zheng
> > > >
> > > > Dear zheng,
> > > >
> > > > For stop streaming, what I mean is that stoppping jpeg decoding or
> > > > encoding.
> > > > Ok, let me introduce the sw flow of stop streaming:
> > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will
> > > > call
> > > > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is
> > > > running), it will wait event, the event is wake up by
> > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by
> > > > jpeg
> > > > dec/enc irq handler, which means that the waitting would result mtk
> > > > hw
> > > > to finish dec/enc, irq will occur and irq handler would cancel
> > > > timeout
> > > > worker. The follow is shown as blow.
> > > > v4l2_m2m_ioctl_streamoff
> > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec
> > > > _irq
> > > > wait evnet <------ wake up ------v4l2_m2m_job_finish
> > > > cancel timeout work
> > > >
> > > > As mentioned above, if it is normal stop streaming action, there
> > > > will
> > > > be no happen that the timeout worker does not canceled.
> > > >
> > > > But if mtk_jpeg_remove is called directly without above flow, it
> > > > would
> > > > cause lots of issues.
> > > >
> > > > Regards,
> > > > Kyrie.
> > > > >
> > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > > > > >
> > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > mtk_jpeg_dec_irq)
> > > > > > > which
> > > > > > > may cancel the worker.
> > > > > > > They are used as IRQ handler function which is saved as
> > > > > > > function
> > > > > > > pointer in a variable.
> > > > > > > In mtk_jpeg_probe, they are registered by devm_request_irq:
> > > > > > >
> > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > jpeg_irq,
> > > > > > > jpeg->variant->irq_handler,
> > > > > > > 0,
> > > > > > > pdev->name, jpeg);
> > > > > > > if (ret) {
> > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d
> > > > > > > (%d)\n",
> > > > > > > jpeg_irq, ret);
> > > > > > > return ret;
> > > > > > > }
> > > > > > >
> > > > > > > However, if we remove the module without triggering the irq,
> > > > > > > the
> > > > > > > worker will never be removed.
> > > > > > >
> > > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > The schedule invoking is quite complicated. As far as I know,
> > > > > > > the
> > > > > > > invoking chain is as follows:
> > > > > > >
> > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run
> > > > > > >
> > > > > > > the v4l2_m2m_device_run_work function is also a worker which
> > > > > > > is
> > > > > > > set
> > > > > > > in
> > > > > > > v4l2_m2m_init and started in
> > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > >
> > > > > > > Before calling remove function, the mtk_jpeg_release was
> > > > > > > invoked
> > > > > > > to
> > > > > > > release the related resource.
> > > > > > >
> > > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > >
> > > > > > > But this will only cancel the current queue by
> > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > >
> > > > > > > I think this can not cancel the posted task mentioned before.
> > > > > > > So
> > > > > > > I
> > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > >
> > > > > > > is working on, and using jpeg->m2m_dev after freeing it in
> > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > >
> > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > {
> > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > > > > > private_data);
> > > > > > >
> > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > kfree(ctx);
> > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > return 0;
> > > > > > > }
> > > > > > >
> > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > > > > {
> > > > > > > /* wait until the current context is dequeued from
> > > > > > > job_queue */
> > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > >
> > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > >
> > > > > > > kfree(m2m_ctx);
> > > > > > > }
> > > > > > >
> > > > > > > Note that all of this is static analysis, which may be false
> > > > > > > positive.
> > > > > > > Feel free to tell me if there is something I've missed.
> > > > > > >
> > > > > > > Regard,
> > > > > > > Zheng
> > > > > >
> > > > > > Dear Zheng,
> > > > > >
> > > > > > You set up an application scenario:
> > > > > > cpu1 is using the mtk-jpeg driver and timeout work has been
> > > > > > scheduled.
> > > > > > At the same time cpu0 wanted to remove the mtk-jpeg driver,
> > > > > > which
> > > > > > caused the UAF bug.
> > > > > > I wonder if such an irrational application scenario could
> > > > > > exist.
> > > > > > This
> > > > > > scenario, as you described, not only leads to the problems you
> > > > > > mentioned, but also to output&capture memory leaks and
> > > > > > unreleased
> > > > > > resources, such as spinlock.
> > > > > > Typically, if we want to remove the driver, we firstly do stop
> > > > > > streaming, which ensures that the worker has been canceled.
> > > > > > I agree with your changes from the perspective of strengthening
> > > > > > the
> > > > > > robustness of the driver code.
> > > > > >
> > > > > > Regards,
> > > > > > Kyrie.
> > > > > > >
> > > > > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二 18:23写道:
> > > > > > > >
> > > > > > > > Dear Angelo and Zheng,
> > > > > > > >
> > > > > > > > Thanks for your patch and comments.
> > > > > > > >
> > > > > > > > Dear Kyrie,
> > > > > > > >
> > > > > > > > Please help to check this, thanks.
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > >
> > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del
> > > > > > > > Regno
> > > > > > > > wrote:
> > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > Is there anyone who can help with this? I can provide
> > > > > > > > > > more
> > > > > > > > > > details
> > > > > > > > > > like invoking chain if needed.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Providing more details is always good. Please do.
> > > > > > > > >
> > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-
> > > > > > > > > jpeg.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Angelo
> > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Zheng
> > > > > > > > > >
> > > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一 14:28写道:
> > > > > > > > > > >
> > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound
> > > > > > > > > > > with
> > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to start
> > > > > > > > > > > the
> > > > > > > > > > > work.
> > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > to make cleanup, there may be a unfinished work. The
> > > > > > > > > > > possible sequence is as follows, which will cause a
> > > > > > > > > > > typical UAF bug.
> > > > > > > > > > >
> > > > > > > > > > > Fix it by canceling the work before cleanup in the
> > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > >
> > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > >
> > > > > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > |
> > > > > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > > > > | m2m_dev->curr_ctx //use
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > >
> > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > > > > > > > > > > |
> > > > > > > > > > > 2 +-
> > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git
> > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > > c
> > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > > c
> > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > ---
> > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > > c
> > > > > > > > > > > +++
> > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.
> > > > > > > > > > > c
> > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > static int mtk_jpeg_remove(struct platform_device
> > > > > > > > > > > *pdev)
> > > > > > > > > > > {
> > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > -
> > > > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work);
> > > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >

2023-03-14 02:50:08

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> Dear Kyrie,
>
> Guenter gave some advice about this issue in [1]. As we are not
> familiar with the code here, could you please see if the bug really
> exists?
>
> [1]
> https://lore.kernel.org/all/[email protected]/
>
> Best regards,
> Zheng
Dear zheng,

I have no more questions for your patch, thank you for your
contribution to mtk jpeg driver.

Regards,
Kyrie.
>
>
>
> Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> >
> > Dear Kyrie,
> >
> > Sorry for my misunderstanding. I've seen the timeout worker is
> > bound
> > with mtk_jpegdec_timeout_work rather than
> > mtk_jpeg_job_timeout_work.So
> > the competition won't happen.
> >
> > Back to the beginning scene, I still don't know if it's a UAF issue
> > or
> > not. The normal schedule is very strong to me. If we can call
> > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > The UAF is still possible. Otherwis, I think it's safe here.
> >
> > Best regards,
> > Zheng
> >
> > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > >
> > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > Hi Kyrie,
> > > >
> > > > After reviewing the code, I found anothe possible code path. As
> > > > I am
> > > > not familiar with the module. It has high possibility it's
> > > > wrong.
> > > > Could please help me check this? Very much appreciated for your
> > > > valuable time.
> > > >
> > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and
> > > > started
> > > > in
> > > > mtk_jpeg_multicore_dec_device_run, which made it running on
> > > > cpu1.
> > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > > > schedule_delayed_work to start the timeout_work, which will
> > > > make it
> > > > running on cpu2. Meanwhile, we can call
> > > > mtk_jpeg_release to cancel the job. But there might be a race
> > > > between
> > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > The remove will go on, the other race is as described earlier.
> > > >
> > > > cpu0 cpu1 cpu2
> > > > (1)->device_run
> > > > mtk_jpeg_multicore
> > > > _dec_device_run
> > > > queue_work
> > > > (jpeg->workqueue,
> > > > &ctx->jpeg_work);
> > > > (2)mtk_jpegdec_worker
> > > > (3)mtk_jpeg_release
> > > > v4l2_m2m_cancel_job
> > > > wait event
> > > >
> > > > schedule_delayed_work
> > > > (4)mtk_jpeg_job_tim
> > > > eout_w
> > > > ork
> > > > (5)v4l2_m2m_job_finish
> > > > wake up
> > > > (6)mtk_jpeg_remove
> > > > v4l2_m2m_release
> > > > kfree(m2m_dev)
> > > > (7)v4l2_m2m_get_cur
> > > > r_priv
> > > >
> > >
> > > Dear zheng,
> > >
> > > The mtk_jpeg_multicore_dec_device_run function is used for multi-
> > > hw
> > > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work,
> > > mtk_jpegdec_worker is scheduled in this function.
> > >
> > > The mtk_jpeg_dec_device_run function is used for single hw jpeg
> > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > >
> > > A driver is either a single hw driver or a multi-hw driver and
> > > cannot
> > > represent both at the same time.
> > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be
> > > scheduled at
> > > the same time.
> > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not
> > > cause
> > > competition between the mtk_jpegdec_worker and
> > > v4l2_m2m_cancel_job.
> > >
> > > Regards,
> > > Kyrie.
> > >
> > >
> > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > > > >
> > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > > > Hi Kyrie,
> > > > > >
> > > > > > Thank you for your careful analysis and response. I still
> > > > > > have
> > > > > > some
> > > > > > areas that I don't quite understand and would like to ask
> > > > > > for
> > > > > > clarification. That is, how do the function pointers for
> > > > > > stop
> > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is
> > > > > > canceled?
> > > > > > I
> > > > > > would greatly appreciate your response.
> > > > > >
> > > > > > Best regards,
> > > > > > Zheng
> > > > >
> > > > > Dear zheng,
> > > > >
> > > > > For stop streaming, what I mean is that stoppping jpeg
> > > > > decoding or
> > > > > encoding.
> > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which
> > > > > will
> > > > > call
> > > > > v4l2_m2m_cancel_job, if it finds a job running(as you note,
> > > > > cpu1 is
> > > > > running), it will wait event, the event is wake up by
> > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is
> > > > > called by
> > > > > jpeg
> > > > > dec/enc irq handler, which means that the waitting would
> > > > > result mtk
> > > > > hw
> > > > > to finish dec/enc, irq will occur and irq handler would
> > > > > cancel
> > > > > timeout
> > > > > worker. The follow is shown as blow.
> > > > > v4l2_m2m_ioctl_streamoff
> > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jp
> > > > > eg_dec
> > > > > _irq
> > > > > wait evnet <------ wake up ------v4l2_m2m_job_finish
> > > > > cancel timeout work
> > > > >
> > > > > As mentioned above, if it is normal stop streaming action,
> > > > > there
> > > > > will
> > > > > be no happen that the timeout worker does not canceled.
> > > > >
> > > > > But if mtk_jpeg_remove is called directly without above flow,
> > > > > it
> > > > > would
> > > > > cause lots of issues.
> > > > >
> > > > > Regards,
> > > > > Kyrie.
> > > > > >
> > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > > > > > >
> > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > which
> > > > > > > > may cancel the worker.
> > > > > > > > They are used as IRQ handler function which is saved as
> > > > > > > > function
> > > > > > > > pointer in a variable.
> > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > devm_request_irq:
> > > > > > > >
> > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > jpeg_irq,
> > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > 0,
> > > > > > > > pdev->name, jpeg);
> > > > > > > > if (ret) {
> > > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq
> > > > > > > > %d
> > > > > > > > (%d)\n",
> > > > > > > > jpeg_irq, ret);
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > However, if we remove the module without triggering the
> > > > > > > > irq,
> > > > > > > > the
> > > > > > > > worker will never be removed.
> > > > > > > >
> > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > The schedule invoking is quite complicated. As far as I
> > > > > > > > know,
> > > > > > > > the
> > > > > > > > invoking chain is as follows:
> > > > > > > >
> > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > >v4l2_m2m_try_run
> > > > > > > >
> > > > > > > > the v4l2_m2m_device_run_work function is also a worker
> > > > > > > > which
> > > > > > > > is
> > > > > > > > set
> > > > > > > > in
> > > > > > > > v4l2_m2m_init and started in
> > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > >
> > > > > > > > Before calling remove function, the mtk_jpeg_release
> > > > > > > > was
> > > > > > > > invoked
> > > > > > > > to
> > > > > > > > release the related resource.
> > > > > > > >
> > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > >
> > > > > > > > But this will only cancel the current queue by
> > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > >
> > > > > > > > I think this can not cancel the posted task mentioned
> > > > > > > > before.
> > > > > > > > So
> > > > > > > > I
> > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > >
> > > > > > > > is working on, and using jpeg->m2m_dev after freeing it
> > > > > > > > in
> > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > >
> > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > {
> > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > private_data);
> > > > > > > >
> > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > kfree(ctx);
> > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > return 0;
> > > > > > > > }
> > > > > > > >
> > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > > > > > {
> > > > > > > > /* wait until the current context is dequeued from
> > > > > > > > job_queue */
> > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > >
> > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > >
> > > > > > > > kfree(m2m_ctx);
> > > > > > > > }
> > > > > > > >
> > > > > > > > Note that all of this is static analysis, which may be
> > > > > > > > false
> > > > > > > > positive.
> > > > > > > > Feel free to tell me if there is something I've missed.
> > > > > > > >
> > > > > > > > Regard,
> > > > > > > > Zheng
> > > > > > >
> > > > > > > Dear Zheng,
> > > > > > >
> > > > > > > You set up an application scenario:
> > > > > > > cpu1 is using the mtk-jpeg driver and timeout work has
> > > > > > > been
> > > > > > > scheduled.
> > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg
> > > > > > > driver,
> > > > > > > which
> > > > > > > caused the UAF bug.
> > > > > > > I wonder if such an irrational application scenario could
> > > > > > > exist.
> > > > > > > This
> > > > > > > scenario, as you described, not only leads to the
> > > > > > > problems you
> > > > > > > mentioned, but also to output&capture memory leaks and
> > > > > > > unreleased
> > > > > > > resources, such as spinlock.
> > > > > > > Typically, if we want to remove the driver, we firstly do
> > > > > > > stop
> > > > > > > streaming, which ensures that the worker has been
> > > > > > > canceled.
> > > > > > > I agree with your changes from the perspective of
> > > > > > > strengthening
> > > > > > > the
> > > > > > > robustness of the driver code.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kyrie.
> > > > > > > >
> > > > > > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二
> > > > > > > > 18:23写道:
> > > > > > > > >
> > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > >
> > > > > > > > > Thanks for your patch and comments.
> > > > > > > > >
> > > > > > > > > Dear Kyrie,
> > > > > > > > >
> > > > > > > > > Please help to check this, thanks.
> > > > > > > > >
> > > > > > > > > Best Regards
> > > > > > > > >
> > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino
> > > > > > > > > Del
> > > > > > > > > Regno
> > > > > > > > > wrote:
> > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > Is there anyone who can help with this? I can
> > > > > > > > > > > provide
> > > > > > > > > > > more
> > > > > > > > > > > details
> > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Providing more details is always good. Please do.
> > > > > > > > > >
> > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing
> > > > > > > > > > mtk-
> > > > > > > > > > jpeg.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Angelo
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Zheng
> > > > > > > > > > >
> > > > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一
> > > > > > > > > > > 14:28写道:
> > > > > > > > > > > >
> > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is
> > > > > > > > > > > > bound
> > > > > > > > > > > > with
> > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to
> > > > > > > > > > > > start
> > > > > > > > > > > > the
> > > > > > > > > > > > work.
> > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > to make cleanup, there may be a unfinished
> > > > > > > > > > > > work. The
> > > > > > > > > > > > possible sequence is as follows, which will
> > > > > > > > > > > > cause a
> > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > >
> > > > > > > > > > > > Fix it by canceling the work before cleanup in
> > > > > > > > > > > > the
> > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > >
> > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > >
> > > > > > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > |
> > > > > > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > > > > > | m2m_dev->curr_ctx
> > > > > > > > > > > > //use
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > >
> > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_c
> > > > > > > > > > > > ore.c
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2 +-
> > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git
> > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > _core.
> > > > > > > > > > > > c
> > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > _core.
> > > > > > > > > > > > c
> > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > ---
> > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > _core.
> > > > > > > > > > > > c
> > > > > > > > > > > > +++
> > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > _core.
> > > > > > > > > > > > c
> > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > platform_device
> > > > > > > > > > > > *pdev)
> > > > > > > > > > > > {
> > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > -
> > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > >job_timeout_work);
> > > > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >

2023-03-14 03:38:25

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
>
> On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > Dear Kyrie,
> >
> > Guenter gave some advice about this issue in [1]. As we are not
> > familiar with the code here, could you please see if the bug really
> > exists?
> >
> > [1]
> > https://lore.kernel.org/all/[email protected]/
> >
> > Best regards,
> > Zheng
> Dear zheng,
>
> I have no more questions for your patch, thank you for your
> contribution to mtk jpeg driver.
>
Thanks for your detailed review and explanation about the code. I have
learned a lot
from it. Have a nice day :)

Best regards,
Zheng

> >
> >
> >
> > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > >
> > > Dear Kyrie,
> > >
> > > Sorry for my misunderstanding. I've seen the timeout worker is
> > > bound
> > > with mtk_jpegdec_timeout_work rather than
> > > mtk_jpeg_job_timeout_work.So
> > > the competition won't happen.
> > >
> > > Back to the beginning scene, I still don't know if it's a UAF issue
> > > or
> > > not. The normal schedule is very strong to me. If we can call
> > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > The UAF is still possible. Otherwis, I think it's safe here.
> > >
> > > Best regards,
> > > Zheng
> > >
> > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > >
> > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > Hi Kyrie,
> > > > >
> > > > > After reviewing the code, I found anothe possible code path. As
> > > > > I am
> > > > > not familiar with the module. It has high possibility it's
> > > > > wrong.
> > > > > Could please help me check this? Very much appreciated for your
> > > > > valuable time.
> > > > >
> > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and
> > > > > started
> > > > > in
> > > > > mtk_jpeg_multicore_dec_device_run, which made it running on
> > > > > cpu1.
> > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > > > > schedule_delayed_work to start the timeout_work, which will
> > > > > make it
> > > > > running on cpu2. Meanwhile, we can call
> > > > > mtk_jpeg_release to cancel the job. But there might be a race
> > > > > between
> > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > The remove will go on, the other race is as described earlier.
> > > > >
> > > > > cpu0 cpu1 cpu2
> > > > > (1)->device_run
> > > > > mtk_jpeg_multicore
> > > > > _dec_device_run
> > > > > queue_work
> > > > > (jpeg->workqueue,
> > > > > &ctx->jpeg_work);
> > > > > (2)mtk_jpegdec_worker
> > > > > (3)mtk_jpeg_release
> > > > > v4l2_m2m_cancel_job
> > > > > wait event
> > > > >
> > > > > schedule_delayed_work
> > > > > (4)mtk_jpeg_job_tim
> > > > > eout_w
> > > > > ork
> > > > > (5)v4l2_m2m_job_finish
> > > > > wake up
> > > > > (6)mtk_jpeg_remove
> > > > > v4l2_m2m_release
> > > > > kfree(m2m_dev)
> > > > > (7)v4l2_m2m_get_cur
> > > > > r_priv
> > > > >
> > > >
> > > > Dear zheng,
> > > >
> > > > The mtk_jpeg_multicore_dec_device_run function is used for multi-
> > > > hw
> > > > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work,
> > > > mtk_jpegdec_worker is scheduled in this function.
> > > >
> > > > The mtk_jpeg_dec_device_run function is used for single hw jpeg
> > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > >
> > > > A driver is either a single hw driver or a multi-hw driver and
> > > > cannot
> > > > represent both at the same time.
> > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be
> > > > scheduled at
> > > > the same time.
> > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not
> > > > cause
> > > > competition between the mtk_jpegdec_worker and
> > > > v4l2_m2m_cancel_job.
> > > >
> > > > Regards,
> > > > Kyrie.
> > > >
> > > >
> > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > > > > >
> > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > > > > Hi Kyrie,
> > > > > > >
> > > > > > > Thank you for your careful analysis and response. I still
> > > > > > > have
> > > > > > > some
> > > > > > > areas that I don't quite understand and would like to ask
> > > > > > > for
> > > > > > > clarification. That is, how do the function pointers for
> > > > > > > stop
> > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and
> > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is
> > > > > > > canceled?
> > > > > > > I
> > > > > > > would greatly appreciate your response.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Zheng
> > > > > >
> > > > > > Dear zheng,
> > > > > >
> > > > > > For stop streaming, what I mean is that stoppping jpeg
> > > > > > decoding or
> > > > > > encoding.
> > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which
> > > > > > will
> > > > > > call
> > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you note,
> > > > > > cpu1 is
> > > > > > running), it will wait event, the event is wake up by
> > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is
> > > > > > called by
> > > > > > jpeg
> > > > > > dec/enc irq handler, which means that the waitting would
> > > > > > result mtk
> > > > > > hw
> > > > > > to finish dec/enc, irq will occur and irq handler would
> > > > > > cancel
> > > > > > timeout
> > > > > > worker. The follow is shown as blow.
> > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jp
> > > > > > eg_dec
> > > > > > _irq
> > > > > > wait evnet <------ wake up ------v4l2_m2m_job_finish
> > > > > > cancel timeout work
> > > > > >
> > > > > > As mentioned above, if it is normal stop streaming action,
> > > > > > there
> > > > > > will
> > > > > > be no happen that the timeout worker does not canceled.
> > > > > >
> > > > > > But if mtk_jpeg_remove is called directly without above flow,
> > > > > > it
> > > > > > would
> > > > > > cause lots of issues.
> > > > > >
> > > > > > Regards,
> > > > > > Kyrie.
> > > > > > >
> > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 10:02写道:
> > > > > > > >
> > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote:
> > > > > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > And the worker is started in mtk_jpeg_dec_device_run.
> > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > which
> > > > > > > > > may cancel the worker.
> > > > > > > > > They are used as IRQ handler function which is saved as
> > > > > > > > > function
> > > > > > > > > pointer in a variable.
> > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > devm_request_irq:
> > > > > > > > >
> > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > jpeg_irq,
> > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > 0,
> > > > > > > > > pdev->name, jpeg);
> > > > > > > > > if (ret) {
> > > > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq
> > > > > > > > > %d
> > > > > > > > > (%d)\n",
> > > > > > > > > jpeg_irq, ret);
> > > > > > > > > return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > However, if we remove the module without triggering the
> > > > > > > > > irq,
> > > > > > > > > the
> > > > > > > > > worker will never be removed.
> > > > > > > > >
> > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > The schedule invoking is quite complicated. As far as I
> > > > > > > > > know,
> > > > > > > > > the
> > > > > > > > > invoking chain is as follows:
> > > > > > > > >
> > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > >v4l2_m2m_try_run
> > > > > > > > >
> > > > > > > > > the v4l2_m2m_device_run_work function is also a worker
> > > > > > > > > which
> > > > > > > > > is
> > > > > > > > > set
> > > > > > > > > in
> > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > >
> > > > > > > > > Before calling remove function, the mtk_jpeg_release
> > > > > > > > > was
> > > > > > > > > invoked
> > > > > > > > > to
> > > > > > > > > release the related resource.
> > > > > > > > >
> > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > >
> > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > >
> > > > > > > > > I think this can not cancel the posted task mentioned
> > > > > > > > > before.
> > > > > > > > > So
> > > > > > > > > I
> > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > >
> > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it
> > > > > > > > > in
> > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > >
> > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > {
> > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > private_data);
> > > > > > > > >
> > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > kfree(ctx);
> > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > return 0;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx)
> > > > > > > > > {
> > > > > > > > > /* wait until the current context is dequeued from
> > > > > > > > > job_queue */
> > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > >
> > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > >
> > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Note that all of this is static analysis, which may be
> > > > > > > > > false
> > > > > > > > > positive.
> > > > > > > > > Feel free to tell me if there is something I've missed.
> > > > > > > > >
> > > > > > > > > Regard,
> > > > > > > > > Zheng
> > > > > > > >
> > > > > > > > Dear Zheng,
> > > > > > > >
> > > > > > > > You set up an application scenario:
> > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work has
> > > > > > > > been
> > > > > > > > scheduled.
> > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg
> > > > > > > > driver,
> > > > > > > > which
> > > > > > > > caused the UAF bug.
> > > > > > > > I wonder if such an irrational application scenario could
> > > > > > > > exist.
> > > > > > > > This
> > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > problems you
> > > > > > > > mentioned, but also to output&capture memory leaks and
> > > > > > > > unreleased
> > > > > > > > resources, such as spinlock.
> > > > > > > > Typically, if we want to remove the driver, we firstly do
> > > > > > > > stop
> > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > canceled.
> > > > > > > > I agree with your changes from the perspective of
> > > > > > > > strengthening
> > > > > > > > the
> > > > > > > > robustness of the driver code.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Kyrie.
> > > > > > > > >
> > > > > > > > > Irui Wang (王瑞) <[email protected]> 于2023年3月7日周二
> > > > > > > > > 18:23写道:
> > > > > > > > > >
> > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > >
> > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > >
> > > > > > > > > > Dear Kyrie,
> > > > > > > > > >
> > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > >
> > > > > > > > > > Best Regards
> > > > > > > > > >
> > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino
> > > > > > > > > > Del
> > > > > > > > > > Regno
> > > > > > > > > > wrote:
> > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > Is there anyone who can help with this? I can
> > > > > > > > > > > > provide
> > > > > > > > > > > > more
> > > > > > > > > > > > details
> > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Providing more details is always good. Please do.
> > > > > > > > > > >
> > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing
> > > > > > > > > > > mtk-
> > > > > > > > > > > jpeg.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Angelo
> > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Zheng
> > > > > > > > > > > >
> > > > > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一
> > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is
> > > > > > > > > > > > > bound
> > > > > > > > > > > > > with
> > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to
> > > > > > > > > > > > > start
> > > > > > > > > > > > > the
> > > > > > > > > > > > > work.
> > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > to make cleanup, there may be a unfinished
> > > > > > > > > > > > > work. The
> > > > > > > > > > > > > possible sequence is as follows, which will
> > > > > > > > > > > > > cause a
> > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Fix it by canceling the work before cleanup in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > >
> > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > >
> > > > > > > > > > > > > |mtk_jpeg_job_timeout_work
> > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > |
> > > > > > > > > > > > > | v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > | m2m_dev->curr_ctx
> > > > > > > > > > > > > //use
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Zheng Wang <[email protected]>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >
> > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_c
> > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > > _core.
> > > > > > > > > > > > > c
> > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > > _core.
> > > > > > > > > > > > > c
> > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > ---
> > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > > _core.
> > > > > > > > > > > > > c
> > > > > > > > > > > > > +++
> > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg
> > > > > > > > > > > > > _core.
> > > > > > > > > > > > > c
> > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > -
> > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > >job_timeout_work);
> > > > > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > > > > video_unregister_device(jpeg->vdev);
> > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >

2023-05-12 01:58:07

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote:
> Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
> >
> > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > > Dear Kyrie,
> > >
> > > Guenter gave some advice about this issue in [1]. As we are not
> > > familiar with the code here, could you please see if the bug
> > > really
> > > exists?
> > >
> > > [1]
> > >
https://lore.kernel.org/all/[email protected]/
> > >
> > > Best regards,
> > > Zheng
> >
> > Dear zheng,
> >
> > I have no more questions for your patch, thank you for your
> > contribution to mtk jpeg driver.
> >
>
> Thanks for your detailed review and explanation about the code. I
> have
> learned a lot
> from it. Have a nice day :)
>
> Best regards,
> Zheng
Dear Zheng,

I think that cancel_delayed_work, added in remove function, should
change to cancel_delayed_work_sync to solve UAF. The corresponding
operation flow is shown as follow:
CPU0 CPU1
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
cancel_delayed_work |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use

If cancel_delayed_work is called in async mode like above, and the
m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this
calling may cause UAF.

Thanks.
>
> > >
> > >
> > >
> > > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > > >
> > > > Dear Kyrie,
> > > >
> > > > Sorry for my misunderstanding. I've seen the timeout worker is
> > > > bound
> > > > with mtk_jpegdec_timeout_work rather than
> > > > mtk_jpeg_job_timeout_work.So
> > > > the competition won't happen.
> > > >
> > > > Back to the beginning scene, I still don't know if it's a UAF
> > > > issue
> > > > or
> > > > not. The normal schedule is very strong to me. If we can call
> > > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > > The UAF is still possible. Otherwis, I think it's safe here.
> > > >
> > > > Best regards,
> > > > Zheng
> > > >
> > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > > >
> > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > > Hi Kyrie,
> > > > > >
> > > > > > After reviewing the code, I found anothe possible code
> > > > > > path. As
> > > > > > I am
> > > > > > not familiar with the module. It has high possibility it's
> > > > > > wrong.
> > > > > > Could please help me check this? Very much appreciated for
> > > > > > your
> > > > > > valuable time.
> > > > > >
> > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and
> > > > > > started
> > > > > > in
> > > > > > mtk_jpeg_multicore_dec_device_run, which made it running on
> > > > > > cpu1.
> > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > > > > > schedule_delayed_work to start the timeout_work, which
> > > > > > will
> > > > > > make it
> > > > > > running on cpu2. Meanwhile, we can call
> > > > > > mtk_jpeg_release to cancel the job. But there might be a
> > > > > > race
> > > > > > between
> > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > > The remove will go on, the other race is as described
> > > > > > earlier.
> > > > > >
> > > > > > cpu0 cpu1 cpu2
> > > > > > (1)->device_run
> > > > > > mtk_jpeg_multicore
> > > > > > _dec_device_run
> > > > > > queue_work
> > > > > > (jpeg->workqueue,
> > > > > > &ctx->jpeg_work);
> > > > > > (2)mtk_jpegdec_worker
> > > > > > (3)mtk_jpeg_release
> > > > > > v4l2_m2m_cancel_job
> > > > > > wait event
> > > > > >
> > > > > > schedule_delayed_work
> > > > > > (4)mtk_jpeg_job
> > > > > > _tim
> > > > > > eout_w
> > > > > > ork
> > > > > > (5)v4l2_m2m_job_finish
> > > > > > wake up
> > > > > > (6)mtk_jpeg_remove
> > > > > > v4l2_m2m_release
> > > > > > kfree(m2m_dev)
> > > > > > (7)v4l2_m2m_get
> > > > > > _cur
> > > > > > r_priv
> > > > > >
> > > > >
> > > > > Dear zheng,
> > > > >
> > > > > The mtk_jpeg_multicore_dec_device_run function is used for
> > > > > multi-
> > > > > hw
> > > > > jpeg decoding. Instead of scheduling
> > > > > mtk_jpeg_job_timeout_work,
> > > > > mtk_jpegdec_worker is scheduled in this function.
> > > > >
> > > > > The mtk_jpeg_dec_device_run function is used for single hw
> > > > > jpeg
> > > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > > >
> > > > > A driver is either a single hw driver or a multi-hw driver
> > > > > and
> > > > > cannot
> > > > > represent both at the same time.
> > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be
> > > > > scheduled at
> > > > > the same time.
> > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would
> > > > > not
> > > > > cause
> > > > > competition between the mtk_jpegdec_worker and
> > > > > v4l2_m2m_cancel_job.
> > > > >
> > > > > Regards,
> > > > > Kyrie.
> > > > >
> > > > >
> > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > > > > > >
> > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > > > > > Hi Kyrie,
> > > > > > > >
> > > > > > > > Thank you for your careful analysis and response. I
> > > > > > > > still
> > > > > > > > have
> > > > > > > > some
> > > > > > > > areas that I don't quite understand and would like to
> > > > > > > > ask
> > > > > > > > for
> > > > > > > > clarification. That is, how do the function pointers
> > > > > > > > for
> > > > > > > > stop
> > > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming
> > > > > > > > and
> > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is
> > > > > > > > canceled?
> > > > > > > > I
> > > > > > > > would greatly appreciate your response.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Zheng
> > > > > > >
> > > > > > > Dear zheng,
> > > > > > >
> > > > > > > For stop streaming, what I mean is that stoppping jpeg
> > > > > > > decoding or
> > > > > > > encoding.
> > > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff,
> > > > > > > which
> > > > > > > will
> > > > > > > call
> > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you
> > > > > > > note,
> > > > > > > cpu1 is
> > > > > > > running), it will wait event, the event is wake up by
> > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is
> > > > > > > called by
> > > > > > > jpeg
> > > > > > > dec/enc irq handler, which means that the waitting would
> > > > > > > result mtk
> > > > > > > hw
> > > > > > > to finish dec/enc, irq will occur and irq handler would
> > > > > > > cancel
> > > > > > > timeout
> > > > > > > worker. The follow is shown as blow.
> > > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mt
> > > > > > > k_jp
> > > > > > > eg_dec
> > > > > > > _irq
> > > > > > > wait evnet <------ wake up ------
> > > > > > > v4l2_m2m_job_finish
> > > > > > > cancel timeout
> > > > > > > work
> > > > > > >
> > > > > > > As mentioned above, if it is normal stop streaming
> > > > > > > action,
> > > > > > > there
> > > > > > > will
> > > > > > > be no happen that the timeout worker does not canceled.
> > > > > > >
> > > > > > > But if mtk_jpeg_remove is called directly without above
> > > > > > > flow,
> > > > > > > it
> > > > > > > would
> > > > > > > cause lots of issues.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kyrie.
> > > > > > > >
> > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > 10:02写道:
> > > > > > > > >
> > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker
> > > > > > > > > wrote:
> > > > > > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > > And the worker is started in
> > > > > > > > > > mtk_jpeg_dec_device_run.
> > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > > which
> > > > > > > > > > may cancel the worker.
> > > > > > > > > > They are used as IRQ handler function which is
> > > > > > > > > > saved as
> > > > > > > > > > function
> > > > > > > > > > pointer in a variable.
> > > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > > devm_request_irq:
> > > > > > > > > >
> > > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > > jpeg_irq,
> > > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > > 0,
> > > > > > > > > > pdev->name, jpeg);
> > > > > > > > > > if (ret) {
> > > > > > > > > > dev_err(&pdev->dev, "Failed to request
> > > > > > > > > > jpeg_irq
> > > > > > > > > > %d
> > > > > > > > > > (%d)\n",
> > > > > > > > > > jpeg_irq, ret);
> > > > > > > > > > return ret;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > However, if we remove the module without triggering
> > > > > > > > > > the
> > > > > > > > > > irq,
> > > > > > > > > > the
> > > > > > > > > > worker will never be removed.
> > > > > > > > > >
> > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > > The schedule invoking is quite complicated. As far
> > > > > > > > > > as I
> > > > > > > > > > know,
> > > > > > > > > > the
> > > > > > > > > > invoking chain is as follows:
> > > > > > > > > >
> > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > > > v4l2_m2m_try_run
> > > > > > > > > >
> > > > > > > > > > the v4l2_m2m_device_run_work function is also a
> > > > > > > > > > worker
> > > > > > > > > > which
> > > > > > > > > > is
> > > > > > > > > > set
> > > > > > > > > > in
> > > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > > >
> > > > > > > > > > Before calling remove function,
> > > > > > > > > > the mtk_jpeg_release
> > > > > > > > > > was
> > > > > > > > > > invoked
> > > > > > > > > > to
> > > > > > > > > > release the related resource.
> > > > > > > > > >
> > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > > >
> > > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > > >
> > > > > > > > > > I think this can not cancel the posted task
> > > > > > > > > > mentioned
> > > > > > > > > > before.
> > > > > > > > > > So
> > > > > > > > > > I
> > > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > > >
> > > > > > > > > > is working on, and using jpeg->m2m_dev after
> > > > > > > > > > freeing it
> > > > > > > > > > in
> > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > > >
> > > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > > {
> > > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > > > > > struct mtk_jpeg_ctx *ctx =
> > > > > > > > > > mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > > private_data);
> > > > > > > > > >
> > > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > > kfree(ctx);
> > > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > > return 0;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx
> > > > > > > > > > *m2m_ctx)
> > > > > > > > > > {
> > > > > > > > > > /* wait until the current context is dequeued
> > > > > > > > > > from
> > > > > > > > > > job_queue */
> > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > > >
> > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > > >
> > > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > Note that all of this is static analysis, which may
> > > > > > > > > > be
> > > > > > > > > > false
> > > > > > > > > > positive.
> > > > > > > > > > Feel free to tell me if there is something I've
> > > > > > > > > > missed.
> > > > > > > > > >
> > > > > > > > > > Regard,
> > > > > > > > > > Zheng
> > > > > > > > >
> > > > > > > > > Dear Zheng,
> > > > > > > > >
> > > > > > > > > You set up an application scenario:
> > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work
> > > > > > > > > has
> > > > > > > > > been
> > > > > > > > > scheduled.
> > > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg
> > > > > > > > > driver,
> > > > > > > > > which
> > > > > > > > > caused the UAF bug.
> > > > > > > > > I wonder if such an irrational application scenario
> > > > > > > > > could
> > > > > > > > > exist.
> > > > > > > > > This
> > > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > > problems you
> > > > > > > > > mentioned, but also to output&capture memory leaks
> > > > > > > > > and
> > > > > > > > > unreleased
> > > > > > > > > resources, such as spinlock.
> > > > > > > > > Typically, if we want to remove the driver, we
> > > > > > > > > firstly do
> > > > > > > > > stop
> > > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > > canceled.
> > > > > > > > > I agree with your changes from the perspective of
> > > > > > > > > strengthening
> > > > > > > > > the
> > > > > > > > > robustness of the driver code.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Kyrie.
> > > > > > > > > >
> > > > > > > > > > Irui Wang (王瑞) <[email protected]>
> > > > > > > > > > 于2023年3月7日周二
> > > > > > > > > > 18:23写道:
> > > > > > > > > > >
> > > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > > >
> > > > > > > > > > > Dear Kyrie,
> > > > > > > > > > >
> > > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > > >
> > > > > > > > > > > Best Regards
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100,
> > > > > > > > > > > AngeloGioacchino
> > > > > > > > > > > Del
> > > > > > > > > > > Regno
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is there anyone who can help with this? I can
> > > > > > > > > > > > > provide
> > > > > > > > > > > > > more
> > > > > > > > > > > > > details
> > > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Providing more details is always good. Please
> > > > > > > > > > > > do.
> > > > > > > > > > > >
> > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's
> > > > > > > > > > > > doing
> > > > > > > > > > > > mtk-
> > > > > > > > > > > > jpeg.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Angelo
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Zheng
> > > > > > > > > > > > >
> > > > > > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一
> > > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > bound
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > start
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > to make cleanup, there may be a unfinished
> > > > > > > > > > > > > > work. The
> > > > > > > > > > > > > > possible sequence is as follows, which will
> > > > > > > > > > > > > > cause a
> > > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Fix it by canceling the work before cleanup
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > |mtk_jpeg_job_timeout_
> > > > > > > > > > > > > > work
> > > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > > |
> > > > > > > > > > > > > > |
> > > > > > > > > > > > > > v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > > | m2m_dev->curr_ctx
> > > > > > > > > > > > > > //use
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Zheng Wang <
> > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jp
> > > > > > > > > > > > > > eg_c
> > > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1
> > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > c
> > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > c
> > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > c
> > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > c
> > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > > > job_timeout_work);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > > > > > video_unregister_device(jpeg-
> > > > > > > > > > > > > > >vdev);
> > > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >

2023-05-15 03:03:47

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi Kyrie,

I think your analysis is right, I'll make the change in the next version.

Best regards,
Zheng

Kyrie Wu (吴晗) <[email protected]> 于2023年5月12日周五 09:24写道:
>
> On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote:
> > Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
> > >
> > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > > > Dear Kyrie,
> > > >
> > > > Guenter gave some advice about this issue in [1]. As we are not
> > > > familiar with the code here, could you please see if the bug
> > > > really
> > > > exists?
> > > >
> > > > [1]
> > > >
> https://lore.kernel.org/all/[email protected]/
> > > >
> > > > Best regards,
> > > > Zheng
> > >
> > > Dear zheng,
> > >
> > > I have no more questions for your patch, thank you for your
> > > contribution to mtk jpeg driver.
> > >
> >
> > Thanks for your detailed review and explanation about the code. I
> > have
> > learned a lot
> > from it. Have a nice day :)
> >
> > Best regards,
> > Zheng
> Dear Zheng,
>
> I think that cancel_delayed_work, added in remove function, should
> change to cancel_delayed_work_sync to solve UAF. The corresponding
> operation flow is shown as follow:
> CPU0 CPU1
> |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove |
> cancel_delayed_work |
> v4l2_m2m_release |
> kfree(m2m_dev); |
> |
> | v4l2_m2m_get_curr_priv
> | m2m_dev->curr_ctx //use
>
> If cancel_delayed_work is called in async mode like above, and the
> m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this
> calling may cause UAF.
>
> Thanks.
> >
> > > >
> > > >
> > > >
> > > > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > > > >
> > > > > Dear Kyrie,
> > > > >
> > > > > Sorry for my misunderstanding. I've seen the timeout worker is
> > > > > bound
> > > > > with mtk_jpegdec_timeout_work rather than
> > > > > mtk_jpeg_job_timeout_work.So
> > > > > the competition won't happen.
> > > > >
> > > > > Back to the beginning scene, I still don't know if it's a UAF
> > > > > issue
> > > > > or
> > > > > not. The normal schedule is very strong to me. If we can call
> > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > > > The UAF is still possible. Otherwis, I think it's safe here.
> > > > >
> > > > > Best regards,
> > > > > Zheng
> > > > >
> > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > > > >
> > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > > > Hi Kyrie,
> > > > > > >
> > > > > > > After reviewing the code, I found anothe possible code
> > > > > > > path. As
> > > > > > > I am
> > > > > > > not familiar with the module. It has high possibility it's
> > > > > > > wrong.
> > > > > > > Could please help me check this? Very much appreciated for
> > > > > > > your
> > > > > > > valuable time.
> > > > > > >
> > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and
> > > > > > > started
> > > > > > > in
> > > > > > > mtk_jpeg_multicore_dec_device_run, which made it running on
> > > > > > > cpu1.
> > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call
> > > > > > > schedule_delayed_work to start the timeout_work, which
> > > > > > > will
> > > > > > > make it
> > > > > > > running on cpu2. Meanwhile, we can call
> > > > > > > mtk_jpeg_release to cancel the job. But there might be a
> > > > > > > race
> > > > > > > between
> > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > > > The remove will go on, the other race is as described
> > > > > > > earlier.
> > > > > > >
> > > > > > > cpu0 cpu1 cpu2
> > > > > > > (1)->device_run
> > > > > > > mtk_jpeg_multicore
> > > > > > > _dec_device_run
> > > > > > > queue_work
> > > > > > > (jpeg->workqueue,
> > > > > > > &ctx->jpeg_work);
> > > > > > > (2)mtk_jpegdec_worker
> > > > > > > (3)mtk_jpeg_release
> > > > > > > v4l2_m2m_cancel_job
> > > > > > > wait event
> > > > > > >
> > > > > > > schedule_delayed_work
> > > > > > > (4)mtk_jpeg_job
> > > > > > > _tim
> > > > > > > eout_w
> > > > > > > ork
> > > > > > > (5)v4l2_m2m_job_finish
> > > > > > > wake up
> > > > > > > (6)mtk_jpeg_remove
> > > > > > > v4l2_m2m_release
> > > > > > > kfree(m2m_dev)
> > > > > > > (7)v4l2_m2m_get
> > > > > > > _cur
> > > > > > > r_priv
> > > > > > >
> > > > > >
> > > > > > Dear zheng,
> > > > > >
> > > > > > The mtk_jpeg_multicore_dec_device_run function is used for
> > > > > > multi-
> > > > > > hw
> > > > > > jpeg decoding. Instead of scheduling
> > > > > > mtk_jpeg_job_timeout_work,
> > > > > > mtk_jpegdec_worker is scheduled in this function.
> > > > > >
> > > > > > The mtk_jpeg_dec_device_run function is used for single hw
> > > > > > jpeg
> > > > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > > > >
> > > > > > A driver is either a single hw driver or a multi-hw driver
> > > > > > and
> > > > > > cannot
> > > > > > represent both at the same time.
> > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be
> > > > > > scheduled at
> > > > > > the same time.
> > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would
> > > > > > not
> > > > > > cause
> > > > > > competition between the mtk_jpegdec_worker and
> > > > > > v4l2_m2m_cancel_job.
> > > > > >
> > > > > > Regards,
> > > > > > Kyrie.
> > > > > >
> > > > > >
> > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 11:32写道:
> > > > > > > >
> > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote:
> > > > > > > > > Hi Kyrie,
> > > > > > > > >
> > > > > > > > > Thank you for your careful analysis and response. I
> > > > > > > > > still
> > > > > > > > > have
> > > > > > > > > some
> > > > > > > > > areas that I don't quite understand and would like to
> > > > > > > > > ask
> > > > > > > > > for
> > > > > > > > > clarification. That is, how do the function pointers
> > > > > > > > > for
> > > > > > > > > stop
> > > > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming
> > > > > > > > > and
> > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is
> > > > > > > > > canceled?
> > > > > > > > > I
> > > > > > > > > would greatly appreciate your response.
> > > > > > > > >
> > > > > > > > > Best regards,
> > > > > > > > > Zheng
> > > > > > > >
> > > > > > > > Dear zheng,
> > > > > > > >
> > > > > > > > For stop streaming, what I mean is that stoppping jpeg
> > > > > > > > decoding or
> > > > > > > > encoding.
> > > > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff,
> > > > > > > > which
> > > > > > > > will
> > > > > > > > call
> > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you
> > > > > > > > note,
> > > > > > > > cpu1 is
> > > > > > > > running), it will wait event, the event is wake up by
> > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is
> > > > > > > > called by
> > > > > > > > jpeg
> > > > > > > > dec/enc irq handler, which means that the waitting would
> > > > > > > > result mtk
> > > > > > > > hw
> > > > > > > > to finish dec/enc, irq will occur and irq handler would
> > > > > > > > cancel
> > > > > > > > timeout
> > > > > > > > worker. The follow is shown as blow.
> > > > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mt
> > > > > > > > k_jp
> > > > > > > > eg_dec
> > > > > > > > _irq
> > > > > > > > wait evnet <------ wake up ------
> > > > > > > > v4l2_m2m_job_finish
> > > > > > > > cancel timeout
> > > > > > > > work
> > > > > > > >
> > > > > > > > As mentioned above, if it is normal stop streaming
> > > > > > > > action,
> > > > > > > > there
> > > > > > > > will
> > > > > > > > be no happen that the timeout worker does not canceled.
> > > > > > > >
> > > > > > > > But if mtk_jpeg_remove is called directly without above
> > > > > > > > flow,
> > > > > > > > it
> > > > > > > > would
> > > > > > > > cause lots of issues.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Kyrie.
> > > > > > > > >
> > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > 10:02写道:
> > > > > > > > > >
> > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker
> > > > > > > > > > wrote:
> > > > > > > > > > > The timer function was set in mtk_jpeg_probe with
> > > > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > > > And the worker is started in
> > > > > > > > > > > mtk_jpeg_dec_device_run.
> > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > > > which
> > > > > > > > > > > may cancel the worker.
> > > > > > > > > > > They are used as IRQ handler function which is
> > > > > > > > > > > saved as
> > > > > > > > > > > function
> > > > > > > > > > > pointer in a variable.
> > > > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > > > devm_request_irq:
> > > > > > > > > > >
> > > > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > > > jpeg_irq,
> > > > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > > > 0,
> > > > > > > > > > > pdev->name, jpeg);
> > > > > > > > > > > if (ret) {
> > > > > > > > > > > dev_err(&pdev->dev, "Failed to request
> > > > > > > > > > > jpeg_irq
> > > > > > > > > > > %d
> > > > > > > > > > > (%d)\n",
> > > > > > > > > > > jpeg_irq, ret);
> > > > > > > > > > > return ret;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > However, if we remove the module without triggering
> > > > > > > > > > > the
> > > > > > > > > > > irq,
> > > > > > > > > > > the
> > > > > > > > > > > worker will never be removed.
> > > > > > > > > > >
> > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and
> > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > > > The schedule invoking is quite complicated. As far
> > > > > > > > > > > as I
> > > > > > > > > > > know,
> > > > > > > > > > > the
> > > > > > > > > > > invoking chain is as follows:
> > > > > > > > > > >
> > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > > > > v4l2_m2m_try_run
> > > > > > > > > > >
> > > > > > > > > > > the v4l2_m2m_device_run_work function is also a
> > > > > > > > > > > worker
> > > > > > > > > > > which
> > > > > > > > > > > is
> > > > > > > > > > > set
> > > > > > > > > > > in
> > > > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > > > >
> > > > > > > > > > > Before calling remove function,
> > > > > > > > > > > the mtk_jpeg_release
> > > > > > > > > > > was
> > > > > > > > > > > invoked
> > > > > > > > > > > to
> > > > > > > > > > > release the related resource.
> > > > > > > > > > >
> > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling
> > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > > > >
> > > > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > > > >
> > > > > > > > > > > I think this can not cancel the posted task
> > > > > > > > > > > mentioned
> > > > > > > > > > > before.
> > > > > > > > > > > So
> > > > > > > > > > > I
> > > > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > > > >
> > > > > > > > > > > is working on, and using jpeg->m2m_dev after
> > > > > > > > > > > freeing it
> > > > > > > > > > > in
> > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > > > >
> > > > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > > > {
> > > > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file);
> > > > > > > > > > > struct mtk_jpeg_ctx *ctx =
> > > > > > > > > > > mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > > > private_data);
> > > > > > > > > > >
> > > > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > > > kfree(ctx);
> > > > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > > > return 0;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx
> > > > > > > > > > > *m2m_ctx)
> > > > > > > > > > > {
> > > > > > > > > > > /* wait until the current context is dequeued
> > > > > > > > > > > from
> > > > > > > > > > > job_queue */
> > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > > > >
> > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > > > >
> > > > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > Note that all of this is static analysis, which may
> > > > > > > > > > > be
> > > > > > > > > > > false
> > > > > > > > > > > positive.
> > > > > > > > > > > Feel free to tell me if there is something I've
> > > > > > > > > > > missed.
> > > > > > > > > > >
> > > > > > > > > > > Regard,
> > > > > > > > > > > Zheng
> > > > > > > > > >
> > > > > > > > > > Dear Zheng,
> > > > > > > > > >
> > > > > > > > > > You set up an application scenario:
> > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work
> > > > > > > > > > has
> > > > > > > > > > been
> > > > > > > > > > scheduled.
> > > > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg
> > > > > > > > > > driver,
> > > > > > > > > > which
> > > > > > > > > > caused the UAF bug.
> > > > > > > > > > I wonder if such an irrational application scenario
> > > > > > > > > > could
> > > > > > > > > > exist.
> > > > > > > > > > This
> > > > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > > > problems you
> > > > > > > > > > mentioned, but also to output&capture memory leaks
> > > > > > > > > > and
> > > > > > > > > > unreleased
> > > > > > > > > > resources, such as spinlock.
> > > > > > > > > > Typically, if we want to remove the driver, we
> > > > > > > > > > firstly do
> > > > > > > > > > stop
> > > > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > > > canceled.
> > > > > > > > > > I agree with your changes from the perspective of
> > > > > > > > > > strengthening
> > > > > > > > > > the
> > > > > > > > > > robustness of the driver code.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Kyrie.
> > > > > > > > > > >
> > > > > > > > > > > Irui Wang (王瑞) <[email protected]>
> > > > > > > > > > > 于2023年3月7日周二
> > > > > > > > > > > 18:23写道:
> > > > > > > > > > > >
> > > > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > > > >
> > > > > > > > > > > > Dear Kyrie,
> > > > > > > > > > > >
> > > > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100,
> > > > > > > > > > > > AngeloGioacchino
> > > > > > > > > > > > Del
> > > > > > > > > > > > Regno
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is there anyone who can help with this? I can
> > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > more
> > > > > > > > > > > > > > details
> > > > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Providing more details is always good. Please
> > > > > > > > > > > > > do.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's
> > > > > > > > > > > > > doing
> > > > > > > > > > > > > mtk-
> > > > > > > > > > > > > jpeg.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Angelo
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Zheng
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Zheng Wang <[email protected]> 于2023年3月6日周一
> > > > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > bound
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > start
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > to make cleanup, there may be a unfinished
> > > > > > > > > > > > > > > work. The
> > > > > > > > > > > > > > > possible sequence is as follows, which will
> > > > > > > > > > > > > > > cause a
> > > > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fix it by canceling the work before cleanup
> > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_
> > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > > > | m2m_dev->curr_ctx
> > > > > > > > > > > > > > > //use
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jp
> > > > > > > > > > > > > > > eg_c
> > > > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1
> > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_
> > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > > > > job_timeout_work);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > pm_runtime_disable(&pdev->dev);
> > > > > > > > > > > > > > > video_unregister_device(jpeg-
> > > > > > > > > > > > > > > >vdev);
> > > > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev);
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >

2023-06-15 02:56:36

by Kyrie Wu (吴晗)

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>
>
> Hi Kyrie,
>
> I think your analysis is right, I'll make the change in the next
> version.
>
> Best regards,
> Zheng
>
Dear Zheng,

Could you please update the next version at any convenient time?

Thanks.

Bset Regards,
Kyrie.

> Kyrie Wu (吴晗) <[email protected]> 于2023年5月12日周五 09:24写道:
> >
> > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote:
> > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
> > > >
> > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > > > > Dear Kyrie,
> > > > >
> > > > > Guenter gave some advice about this issue in [1]. As we are
> > > > > not
> > > > > familiar with the code here, could you please see if the bug
> > > > > really
> > > > > exists?
> > > > >
> > > > > [1]
> > > > >
> >
> >
https://lore.kernel.org/all/[email protected]/
> > > > >
> > > > > Best regards,
> > > > > Zheng
> > > >
> > > > Dear zheng,
> > > >
> > > > I have no more questions for your patch, thank you for your
> > > > contribution to mtk jpeg driver.
> > > >
> > >
> > > Thanks for your detailed review and explanation about the code. I
> > > have
> > > learned a lot
> > > from it. Have a nice day :)
> > >
> > > Best regards,
> > > Zheng
> >
> > Dear Zheng,
> >
> > I think that cancel_delayed_work, added in remove function, should
> > change to cancel_delayed_work_sync to solve UAF. The corresponding
> > operation flow is shown as follow:
> > CPU0 CPU1
> > |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove |
> > cancel_delayed_work |
> > v4l2_m2m_release |
> > kfree(m2m_dev); |
> > |
> > | v4l2_m2m_get_curr_priv
> > | m2m_dev->curr_ctx //use
> >
> > If cancel_delayed_work is called in async mode like above, and the
> > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this
> > calling may cause UAF.
> >
> > Thanks.
> > >
> > > > >
> > > > >
> > > > >
> > > > > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > > > > >
> > > > > > Dear Kyrie,
> > > > > >
> > > > > > Sorry for my misunderstanding. I've seen the timeout worker
> > > > > > is
> > > > > > bound
> > > > > > with mtk_jpegdec_timeout_work rather than
> > > > > > mtk_jpeg_job_timeout_work.So
> > > > > > the competition won't happen.
> > > > > >
> > > > > > Back to the beginning scene, I still don't know if it's a
> > > > > > UAF
> > > > > > issue
> > > > > > or
> > > > > > not. The normal schedule is very strong to me. If we can
> > > > > > call
> > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > > > > The UAF is still possible. Otherwis, I think it's safe
> > > > > > here.
> > > > > >
> > > > > > Best regards,
> > > > > > Zheng
> > > > > >
> > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > > > > >
> > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > > > > Hi Kyrie,
> > > > > > > >
> > > > > > > > After reviewing the code, I found anothe possible code
> > > > > > > > path. As
> > > > > > > > I am
> > > > > > > > not familiar with the module. It has high possibility
> > > > > > > > it's
> > > > > > > > wrong.
> > > > > > > > Could please help me check this? Very much appreciated
> > > > > > > > for
> > > > > > > > your
> > > > > > > > valuable time.
> > > > > > > >
> > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open
> > > > > > > > and
> > > > > > > > started
> > > > > > > > in
> > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it
> > > > > > > > running on
> > > > > > > > cpu1.
> > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will
> > > > > > > > call
> > > > > > > > schedule_delayed_work to start the timeout_work, which
> > > > > > > > will
> > > > > > > > make it
> > > > > > > > running on cpu2. Meanwhile, we can call
> > > > > > > > mtk_jpeg_release to cancel the job. But there might be
> > > > > > > > a
> > > > > > > > race
> > > > > > > > between
> > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > > > > The remove will go on, the other race is as described
> > > > > > > > earlier.
> > > > > > > >
> > > > > > > > cpu0 cpu1 cpu2
> > > > > > > > (1)->device_run
> > > > > > > > mtk_jpeg_multicore
> > > > > > > > _dec_device_run
> > > > > > > > queue_work
> > > > > > > > (jpeg->workqueue,
> > > > > > > > &ctx->jpeg_work);
> > > > > > > > (2)mtk_jpegdec_worker
> > > > > > > > (3)mtk_jpeg_release
> > > > > > > > v4l2_m2m_cancel_job
> > > > > > > > wait event
> > > > > > > >
> > > > > > > > schedule_delayed_work
> > > > > > > > (4)mtk_jpeg
> > > > > > > > _job
> > > > > > > > _tim
> > > > > > > > eout_w
> > > > > > > > ork
> > > > > > > > (5)v4l2_m2m_job_finish
> > > > > > > > wake up
> > > > > > > > (6)mtk_jpeg_remove
> > > > > > > > v4l2_m2m_release
> > > > > > > > kfree(m2m_dev)
> > > > > > > > (7)v4l2_m2m
> > > > > > > > _get
> > > > > > > > _cur
> > > > > > > > r_priv
> > > > > > > >
> > > > > > >
> > > > > > > Dear zheng,
> > > > > > >
> > > > > > > The mtk_jpeg_multicore_dec_device_run function is used
> > > > > > > for
> > > > > > > multi-
> > > > > > > hw
> > > > > > > jpeg decoding. Instead of scheduling
> > > > > > > mtk_jpeg_job_timeout_work,
> > > > > > > mtk_jpegdec_worker is scheduled in this function.
> > > > > > >
> > > > > > > The mtk_jpeg_dec_device_run function is used for single
> > > > > > > hw
> > > > > > > jpeg
> > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > > > > >
> > > > > > > A driver is either a single hw driver or a multi-hw
> > > > > > > driver
> > > > > > > and
> > > > > > > cannot
> > > > > > > represent both at the same time.
> > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot
> > > > > > > be
> > > > > > > scheduled at
> > > > > > > the same time.
> > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish
> > > > > > > would
> > > > > > > not
> > > > > > > cause
> > > > > > > competition between the mtk_jpegdec_worker and
> > > > > > > v4l2_m2m_cancel_job.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kyrie.
> > > > > > >
> > > > > > >
> > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > 11:32写道:
> > > > > > > > >
> > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker
> > > > > > > > > wrote:
> > > > > > > > > > Hi Kyrie,
> > > > > > > > > >
> > > > > > > > > > Thank you for your careful analysis and response. I
> > > > > > > > > > still
> > > > > > > > > > have
> > > > > > > > > > some
> > > > > > > > > > areas that I don't quite understand and would like
> > > > > > > > > > to
> > > > > > > > > > ask
> > > > > > > > > > for
> > > > > > > > > > clarification. That is, how do the function
> > > > > > > > > > pointers
> > > > > > > > > > for
> > > > > > > > > > stop
> > > > > > > > > > streaming, initialized as
> > > > > > > > > > mtk_jpeg_enc_stop_streaming
> > > > > > > > > > and
> > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker
> > > > > > > > > > is
> > > > > > > > > > canceled?
> > > > > > > > > > I
> > > > > > > > > > would greatly appreciate your response.
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Zheng
> > > > > > > > >
> > > > > > > > > Dear zheng,
> > > > > > > > >
> > > > > > > > > For stop streaming, what I mean is that stoppping
> > > > > > > > > jpeg
> > > > > > > > > decoding or
> > > > > > > > > encoding.
> > > > > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff,
> > > > > > > > > which
> > > > > > > > > will
> > > > > > > > > call
> > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you
> > > > > > > > > note,
> > > > > > > > > cpu1 is
> > > > > > > > > running), it will wait event, the event is wake up by
> > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish
> > > > > > > > > is
> > > > > > > > > called by
> > > > > > > > > jpeg
> > > > > > > > > dec/enc irq handler, which means that the waitting
> > > > > > > > > would
> > > > > > > > > result mtk
> > > > > > > > > hw
> > > > > > > > > to finish dec/enc, irq will occur and irq handler
> > > > > > > > > would
> > > > > > > > > cancel
> > > > > > > > > timeout
> > > > > > > > > worker. The follow is shown as blow.
> > > > > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir
> > > > > > > > > q/mt
> > > > > > > > > k_jp
> > > > > > > > > eg_dec
> > > > > > > > > _irq
> > > > > > > > > wait evnet <------ wake up ------
> > > > > > > > > v4l2_m2m_job_finish
> > > > > > > > > cancel
> > > > > > > > > timeout
> > > > > > > > > work
> > > > > > > > >
> > > > > > > > > As mentioned above, if it is normal stop streaming
> > > > > > > > > action,
> > > > > > > > > there
> > > > > > > > > will
> > > > > > > > > be no happen that the timeout worker does not
> > > > > > > > > canceled.
> > > > > > > > >
> > > > > > > > > But if mtk_jpeg_remove is called directly without
> > > > > > > > > above
> > > > > > > > > flow,
> > > > > > > > > it
> > > > > > > > > would
> > > > > > > > > cause lots of issues.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Kyrie.
> > > > > > > > > >
> > > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > > 10:02写道:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker
> > > > > > > > > > > wrote:
> > > > > > > > > > > > The timer function was set in mtk_jpeg_probe
> > > > > > > > > > > > with
> > > > > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > > > > And the worker is started in
> > > > > > > > > > > > mtk_jpeg_dec_device_run.
> > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > > > > which
> > > > > > > > > > > > may cancel the worker.
> > > > > > > > > > > > They are used as IRQ handler function which is
> > > > > > > > > > > > saved as
> > > > > > > > > > > > function
> > > > > > > > > > > > pointer in a variable.
> > > > > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > > > > devm_request_irq:
> > > > > > > > > > > >
> > > > > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > > > > jpeg_irq,
> > > > > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > > > > 0,
> > > > > > > > > > > > pdev->name, jpeg);
> > > > > > > > > > > > if (ret) {
> > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request
> > > > > > > > > > > > jpeg_irq
> > > > > > > > > > > > %d
> > > > > > > > > > > > (%d)\n",
> > > > > > > > > > > > jpeg_irq, ret);
> > > > > > > > > > > > return ret;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > However, if we remove the module without
> > > > > > > > > > > > triggering
> > > > > > > > > > > > the
> > > > > > > > > > > > irq,
> > > > > > > > > > > > the
> > > > > > > > > > > > worker will never be removed.
> > > > > > > > > > > >
> > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run
> > > > > > > > > > > > and
> > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > > > > The schedule invoking is quite complicated. As
> > > > > > > > > > > > far
> > > > > > > > > > > > as I
> > > > > > > > > > > > know,
> > > > > > > > > > > > the
> > > > > > > > > > > > invoking chain is as follows:
> > > > > > > > > > > >
> > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > > > > > v4l2_m2m_try_run
> > > > > > > > > > > >
> > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a
> > > > > > > > > > > > worker
> > > > > > > > > > > > which
> > > > > > > > > > > > is
> > > > > > > > > > > > set
> > > > > > > > > > > > in
> > > > > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > > > > >
> > > > > > > > > > > > Before calling remove function,
> > > > > > > > > > > > the mtk_jpeg_release
> > > > > > > > > > > > was
> > > > > > > > > > > > invoked
> > > > > > > > > > > > to
> > > > > > > > > > > > release the related resource.
> > > > > > > > > > > >
> > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by
> > > > > > > > > > > > calling
> > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > > > > >
> > > > > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > > > > >
> > > > > > > > > > > > I think this can not cancel the posted task
> > > > > > > > > > > > mentioned
> > > > > > > > > > > > before.
> > > > > > > > > > > > So
> > > > > > > > > > > > I
> > > > > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > > > > >
> > > > > > > > > > > > is working on, and using jpeg->m2m_dev after
> > > > > > > > > > > > freeing it
> > > > > > > > > > > > in
> > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > > > > >
> > > > > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > > > > {
> > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > video_drvdata(file);
> > > > > > > > > > > > struct mtk_jpeg_ctx *ctx =
> > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > > > > private_data);
> > > > > > > > > > > >
> > > > > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > > > > kfree(ctx);
> > > > > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > > > > return 0;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx
> > > > > > > > > > > > *m2m_ctx)
> > > > > > > > > > > > {
> > > > > > > > > > > > /* wait until the current context is dequeued
> > > > > > > > > > > > from
> > > > > > > > > > > > job_queue */
> > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > > > > >
> > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > > > > >
> > > > > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > Note that all of this is static analysis, which
> > > > > > > > > > > > may
> > > > > > > > > > > > be
> > > > > > > > > > > > false
> > > > > > > > > > > > positive.
> > > > > > > > > > > > Feel free to tell me if there is something I've
> > > > > > > > > > > > missed.
> > > > > > > > > > > >
> > > > > > > > > > > > Regard,
> > > > > > > > > > > > Zheng
> > > > > > > > > > >
> > > > > > > > > > > Dear Zheng,
> > > > > > > > > > >
> > > > > > > > > > > You set up an application scenario:
> > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout
> > > > > > > > > > > work
> > > > > > > > > > > has
> > > > > > > > > > > been
> > > > > > > > > > > scheduled.
> > > > > > > > > > > At the same time cpu0 wanted to remove the mtk-
> > > > > > > > > > > jpeg
> > > > > > > > > > > driver,
> > > > > > > > > > > which
> > > > > > > > > > > caused the UAF bug.
> > > > > > > > > > > I wonder if such an irrational application
> > > > > > > > > > > scenario
> > > > > > > > > > > could
> > > > > > > > > > > exist.
> > > > > > > > > > > This
> > > > > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > > > > problems you
> > > > > > > > > > > mentioned, but also to output&capture memory
> > > > > > > > > > > leaks
> > > > > > > > > > > and
> > > > > > > > > > > unreleased
> > > > > > > > > > > resources, such as spinlock.
> > > > > > > > > > > Typically, if we want to remove the driver, we
> > > > > > > > > > > firstly do
> > > > > > > > > > > stop
> > > > > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > > > > canceled.
> > > > > > > > > > > I agree with your changes from the perspective of
> > > > > > > > > > > strengthening
> > > > > > > > > > > the
> > > > > > > > > > > robustness of the driver code.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Kyrie.
> > > > > > > > > > > >
> > > > > > > > > > > > Irui Wang (王瑞) <[email protected]>
> > > > > > > > > > > > 于2023年3月7日周二
> > > > > > > > > > > > 18:23写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Dear Kyrie,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100,
> > > > > > > > > > > > > AngeloGioacchino
> > > > > > > > > > > > > Del
> > > > > > > > > > > > > Regno
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is there anyone who can help with this? I
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > details
> > > > > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Providing more details is always good.
> > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > do.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop:
> > > > > > > > > > > > > > he's
> > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > mtk-
> > > > > > > > > > > > > > jpeg.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > Angelo
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Zheng
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Zheng Wang <[email protected]>
> > > > > > > > > > > > > > > 于2023年3月6日周一
> > > > > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg-
> > > > > > > > > > > > > > > > >job_timeout_work
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > bound
> > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be
> > > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > start
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > > to make cleanup, there may be a
> > > > > > > > > > > > > > > > unfinished
> > > > > > > > > > > > > > > > work. The
> > > > > > > > > > > > > > > > possible sequence is as follows, which
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > cause a
> > > > > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Fix it by canceling the work before
> > > > > > > > > > > > > > > > cleanup
> > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > |mtk_jpeg_job_time
> > > > > > > > > > > > > > > > out_
> > > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > > > > | m2m_dev-
> > > > > > > > > > > > > > > > >curr_ctx
> > > > > > > > > > > > > > > > //use
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <
> > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt
> > > > > > > > > > > > > > > > k_jp
> > > > > > > > > > > > > > > > eg_c
> > > > > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1
> > > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > > > > > job_timeout_work);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > pm_runtime_disable(&pdev-
> > > > > > > > > > > > > > > > >dev);
> > > > > > > > > > > > > > > > video_unregister_device(jpeg-
> > > > > > > > > > > > > > > > > vdev);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg-
> > > > > > > > > > > > > > > > >m2m_dev);
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >

2023-06-15 03:07:03

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hi Kyrie,

Thanks for your kind reminder. I'll write the patch as soon as I can.

Best regards,
Zheng Wang

Kyrie Wu (吴晗) <[email protected]> 于2023年6月15日周四 10:40写道:
>
> On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote:
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >
> >
> > Hi Kyrie,
> >
> > I think your analysis is right, I'll make the change in the next
> > version.
> >
> > Best regards,
> > Zheng
> >
> Dear Zheng,
>
> Could you please update the next version at any convenient time?
>
> Thanks.
>
> Bset Regards,
> Kyrie.
>
> > Kyrie Wu (吴晗) <[email protected]> 于2023年5月12日周五 09:24写道:
> > >
> > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote:
> > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
> > > > >
> > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > > > > > Dear Kyrie,
> > > > > >
> > > > > > Guenter gave some advice about this issue in [1]. As we are
> > > > > > not
> > > > > > familiar with the code here, could you please see if the bug
> > > > > > really
> > > > > > exists?
> > > > > >
> > > > > > [1]
> > > > > >
> > >
> > >
> https://lore.kernel.org/all/[email protected]/
> > > > > >
> > > > > > Best regards,
> > > > > > Zheng
> > > > >
> > > > > Dear zheng,
> > > > >
> > > > > I have no more questions for your patch, thank you for your
> > > > > contribution to mtk jpeg driver.
> > > > >
> > > >
> > > > Thanks for your detailed review and explanation about the code. I
> > > > have
> > > > learned a lot
> > > > from it. Have a nice day :)
> > > >
> > > > Best regards,
> > > > Zheng
> > >
> > > Dear Zheng,
> > >
> > > I think that cancel_delayed_work, added in remove function, should
> > > change to cancel_delayed_work_sync to solve UAF. The corresponding
> > > operation flow is shown as follow:
> > > CPU0 CPU1
> > > |mtk_jpeg_job_timeout_work
> > > mtk_jpeg_remove |
> > > cancel_delayed_work |
> > > v4l2_m2m_release |
> > > kfree(m2m_dev); |
> > > |
> > > | v4l2_m2m_get_curr_priv
> > > | m2m_dev->curr_ctx //use
> > >
> > > If cancel_delayed_work is called in async mode like above, and the
> > > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this
> > > calling may cause UAF.
> > >
> > > Thanks.
> > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > > > > > >
> > > > > > > Dear Kyrie,
> > > > > > >
> > > > > > > Sorry for my misunderstanding. I've seen the timeout worker
> > > > > > > is
> > > > > > > bound
> > > > > > > with mtk_jpegdec_timeout_work rather than
> > > > > > > mtk_jpeg_job_timeout_work.So
> > > > > > > the competition won't happen.
> > > > > > >
> > > > > > > Back to the beginning scene, I still don't know if it's a
> > > > > > > UAF
> > > > > > > issue
> > > > > > > or
> > > > > > > not. The normal schedule is very strong to me. If we can
> > > > > > > call
> > > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > > > > > The UAF is still possible. Otherwis, I think it's safe
> > > > > > > here.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Zheng
> > > > > > >
> > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > > > > > >
> > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > > > > > Hi Kyrie,
> > > > > > > > >
> > > > > > > > > After reviewing the code, I found anothe possible code
> > > > > > > > > path. As
> > > > > > > > > I am
> > > > > > > > > not familiar with the module. It has high possibility
> > > > > > > > > it's
> > > > > > > > > wrong.
> > > > > > > > > Could please help me check this? Very much appreciated
> > > > > > > > > for
> > > > > > > > > your
> > > > > > > > > valuable time.
> > > > > > > > >
> > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open
> > > > > > > > > and
> > > > > > > > > started
> > > > > > > > > in
> > > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it
> > > > > > > > > running on
> > > > > > > > > cpu1.
> > > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will
> > > > > > > > > call
> > > > > > > > > schedule_delayed_work to start the timeout_work, which
> > > > > > > > > will
> > > > > > > > > make it
> > > > > > > > > running on cpu2. Meanwhile, we can call
> > > > > > > > > mtk_jpeg_release to cancel the job. But there might be
> > > > > > > > > a
> > > > > > > > > race
> > > > > > > > > between
> > > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > > > > > The remove will go on, the other race is as described
> > > > > > > > > earlier.
> > > > > > > > >
> > > > > > > > > cpu0 cpu1 cpu2
> > > > > > > > > (1)->device_run
> > > > > > > > > mtk_jpeg_multicore
> > > > > > > > > _dec_device_run
> > > > > > > > > queue_work
> > > > > > > > > (jpeg->workqueue,
> > > > > > > > > &ctx->jpeg_work);
> > > > > > > > > (2)mtk_jpegdec_worker
> > > > > > > > > (3)mtk_jpeg_release
> > > > > > > > > v4l2_m2m_cancel_job
> > > > > > > > > wait event
> > > > > > > > >
> > > > > > > > > schedule_delayed_work
> > > > > > > > > (4)mtk_jpeg
> > > > > > > > > _job
> > > > > > > > > _tim
> > > > > > > > > eout_w
> > > > > > > > > ork
> > > > > > > > > (5)v4l2_m2m_job_finish
> > > > > > > > > wake up
> > > > > > > > > (6)mtk_jpeg_remove
> > > > > > > > > v4l2_m2m_release
> > > > > > > > > kfree(m2m_dev)
> > > > > > > > > (7)v4l2_m2m
> > > > > > > > > _get
> > > > > > > > > _cur
> > > > > > > > > r_priv
> > > > > > > > >
> > > > > > > >
> > > > > > > > Dear zheng,
> > > > > > > >
> > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used
> > > > > > > > for
> > > > > > > > multi-
> > > > > > > > hw
> > > > > > > > jpeg decoding. Instead of scheduling
> > > > > > > > mtk_jpeg_job_timeout_work,
> > > > > > > > mtk_jpegdec_worker is scheduled in this function.
> > > > > > > >
> > > > > > > > The mtk_jpeg_dec_device_run function is used for single
> > > > > > > > hw
> > > > > > > > jpeg
> > > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > > > > > >
> > > > > > > > A driver is either a single hw driver or a multi-hw
> > > > > > > > driver
> > > > > > > > and
> > > > > > > > cannot
> > > > > > > > represent both at the same time.
> > > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot
> > > > > > > > be
> > > > > > > > scheduled at
> > > > > > > > the same time.
> > > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish
> > > > > > > > would
> > > > > > > > not
> > > > > > > > cause
> > > > > > > > competition between the mtk_jpegdec_worker and
> > > > > > > > v4l2_m2m_cancel_job.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Kyrie.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > 11:32写道:
> > > > > > > > > >
> > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker
> > > > > > > > > > wrote:
> > > > > > > > > > > Hi Kyrie,
> > > > > > > > > > >
> > > > > > > > > > > Thank you for your careful analysis and response. I
> > > > > > > > > > > still
> > > > > > > > > > > have
> > > > > > > > > > > some
> > > > > > > > > > > areas that I don't quite understand and would like
> > > > > > > > > > > to
> > > > > > > > > > > ask
> > > > > > > > > > > for
> > > > > > > > > > > clarification. That is, how do the function
> > > > > > > > > > > pointers
> > > > > > > > > > > for
> > > > > > > > > > > stop
> > > > > > > > > > > streaming, initialized as
> > > > > > > > > > > mtk_jpeg_enc_stop_streaming
> > > > > > > > > > > and
> > > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker
> > > > > > > > > > > is
> > > > > > > > > > > canceled?
> > > > > > > > > > > I
> > > > > > > > > > > would greatly appreciate your response.
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Zheng
> > > > > > > > > >
> > > > > > > > > > Dear zheng,
> > > > > > > > > >
> > > > > > > > > > For stop streaming, what I mean is that stoppping
> > > > > > > > > > jpeg
> > > > > > > > > > decoding or
> > > > > > > > > > encoding.
> > > > > > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff,
> > > > > > > > > > which
> > > > > > > > > > will
> > > > > > > > > > call
> > > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you
> > > > > > > > > > note,
> > > > > > > > > > cpu1 is
> > > > > > > > > > running), it will wait event, the event is wake up by
> > > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish
> > > > > > > > > > is
> > > > > > > > > > called by
> > > > > > > > > > jpeg
> > > > > > > > > > dec/enc irq handler, which means that the waitting
> > > > > > > > > > would
> > > > > > > > > > result mtk
> > > > > > > > > > hw
> > > > > > > > > > to finish dec/enc, irq will occur and irq handler
> > > > > > > > > > would
> > > > > > > > > > cancel
> > > > > > > > > > timeout
> > > > > > > > > > worker. The follow is shown as blow.
> > > > > > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir
> > > > > > > > > > q/mt
> > > > > > > > > > k_jp
> > > > > > > > > > eg_dec
> > > > > > > > > > _irq
> > > > > > > > > > wait evnet <------ wake up ------
> > > > > > > > > > v4l2_m2m_job_finish
> > > > > > > > > > cancel
> > > > > > > > > > timeout
> > > > > > > > > > work
> > > > > > > > > >
> > > > > > > > > > As mentioned above, if it is normal stop streaming
> > > > > > > > > > action,
> > > > > > > > > > there
> > > > > > > > > > will
> > > > > > > > > > be no happen that the timeout worker does not
> > > > > > > > > > canceled.
> > > > > > > > > >
> > > > > > > > > > But if mtk_jpeg_remove is called directly without
> > > > > > > > > > above
> > > > > > > > > > flow,
> > > > > > > > > > it
> > > > > > > > > > would
> > > > > > > > > > cause lots of issues.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Kyrie.
> > > > > > > > > > >
> > > > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > > > 10:02写道:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > The timer function was set in mtk_jpeg_probe
> > > > > > > > > > > > > with
> > > > > > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > > > > > And the worker is started in
> > > > > > > > > > > > > mtk_jpeg_dec_device_run.
> > > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > > > > > which
> > > > > > > > > > > > > may cancel the worker.
> > > > > > > > > > > > > They are used as IRQ handler function which is
> > > > > > > > > > > > > saved as
> > > > > > > > > > > > > function
> > > > > > > > > > > > > pointer in a variable.
> > > > > > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > > > > > devm_request_irq:
> > > > > > > > > > > > >
> > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > > > > > jpeg_irq,
> > > > > > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > > > > > 0,
> > > > > > > > > > > > > pdev->name, jpeg);
> > > > > > > > > > > > > if (ret) {
> > > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request
> > > > > > > > > > > > > jpeg_irq
> > > > > > > > > > > > > %d
> > > > > > > > > > > > > (%d)\n",
> > > > > > > > > > > > > jpeg_irq, ret);
> > > > > > > > > > > > > return ret;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, if we remove the module without
> > > > > > > > > > > > > triggering
> > > > > > > > > > > > > the
> > > > > > > > > > > > > irq,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > worker will never be removed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run
> > > > > > > > > > > > > and
> > > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > > > > > The schedule invoking is quite complicated. As
> > > > > > > > > > > > > far
> > > > > > > > > > > > > as I
> > > > > > > > > > > > > know,
> > > > > > > > > > > > > the
> > > > > > > > > > > > > invoking chain is as follows:
> > > > > > > > > > > > >
> > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > > > > > > v4l2_m2m_try_run
> > > > > > > > > > > > >
> > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a
> > > > > > > > > > > > > worker
> > > > > > > > > > > > > which
> > > > > > > > > > > > > is
> > > > > > > > > > > > > set
> > > > > > > > > > > > > in
> > > > > > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Before calling remove function,
> > > > > > > > > > > > > the mtk_jpeg_release
> > > > > > > > > > > > > was
> > > > > > > > > > > > > invoked
> > > > > > > > > > > > > to
> > > > > > > > > > > > > release the related resource.
> > > > > > > > > > > > >
> > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by
> > > > > > > > > > > > > calling
> > > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > > > > > >
> > > > > > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this can not cancel the posted task
> > > > > > > > > > > > > mentioned
> > > > > > > > > > > > > before.
> > > > > > > > > > > > > So
> > > > > > > > > > > > > I
> > > > > > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > > > > > >
> > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after
> > > > > > > > > > > > > freeing it
> > > > > > > > > > > > > in
> > > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > > > > > >
> > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > video_drvdata(file);
> > > > > > > > > > > > > struct mtk_jpeg_ctx *ctx =
> > > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > > > > > private_data);
> > > > > > > > > > > > >
> > > > > > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > > > > > kfree(ctx);
> > > > > > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > > > > > return 0;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx
> > > > > > > > > > > > > *m2m_ctx)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > /* wait until the current context is dequeued
> > > > > > > > > > > > > from
> > > > > > > > > > > > > job_queue */
> > > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > > > > > >
> > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > > > > > >
> > > > > > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > Note that all of this is static analysis, which
> > > > > > > > > > > > > may
> > > > > > > > > > > > > be
> > > > > > > > > > > > > false
> > > > > > > > > > > > > positive.
> > > > > > > > > > > > > Feel free to tell me if there is something I've
> > > > > > > > > > > > > missed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regard,
> > > > > > > > > > > > > Zheng
> > > > > > > > > > > >
> > > > > > > > > > > > Dear Zheng,
> > > > > > > > > > > >
> > > > > > > > > > > > You set up an application scenario:
> > > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout
> > > > > > > > > > > > work
> > > > > > > > > > > > has
> > > > > > > > > > > > been
> > > > > > > > > > > > scheduled.
> > > > > > > > > > > > At the same time cpu0 wanted to remove the mtk-
> > > > > > > > > > > > jpeg
> > > > > > > > > > > > driver,
> > > > > > > > > > > > which
> > > > > > > > > > > > caused the UAF bug.
> > > > > > > > > > > > I wonder if such an irrational application
> > > > > > > > > > > > scenario
> > > > > > > > > > > > could
> > > > > > > > > > > > exist.
> > > > > > > > > > > > This
> > > > > > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > > > > > problems you
> > > > > > > > > > > > mentioned, but also to output&capture memory
> > > > > > > > > > > > leaks
> > > > > > > > > > > > and
> > > > > > > > > > > > unreleased
> > > > > > > > > > > > resources, such as spinlock.
> > > > > > > > > > > > Typically, if we want to remove the driver, we
> > > > > > > > > > > > firstly do
> > > > > > > > > > > > stop
> > > > > > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > > > > > canceled.
> > > > > > > > > > > > I agree with your changes from the perspective of
> > > > > > > > > > > > strengthening
> > > > > > > > > > > > the
> > > > > > > > > > > > robustness of the driver code.
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Kyrie.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Irui Wang (王瑞) <[email protected]>
> > > > > > > > > > > > > 于2023年3月7日周二
> > > > > > > > > > > > > 18:23写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Dear Kyrie,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100,
> > > > > > > > > > > > > > AngeloGioacchino
> > > > > > > > > > > > > > Del
> > > > > > > > > > > > > > Regno
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is there anyone who can help with this? I
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > details
> > > > > > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Providing more details is always good.
> > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > do.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop:
> > > > > > > > > > > > > > > he's
> > > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > > mtk-
> > > > > > > > > > > > > > > jpeg.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > Angelo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Zheng
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Zheng Wang <[email protected]>
> > > > > > > > > > > > > > > > 于2023年3月6日周一
> > > > > > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg-
> > > > > > > > > > > > > > > > > >job_timeout_work
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > bound
> > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be
> > > > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > start
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > > > to make cleanup, there may be a
> > > > > > > > > > > > > > > > > unfinished
> > > > > > > > > > > > > > > > > work. The
> > > > > > > > > > > > > > > > > possible sequence is as follows, which
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > cause a
> > > > > > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Fix it by canceling the work before
> > > > > > > > > > > > > > > > > cleanup
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > |mtk_jpeg_job_time
> > > > > > > > > > > > > > > > > out_
> > > > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > > > > > | m2m_dev-
> > > > > > > > > > > > > > > > > >curr_ctx
> > > > > > > > > > > > > > > > > //use
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <
> > > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt
> > > > > > > > > > > > > > > > > k_jp
> > > > > > > > > > > > > > > > > eg_c
> > > > > > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1
> > > > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > > > > > > job_timeout_work);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev-
> > > > > > > > > > > > > > > > > >dev);
> > > > > > > > > > > > > > > > > video_unregister_device(jpeg-
> > > > > > > > > > > > > > > > > > vdev);
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg-
> > > > > > > > > > > > > > > > > >m2m_dev);
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >

2023-06-25 03:16:25

by Zheng Hacker

[permalink] [raw]
Subject: Re: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work

Hello,

I sent the patch-v2, but found some mail was rejected due to Network
error. Here is the newest patch link:

https://lore.kernel.org/all/[email protected]/


Zheng Hacker <[email protected]> 于2023年6月15日周四 10:58写道:
>
> Hi Kyrie,
>
> Thanks for your kind reminder. I'll write the patch as soon as I can.
>
> Best regards,
> Zheng Wang
>
> Kyrie Wu (吴晗) <[email protected]> 于2023年6月15日周四 10:40写道:
> >
> > On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote:
> > > External email : Please do not click links or open attachments until
> > > you have verified the sender or the content.
> > >
> > >
> > > Hi Kyrie,
> > >
> > > I think your analysis is right, I'll make the change in the next
> > > version.
> > >
> > > Best regards,
> > > Zheng
> > >
> > Dear Zheng,
> >
> > Could you please update the next version at any convenient time?
> >
> > Thanks.
> >
> > Bset Regards,
> > Kyrie.
> >
> > > Kyrie Wu (吴晗) <[email protected]> 于2023年5月12日周五 09:24写道:
> > > >
> > > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote:
> > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月14日周二 10:49写道:
> > > > > >
> > > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote:
> > > > > > > Dear Kyrie,
> > > > > > >
> > > > > > > Guenter gave some advice about this issue in [1]. As we are
> > > > > > > not
> > > > > > > familiar with the code here, could you please see if the bug
> > > > > > > really
> > > > > > > exists?
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > >
> > > >
> > https://lore.kernel.org/all/[email protected]/
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Zheng
> > > > > >
> > > > > > Dear zheng,
> > > > > >
> > > > > > I have no more questions for your patch, thank you for your
> > > > > > contribution to mtk jpeg driver.
> > > > > >
> > > > >
> > > > > Thanks for your detailed review and explanation about the code. I
> > > > > have
> > > > > learned a lot
> > > > > from it. Have a nice day :)
> > > > >
> > > > > Best regards,
> > > > > Zheng
> > > >
> > > > Dear Zheng,
> > > >
> > > > I think that cancel_delayed_work, added in remove function, should
> > > > change to cancel_delayed_work_sync to solve UAF. The corresponding
> > > > operation flow is shown as follow:
> > > > CPU0 CPU1
> > > > |mtk_jpeg_job_timeout_work
> > > > mtk_jpeg_remove |
> > > > cancel_delayed_work |
> > > > v4l2_m2m_release |
> > > > kfree(m2m_dev); |
> > > > |
> > > > | v4l2_m2m_get_curr_priv
> > > > | m2m_dev->curr_ctx //use
> > > >
> > > > If cancel_delayed_work is called in async mode like above, and the
> > > > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this
> > > > calling may cause UAF.
> > > >
> > > > Thanks.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Zheng Hacker <[email protected]> 于2023年3月8日周三 17:11写道:
> > > > > > > >
> > > > > > > > Dear Kyrie,
> > > > > > > >
> > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker
> > > > > > > > is
> > > > > > > > bound
> > > > > > > > with mtk_jpegdec_timeout_work rather than
> > > > > > > > mtk_jpeg_job_timeout_work.So
> > > > > > > > the competition won't happen.
> > > > > > > >
> > > > > > > > Back to the beginning scene, I still don't know if it's a
> > > > > > > > UAF
> > > > > > > > issue
> > > > > > > > or
> > > > > > > > not. The normal schedule is very strong to me. If we can
> > > > > > > > call
> > > > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release,
> > > > > > > > The UAF is still possible. Otherwis, I think it's safe
> > > > > > > > here.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Zheng
> > > > > > > >
> > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三 15:39写道:
> > > > > > > > >
> > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote:
> > > > > > > > > > Hi Kyrie,
> > > > > > > > > >
> > > > > > > > > > After reviewing the code, I found anothe possible code
> > > > > > > > > > path. As
> > > > > > > > > > I am
> > > > > > > > > > not familiar with the module. It has high possibility
> > > > > > > > > > it's
> > > > > > > > > > wrong.
> > > > > > > > > > Could please help me check this? Very much appreciated
> > > > > > > > > > for
> > > > > > > > > > your
> > > > > > > > > > valuable time.
> > > > > > > > > >
> > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open
> > > > > > > > > > and
> > > > > > > > > > started
> > > > > > > > > > in
> > > > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it
> > > > > > > > > > running on
> > > > > > > > > > cpu1.
> > > > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will
> > > > > > > > > > call
> > > > > > > > > > schedule_delayed_work to start the timeout_work, which
> > > > > > > > > > will
> > > > > > > > > > make it
> > > > > > > > > > running on cpu2. Meanwhile, we can call
> > > > > > > > > > mtk_jpeg_release to cancel the job. But there might be
> > > > > > > > > > a
> > > > > > > > > > race
> > > > > > > > > > between
> > > > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call
> > > > > > > > > > v4l2_m2m_job_finish too early to wake up the event.
> > > > > > > > > > The remove will go on, the other race is as described
> > > > > > > > > > earlier.
> > > > > > > > > >
> > > > > > > > > > cpu0 cpu1 cpu2
> > > > > > > > > > (1)->device_run
> > > > > > > > > > mtk_jpeg_multicore
> > > > > > > > > > _dec_device_run
> > > > > > > > > > queue_work
> > > > > > > > > > (jpeg->workqueue,
> > > > > > > > > > &ctx->jpeg_work);
> > > > > > > > > > (2)mtk_jpegdec_worker
> > > > > > > > > > (3)mtk_jpeg_release
> > > > > > > > > > v4l2_m2m_cancel_job
> > > > > > > > > > wait event
> > > > > > > > > >
> > > > > > > > > > schedule_delayed_work
> > > > > > > > > > (4)mtk_jpeg
> > > > > > > > > > _job
> > > > > > > > > > _tim
> > > > > > > > > > eout_w
> > > > > > > > > > ork
> > > > > > > > > > (5)v4l2_m2m_job_finish
> > > > > > > > > > wake up
> > > > > > > > > > (6)mtk_jpeg_remove
> > > > > > > > > > v4l2_m2m_release
> > > > > > > > > > kfree(m2m_dev)
> > > > > > > > > > (7)v4l2_m2m
> > > > > > > > > > _get
> > > > > > > > > > _cur
> > > > > > > > > > r_priv
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Dear zheng,
> > > > > > > > >
> > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used
> > > > > > > > > for
> > > > > > > > > multi-
> > > > > > > > > hw
> > > > > > > > > jpeg decoding. Instead of scheduling
> > > > > > > > > mtk_jpeg_job_timeout_work,
> > > > > > > > > mtk_jpegdec_worker is scheduled in this function.
> > > > > > > > >
> > > > > > > > > The mtk_jpeg_dec_device_run function is used for single
> > > > > > > > > hw
> > > > > > > > > jpeg
> > > > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work.
> > > > > > > > >
> > > > > > > > > A driver is either a single hw driver or a multi-hw
> > > > > > > > > driver
> > > > > > > > > and
> > > > > > > > > cannot
> > > > > > > > > represent both at the same time.
> > > > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot
> > > > > > > > > be
> > > > > > > > > scheduled at
> > > > > > > > > the same time.
> > > > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish
> > > > > > > > > would
> > > > > > > > > not
> > > > > > > > > cause
> > > > > > > > > competition between the mtk_jpegdec_worker and
> > > > > > > > > v4l2_m2m_cancel_job.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Kyrie.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > > 11:32写道:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker
> > > > > > > > > > > wrote:
> > > > > > > > > > > > Hi Kyrie,
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you for your careful analysis and response. I
> > > > > > > > > > > > still
> > > > > > > > > > > > have
> > > > > > > > > > > > some
> > > > > > > > > > > > areas that I don't quite understand and would like
> > > > > > > > > > > > to
> > > > > > > > > > > > ask
> > > > > > > > > > > > for
> > > > > > > > > > > > clarification. That is, how do the function
> > > > > > > > > > > > pointers
> > > > > > > > > > > > for
> > > > > > > > > > > > stop
> > > > > > > > > > > > streaming, initialized as
> > > > > > > > > > > > mtk_jpeg_enc_stop_streaming
> > > > > > > > > > > > and
> > > > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker
> > > > > > > > > > > > is
> > > > > > > > > > > > canceled?
> > > > > > > > > > > > I
> > > > > > > > > > > > would greatly appreciate your response.
> > > > > > > > > > > >
> > > > > > > > > > > > Best regards,
> > > > > > > > > > > > Zheng
> > > > > > > > > > >
> > > > > > > > > > > Dear zheng,
> > > > > > > > > > >
> > > > > > > > > > > For stop streaming, what I mean is that stoppping
> > > > > > > > > > > jpeg
> > > > > > > > > > > decoding or
> > > > > > > > > > > encoding.
> > > > > > > > > > > Ok, let me introduce the sw flow of stop streaming:
> > > > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff,
> > > > > > > > > > > which
> > > > > > > > > > > will
> > > > > > > > > > > call
> > > > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you
> > > > > > > > > > > note,
> > > > > > > > > > > cpu1 is
> > > > > > > > > > > running), it will wait event, the event is wake up by
> > > > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish
> > > > > > > > > > > is
> > > > > > > > > > > called by
> > > > > > > > > > > jpeg
> > > > > > > > > > > dec/enc irq handler, which means that the waitting
> > > > > > > > > > > would
> > > > > > > > > > > result mtk
> > > > > > > > > > > hw
> > > > > > > > > > > to finish dec/enc, irq will occur and irq handler
> > > > > > > > > > > would
> > > > > > > > > > > cancel
> > > > > > > > > > > timeout
> > > > > > > > > > > worker. The follow is shown as blow.
> > > > > > > > > > > v4l2_m2m_ioctl_streamoff
> > > > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir
> > > > > > > > > > > q/mt
> > > > > > > > > > > k_jp
> > > > > > > > > > > eg_dec
> > > > > > > > > > > _irq
> > > > > > > > > > > wait evnet <------ wake up ------
> > > > > > > > > > > v4l2_m2m_job_finish
> > > > > > > > > > > cancel
> > > > > > > > > > > timeout
> > > > > > > > > > > work
> > > > > > > > > > >
> > > > > > > > > > > As mentioned above, if it is normal stop streaming
> > > > > > > > > > > action,
> > > > > > > > > > > there
> > > > > > > > > > > will
> > > > > > > > > > > be no happen that the timeout worker does not
> > > > > > > > > > > canceled.
> > > > > > > > > > >
> > > > > > > > > > > But if mtk_jpeg_remove is called directly without
> > > > > > > > > > > above
> > > > > > > > > > > flow,
> > > > > > > > > > > it
> > > > > > > > > > > would
> > > > > > > > > > > cause lots of issues.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Kyrie.
> > > > > > > > > > > >
> > > > > > > > > > > > Kyrie Wu (吴晗) <[email protected]> 于2023年3月8日周三
> > > > > > > > > > > > 10:02写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > The timer function was set in mtk_jpeg_probe
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > mtk_jpeg_job_timeout_work function.
> > > > > > > > > > > > > > And the worker is started in
> > > > > > > > > > > > > > mtk_jpeg_dec_device_run.
> > > > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and
> > > > > > > > > > > > > > mtk_jpeg_dec_irq)
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > may cancel the worker.
> > > > > > > > > > > > > > They are used as IRQ handler function which is
> > > > > > > > > > > > > > saved as
> > > > > > > > > > > > > > function
> > > > > > > > > > > > > > pointer in a variable.
> > > > > > > > > > > > > > In mtk_jpeg_probe, they are registered by
> > > > > > > > > > > > > > devm_request_irq:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev,
> > > > > > > > > > > > > > jpeg_irq,
> > > > > > > > > > > > > > jpeg->variant->irq_handler,
> > > > > > > > > > > > > > 0,
> > > > > > > > > > > > > > pdev->name, jpeg);
> > > > > > > > > > > > > > if (ret) {
> > > > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request
> > > > > > > > > > > > > > jpeg_irq
> > > > > > > > > > > > > > %d
> > > > > > > > > > > > > > (%d)\n",
> > > > > > > > > > > > > > jpeg_irq, ret);
> > > > > > > > > > > > > > return ret;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, if we remove the module without
> > > > > > > > > > > > > > triggering
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > irq,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > worker will never be removed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker.
> > > > > > > > > > > > > > The schedule invoking is quite complicated. As
> > > > > > > > > > > > > > far
> > > > > > > > > > > > > > as I
> > > > > > > > > > > > > > know,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > invoking chain is as follows:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work-
> > > > > > > > > > > > > > > v4l2_m2m_try_run
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a
> > > > > > > > > > > > > > worker
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > set
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > v4l2_m2m_init and started in
> > > > > > > > > > > > > > v4l2_m2m_schedule_next_job.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Before calling remove function,
> > > > > > > > > > > > > > the mtk_jpeg_release
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > invoked
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > release the related resource.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by
> > > > > > > > > > > > > > calling
> > > > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > But this will only cancel the current queue by
> > > > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think this can not cancel the posted task
> > > > > > > > > > > > > > mentioned
> > > > > > > > > > > > > > before.
> > > > > > > > > > > > > > So
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > think if mtk_jpeg_job_timeout_work
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after
> > > > > > > > > > > > > > freeing it
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > video_drvdata(file);
> > > > > > > > > > > > > > struct mtk_jpeg_ctx *ctx =
> > > > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file-
> > > > > > > > > > > > > > > private_data);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > mutex_lock(&jpeg->lock);
> > > > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl);
> > > > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
> > > > > > > > > > > > > > v4l2_fh_del(&ctx->fh);
> > > > > > > > > > > > > > v4l2_fh_exit(&ctx->fh);
> > > > > > > > > > > > > > kfree(ctx);
> > > > > > > > > > > > > > mutex_unlock(&jpeg->lock);
> > > > > > > > > > > > > > return 0;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx
> > > > > > > > > > > > > > *m2m_ctx)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > /* wait until the current context is dequeued
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > job_queue */
> > > > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q);
> > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > kfree(m2m_ctx);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that all of this is static analysis, which
> > > > > > > > > > > > > > may
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > false
> > > > > > > > > > > > > > positive.
> > > > > > > > > > > > > > Feel free to tell me if there is something I've
> > > > > > > > > > > > > > missed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regard,
> > > > > > > > > > > > > > Zheng
> > > > > > > > > > > > >
> > > > > > > > > > > > > Dear Zheng,
> > > > > > > > > > > > >
> > > > > > > > > > > > > You set up an application scenario:
> > > > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout
> > > > > > > > > > > > > work
> > > > > > > > > > > > > has
> > > > > > > > > > > > > been
> > > > > > > > > > > > > scheduled.
> > > > > > > > > > > > > At the same time cpu0 wanted to remove the mtk-
> > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > driver,
> > > > > > > > > > > > > which
> > > > > > > > > > > > > caused the UAF bug.
> > > > > > > > > > > > > I wonder if such an irrational application
> > > > > > > > > > > > > scenario
> > > > > > > > > > > > > could
> > > > > > > > > > > > > exist.
> > > > > > > > > > > > > This
> > > > > > > > > > > > > scenario, as you described, not only leads to the
> > > > > > > > > > > > > problems you
> > > > > > > > > > > > > mentioned, but also to output&capture memory
> > > > > > > > > > > > > leaks
> > > > > > > > > > > > > and
> > > > > > > > > > > > > unreleased
> > > > > > > > > > > > > resources, such as spinlock.
> > > > > > > > > > > > > Typically, if we want to remove the driver, we
> > > > > > > > > > > > > firstly do
> > > > > > > > > > > > > stop
> > > > > > > > > > > > > streaming, which ensures that the worker has been
> > > > > > > > > > > > > canceled.
> > > > > > > > > > > > > I agree with your changes from the perspective of
> > > > > > > > > > > > > strengthening
> > > > > > > > > > > > > the
> > > > > > > > > > > > > robustness of the driver code.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > Kyrie.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Irui Wang (王瑞) <[email protected]>
> > > > > > > > > > > > > > 于2023年3月7日周二
> > > > > > > > > > > > > > 18:23写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Dear Angelo and Zheng,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for your patch and comments.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Dear Kyrie,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please help to check this, thanks.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100,
> > > > > > > > > > > > > > > AngeloGioacchino
> > > > > > > > > > > > > > > Del
> > > > > > > > > > > > > > > Regno
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto:
> > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Is there anyone who can help with this? I
> > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > details
> > > > > > > > > > > > > > > > > like invoking chain if needed.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Providing more details is always good.
> > > > > > > > > > > > > > > > Please
> > > > > > > > > > > > > > > > do.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop:
> > > > > > > > > > > > > > > > he's
> > > > > > > > > > > > > > > > doing
> > > > > > > > > > > > > > > > mtk-
> > > > > > > > > > > > > > > > jpeg.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > Angelo
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Zheng
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Zheng Wang <[email protected]>
> > > > > > > > > > > > > > > > > 于2023年3月6日周一
> > > > > > > > > > > > > > > > > 14:28写道:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg-
> > > > > > > > > > > > > > > > > > >job_timeout_work
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > bound
> > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then
> > > > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run
> > > > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be
> > > > > > > > > > > > > > > > > > called
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > start
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > work.
> > > > > > > > > > > > > > > > > > If we remove the module which will call
> > > > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > > > > to make cleanup, there may be a
> > > > > > > > > > > > > > > > > > unfinished
> > > > > > > > > > > > > > > > > > work. The
> > > > > > > > > > > > > > > > > > possible sequence is as follows, which
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > cause a
> > > > > > > > > > > > > > > > > > typical UAF bug.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Fix it by canceling the work before
> > > > > > > > > > > > > > > > > > cleanup
> > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > mtk_jpeg_remove
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > CPU0 CPU1
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > |mtk_jpeg_job_time
> > > > > > > > > > > > > > > > > > out_
> > > > > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > > > mtk_jpeg_remove |
> > > > > > > > > > > > > > > > > > v4l2_m2m_release |
> > > > > > > > > > > > > > > > > > kfree(m2m_dev); |
> > > > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > > > |
> > > > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv
> > > > > > > > > > > > > > > > > > | m2m_dev-
> > > > > > > > > > > > > > > > > > >curr_ctx
> > > > > > > > > > > > > > > > > > //use
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <
> > > > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt
> > > > > > > > > > > > > > > > > > k_jp
> > > > > > > > > > > > > > > > > > eg_c
> > > > > > > > > > > > > > > > > > ore.c
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2 +-
> > > > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1
> > > > > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644
> > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > > +++
> > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/
> > > > > > > > > > > > > > > > > > mtk_
> > > > > > > > > > > > > > > > > > jpeg
> > > > > > > > > > > > > > > > > > _core.
> > > > > > > > > > > > > > > > > > c
> > > > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int
> > > > > > > > > > > > > > > > > > mtk_jpeg_probe(struct
> > > > > > > > > > > > > > > > > > platform_device *pdev)
> > > > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct
> > > > > > > > > > > > > > > > > > platform_device
> > > > > > > > > > > > > > > > > > *pdev)
> > > > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg =
> > > > > > > > > > > > > > > > > > platform_get_drvdata(pdev);
> > > > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg-
> > > > > > > > > > > > > > > > > > > job_timeout_work);
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev-
> > > > > > > > > > > > > > > > > > >dev);
> > > > > > > > > > > > > > > > > > video_unregister_device(jpeg-
> > > > > > > > > > > > > > > > > > > vdev);
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg-
> > > > > > > > > > > > > > > > > > >m2m_dev);
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >