2023-07-12 22:50:20

by Rob Clark

[permalink] [raw]
Subject: [PATCH] drm/msm: Fix hw_fence error path cleanup

From: Rob Clark <[email protected]>

In an error path where the submit is free'd without the job being run,
the hw_fence pointer is simply a kzalloc'd block of memory. In this
case we should just kfree() it, rather than trying to decrement it's
reference count. Fortunately we can tell that this is the case by
checking for a zero refcount, since if the job was run, the submit would
be holding a reference to the hw_fence.

Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence")
Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/msm/msm_fence.c | 6 ++++++
drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++-
2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 96599ec3eb78..1a5d4f1c8b42 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx)

f->fctx = fctx;

+ /*
+ * Until this point, the fence was just some pre-allocated memory,
+ * no-one should have taken a reference to it yet.
+ */
+ WARN_ON(kref_read(&fence->refcount));
+
dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
fctx->context, ++fctx->last_fence);
}
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 3f1aa4de3b87..9d66498cdc04 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref)
}

dma_fence_put(submit->user_fence);
- dma_fence_put(submit->hw_fence);
+
+ /*
+ * If the submit is freed before msm_job_run(), then hw_fence is
+ * just some pre-allocated memory, not a reference counted fence.
+ * Once the job runs and the hw_fence is initialized, it will
+ * have a refcount of at least one, since the submit holds a ref
+ * to the hw_fence.
+ */
+ if (kref_read(&submit->hw_fence->refcount) == 0) {
+ kfree(submit->hw_fence);
+ } else {
+ dma_fence_put(submit->hw_fence);
+ }

put_pid(submit->pid);
msm_submitqueue_put(submit->queue);
--
2.41.0



2023-07-13 20:28:50

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH] drm/msm: Fix hw_fence error path cleanup

On 13/07/2023 01:25, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> In an error path where the submit is free'd without the job being run,
> the hw_fence pointer is simply a kzalloc'd block of memory. In this
> case we should just kfree() it, rather than trying to decrement it's
> reference count. Fortunately we can tell that this is the case by
> checking for a zero refcount, since if the job was run, the submit would
> be holding a reference to the hw_fence.
>
> Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence")
> Signed-off-by: Rob Clark <[email protected]>
> ---
> drivers/gpu/drm/msm/msm_fence.c | 6 ++++++
> drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> index 96599ec3eb78..1a5d4f1c8b42 100644
> --- a/drivers/gpu/drm/msm/msm_fence.c
> +++ b/drivers/gpu/drm/msm/msm_fence.c
> @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx)
>
> f->fctx = fctx;
>
> + /*
> + * Until this point, the fence was just some pre-allocated memory,
> + * no-one should have taken a reference to it yet.
> + */
> + WARN_ON(kref_read(&fence->refcount));

It this really correct to return a refcounted object with 0 refcount
(I'm looking at submit_create() / msm_fence_alloc() )? Maybe it would be
better to move dma_fence_get() to msm_fence_alloc() ? But don't
immediately see, which one should be moved.

> +
> dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> fctx->context, ++fctx->last_fence);
> }
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 3f1aa4de3b87..9d66498cdc04 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref)
> }
>
> dma_fence_put(submit->user_fence);
> - dma_fence_put(submit->hw_fence);
> +
> + /*
> + * If the submit is freed before msm_job_run(), then hw_fence is
> + * just some pre-allocated memory, not a reference counted fence.
> + * Once the job runs and the hw_fence is initialized, it will
> + * have a refcount of at least one, since the submit holds a ref
> + * to the hw_fence.
> + */
> + if (kref_read(&submit->hw_fence->refcount) == 0) {
> + kfree(submit->hw_fence);
> + } else {
> + dma_fence_put(submit->hw_fence);
> + }
>
> put_pid(submit->pid);
> msm_submitqueue_put(submit->queue);

--
With best wishes
Dmitry


2023-07-13 20:50:25

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH] drm/msm: Fix hw_fence error path cleanup

On Thu, Jul 13, 2023 at 1:03 PM Dmitry Baryshkov
<[email protected]> wrote:
>
> On 13/07/2023 01:25, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > In an error path where the submit is free'd without the job being run,
> > the hw_fence pointer is simply a kzalloc'd block of memory. In this
> > case we should just kfree() it, rather than trying to decrement it's
> > reference count. Fortunately we can tell that this is the case by
> > checking for a zero refcount, since if the job was run, the submit would
> > be holding a reference to the hw_fence.
> >
> > Fixes: f94e6a51e17c ("drm/msm: Pre-allocate hw_fence")
> > Signed-off-by: Rob Clark <[email protected]>
> > ---
> > drivers/gpu/drm/msm/msm_fence.c | 6 ++++++
> > drivers/gpu/drm/msm/msm_gem_submit.c | 14 +++++++++++++-
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index 96599ec3eb78..1a5d4f1c8b42 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -191,6 +191,12 @@ msm_fence_init(struct dma_fence *fence, struct msm_fence_context *fctx)
> >
> > f->fctx = fctx;
> >
> > + /*
> > + * Until this point, the fence was just some pre-allocated memory,
> > + * no-one should have taken a reference to it yet.
> > + */
> > + WARN_ON(kref_read(&fence->refcount));
>
> It this really correct to return a refcounted object with 0 refcount
> (I'm looking at submit_create() / msm_fence_alloc() )? Maybe it would be
> better to move dma_fence_get() to msm_fence_alloc() ? But don't
> immediately see, which one should be moved.

The issue is that we can't really initialize the fence until
msm_job_run(), when it is known what order the fence would be
signaled. But we don't want to do any allocations in msm_job_run()
because that could trigger the shrinker, which could need to wait
until jobs complete to release memory, forming a deadlock.

BR,
-R

> > +
> > dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
> > fctx->context, ++fctx->last_fence);
> > }
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 3f1aa4de3b87..9d66498cdc04 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -86,7 +86,19 @@ void __msm_gem_submit_destroy(struct kref *kref)
> > }
> >
> > dma_fence_put(submit->user_fence);
> > - dma_fence_put(submit->hw_fence);
> > +
> > + /*
> > + * If the submit is freed before msm_job_run(), then hw_fence is
> > + * just some pre-allocated memory, not a reference counted fence.
> > + * Once the job runs and the hw_fence is initialized, it will
> > + * have a refcount of at least one, since the submit holds a ref
> > + * to the hw_fence.
> > + */
> > + if (kref_read(&submit->hw_fence->refcount) == 0) {
> > + kfree(submit->hw_fence);
> > + } else {
> > + dma_fence_put(submit->hw_fence);
> > + }
> >
> > put_pid(submit->pid);
> > msm_submitqueue_put(submit->queue);
>
> --
> With best wishes
> Dmitry
>