2022-06-15 16:32:22

by Rob Clark

[permalink] [raw]
Subject: [PATCH] drm/msm: Fix fence rollover issue

From: Rob Clark <[email protected]>

And while we are at it, let's start the fence counter close to the
rollover point so that if issues slip in, they are more obvious.

Signed-off-by: Rob Clark <[email protected]>
---
drivers/gpu/drm/msm/msm_fence.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 3df255402a33..a35a6746c7cd 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -28,6 +28,14 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
fctx->fenceptr = fenceptr;
spin_lock_init(&fctx->spinlock);

+ /*
+ * Start out close to the 32b fence rollover point, so we can
+ * catch bugs with fence comparisons.
+ */
+ fctx->last_fence = 0xffffff00;
+ fctx->completed_fence = fctx->last_fence;
+ *fctx->fenceptr = fctx->last_fence;
+
return fctx;
}

@@ -46,11 +54,12 @@ bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence)
(int32_t)(*fctx->fenceptr - fence) >= 0;
}

-/* called from workqueue */
+/* called from irq handler */
void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
{
spin_lock(&fctx->spinlock);
- fctx->completed_fence = max(fence, fctx->completed_fence);
+ if (fence_after(fence, fctx->completed_fence))
+ fctx->completed_fence = fence;
spin_unlock(&fctx->spinlock);
}

--
2.36.1


2022-06-16 08:39:28

by Dmitry Baryshkov

[permalink] [raw]
Subject: Re: [PATCH] drm/msm: Fix fence rollover issue

On 15/06/2022 19:24, Rob Clark wrote:
> From: Rob Clark <[email protected]>
>
> And while we are at it, let's start the fence counter close to the
> rollover point so that if issues slip in, they are more obvious.
>
> Signed-off-by: Rob Clark <[email protected]>

Should it also have

Fixes: fde5de6cb461 ("drm/msm: move fence code to it's own file")

Or maybe

Fixes: 5f3aee4ceb5b ("drm/msm: Handle fence rollover")

Otherwise:

Reviewed: Dmitry Baryshkov <[email protected]>


> ---
> drivers/gpu/drm/msm/msm_fence.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> index 3df255402a33..a35a6746c7cd 100644
> --- a/drivers/gpu/drm/msm/msm_fence.c
> +++ b/drivers/gpu/drm/msm/msm_fence.c
> @@ -28,6 +28,14 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
> fctx->fenceptr = fenceptr;
> spin_lock_init(&fctx->spinlock);
>
> + /*
> + * Start out close to the 32b fence rollover point, so we can
> + * catch bugs with fence comparisons.
> + */
> + fctx->last_fence = 0xffffff00;
> + fctx->completed_fence = fctx->last_fence;
> + *fctx->fenceptr = fctx->last_fence;

This looks like a debugging hack. But probably it's fine to have it, as
it wouldn't cause any side effects.

> +
> return fctx;
> }
>
> @@ -46,11 +54,12 @@ bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence)
> (int32_t)(*fctx->fenceptr - fence) >= 0;
> }
>
> -/* called from workqueue */
> +/* called from irq handler */
> void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> {
> spin_lock(&fctx->spinlock);
> - fctx->completed_fence = max(fence, fctx->completed_fence);
> + if (fence_after(fence, fctx->completed_fence))
> + fctx->completed_fence = fence;
> spin_unlock(&fctx->spinlock);
> }
>


--
With best wishes
Dmitry

2022-06-16 14:31:45

by Rob Clark

[permalink] [raw]
Subject: Re: [PATCH] drm/msm: Fix fence rollover issue

On Thu, Jun 16, 2022 at 1:27 AM Dmitry Baryshkov
<[email protected]> wrote:
>
> On 15/06/2022 19:24, Rob Clark wrote:
> > From: Rob Clark <[email protected]>
> >
> > And while we are at it, let's start the fence counter close to the
> > rollover point so that if issues slip in, they are more obvious.
> >
> > Signed-off-by: Rob Clark <[email protected]>
>
> Should it also have
>
> Fixes: fde5de6cb461 ("drm/msm: move fence code to it's own file")
>
> Or maybe
>
> Fixes: 5f3aee4ceb5b ("drm/msm: Handle fence rollover")

arguably it fixes the first commit that added GPU support (and
finishes up a couple spots that the above commit missed)

I guess I could use the fixes tag just to indicate how far back it
would be reasonable to backport to stable branches.

> Otherwise:
>
> Reviewed: Dmitry Baryshkov <[email protected]>
>
>
> > ---
> > drivers/gpu/drm/msm/msm_fence.c | 13 +++++++++++--
> > 1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index 3df255402a33..a35a6746c7cd 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -28,6 +28,14 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr,
> > fctx->fenceptr = fenceptr;
> > spin_lock_init(&fctx->spinlock);
> >
> > + /*
> > + * Start out close to the 32b fence rollover point, so we can
> > + * catch bugs with fence comparisons.
> > + */
> > + fctx->last_fence = 0xffffff00;
> > + fctx->completed_fence = fctx->last_fence;
> > + *fctx->fenceptr = fctx->last_fence;
>
> This looks like a debugging hack. But probably it's fine to have it, as
> it wouldn't cause any side effects.

I was originally going to add a modparam or kconfig to enable this..
but then thought, if there is a bug and thing are to go wrong, it's
best for that to happen ASAP rather than after 200-400 days of
uptime.. the latter case can be rather hard to reproduce bugs ;-)

IIRC the kernel does something similar with jiffies to ensure the
rollover point is hit quickly

BR,
-R

> > +
> > return fctx;
> > }
> >
> > @@ -46,11 +54,12 @@ bool msm_fence_completed(struct msm_fence_context *fctx, uint32_t fence)
> > (int32_t)(*fctx->fenceptr - fence) >= 0;
> > }
> >
> > -/* called from workqueue */
> > +/* called from irq handler */
> > void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> > {
> > spin_lock(&fctx->spinlock);
> > - fctx->completed_fence = max(fence, fctx->completed_fence);
> > + if (fence_after(fence, fctx->completed_fence))
> > + fctx->completed_fence = fence;
> > spin_unlock(&fctx->spinlock);
> > }
> >
>
>
> --
> With best wishes
> Dmitry