2023-05-31 08:11:28

by Johan Hovold

[permalink] [raw]
Subject: [PATCH] drm/msm/a6xx: fix uninitialised lock in init error path

A recent commit started taking the GMU lock in the GPU destroy path,
which on GPU initialisation failure is called before the GMU and its
lock have been initialised.

Make sure that the GMU has been initialised before taking the lock in
a6xx_destroy() and drop the now redundant check from a6xx_gmu_remove().

Fixes: 4cd15a3e8b36 ("drm/msm/a6xx: Make GPU destroy a bit safer")
Cc: [email protected] # 6.3
Cc: Douglas Anderson <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 ---
drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 9 ++++++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index e16b4b3f8535..105ccf17041f 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1472,9 +1472,6 @@ void a6xx_gmu_remove(struct a6xx_gpu *a6xx_gpu)
struct a6xx_gmu *gmu = &a6xx_gpu->gmu;
struct platform_device *pdev = to_platform_device(gmu->dev);

- if (!gmu->initialized)
- return;
-
pm_runtime_force_suspend(gmu->dev);

/*
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 9fb214f150dd..ee47b95a0205 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1684,6 +1684,7 @@ static void a6xx_destroy(struct msm_gpu *gpu)
{
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a6xx_gpu *a6xx_gpu = to_a6xx_gpu(adreno_gpu);
+ struct a6xx_gmu *gmu = &a6xx_gpu->gmu;

if (a6xx_gpu->sqe_bo) {
msm_gem_unpin_iova(a6xx_gpu->sqe_bo, gpu->aspace);
@@ -1697,9 +1698,11 @@ static void a6xx_destroy(struct msm_gpu *gpu)

a6xx_llc_slices_destroy(a6xx_gpu);

- mutex_lock(&a6xx_gpu->gmu.lock);
- a6xx_gmu_remove(a6xx_gpu);
- mutex_unlock(&a6xx_gpu->gmu.lock);
+ if (gmu->initialized) {
+ mutex_lock(&gmu->lock);
+ a6xx_gmu_remove(a6xx_gpu);
+ mutex_unlock(&gmu->lock);
+ }

adreno_gpu_cleanup(adreno_gpu);

--
2.39.3



2023-05-31 14:34:28

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH] drm/msm/a6xx: fix uninitialised lock in init error path

Hi,

On Wed, May 31, 2023 at 1:00 AM Johan Hovold <[email protected]> wrote:
>
> A recent commit started taking the GMU lock in the GPU destroy path,
> which on GPU initialisation failure is called before the GMU and its
> lock have been initialised.
>
> Make sure that the GMU has been initialised before taking the lock in
> a6xx_destroy() and drop the now redundant check from a6xx_gmu_remove().
>
> Fixes: 4cd15a3e8b36 ("drm/msm/a6xx: Make GPU destroy a bit safer")
> Cc: [email protected] # 6.3
> Cc: Douglas Anderson <[email protected]>
> Signed-off-by: Johan Hovold <[email protected]>
> ---
> drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 ---
> drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 9 ++++++---
> 2 files changed, 6 insertions(+), 6 deletions(-)

I think Dmitry already posted a patch 1.5 months ago to fix this.

https://lore.kernel.org/r/[email protected]

Can you confirm that works for you?

-Doug

2023-06-01 09:13:45

by Johan Hovold

[permalink] [raw]
Subject: Re: [PATCH] drm/msm/a6xx: fix uninitialised lock in init error path

On Wed, May 31, 2023 at 07:22:49AM -0700, Doug Anderson wrote:
> Hi,
>
> On Wed, May 31, 2023 at 1:00 AM Johan Hovold <[email protected]> wrote:
> >
> > A recent commit started taking the GMU lock in the GPU destroy path,
> > which on GPU initialisation failure is called before the GMU and its
> > lock have been initialised.
> >
> > Make sure that the GMU has been initialised before taking the lock in
> > a6xx_destroy() and drop the now redundant check from a6xx_gmu_remove().
> >
> > Fixes: 4cd15a3e8b36 ("drm/msm/a6xx: Make GPU destroy a bit safer")
> > Cc: [email protected] # 6.3
> > Cc: Douglas Anderson <[email protected]>
> > Signed-off-by: Johan Hovold <[email protected]>
> > ---
> > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 3 ---
> > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 9 ++++++---
> > 2 files changed, 6 insertions(+), 6 deletions(-)
>
> I think Dmitry already posted a patch 1.5 months ago to fix this.
>
> https://lore.kernel.org/r/[email protected]

Bah, I checked if Bjorn had hit this with his recent A690 v3 series and
posted a fix, but did not look further than that.

> Can you confirm that works for you?

That looks like it would work too, but I think I prefer my version which
keeps the initialisation of the GMU struct in a6xx_gmu_init().

Dmitry or Rob, could you see to that either version gets merged soon so
that we don't end up with even more people having to debug and fix the
same issue?

Johan