2024-04-05 15:32:45

by Erico Nunes

[permalink] [raw]
Subject: [PATCH v2 0/3] drm/lima: fix devfreq refcount imbalance for job timeouts

v1 reference:
https://patchwork.freedesktop.org/series/131902/

Changes v1 -> v2:
- Split synchronize_irq of pp bcast irq change into (new) patch 2.

Erico Nunes (3):
drm/lima: add mask irq callback to gp and pp
drm/lima: include pp bcast irq in timeout handler check
drm/lima: mask irqs in timeout path before hard reset

drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++
drivers/gpu/drm/lima/lima_bcast.h | 3 +++
drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++
drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++
drivers/gpu/drm/lima/lima_sched.c | 9 +++++++++
drivers/gpu/drm/lima/lima_sched.h | 1 +
6 files changed, 51 insertions(+)

--
2.44.0



2024-04-05 15:32:59

by Erico Nunes

[permalink] [raw]
Subject: [PATCH v2 2/3] drm/lima: include pp bcast irq in timeout handler check

In commit 53cb55b20208 ("drm/lima: handle spurious timeouts due to high
irq latency") a check was added to detect an unexpectedly high interrupt
latency timeout.
With further investigation it was noted that on Mali-450 the pp bcast
irq may also be a trigger of race conditions against the timeout
handler, so add it to this check too.

Signed-off-by: Erico Nunes <[email protected]>
---
drivers/gpu/drm/lima/lima_sched.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 00b19adfc888..66841503a618 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -422,6 +422,8 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
*/
for (i = 0; i < pipe->num_processor; i++)
synchronize_irq(pipe->processor[i]->irq);
+ if (pipe->bcast_processor)
+ synchronize_irq(pipe->bcast_processor->irq);

if (dma_fence_is_signaled(task->fence)) {
DRM_WARN("%s unexpectedly high interrupt latency\n", lima_ip_name(ip));
--
2.44.0


2024-04-05 15:32:58

by Erico Nunes

[permalink] [raw]
Subject: [PATCH v2 3/3] drm/lima: mask irqs in timeout path before hard reset

There is a race condition in which a rendering job might take just long
enough to trigger the drm sched job timeout handler but also still
complete before the hard reset is done by the timeout handler.
This runs into race conditions not expected by the timeout handler.
In some very specific cases it currently may result in a refcount
imbalance on lima_pm_idle, with a stack dump such as:

[10136.669170] WARNING: CPU: 0 PID: 0 at drivers/gpu/drm/lima/lima_devfreq.c:205 lima_devfreq_record_idle+0xa0/0xb0
..
[10136.669459] pc : lima_devfreq_record_idle+0xa0/0xb0
..
[10136.669628] Call trace:
[10136.669634] lima_devfreq_record_idle+0xa0/0xb0
[10136.669646] lima_sched_pipe_task_done+0x5c/0xb0
[10136.669656] lima_gp_irq_handler+0xa8/0x120
[10136.669666] __handle_irq_event_percpu+0x48/0x160
[10136.669679] handle_irq_event+0x4c/0xc0

We can prevent that race condition entirely by masking the irqs at the
beginning of the timeout handler, at which point we give up on waiting
for that job entirely.
The irqs will be enabled again at the next hard reset which is already
done as a recovery by the timeout handler.

Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Qiang Yu <[email protected]>
---
drivers/gpu/drm/lima/lima_sched.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 66841503a618..bbf3f8feab94 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -430,6 +430,13 @@ static enum drm_gpu_sched_stat lima_sched_timedout_job(struct drm_sched_job *job
return DRM_GPU_SCHED_STAT_NOMINAL;
}

+ /*
+ * The task might still finish while this timeout handler runs.
+ * To prevent a race condition on its completion, mask all irqs
+ * on the running core until the next hard reset completes.
+ */
+ pipe->task_mask_irq(pipe);
+
if (!pipe->error)
DRM_ERROR("%s job timeout\n", lima_ip_name(ip));

--
2.44.0


2024-04-05 15:33:29

by Erico Nunes

[permalink] [raw]
Subject: [PATCH v2 1/3] drm/lima: add mask irq callback to gp and pp

This is needed because we want to reset those devices in device-agnostic
code such as lima_sched.
In particular, masking irqs will be useful before a hard reset to
prevent race conditions.

Signed-off-by: Erico Nunes <[email protected]>
---
drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++
drivers/gpu/drm/lima/lima_bcast.h | 3 +++
drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++
drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++
drivers/gpu/drm/lima/lima_sched.h | 1 +
5 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/lima/lima_bcast.c b/drivers/gpu/drm/lima/lima_bcast.c
index fbc43f243c54..6d000504e1a4 100644
--- a/drivers/gpu/drm/lima/lima_bcast.c
+++ b/drivers/gpu/drm/lima/lima_bcast.c
@@ -43,6 +43,18 @@ void lima_bcast_suspend(struct lima_ip *ip)

}

+int lima_bcast_mask_irq(struct lima_ip *ip)
+{
+ bcast_write(LIMA_BCAST_BROADCAST_MASK, 0);
+ bcast_write(LIMA_BCAST_INTERRUPT_MASK, 0);
+ return 0;
+}
+
+int lima_bcast_reset(struct lima_ip *ip)
+{
+ return lima_bcast_hw_init(ip);
+}
+
int lima_bcast_init(struct lima_ip *ip)
{
int i;
diff --git a/drivers/gpu/drm/lima/lima_bcast.h b/drivers/gpu/drm/lima/lima_bcast.h
index 465ee587bceb..cd08841e4787 100644
--- a/drivers/gpu/drm/lima/lima_bcast.h
+++ b/drivers/gpu/drm/lima/lima_bcast.h
@@ -13,4 +13,7 @@ void lima_bcast_fini(struct lima_ip *ip);

void lima_bcast_enable(struct lima_device *dev, int num_pp);

+int lima_bcast_mask_irq(struct lima_ip *ip);
+int lima_bcast_reset(struct lima_ip *ip);
+
#endif
diff --git a/drivers/gpu/drm/lima/lima_gp.c b/drivers/gpu/drm/lima/lima_gp.c
index 6b354e2fb61d..e15295071533 100644
--- a/drivers/gpu/drm/lima/lima_gp.c
+++ b/drivers/gpu/drm/lima/lima_gp.c
@@ -233,6 +233,13 @@ static void lima_gp_task_mmu_error(struct lima_sched_pipe *pipe)
lima_sched_pipe_task_done(pipe);
}

+static void lima_gp_task_mask_irq(struct lima_sched_pipe *pipe)
+{
+ struct lima_ip *ip = pipe->processor[0];
+
+ gp_write(LIMA_GP_INT_MASK, 0);
+}
+
static int lima_gp_task_recover(struct lima_sched_pipe *pipe)
{
struct lima_ip *ip = pipe->processor[0];
@@ -365,6 +372,7 @@ int lima_gp_pipe_init(struct lima_device *dev)
pipe->task_error = lima_gp_task_error;
pipe->task_mmu_error = lima_gp_task_mmu_error;
pipe->task_recover = lima_gp_task_recover;
+ pipe->task_mask_irq = lima_gp_task_mask_irq;

return 0;
}
diff --git a/drivers/gpu/drm/lima/lima_pp.c b/drivers/gpu/drm/lima/lima_pp.c
index d0d2db0ef1ce..a4a2ffe6527c 100644
--- a/drivers/gpu/drm/lima/lima_pp.c
+++ b/drivers/gpu/drm/lima/lima_pp.c
@@ -429,6 +429,9 @@ static void lima_pp_task_error(struct lima_sched_pipe *pipe)

lima_pp_hard_reset(ip);
}
+
+ if (pipe->bcast_processor)
+ lima_bcast_reset(pipe->bcast_processor);
}

static void lima_pp_task_mmu_error(struct lima_sched_pipe *pipe)
@@ -437,6 +440,20 @@ static void lima_pp_task_mmu_error(struct lima_sched_pipe *pipe)
lima_sched_pipe_task_done(pipe);
}

+static void lima_pp_task_mask_irq(struct lima_sched_pipe *pipe)
+{
+ int i;
+
+ for (i = 0; i < pipe->num_processor; i++) {
+ struct lima_ip *ip = pipe->processor[i];
+
+ pp_write(LIMA_PP_INT_MASK, 0);
+ }
+
+ if (pipe->bcast_processor)
+ lima_bcast_mask_irq(pipe->bcast_processor);
+}
+
static struct kmem_cache *lima_pp_task_slab;
static int lima_pp_task_slab_refcnt;

@@ -468,6 +485,7 @@ int lima_pp_pipe_init(struct lima_device *dev)
pipe->task_fini = lima_pp_task_fini;
pipe->task_error = lima_pp_task_error;
pipe->task_mmu_error = lima_pp_task_mmu_error;
+ pipe->task_mask_irq = lima_pp_task_mask_irq;

return 0;
}
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 6bd4f3b70109..85b23ba901d5 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -80,6 +80,7 @@ struct lima_sched_pipe {
void (*task_error)(struct lima_sched_pipe *pipe);
void (*task_mmu_error)(struct lima_sched_pipe *pipe);
int (*task_recover)(struct lima_sched_pipe *pipe);
+ void (*task_mask_irq)(struct lima_sched_pipe *pipe);

struct work_struct recover_work;
};
--
2.44.0


2024-04-06 07:32:58

by Qiang Yu

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/lima: fix devfreq refcount imbalance for job timeouts

Serial is Reviewed-by: Qiang Yu <[email protected]>

On Fri, Apr 5, 2024 at 11:31 PM Erico Nunes <[email protected]> wrote:
>
> v1 reference:
> https://patchwork.freedesktop.org/series/131902/
>
> Changes v1 -> v2:
> - Split synchronize_irq of pp bcast irq change into (new) patch 2.
>
> Erico Nunes (3):
> drm/lima: add mask irq callback to gp and pp
> drm/lima: include pp bcast irq in timeout handler check
> drm/lima: mask irqs in timeout path before hard reset
>
> drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++
> drivers/gpu/drm/lima/lima_bcast.h | 3 +++
> drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++
> drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++
> drivers/gpu/drm/lima/lima_sched.c | 9 +++++++++
> drivers/gpu/drm/lima/lima_sched.h | 1 +
> 6 files changed, 51 insertions(+)
>
> --
> 2.44.0
>

2024-04-15 01:18:57

by Qiang Yu

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] drm/lima: fix devfreq refcount imbalance for job timeouts

applied to drm-misc-next

On Sat, Apr 6, 2024 at 3:32 PM Qiang Yu <[email protected]> wrote:
>
> Serial is Reviewed-by: Qiang Yu <[email protected]>
>
> On Fri, Apr 5, 2024 at 11:31 PM Erico Nunes <[email protected]> wrote:
> >
> > v1 reference:
> > https://patchwork.freedesktop.org/series/131902/
> >
> > Changes v1 -> v2:
> > - Split synchronize_irq of pp bcast irq change into (new) patch 2.
> >
> > Erico Nunes (3):
> > drm/lima: add mask irq callback to gp and pp
> > drm/lima: include pp bcast irq in timeout handler check
> > drm/lima: mask irqs in timeout path before hard reset
> >
> > drivers/gpu/drm/lima/lima_bcast.c | 12 ++++++++++++
> > drivers/gpu/drm/lima/lima_bcast.h | 3 +++
> > drivers/gpu/drm/lima/lima_gp.c | 8 ++++++++
> > drivers/gpu/drm/lima/lima_pp.c | 18 ++++++++++++++++++
> > drivers/gpu/drm/lima/lima_sched.c | 9 +++++++++
> > drivers/gpu/drm/lima/lima_sched.h | 1 +
> > 6 files changed, 51 insertions(+)
> >
> > --
> > 2.44.0
> >