2024-01-17 03:13:33

by Erico Nunes

[permalink] [raw]
Subject: [PATCH v1 5/6] drm/lima: remove guilty drm_sched context handling

Marking the context as guilty currently only makes the application which
hits a single timeout problem to stop its rendering context entirely.
All jobs submitted later are dropped from the guilty context.

Lima runs on fairly underpowered hardware for modern standards and it is
not entirely unreasonable that a rendering job may time out occasionally
due to high system load or too demanding application stack. In this case
it would be generally preferred to report the error but try to keep the
application going.

Other similar embedded GPU drivers don't make use of the guilty context
flag. Now that there are reliability improvements to the lima timeout
recovery handling, drop the guilty contexts to let the application keep
running in this case.

Signed-off-by: Erico Nunes <[email protected]>
---
drivers/gpu/drm/lima/lima_ctx.c | 2 +-
drivers/gpu/drm/lima/lima_ctx.h | 1 -
drivers/gpu/drm/lima/lima_sched.c | 5 ++---
drivers/gpu/drm/lima/lima_sched.h | 3 +--
4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_ctx.c b/drivers/gpu/drm/lima/lima_ctx.c
index 8389f2d7d021..0e668fc1e0f9 100644
--- a/drivers/gpu/drm/lima/lima_ctx.c
+++ b/drivers/gpu/drm/lima/lima_ctx.c
@@ -19,7 +19,7 @@ int lima_ctx_create(struct lima_device *dev, struct lima_ctx_mgr *mgr, u32 *id)
kref_init(&ctx->refcnt);

for (i = 0; i < lima_pipe_num; i++) {
- err = lima_sched_context_init(dev->pipe + i, ctx->context + i, &ctx->guilty);
+ err = lima_sched_context_init(dev->pipe + i, ctx->context + i);
if (err)
goto err_out0;
}
diff --git a/drivers/gpu/drm/lima/lima_ctx.h b/drivers/gpu/drm/lima/lima_ctx.h
index 74e2be09090f..5b1063ce968b 100644
--- a/drivers/gpu/drm/lima/lima_ctx.h
+++ b/drivers/gpu/drm/lima/lima_ctx.h
@@ -13,7 +13,6 @@ struct lima_ctx {
struct kref refcnt;
struct lima_device *dev;
struct lima_sched_context context[lima_pipe_num];
- atomic_t guilty;

/* debug info */
char pname[TASK_COMM_LEN];
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index 9449b81bcd5b..496c79713fe8 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -154,13 +154,12 @@ void lima_sched_task_fini(struct lima_sched_task *task)
}

int lima_sched_context_init(struct lima_sched_pipe *pipe,
- struct lima_sched_context *context,
- atomic_t *guilty)
+ struct lima_sched_context *context)
{
struct drm_gpu_scheduler *sched = &pipe->base;

return drm_sched_entity_init(&context->base, DRM_SCHED_PRIORITY_NORMAL,
- &sched, 1, guilty);
+ &sched, 1, NULL);
}

void lima_sched_context_fini(struct lima_sched_pipe *pipe,
diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
index 34050facb110..677e908b53f8 100644
--- a/drivers/gpu/drm/lima/lima_sched.h
+++ b/drivers/gpu/drm/lima/lima_sched.h
@@ -93,8 +93,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
void lima_sched_task_fini(struct lima_sched_task *task);

int lima_sched_context_init(struct lima_sched_pipe *pipe,
- struct lima_sched_context *context,
- atomic_t *guilty);
+ struct lima_sched_context *context);
void lima_sched_context_fini(struct lima_sched_pipe *pipe,
struct lima_sched_context *context);
struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);
--
2.43.0



2024-01-17 18:28:42

by Vasily Khoruzhick

[permalink] [raw]
Subject: Re: [PATCH v1 5/6] drm/lima: remove guilty drm_sched context handling

On Tue, Jan 16, 2024 at 7:12 PM Erico Nunes <[email protected]> wrote:
>
> Marking the context as guilty currently only makes the application which
> hits a single timeout problem to stop its rendering context entirely.
> All jobs submitted later are dropped from the guilty context.
>
> Lima runs on fairly underpowered hardware for modern standards and it is
> not entirely unreasonable that a rendering job may time out occasionally
> due to high system load or too demanding application stack. In this case
> it would be generally preferred to report the error but try to keep the
> application going.
>
> Other similar embedded GPU drivers don't make use of the guilty context
> flag. Now that there are reliability improvements to the lima timeout
> recovery handling, drop the guilty contexts to let the application keep
> running in this case.
>
> Signed-off-by: Erico Nunes <[email protected]>
Reviewed-by: Vasily Khoruzhick <[email protected]>

> ---
> drivers/gpu/drm/lima/lima_ctx.c | 2 +-
> drivers/gpu/drm/lima/lima_ctx.h | 1 -
> drivers/gpu/drm/lima/lima_sched.c | 5 ++---
> drivers/gpu/drm/lima/lima_sched.h | 3 +--
> 4 files changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/lima/lima_ctx.c b/drivers/gpu/drm/lima/lima_ctx.c
> index 8389f2d7d021..0e668fc1e0f9 100644
> --- a/drivers/gpu/drm/lima/lima_ctx.c
> +++ b/drivers/gpu/drm/lima/lima_ctx.c
> @@ -19,7 +19,7 @@ int lima_ctx_create(struct lima_device *dev, struct lima_ctx_mgr *mgr, u32 *id)
> kref_init(&ctx->refcnt);
>
> for (i = 0; i < lima_pipe_num; i++) {
> - err = lima_sched_context_init(dev->pipe + i, ctx->context + i, &ctx->guilty);
> + err = lima_sched_context_init(dev->pipe + i, ctx->context + i);
> if (err)
> goto err_out0;
> }
> diff --git a/drivers/gpu/drm/lima/lima_ctx.h b/drivers/gpu/drm/lima/lima_ctx.h
> index 74e2be09090f..5b1063ce968b 100644
> --- a/drivers/gpu/drm/lima/lima_ctx.h
> +++ b/drivers/gpu/drm/lima/lima_ctx.h
> @@ -13,7 +13,6 @@ struct lima_ctx {
> struct kref refcnt;
> struct lima_device *dev;
> struct lima_sched_context context[lima_pipe_num];
> - atomic_t guilty;
>
> /* debug info */
> char pname[TASK_COMM_LEN];
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index 9449b81bcd5b..496c79713fe8 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -154,13 +154,12 @@ void lima_sched_task_fini(struct lima_sched_task *task)
> }
>
> int lima_sched_context_init(struct lima_sched_pipe *pipe,
> - struct lima_sched_context *context,
> - atomic_t *guilty)
> + struct lima_sched_context *context)
> {
> struct drm_gpu_scheduler *sched = &pipe->base;
>
> return drm_sched_entity_init(&context->base, DRM_SCHED_PRIORITY_NORMAL,
> - &sched, 1, guilty);
> + &sched, 1, NULL);
> }
>
> void lima_sched_context_fini(struct lima_sched_pipe *pipe,
> diff --git a/drivers/gpu/drm/lima/lima_sched.h b/drivers/gpu/drm/lima/lima_sched.h
> index 34050facb110..677e908b53f8 100644
> --- a/drivers/gpu/drm/lima/lima_sched.h
> +++ b/drivers/gpu/drm/lima/lima_sched.h
> @@ -93,8 +93,7 @@ int lima_sched_task_init(struct lima_sched_task *task,
> void lima_sched_task_fini(struct lima_sched_task *task);
>
> int lima_sched_context_init(struct lima_sched_pipe *pipe,
> - struct lima_sched_context *context,
> - atomic_t *guilty);
> + struct lima_sched_context *context);
> void lima_sched_context_fini(struct lima_sched_pipe *pipe,
> struct lima_sched_context *context);
> struct dma_fence *lima_sched_context_queue_task(struct lima_sched_task *task);
> --
> 2.43.0
>