LinuxLists.cc - [PATCH v1] gpu: host1x: Cancel only job that actually got stuck

2018-08-05 17:03:45

Subject: [PATCH v1] gpu: host1x: Cancel only job that actually got stuck

Host1x doesn't have information about jobs inter-dependency, that is
something that will become available once host1x will get a proper
jobs scheduler implementation. Currently a hang job causes other unrelated
jobs to be canceled, that is a relic from downstream driver which is
irrelevant to upstream. Let's cancel only the hanging job and not to touch
other jobs in queue.

Signed-off-by: Dmitry Osipenko <[email protected]>
---
drivers/gpu/host1x/cdma.c | 30 ++++++------------------------
1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 91df51e631b2..4d94af4a315f 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -348,13 +348,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
}

/*
- * Walk the sync_queue, first incrementing with the CPU syncpts that
- * are partially executed (the first buffer) or fully skipped while
- * still in the current context (slots are also NOP-ed).
+ * Increment with CPU the remaining syncpts of a partially executed job.
*
- * At the point contexts are interleaved, syncpt increments must be
- * done inline with the pushbuffer from a GATHER buffer to maintain
- * the order (slots are modified to be a GATHER of syncpt incrs).
+ * Syncpt increments must be done inline with the pushbuffer from a
+ * GATHER buffer to maintain the order (slots are modified to be a
+ * GATHER of syncpt incrs).
*
* Note: save in restart_addr the location where the timed out buffer
* started in the PB, so we can start the refetch from there (with the
@@ -370,12 +368,8 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
else
restart_addr = cdma->last_pos;

- /* do CPU increments as long as this context continues */
- list_for_each_entry_from(job, &cdma->sync_queue, list) {
- /* different context, gets us out of this loop */
- if (job->client != cdma->timeout.client)
- break;
-
+ /* do CPU increments for the remaining syncpts */
+ if (job) {
/* won't need a timeout when replayed */
job->timeout = 0;

@@ -388,20 +382,8 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
host1x_hw_cdma_timeout_cpu_incr(host1x, cdma, job->first_get,
syncpt_incrs, job->syncpt_end,
job->num_slots);
-
- syncpt_val += syncpt_incrs;
}

- /*
- * The following sumbits from the same client may be dependent on the
- * failed submit and therefore they may fail. Force a small timeout
- * to make the queue cleanup faster.
- */
-
- list_for_each_entry_from(job, &cdma->sync_queue, list)
- if (job->client == cdma->timeout.client)
- job->timeout = min_t(unsigned int, job->timeout, 500);
-
dev_dbg(dev, "%s: finished sync_queue modification\n", __func__);

/* roll back DMAGET and start up channel again */
--
2.18.0

2018-08-06 10:06:20

by Mikko Perttunen

[permalink] [raw]

Subject: Re: [PATCH v1] gpu: host1x: Cancel only job that actually got stuck

Reviewed-by: Mikko Perttunen <[email protected]>

On 05.08.2018 20:01, Dmitry Osipenko wrote:
> Host1x doesn't have information about jobs inter-dependency, that is
> something that will become available once host1x will get a proper
> jobs scheduler implementation. Currently a hang job causes other unrelated
> jobs to be canceled, that is a relic from downstream driver which is
> irrelevant to upstream. Let's cancel only the hanging job and not to touch
> other jobs in queue.
>
> Signed-off-by: Dmitry Osipenko <[email protected]>
> ---
> drivers/gpu/host1x/cdma.c | 30 ++++++------------------------
> 1 file changed, 6 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index 91df51e631b2..4d94af4a315f 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -348,13 +348,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
> }
>
> /*
> - * Walk the sync_queue, first incrementing with the CPU syncpts that
> - * are partially executed (the first buffer) or fully skipped while
> - * still in the current context (slots are also NOP-ed).
> + * Increment with CPU the remaining syncpts of a partially executed job.
> *
> - * At the point contexts are interleaved, syncpt increments must be
> - * done inline with the pushbuffer from a GATHER buffer to maintain
> - * the order (slots are modified to be a GATHER of syncpt incrs).
> + * Syncpt increments must be done inline with the pushbuffer from a
> + * GATHER buffer to maintain the order (slots are modified to be a
> + * GATHER of syncpt incrs).
> *
> * Note: save in restart_addr the location where the timed out buffer
> * started in the PB, so we can start the refetch from there (with the
> @@ -370,12 +368,8 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
> else
> restart_addr = cdma->last_pos;
>
> - /* do CPU increments as long as this context continues */
> - list_for_each_entry_from(job, &cdma->sync_queue, list) {
> - /* different context, gets us out of this loop */
> - if (job->client != cdma->timeout.client)
> - break;
> -
> + /* do CPU increments for the remaining syncpts */
> + if (job) {
> /* won't need a timeout when replayed */
> job->timeout = 0;
>
> @@ -388,20 +382,8 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
> host1x_hw_cdma_timeout_cpu_incr(host1x, cdma, job->first_get,
> syncpt_incrs, job->syncpt_end,
> job->num_slots);
> -
> - syncpt_val += syncpt_incrs;
> }
>
> - /*
> - * The following sumbits from the same client may be dependent on the
> - * failed submit and therefore they may fail. Force a small timeout
> - * to make the queue cleanup faster.
> - */
> -
> - list_for_each_entry_from(job, &cdma->sync_queue, list)
> - if (job->client == cdma->timeout.client)
> - job->timeout = min_t(unsigned int, job->timeout, 500);
> -
> dev_dbg(dev, "%s: finished sync_queue modification\n", __func__);
>
> /* roll back DMAGET and start up channel again */
>