2022-09-05 16:36:38

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v2 0/4] dma-buf: To check enable signaling before signaled

TTM, GEM, DRM or the core DMA-buf framework are needs
to enable software signaling before the fence is signaled.
The core DMA-buf framework software can forget to call
enable_signaling before the fence is signaled. It means
framework code can forget to call dma_fence_enable_sw_signaling()
before calling dma_fence_is_signaled(). To avoid this scenario
on debug kernel, check the DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT bit
status before checking the MA_FENCE_FLAG_SIGNALED_BIT bit status
to confirm that software signaling is enabled.


Arvind Yadav (4):
[PATCH v2 1/4] drm/sched: Enable signaling for finished fence
[PATCH v2 2/4] dma-buf: enable signaling for the stub fence on debug
[PATCH v2 3/4] dma-buf: enable signaling for selftest fence on debug
[PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug

drivers/dma-buf/dma-fence.c | 7 ++++
drivers/dma-buf/st-dma-fence-chain.c | 8 +++++
drivers/dma-buf/st-dma-fence-unwrap.c | 44 ++++++++++++++++++++++++++
drivers/dma-buf/st-dma-fence.c | 25 ++++++++++++++-
drivers/dma-buf/st-dma-resv.c | 20 ++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 2 ++
include/linux/dma-fence.h | 5 +++
7 files changed, 110 insertions(+), 1 deletion(-)

--
2.25.1


2022-09-05 16:41:52

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v2 2/4] dma-buf: enable signaling for the stub fence on debug

Here's on debug enabling software signaling for the stub fence
which is always signaled. This fence should enable software
signaling otherwise the AMD GPU scheduler will cause a GPU reset
due to a GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav <[email protected]>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 3/4]

---
drivers/dma-buf/dma-fence.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..2378b12538c4 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
static DEFINE_SPINLOCK(dma_fence_stub_lock);
static struct dma_fence dma_fence_stub;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+static bool __dma_fence_enable_signaling(struct dma_fence *fence);
+#endif
+
/*
* fence context counter: each execution context should have its own
* fence context, this allows checking if fences belong to the same
@@ -136,6 +140,9 @@ struct dma_fence *dma_fence_get_stub(void)
&dma_fence_stub_ops,
&dma_fence_stub_lock,
0, 0);
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ __dma_fence_enable_signaling(&dma_fence_stub);
+#endif
dma_fence_signal_locked(&dma_fence_stub);
}
spin_unlock(&dma_fence_stub_lock);
--
2.25.1

2022-09-05 16:45:44

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence

Here's enabling software signaling for finished fence.

Signed-off-by: Arvind Yadav <[email protected]>
---

Changes in v1 :
1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from
this patch.
2- The version of this patch is also changed and previously
it was [PATCH 2/4]

---
drivers/gpu/drm/scheduler/sched_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e0ab14e0fb6b..fe72de0e2911 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -962,6 +962,8 @@ static int drm_sched_main(void *param)
/* Drop for original kref_init of the fence */
dma_fence_put(fence);

+ dma_fence_enable_sw_signaling(&s_fence->finished);
+
r = dma_fence_add_callback(fence, &sched_job->cb,
drm_sched_job_done_cb);
if (r == -ENOENT)
--
2.25.1

2022-09-05 16:49:43

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v2 3/4] dma-buf: enable signaling for selftest fence on debug

Here's on debug enabling software signaling for selftest.

Signed-off-by: Arvind Yadav <[email protected]>
---

Changes in v1 :
1- Addressing Christian's comment to remove unnecessary callback.
2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
3- The version of this patch is also changed and previously
it was [PATCH 4/4]

---
drivers/dma-buf/st-dma-fence-chain.c | 8 +++++
drivers/dma-buf/st-dma-fence-unwrap.c | 44 +++++++++++++++++++++++++++
drivers/dma-buf/st-dma-fence.c | 25 ++++++++++++++-
drivers/dma-buf/st-dma-resv.c | 20 ++++++++++++
4 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..d3070f8a393c 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -87,6 +87,10 @@ static int sanitycheck(void *arg)
if (!chain)
err = -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(chain);
+#endif
+
dma_fence_signal(f);
dma_fence_put(f);

@@ -143,6 +147,10 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
}

fc->tail = fc->chains[i];
+
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(fc->chains[i]);
+#endif
}

fc->chain_length = i;
diff --git a/drivers/dma-buf/st-dma-fence-unwrap.c b/drivers/dma-buf/st-dma-fence-unwrap.c
index 4105d5ea8dde..b76cdd9ee0c7 100644
--- a/drivers/dma-buf/st-dma-fence-unwrap.c
+++ b/drivers/dma-buf/st-dma-fence-unwrap.c
@@ -102,6 +102,10 @@ static int sanitycheck(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
array = mock_array(1, f);
if (!array)
return -ENOMEM;
@@ -124,12 +128,20 @@ static int unwrap_array(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
array = mock_array(2, f1, f2);
if (!array)
return -ENOMEM;
@@ -164,12 +176,20 @@ static int unwrap_chain(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
chain = mock_chain(f1, f2);
if (!chain)
return -ENOMEM;
@@ -204,12 +224,20 @@ static int unwrap_chain_array(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
array = mock_array(2, f1, f2);
if (!array)
return -ENOMEM;
@@ -248,12 +276,20 @@ static int unwrap_merge(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
err = -ENOMEM;
goto error_put_f1;
}

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
f3 = dma_fence_unwrap_merge(f1, f2);
if (!f3) {
err = -ENOMEM;
@@ -296,10 +332,18 @@ static int unwrap_merge_complex(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2)
goto error_put_f1;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
f3 = dma_fence_unwrap_merge(f1, f2);
if (!f3)
goto error_put_f2;
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index c8a12d7ad71a..b7880d8374db 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -101,7 +101,9 @@ static int sanitycheck(void *arg)
f = mock_fence();
if (!f)
return -ENOMEM;
-
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_signal(f);
dma_fence_put(f);

@@ -117,6 +119,9 @@ static int test_signaling(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_is_signaled(f)) {
pr_err("Fence unexpectedly signaled on creation\n");
goto err_free;
@@ -190,6 +195,9 @@ static int test_late_add_callback(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_signal(f);

if (!dma_fence_add_callback(f, &cb.cb, simple_callback)) {
@@ -282,6 +290,9 @@ static int test_status(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_get_status(f)) {
pr_err("Fence unexpectedly has signaled status on creation\n");
goto err_free;
@@ -308,6 +319,9 @@ static int test_error(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_set_error(f, -EIO);

if (dma_fence_get_status(f)) {
@@ -337,6 +351,9 @@ static int test_wait(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_wait_timeout(f, false, 0) != -ETIME) {
pr_err("Wait reported complete before being signaled\n");
goto err_free;
@@ -379,6 +396,9 @@ static int test_wait_timeout(void *arg)
if (!wt.f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(wt.f);
+#endif
if (dma_fence_wait_timeout(wt.f, false, 1) != -ETIME) {
pr_err("Wait reported complete before being signaled\n");
goto err_free;
@@ -458,6 +478,9 @@ static int thread_signal_callback(void *arg)
break;
}

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f1);
+#endif
rcu_assign_pointer(t->fences[t->id], f1);
smp_wmb();

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
index 813779e3c9be..bd7ef58f8b24 100644
--- a/drivers/dma-buf/st-dma-resv.c
+++ b/drivers/dma-buf/st-dma-resv.c
@@ -45,6 +45,10 @@ static int sanitycheck(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_fence_signal(f);
dma_fence_put(f);

@@ -69,6 +73,10 @@ static int test_signaling(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -114,6 +122,10 @@ static int test_for_each(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -173,6 +185,10 @@ static int test_for_each_unlocked(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -244,6 +260,10 @@ static int test_get_fences(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
--
2.25.1

2022-09-05 17:27:48

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug

The core DMA-buf framework needs to enable signaling
before the fence is signaled. The core DMA-buf framework
can forget to enable signaling before the fence is signaled.
To avoid this scenario on the debug kernel, check the
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
the signaling bit status to confirm that enable_signaling
is enabled.

Signed-off-by: Arvind Yadav <[email protected]>
---

Changes in v1 :
1- Addressing Christian's comment to replace
CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
2- As per Christian's comment moving this patch at last so
The version of this patch is also changed and previously
it was [PATCH 1/4]


---
include/linux/dma-fence.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..ba1ddc14c5d4 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
static inline bool
dma_fence_is_signaled(struct dma_fence *fence)
{
+#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
+ if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+ return false;
+#endif
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;

--
2.25.1

2022-09-06 07:11:41

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence

Am 05.09.22 um 18:34 schrieb Arvind Yadav:
> Here's enabling software signaling for finished fence.
>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
>
> Changes in v1 :
> 1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from
> this patch.
> 2- The version of this patch is also changed and previously
> it was [PATCH 2/4]
>
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e0ab14e0fb6b..fe72de0e2911 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -962,6 +962,8 @@ static int drm_sched_main(void *param)
> /* Drop for original kref_init of the fence */
> dma_fence_put(fence);
>
> + dma_fence_enable_sw_signaling(&s_fence->finished);

Ok, this makes it a lot clearer. Previously I though that we have some
bug in dma_fence_add_callback().

This is essentially the wrong place to call this, the finished fence
should be enabled by the caller and not here.

There is also another problem in dma_fence_enable_sw_signaling(), it
returns early when the fence is already signaled:

        if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
                return;

Please remove that one first.

Thanks,
Christian.


> +
> r = dma_fence_add_callback(fence, &sched_job->cb,
> drm_sched_job_done_cb);
> if (r == -ENOENT)

2022-09-06 07:38:20

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] dma-buf: enable signaling for the stub fence on debug



Am 05.09.22 um 18:35 schrieb Arvind Yadav:
> Here's on debug enabling software signaling for the stub fence
> which is always signaled. This fence should enable software
> signaling otherwise the AMD GPU scheduler will cause a GPU reset
> due to a GPU scheduler cleanup activity timeout.
>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
>
> Changes in v1 :
> 1- Addressing Christian's comment to remove unnecessary callback.
> 2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
> 3- The version of this patch is also changed and previously
> it was [PATCH 3/4]
>
> ---
> drivers/dma-buf/dma-fence.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 066400ed8841..2378b12538c4 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
> static DEFINE_SPINLOCK(dma_fence_stub_lock);
> static struct dma_fence dma_fence_stub;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> +static bool __dma_fence_enable_signaling(struct dma_fence *fence);
> +#endif
> +

I would rename the function to something like
dma_fence_enable_signaling_locked().

And please don't add any #ifdef if it isn't absolutely necessary. This
makes the code pretty fragile.

> /*
> * fence context counter: each execution context should have its own
> * fence context, this allows checking if fences belong to the same
> @@ -136,6 +140,9 @@ struct dma_fence *dma_fence_get_stub(void)
> &dma_fence_stub_ops,
> &dma_fence_stub_lock,
> 0, 0);
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + __dma_fence_enable_signaling(&dma_fence_stub);
> +#endif

Alternatively in this particular case you could just set the bit
manually here since this is part of the dma_fence code anyway.

Christian.

> dma_fence_signal_locked(&dma_fence_stub);
> }
> spin_unlock(&dma_fence_stub_lock);

2022-09-06 07:44:04

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] dma-buf: enable signaling for selftest fence on debug

Am 05.09.22 um 18:35 schrieb Arvind Yadav:
> Here's on debug enabling software signaling for selftest.

Please drop all the #ifdefs, apart from that looks pretty good to me.

Christian.

>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
>
> Changes in v1 :
> 1- Addressing Christian's comment to remove unnecessary callback.
> 2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
> 3- The version of this patch is also changed and previously
> it was [PATCH 4/4]
>
> ---
> drivers/dma-buf/st-dma-fence-chain.c | 8 +++++
> drivers/dma-buf/st-dma-fence-unwrap.c | 44 +++++++++++++++++++++++++++
> drivers/dma-buf/st-dma-fence.c | 25 ++++++++++++++-
> drivers/dma-buf/st-dma-resv.c | 20 ++++++++++++
> 4 files changed, 96 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 8ce1ea59d31b..d3070f8a393c 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -87,6 +87,10 @@ static int sanitycheck(void *arg)
> if (!chain)
> err = -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(chain);
> +#endif
> +
> dma_fence_signal(f);
> dma_fence_put(f);
>
> @@ -143,6 +147,10 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
> }
>
> fc->tail = fc->chains[i];
> +
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(fc->chains[i]);
> +#endif
> }
>
> fc->chain_length = i;
> diff --git a/drivers/dma-buf/st-dma-fence-unwrap.c b/drivers/dma-buf/st-dma-fence-unwrap.c
> index 4105d5ea8dde..b76cdd9ee0c7 100644
> --- a/drivers/dma-buf/st-dma-fence-unwrap.c
> +++ b/drivers/dma-buf/st-dma-fence-unwrap.c
> @@ -102,6 +102,10 @@ static int sanitycheck(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> array = mock_array(1, f);
> if (!array)
> return -ENOMEM;
> @@ -124,12 +128,20 @@ static int unwrap_array(void *arg)
> if (!f1)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> +
> f2 = mock_fence();
> if (!f2) {
> dma_fence_put(f1);
> return -ENOMEM;
> }
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f2);
> +#endif
> +
> array = mock_array(2, f1, f2);
> if (!array)
> return -ENOMEM;
> @@ -164,12 +176,20 @@ static int unwrap_chain(void *arg)
> if (!f1)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> +
> f2 = mock_fence();
> if (!f2) {
> dma_fence_put(f1);
> return -ENOMEM;
> }
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f2);
> +#endif
> +
> chain = mock_chain(f1, f2);
> if (!chain)
> return -ENOMEM;
> @@ -204,12 +224,20 @@ static int unwrap_chain_array(void *arg)
> if (!f1)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> +
> f2 = mock_fence();
> if (!f2) {
> dma_fence_put(f1);
> return -ENOMEM;
> }
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f2);
> +#endif
> +
> array = mock_array(2, f1, f2);
> if (!array)
> return -ENOMEM;
> @@ -248,12 +276,20 @@ static int unwrap_merge(void *arg)
> if (!f1)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> +
> f2 = mock_fence();
> if (!f2) {
> err = -ENOMEM;
> goto error_put_f1;
> }
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f2);
> +#endif
> +
> f3 = dma_fence_unwrap_merge(f1, f2);
> if (!f3) {
> err = -ENOMEM;
> @@ -296,10 +332,18 @@ static int unwrap_merge_complex(void *arg)
> if (!f1)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> +
> f2 = mock_fence();
> if (!f2)
> goto error_put_f1;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f2);
> +#endif
> +
> f3 = dma_fence_unwrap_merge(f1, f2);
> if (!f3)
> goto error_put_f2;
> diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
> index c8a12d7ad71a..b7880d8374db 100644
> --- a/drivers/dma-buf/st-dma-fence.c
> +++ b/drivers/dma-buf/st-dma-fence.c
> @@ -101,7 +101,9 @@ static int sanitycheck(void *arg)
> f = mock_fence();
> if (!f)
> return -ENOMEM;
> -
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> dma_fence_signal(f);
> dma_fence_put(f);
>
> @@ -117,6 +119,9 @@ static int test_signaling(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> if (dma_fence_is_signaled(f)) {
> pr_err("Fence unexpectedly signaled on creation\n");
> goto err_free;
> @@ -190,6 +195,9 @@ static int test_late_add_callback(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> dma_fence_signal(f);
>
> if (!dma_fence_add_callback(f, &cb.cb, simple_callback)) {
> @@ -282,6 +290,9 @@ static int test_status(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> if (dma_fence_get_status(f)) {
> pr_err("Fence unexpectedly has signaled status on creation\n");
> goto err_free;
> @@ -308,6 +319,9 @@ static int test_error(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> dma_fence_set_error(f, -EIO);
>
> if (dma_fence_get_status(f)) {
> @@ -337,6 +351,9 @@ static int test_wait(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> if (dma_fence_wait_timeout(f, false, 0) != -ETIME) {
> pr_err("Wait reported complete before being signaled\n");
> goto err_free;
> @@ -379,6 +396,9 @@ static int test_wait_timeout(void *arg)
> if (!wt.f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(wt.f);
> +#endif
> if (dma_fence_wait_timeout(wt.f, false, 1) != -ETIME) {
> pr_err("Wait reported complete before being signaled\n");
> goto err_free;
> @@ -458,6 +478,9 @@ static int thread_signal_callback(void *arg)
> break;
> }
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f1);
> +#endif
> rcu_assign_pointer(t->fences[t->id], f1);
> smp_wmb();
>
> diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
> index 813779e3c9be..bd7ef58f8b24 100644
> --- a/drivers/dma-buf/st-dma-resv.c
> +++ b/drivers/dma-buf/st-dma-resv.c
> @@ -45,6 +45,10 @@ static int sanitycheck(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> dma_fence_signal(f);
> dma_fence_put(f);
>
> @@ -69,6 +73,10 @@ static int test_signaling(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> dma_resv_init(&resv);
> r = dma_resv_lock(&resv, NULL);
> if (r) {
> @@ -114,6 +122,10 @@ static int test_for_each(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> dma_resv_init(&resv);
> r = dma_resv_lock(&resv, NULL);
> if (r) {
> @@ -173,6 +185,10 @@ static int test_for_each_unlocked(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> dma_resv_init(&resv);
> r = dma_resv_lock(&resv, NULL);
> if (r) {
> @@ -244,6 +260,10 @@ static int test_get_fences(void *arg)
> if (!f)
> return -ENOMEM;
>
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + dma_fence_enable_sw_signaling(f);
> +#endif
> +
> dma_resv_init(&resv);
> r = dma_resv_lock(&resv, NULL);
> if (r) {

2022-09-06 09:44:12

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug

Am 05.09.22 um 18:35 schrieb Arvind Yadav:
> The core DMA-buf framework needs to enable signaling
> before the fence is signaled. The core DMA-buf framework
> can forget to enable signaling before the fence is signaled.

This sentence is a bit confusing. I'm not a native speaker of English
either, but I suggest something like:

"Fence signaling must be enable to make sure that the
dma_fence_is_signaled() function ever returns true."

> To avoid this scenario on the debug kernel, check the
> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
> the signaling bit status to confirm that enable_signaling
> is enabled.

This describes the implementation, but we should rather describe the
background of the change. The implementation should be obvious.
Something like this maybe:

"
Since drivers and implementations sometimes mess this up enforce correct
behavior when DEBUG_WW_MUTEX_SLOWPATH is used during debugging.

This should make any implementations bugs resulting in not signaled
fences much more obvious.
"

>
> Signed-off-by: Arvind Yadav <[email protected]>

With the improved commit message this patch is Reviewed-by: Christian
König <[email protected]>

Regards,
Christian.

> ---
>
> Changes in v1 :
> 1- Addressing Christian's comment to replace
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
> 2- As per Christian's comment moving this patch at last so
> The version of this patch is also changed and previously
> it was [PATCH 1/4]
>
>
> ---
> include/linux/dma-fence.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 775cdc0b4f24..ba1ddc14c5d4 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
> static inline bool
> dma_fence_is_signaled(struct dma_fence *fence)
> {
> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> + if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> + return false;
> +#endif
> +
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return true;
>

2022-09-06 10:34:23

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug


On 06/09/2022 09:39, Christian König wrote:
> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>> The core DMA-buf framework needs to enable signaling
>> before the fence is signaled. The core DMA-buf framework
>> can forget to enable signaling before the fence is signaled.
>
> This sentence is a bit confusing. I'm not a native speaker of English
> either, but I suggest something like:
>
> "Fence signaling must be enable to make sure that the
> dma_fence_is_signaled() function ever returns true."
>
>> To avoid this scenario on the debug kernel, check the
>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>> the signaling bit status to confirm that enable_signaling
>> is enabled.
>
> This describes the implementation, but we should rather describe the
> background of the change. The implementation should be obvious.
> Something like this maybe:
>
> "
> Since drivers and implementations sometimes mess this up enforce correct
> behavior when DEBUG_WW_MUTEX_SLOWPATH is used during debugging.
>
> This should make any implementations bugs resulting in not signaled
> fences much more obvious.
> "

I think I follow the idea but am not sure coupling (well "coupling".. not really, but cross-contaminating in a way) dma-fence.c with a foreign and effectively unrelated concept of a ww mutex is the best way.

Instead, how about a dma-buf specific debug kconfig option?

Condition would then be, according to my understanding of the rules and expectations, along the lines of:

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..147a9df2c9d0 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,17 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
static inline bool
dma_fence_is_signaled(struct dma_fence *fence)
{
+#ifdef CONFIG_DEBUG_DMAFENCE
+ /*
+ * Implementations not providing the enable_signaling callback are
+ * required to always have signaling enabled or fences are not
+ * guaranteed to ever signal.
+ */
+ if (!fence->ops->enable_signaling &&
+ !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+ return false;
+#endif
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;

Thoughts?

Regards,

Tvrtko

>
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>
> With the improved commit message this patch is Reviewed-by: Christian
> König <[email protected]>
>
> Regards,
> Christian.
>
>> ---
>>
>> Changes in v1 :
>> 1- Addressing Christian's comment to replace
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>> 2- As per Christian's comment moving this patch at last so
>> The version of this patch is also changed and previously
>> it was [PATCH 1/4]
>>
>>
>> ---
>>   include/linux/dma-fence.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 775cdc0b4f24..ba1ddc14c5d4 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>> *fence)
>>   static inline bool
>>   dma_fence_is_signaled(struct dma_fence *fence)
>>   {
>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>> +        return false;
>> +#endif
>> +
>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>           return true;
>

2022-09-06 10:49:57

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug

Am 06.09.22 um 12:20 schrieb Tvrtko Ursulin:
>
> On 06/09/2022 09:39, Christian König wrote:
>> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>>> The core DMA-buf framework needs to enable signaling
>>> before the fence is signaled. The core DMA-buf framework
>>> can forget to enable signaling before the fence is signaled.
>>
>> This sentence is a bit confusing. I'm not a native speaker of English
>> either, but I suggest something like:
>>
>> "Fence signaling must be enable to make sure that the
>> dma_fence_is_signaled() function ever returns true."
>>
>>> To avoid this scenario on the debug kernel, check the
>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>>> the signaling bit status to confirm that enable_signaling
>>> is enabled.
>>
>> This describes the implementation, but we should rather describe the
>> background of the change. The implementation should be obvious.
>> Something like this maybe:
>>
>> "
>> Since drivers and implementations sometimes mess this up enforce
>> correct behavior when DEBUG_WW_MUTEX_SLOWPATH is used during debugging.
>>
>> This should make any implementations bugs resulting in not signaled
>> fences much more obvious.
>> "
>
> I think I follow the idea but am not sure coupling (well "coupling"..
> not really, but cross-contaminating in a way) dma-fence.c with a
> foreign and effectively unrelated concept of a ww mutex is the best way.
>
> Instead, how about a dma-buf specific debug kconfig option?

Yeah, I was thinking about that as well.

The slowpath config option was just at hand because when you want to
test the slowpath you want to test this here as well.

>
> Condition would then be, according to my understanding of the rules
> and expectations, along the lines of:
>
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 775cdc0b4f24..147a9df2c9d0 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -428,6 +428,17 @@ dma_fence_is_signaled_locked(struct dma_fence
> *fence)
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +#ifdef CONFIG_DEBUG_DMAFENCE
> +       /*
> +        * Implementations not providing the enable_signaling callback
> are
> +        * required to always have signaling enabled or fences are not
> +        * guaranteed to ever signal.
> +        */

Well that comment is a bit misleading. The intention of the extra check
is to find bugs in the frontend and not the backend.

In other words somewhere in the drm_syncobj code we have a
dma_fence_is_signaled() call without matching
dma_fence_enable_sw_signaling().

Regards,
Christian.

> + if (!fence->ops->enable_signaling &&
> +           !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> +               return false;
> +#endif
> +
>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>                 return true;
>
> Thoughts?
>
> Regards,
>
> Tvrtko
>
>>
>>>
>>> Signed-off-by: Arvind Yadav <[email protected]>
>>
>> With the improved commit message this patch is Reviewed-by: Christian
>> König <[email protected]>
>>
>> Regards,
>> Christian.
>>
>>> ---
>>>
>>> Changes in v1 :
>>> 1- Addressing Christian's comment to replace
>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>>> 2- As per Christian's comment moving this patch at last so
>>> The version of this patch is also changed and previously
>>> it was [PATCH 1/4]
>>>
>>>
>>> ---
>>>   include/linux/dma-fence.h | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 775cdc0b4f24..ba1ddc14c5d4 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>>> *fence)
>>>   static inline bool
>>>   dma_fence_is_signaled(struct dma_fence *fence)
>>>   {
>>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>> +        return false;
>>> +#endif
>>> +
>>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>           return true;
>>

2022-09-06 11:34:47

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug


On 06/09/2022 11:43, Christian König wrote:
> Am 06.09.22 um 12:20 schrieb Tvrtko Ursulin:
>>
>> On 06/09/2022 09:39, Christian König wrote:
>>> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>>>> The core DMA-buf framework needs to enable signaling
>>>> before the fence is signaled. The core DMA-buf framework
>>>> can forget to enable signaling before the fence is signaled.
>>>
>>> This sentence is a bit confusing. I'm not a native speaker of English
>>> either, but I suggest something like:
>>>
>>> "Fence signaling must be enable to make sure that the
>>> dma_fence_is_signaled() function ever returns true."
>>>
>>>> To avoid this scenario on the debug kernel, check the
>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>>>> the signaling bit status to confirm that enable_signaling
>>>> is enabled.
>>>
>>> This describes the implementation, but we should rather describe the
>>> background of the change. The implementation should be obvious.
>>> Something like this maybe:
>>>
>>> "
>>> Since drivers and implementations sometimes mess this up enforce
>>> correct behavior when DEBUG_WW_MUTEX_SLOWPATH is used during debugging.
>>>
>>> This should make any implementations bugs resulting in not signaled
>>> fences much more obvious.
>>> "
>>
>> I think I follow the idea but am not sure coupling (well "coupling"..
>> not really, but cross-contaminating in a way) dma-fence.c with a
>> foreign and effectively unrelated concept of a ww mutex is the best way.
>>
>> Instead, how about a dma-buf specific debug kconfig option?
>
> Yeah, I was thinking about that as well.

Cool, lets see about the specifics below and then decide.

> The slowpath config option was just at hand because when you want to
> test the slowpath you want to test this here as well.
>
>>
>> Condition would then be, according to my understanding of the rules
>> and expectations, along the lines of:
>>
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 775cdc0b4f24..147a9df2c9d0 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -428,6 +428,17 @@ dma_fence_is_signaled_locked(struct dma_fence
>> *fence)
>>  static inline bool
>>  dma_fence_is_signaled(struct dma_fence *fence)
>>  {
>> +#ifdef CONFIG_DEBUG_DMAFENCE
>> +       /*
>> +        * Implementations not providing the enable_signaling callback
>> are
>> +        * required to always have signaling enabled or fences are not
>> +        * guaranteed to ever signal.
>> +        */
>
> Well that comment is a bit misleading. The intention of the extra check
> is to find bugs in the frontend and not the backend.

By backend you mean drivers/dma-buf/dma-fence.c and by front end driver specific implementations? Or simply users for dma-fence?

Could be that I got confused.. I was reading these two:


* This callback is optional. If this callback is not present, then the
* driver must always have signaling enabled.
*/
bool (*enable_signaling)(struct dma_fence *fence);

And dma_fence_is_signaled:

* Returns true if the fence was already signaled, false if not. Since this
* function doesn't enable signaling, it is not guaranteed to ever return
* true if dma_fence_add_callback(), dma_fence_wait() or
* dma_fence_enable_sw_signaling() haven't been called before.

Right, I think I did get confused, apologies. What I was thinking was probably two separate conditions:

static inline bool
dma_fence_is_signaled(struct dma_fence *fence)
{
+#ifdef CONFIG_DEBUG_DMAFENCE
+ if (WARN_ON_ONCE(!fence->ops->enable_signaling &&
+ !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)))
+ return false;
+
+ if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+ return false;
+#endif
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;

Not sure "is signaled" is the best place for the first one or that it should definitely be added.

Regards,

Tvrtko

> In other words somewhere in the drm_syncobj code we have a
> dma_fence_is_signaled() call without matching
> dma_fence_enable_sw_signaling().
>
> Regards,
> Christian.
>
>> + if (!fence->ops->enable_signaling &&
>> +           !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>> +               return false;
>> +#endif
>> +
>>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>                 return true;
>>
>> Thoughts?
>>
>> Regards,
>>
>> Tvrtko
>>
>>>
>>>>
>>>> Signed-off-by: Arvind Yadav <[email protected]>
>>>
>>> With the improved commit message this patch is Reviewed-by: Christian
>>> König <[email protected]>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> ---
>>>>
>>>> Changes in v1 :
>>>> 1- Addressing Christian's comment to replace
>>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>>>> 2- As per Christian's comment moving this patch at last so
>>>> The version of this patch is also changed and previously
>>>> it was [PATCH 1/4]
>>>>
>>>>
>>>> ---
>>>>   include/linux/dma-fence.h | 5 +++++
>>>>   1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>> index 775cdc0b4f24..ba1ddc14c5d4 100644
>>>> --- a/include/linux/dma-fence.h
>>>> +++ b/include/linux/dma-fence.h
>>>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>>>> *fence)
>>>>   static inline bool
>>>>   dma_fence_is_signaled(struct dma_fence *fence)
>>>>   {
>>>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>>>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>>> +        return false;
>>>> +#endif
>>>> +
>>>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>>           return true;
>>>
>

2022-09-06 11:52:46

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug


On 06/09/2022 12:21, Tvrtko Ursulin wrote:
>
> On 06/09/2022 11:43, Christian König wrote:
>> Am 06.09.22 um 12:20 schrieb Tvrtko Ursulin:
>>>
>>> On 06/09/2022 09:39, Christian König wrote:
>>>> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>>>>> The core DMA-buf framework needs to enable signaling
>>>>> before the fence is signaled. The core DMA-buf framework
>>>>> can forget to enable signaling before the fence is signaled.
>>>>
>>>> This sentence is a bit confusing. I'm not a native speaker of
>>>> English either, but I suggest something like:
>>>>
>>>> "Fence signaling must be enable to make sure that the
>>>> dma_fence_is_signaled() function ever returns true."
>>>>
>>>>> To avoid this scenario on the debug kernel, check the
>>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>>>>> the signaling bit status to confirm that enable_signaling
>>>>> is enabled.
>>>>
>>>> This describes the implementation, but we should rather describe the
>>>> background of the change. The implementation should be obvious.
>>>> Something like this maybe:
>>>>
>>>> "
>>>> Since drivers and implementations sometimes mess this up enforce
>>>> correct behavior when DEBUG_WW_MUTEX_SLOWPATH is used during debugging.
>>>>
>>>> This should make any implementations bugs resulting in not signaled
>>>> fences much more obvious.
>>>> "
>>>
>>> I think I follow the idea but am not sure coupling (well "coupling"..
>>> not really, but cross-contaminating in a way) dma-fence.c with a
>>> foreign and effectively unrelated concept of a ww mutex is the best way.
>>>
>>> Instead, how about a dma-buf specific debug kconfig option?
>>
>> Yeah, I was thinking about that as well.
>
> Cool, lets see about the specifics below and then decide.
>
>> The slowpath config option was just at hand because when you want to
>> test the slowpath you want to test this here as well.
>>
>>>
>>> Condition would then be, according to my understanding of the rules
>>> and expectations, along the lines of:
>>>
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 775cdc0b4f24..147a9df2c9d0 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -428,6 +428,17 @@ dma_fence_is_signaled_locked(struct dma_fence
>>> *fence)
>>>  static inline bool
>>>  dma_fence_is_signaled(struct dma_fence *fence)
>>>  {
>>> +#ifdef CONFIG_DEBUG_DMAFENCE
>>> +       /*
>>> +        * Implementations not providing the enable_signaling
>>> callback are
>>> +        * required to always have signaling enabled or fences are not
>>> +        * guaranteed to ever signal.
>>> +        */
>>
>> Well that comment is a bit misleading. The intention of the extra
>> check is to find bugs in the frontend and not the backend.
>
> By backend you mean drivers/dma-buf/dma-fence.c and by front end driver
> specific implementations? Or simply users for dma-fence?
>
> Could be that I got confused.. I was reading these two:
>
>
>      * This callback is optional. If this callback is not present, then
> the
>      * driver must always have signaling enabled.
>      */
>     bool (*enable_signaling)(struct dma_fence *fence);
>
> And dma_fence_is_signaled:
>
>  * Returns true if the fence was already signaled, false if not. Since
> this
>  * function doesn't enable signaling, it is not guaranteed to ever return
>  * true if dma_fence_add_callback(), dma_fence_wait() or
>  * dma_fence_enable_sw_signaling() haven't been called before.
>
> Right, I think I did get confused, apologies. What I was thinking was
> probably two separate conditions:
>
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +#ifdef CONFIG_DEBUG_DMAFENCE
> +       if (WARN_ON_ONCE(!fence->ops->enable_signaling &&
> +                        !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> &fence->flags)))
> +               return false;
> +
> +       if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> +               return false;
> +#endif
> +

Or simpler OFC:

if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
WARN_ON_ONCE(!fence->ops->enable_signaling);
return false;
}

You could also run this core change past i915 CI to see if it catches anything. Just send to our trybot and see what happens? With the debug option enabled of course. Hope it won't be painful.

Regards,

Tvrtko

>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>                 return true;
>
> Not sure "is signaled" is the best place for the first one or that it
> should definitely be added.
>
> Regards,
>
> Tvrtko
>
>> In other words somewhere in the drm_syncobj code we have a
>> dma_fence_is_signaled() call without matching
>> dma_fence_enable_sw_signaling().
>>
>> Regards,
>> Christian.
>>
>>> + if (!fence->ops->enable_signaling &&
>>> +           !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>> +               return false;
>>> +#endif
>>> +
>>>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>                 return true;
>>>
>>> Thoughts?
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Arvind Yadav <[email protected]>
>>>>
>>>> With the improved commit message this patch is Reviewed-by:
>>>> Christian König <[email protected]>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> ---
>>>>>
>>>>> Changes in v1 :
>>>>> 1- Addressing Christian's comment to replace
>>>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>>>>> 2- As per Christian's comment moving this patch at last so
>>>>> The version of this patch is also changed and previously
>>>>> it was [PATCH 1/4]
>>>>>
>>>>>
>>>>> ---
>>>>>   include/linux/dma-fence.h | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>>> index 775cdc0b4f24..ba1ddc14c5d4 100644
>>>>> --- a/include/linux/dma-fence.h
>>>>> +++ b/include/linux/dma-fence.h
>>>>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>>>>> *fence)
>>>>>   static inline bool
>>>>>   dma_fence_is_signaled(struct dma_fence *fence)
>>>>>   {
>>>>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>>>>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>>>> +        return false;
>>>>> +#endif
>>>>> +
>>>>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>>>           return true;
>>>>
>>

2022-09-06 11:57:01

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] dma-buf: Check status of enable-signaling bit on debug

Am 06.09.22 um 13:21 schrieb Tvrtko Ursulin:
>
> On 06/09/2022 11:43, Christian König wrote:
>> Am 06.09.22 um 12:20 schrieb Tvrtko Ursulin:
>>>
>>> On 06/09/2022 09:39, Christian König wrote:
>>>> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>>>>> The core DMA-buf framework needs to enable signaling
>>>>> before the fence is signaled. The core DMA-buf framework
>>>>> can forget to enable signaling before the fence is signaled.
>>>>
>>>> This sentence is a bit confusing. I'm not a native speaker of
>>>> English either, but I suggest something like:
>>>>
>>>> "Fence signaling must be enable to make sure that the
>>>> dma_fence_is_signaled() function ever returns true."
>>>>
>>>>> To avoid this scenario on the debug kernel, check the
>>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>>>>> the signaling bit status to confirm that enable_signaling
>>>>> is enabled.
>>>>
>>>> This describes the implementation, but we should rather describe
>>>> the background of the change. The implementation should be obvious.
>>>> Something like this maybe:
>>>>
>>>> "
>>>> Since drivers and implementations sometimes mess this up enforce
>>>> correct behavior when DEBUG_WW_MUTEX_SLOWPATH is used during
>>>> debugging.
>>>>
>>>> This should make any implementations bugs resulting in not signaled
>>>> fences much more obvious.
>>>> "
>>>
>>> I think I follow the idea but am not sure coupling (well
>>> "coupling".. not really, but cross-contaminating in a way)
>>> dma-fence.c with a foreign and effectively unrelated concept of a ww
>>> mutex is the best way.
>>>
>>> Instead, how about a dma-buf specific debug kconfig option?
>>
>> Yeah, I was thinking about that as well.
>
> Cool, lets see about the specifics below and then decide.
>
>> The slowpath config option was just at hand because when you want to
>> test the slowpath you want to test this here as well.
>>
>>>
>>> Condition would then be, according to my understanding of the rules
>>> and expectations, along the lines of:
>>>
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 775cdc0b4f24..147a9df2c9d0 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -428,6 +428,17 @@ dma_fence_is_signaled_locked(struct dma_fence
>>> *fence)
>>>  static inline bool
>>>  dma_fence_is_signaled(struct dma_fence *fence)
>>>  {
>>> +#ifdef CONFIG_DEBUG_DMAFENCE
>>> +       /*
>>> +        * Implementations not providing the enable_signaling
>>> callback are
>>> +        * required to always have signaling enabled or fences are not
>>> +        * guaranteed to ever signal.
>>> +        */
>>
>> Well that comment is a bit misleading. The intention of the extra
>> check is to find bugs in the frontend and not the backend.
>
> By backend you mean drivers/dma-buf/dma-fence.c and by front end
> driver specific implementations? Or simply users for dma-fence?

Backend are the driver specific implementation of the dma_fence_ops.

Middleware is the framework in drivers/dma-buf.

Frontend is the users of the dma_fence interface, e.g. either drivers or
components (drm_syncobj, drm_scheduler etc...).

>
> Could be that I got confused.. I was reading these two:
>
>
>      * This callback is optional. If this callback is not present,
> then the
>      * driver must always have signaling enabled.
>      */
>     bool (*enable_signaling)(struct dma_fence *fence);
>
> And dma_fence_is_signaled:
>
>  * Returns true if the fence was already signaled, false if not. Since
> this
>  * function doesn't enable signaling, it is not guaranteed to ever return
>  * true if dma_fence_add_callback(), dma_fence_wait() or
>  * dma_fence_enable_sw_signaling() haven't been called before.
>
> Right, I think I did get confused, apologies. What I was thinking was
> probably two separate conditions:
>
>  static inline bool
>  dma_fence_is_signaled(struct dma_fence *fence)
>  {
> +#ifdef CONFIG_DEBUG_DMAFENCE
> +       if (WARN_ON_ONCE(!fence->ops->enable_signaling &&
> + !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)))
> +               return false;
> +
> +       if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> +               return false;
> +#endif
> +
>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>                 return true;
>
> Not sure "is signaled" is the best place for the first one or that it
> should definitely be added.

I was thinking about adding this WARN_ON() as well, but then decided
against it.

There are a couple of cases where calling dma_fence_is_signaled()
without previously calling dma_fence_enable_sw_signaling() is valid
because it is done just for optimization purposes and we guarantee that
dma_fence_enable_sw_signaling() is called just a little bit later.

But yes, in general it's the same idea I already had as well.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>> In other words somewhere in the drm_syncobj code we have a
>> dma_fence_is_signaled() call without matching
>> dma_fence_enable_sw_signaling().
>>
>> Regards,
>> Christian.
>>
>>> + if (!fence->ops->enable_signaling &&
>>> +           !test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>> +               return false;
>>> +#endif
>>> +
>>>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>                 return true;
>>>
>>> Thoughts?
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Arvind Yadav <[email protected]>
>>>>
>>>> With the improved commit message this patch is Reviewed-by:
>>>> Christian König <[email protected]>
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> ---
>>>>>
>>>>> Changes in v1 :
>>>>> 1- Addressing Christian's comment to replace
>>>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>>>>> 2- As per Christian's comment moving this patch at last so
>>>>> The version of this patch is also changed and previously
>>>>> it was [PATCH 1/4]
>>>>>
>>>>>
>>>>> ---
>>>>>   include/linux/dma-fence.h | 5 +++++
>>>>>   1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>>>> index 775cdc0b4f24..ba1ddc14c5d4 100644
>>>>> --- a/include/linux/dma-fence.h
>>>>> +++ b/include/linux/dma-fence.h
>>>>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>>>>> *fence)
>>>>>   static inline bool
>>>>>   dma_fence_is_signaled(struct dma_fence *fence)
>>>>>   {
>>>>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>>>>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>>>> +        return false;
>>>>> +#endif
>>>>> +
>>>>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>>>           return true;
>>>>
>>

2022-09-06 21:04:09

by Andrey Grodzovsky

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence


On 2022-09-06 02:34, Christian König wrote:
> Am 05.09.22 um 18:34 schrieb Arvind Yadav:
>> Here's enabling software signaling for finished fence.
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>
>> Changes in v1 :
>> 1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from
>> this patch.
>> 2- The version of this patch is also changed and previously
>> it was [PATCH 2/4]
>>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index e0ab14e0fb6b..fe72de0e2911 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -962,6 +962,8 @@ static int drm_sched_main(void *param)
>>               /* Drop for original kref_init of the fence */
>>               dma_fence_put(fence);
>>   + dma_fence_enable_sw_signaling(&s_fence->finished);
>
> Ok, this makes it a lot clearer. Previously I though that we have some
> bug in dma_fence_add_callback().
>
> This is essentially the wrong place to call this, the finished fence
> should be enabled by the caller and not here.
>
> There is also another problem in dma_fence_enable_sw_signaling(), it
> returns early when the fence is already signaled:
>
>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>                 return;
>
> Please remove that one first.


Why we even need this explicit call if dma_fence_add_callback calls
__dma_fence_enable_signaling anyway ?

Andrey


>
> Thanks,
> Christian.
>
>
>> +
>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>                              drm_sched_job_done_cb);
>>               if (r == -ENOENT)
>

2022-09-07 06:55:37

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence

Am 06.09.22 um 21:55 schrieb Andrey Grodzovsky:
>
> On 2022-09-06 02:34, Christian König wrote:
>> Am 05.09.22 um 18:34 schrieb Arvind Yadav:
>>> Here's enabling software signaling for finished fence.
>>>
>>> Signed-off-by: Arvind Yadav <[email protected]>
>>> ---
>>>
>>> Changes in v1 :
>>> 1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from
>>> this patch.
>>> 2- The version of this patch is also changed and previously
>>> it was [PATCH 2/4]
>>>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_main.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index e0ab14e0fb6b..fe72de0e2911 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -962,6 +962,8 @@ static int drm_sched_main(void *param)
>>>               /* Drop for original kref_init of the fence */
>>>               dma_fence_put(fence);
>>>   + dma_fence_enable_sw_signaling(&s_fence->finished);
>>
>> Ok, this makes it a lot clearer. Previously I though that we have
>> some bug in dma_fence_add_callback().
>>
>> This is essentially the wrong place to call this, the finished fence
>> should be enabled by the caller and not here.
>>
>> There is also another problem in dma_fence_enable_sw_signaling(), it
>> returns early when the fence is already signaled:
>>
>>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>                 return;
>>
>> Please remove that one first.
>
>
> Why we even need this explicit call if dma_fence_add_callback calls
> __dma_fence_enable_signaling anyway ?

Two different fence objects.

The dma_fence_add_callback() is done on the hw fence we get in return of
submitting the job.

The dma_fence_enable_sw_signaling() here is done on the finished fence
we use to signal the completion externally.

Key point is the finished fence should be used by the frontend drivers
which uses the scheduler and not by the scheduler itself.

Christian.

>
> Andrey
>
>
>>
>> Thanks,
>> Christian.
>>
>>
>>> +
>>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>>                              drm_sched_job_done_cb);
>>>               if (r == -ENOENT)
>>

2022-09-07 16:46:05

by Andrey Grodzovsky

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] drm/sched: Enable signaling for finished fence


On 2022-09-07 02:37, Christian König wrote:
> Am 06.09.22 um 21:55 schrieb Andrey Grodzovsky:
>>
>> On 2022-09-06 02:34, Christian König wrote:
>>> Am 05.09.22 um 18:34 schrieb Arvind Yadav:
>>>> Here's enabling software signaling for finished fence.
>>>>
>>>> Signed-off-by: Arvind Yadav <[email protected]>
>>>> ---
>>>>
>>>> Changes in v1 :
>>>> 1- Addressing Christian's comment to remove CONFIG_DEBUG_FS check from
>>>> this patch.
>>>> 2- The version of this patch is also changed and previously
>>>> it was [PATCH 2/4]
>>>>
>>>> ---
>>>>   drivers/gpu/drm/scheduler/sched_main.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index e0ab14e0fb6b..fe72de0e2911 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -962,6 +962,8 @@ static int drm_sched_main(void *param)
>>>>               /* Drop for original kref_init of the fence */
>>>>               dma_fence_put(fence);
>>>>   + dma_fence_enable_sw_signaling(&s_fence->finished);
>>>
>>> Ok, this makes it a lot clearer. Previously I though that we have
>>> some bug in dma_fence_add_callback().
>>>
>>> This is essentially the wrong place to call this, the finished fence
>>> should be enabled by the caller and not here.
>>>
>>> There is also another problem in dma_fence_enable_sw_signaling(), it
>>> returns early when the fence is already signaled:
>>>
>>>         if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>                 return;
>>>
>>> Please remove that one first.
>>
>>
>> Why we even need this explicit call if dma_fence_add_callback calls
>> __dma_fence_enable_signaling anyway ?
>
> Two different fence objects.
>
> The dma_fence_add_callback() is done on the hw fence we get in return
> of submitting the job.
>
> The dma_fence_enable_sw_signaling() here is done on the finished fence
> we use to signal the completion externally.
>
> Key point is the finished fence should be used by the frontend drivers
> which uses the scheduler and not by the scheduler itself.
>
> Christian.


Oh, so we need to explicitly call this because dma_fence_add_callback is
not always used for finished fence right ?

Yea, then it makes sense that the client needs to manage this since each
one has his own logic what to do with it.

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>> Thanks,
>>> Christian.
>>>
>>>
>>>> +
>>>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>>>                              drm_sched_job_done_cb);
>>>>               if (r == -ENOENT)
>>>
>

2022-09-09 16:45:18

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] dma-buf: enable signaling for the stub fence on debug


On 9/6/2022 12:39 PM, Christian König wrote:
>
>
> Am 05.09.22 um 18:35 schrieb Arvind Yadav:
>> Here's on debug enabling software signaling for the stub fence
>> which is always signaled. This fence should enable software
>> signaling otherwise the AMD GPU scheduler will cause a GPU reset
>> due to a GPU scheduler cleanup activity timeout.
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>
>> Changes in v1 :
>> 1- Addressing Christian's comment to remove unnecessary callback.
>> 2- Replacing CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS.
>> 3- The version of this patch is also changed and previously
>> it was [PATCH 3/4]
>>
>> ---
>>   drivers/dma-buf/dma-fence.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 066400ed8841..2378b12538c4 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>>   static DEFINE_SPINLOCK(dma_fence_stub_lock);
>>   static struct dma_fence dma_fence_stub;
>>   +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>> +static bool __dma_fence_enable_signaling(struct dma_fence *fence);
>> +#endif
>> +
>
> I would rename the function to something like
> dma_fence_enable_signaling_locked().
>
> And please don't add any #ifdef if it isn't absolutely necessary. This
> makes the code pretty fragile.
>
>>   /*
>>    * fence context counter: each execution context should have its own
>>    * fence context, this allows checking if fences belong to the same
>> @@ -136,6 +140,9 @@ struct dma_fence *dma_fence_get_stub(void)
>>                      &dma_fence_stub_ops,
>>                      &dma_fence_stub_lock,
>>                      0, 0);
>> +#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
>> +        __dma_fence_enable_signaling(&dma_fence_stub);
>> +#endif
>
> Alternatively in this particular case you could just set the bit
> manually here since this is part of the dma_fence code anyway.
>
> Christian.
>
As per per review comment. I will set the bit manually.

~arvind

>> dma_fence_signal_locked(&dma_fence_stub);
>>       }
>>       spin_unlock(&dma_fence_stub_lock);
>