2022-09-05 11:01:44

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH 0/4] dma-buf: To check enable signaling before signaled

TTM, GEM, DRM or the core DMA-buf framework are needs
to enable software signaling before the fence is signaled.
The core DMA-buf framework software can forget to call
enable_signaling before the fence is signaled. It means
framework code can forget to call dma_fence_enable_sw_signaling()
before calling dma_fence_is_signaled(). To avoid this scenario
on debug kernel, check the DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT bit
status before checking the MA_FENCE_FLAG_SIGNALED_BIT bit status
to confirm that software signaling is enabled.


Arvind Yadav (4):
dma-buf: Check status of enable-signaling bit on debug
drm/sched: Add callback and enable signaling on debug
dma-buf: Add callback and enable signaling on debug
dma-buf: Add callback and enable signaling on debug

drivers/dma-buf/dma-fence.c | 17 ++++++++
drivers/dma-buf/st-dma-fence-chain.c | 17 ++++++++
drivers/dma-buf/st-dma-fence-unwrap.c | 54 +++++++++++++++++++++++++
drivers/dma-buf/st-dma-fence.c | 34 +++++++++++++++-
drivers/dma-buf/st-dma-resv.c | 30 ++++++++++++++
drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++
drivers/gpu/drm/scheduler/sched_main.c | 4 +-
include/linux/dma-fence.h | 5 +++
8 files changed, 171 insertions(+), 2 deletions(-)

--
2.25.1


2022-09-05 11:03:52

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug

The core DMA-buf framework needs to enable signaling
before the fence is signaled. The core DMA-buf framework
can forget to enable signaling before the fence is signaled.
To avoid this scenario on the debug kernel, check the
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
the signaling bit status to confirm that enable_signaling
is enabled.

Signed-off-by: Arvind Yadav <[email protected]>
---
include/linux/dma-fence.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 775cdc0b4f24..60c0e935c0b5 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
static inline bool
dma_fence_is_signaled(struct dma_fence *fence)
{
+#ifdef CONFIG_DEBUG_FS
+ if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
+ return false;
+#endif
+
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;

--
2.25.1

2022-09-05 11:30:21

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug

Here's on debug adding an enable_signaling callback for finished
fences and enabling software signaling for finished fence.

Signed-off-by: Arvind Yadav <[email protected]>
---
drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++++++
drivers/gpu/drm/scheduler/sched_main.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
index 7fd869520ef2..ebd26cdb79a0 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -122,16 +122,28 @@ static void drm_sched_fence_release_finished(struct dma_fence *f)

dma_fence_put(&fence->scheduled);
}
+#ifdef CONFIG_DEBUG_FS
+static bool drm_sched_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif

static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = drm_sched_enable_signaling,
+#endif
.release = drm_sched_fence_release_scheduled,
};

static const struct dma_fence_ops drm_sched_fence_ops_finished = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = drm_sched_enable_signaling,
+#endif
.release = drm_sched_fence_release_finished,
};

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index e0ab14e0fb6b..140e3d8646e2 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
s_fence->parent = dma_fence_get(fence);
/* Drop for original kref_init of the fence */
dma_fence_put(fence);
-
+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(&s_fence->finished);
+#endif
r = dma_fence_add_callback(fence, &sched_job->cb,
drm_sched_job_done_cb);
if (r == -ENOENT)
--
2.25.1

2022-09-05 11:56:13

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH 3/4] dma-buf: Add callback and enable signaling on debug

Here's on debug adding an enable_signaling callback for the stub
fences and enabling software signaling for the stub fence which
is always signaled. This fence should enable software signaling
otherwise the AMD GPU scheduler will cause a GPU reset due to a
GPU scheduler cleanup activity timeout.

Signed-off-by: Arvind Yadav <[email protected]>
---
drivers/dma-buf/dma-fence.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 066400ed8841..0a67af945ef8 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
static DEFINE_SPINLOCK(dma_fence_stub_lock);
static struct dma_fence dma_fence_stub;

+#ifdef CONFIG_DEBUG_FS
+static bool __dma_fence_enable_signaling(struct dma_fence *fence);
+#endif
+
/*
* fence context counter: each execution context should have its own
* fence context, this allows checking if fences belong to the same
@@ -116,9 +120,19 @@ static const char *dma_fence_stub_get_name(struct dma_fence *fence)
return "stub";
}

+#ifdef CONFIG_DEBUG_FS
+static bool dma_fence_stub_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif
+
static const struct dma_fence_ops dma_fence_stub_ops = {
.get_driver_name = dma_fence_stub_get_name,
.get_timeline_name = dma_fence_stub_get_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = dma_fence_stub_enable_signaling,
+#endif
};

/**
@@ -136,6 +150,9 @@ struct dma_fence *dma_fence_get_stub(void)
&dma_fence_stub_ops,
&dma_fence_stub_lock,
0, 0);
+#ifdef CONFIG_DEBUG_FS
+ __dma_fence_enable_signaling(&dma_fence_stub);
+#endif
dma_fence_signal_locked(&dma_fence_stub);
}
spin_unlock(&dma_fence_stub_lock);
--
2.25.1

2022-09-05 11:57:05

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug

Am 05.09.22 um 12:56 schrieb Arvind Yadav:
> The core DMA-buf framework needs to enable signaling
> before the fence is signaled. The core DMA-buf framework
> can forget to enable signaling before the fence is signaled.
> To avoid this scenario on the debug kernel, check the
> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
> the signaling bit status to confirm that enable_signaling
> is enabled.

You might want to put this patch at the end of the series to avoid
breaking the kernel in between.

>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
> include/linux/dma-fence.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 775cdc0b4f24..60c0e935c0b5 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
> static inline bool
> dma_fence_is_signaled(struct dma_fence *fence)
> {
> +#ifdef CONFIG_DEBUG_FS

CONFIG_DEBUG_FS is certainly wrong, probably better to check for
CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.

Apart from that looks good to me,
Christian.

> + if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
> + return false;
> +#endif
> +
> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> return true;
>

2022-09-05 11:57:34

by Arvind Yadav

[permalink] [raw]
Subject: [PATCH 4/4] dma-buf: Add callback and enable signaling on debug

Here's on debug adding an enable_signaling callback for fences
and enabling software signaling for selftest.

Signed-off-by: Arvind Yadav <[email protected]>
---
drivers/dma-buf/st-dma-fence-chain.c | 17 +++++++++
drivers/dma-buf/st-dma-fence-unwrap.c | 54 +++++++++++++++++++++++++++
drivers/dma-buf/st-dma-fence.c | 34 ++++++++++++++++-
drivers/dma-buf/st-dma-resv.c | 30 +++++++++++++++
4 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 8ce1ea59d31b..ffbb24d6a890 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -32,6 +32,12 @@ static const char *mock_name(struct dma_fence *f)
{
return "mock";
}
+#ifdef CONFIG_DEBUG_FS
+static bool mock_fence_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif

static void mock_fence_release(struct dma_fence *f)
{
@@ -41,6 +47,9 @@ static void mock_fence_release(struct dma_fence *f)
static const struct dma_fence_ops mock_ops = {
.get_driver_name = mock_name,
.get_timeline_name = mock_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = mock_fence_enable_signaling,
+#endif
.release = mock_fence_release,
};

@@ -87,6 +96,10 @@ static int sanitycheck(void *arg)
if (!chain)
err = -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(chain);
+#endif
+
dma_fence_signal(f);
dma_fence_put(f);

@@ -143,6 +156,10 @@ static int fence_chains_init(struct fence_chains *fc, unsigned int count,
}

fc->tail = fc->chains[i];
+
+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(fc->chains[i]);
+#endif
}

fc->chain_length = i;
diff --git a/drivers/dma-buf/st-dma-fence-unwrap.c b/drivers/dma-buf/st-dma-fence-unwrap.c
index 4105d5ea8dde..b43c57559ead 100644
--- a/drivers/dma-buf/st-dma-fence-unwrap.c
+++ b/drivers/dma-buf/st-dma-fence-unwrap.c
@@ -23,9 +23,19 @@ static const char *mock_name(struct dma_fence *f)
return "mock";
}

+#ifdef CONFIG_DEBUG_FS
+static bool mock_fence_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif
+
static const struct dma_fence_ops mock_ops = {
.get_driver_name = mock_name,
.get_timeline_name = mock_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = mock_fence_enable_signaling,
+#endif
};

static struct dma_fence *mock_fence(void)
@@ -102,6 +112,10 @@ static int sanitycheck(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
array = mock_array(1, f);
if (!array)
return -ENOMEM;
@@ -124,12 +138,20 @@ static int unwrap_array(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
array = mock_array(2, f1, f2);
if (!array)
return -ENOMEM;
@@ -164,12 +186,20 @@ static int unwrap_chain(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
chain = mock_chain(f1, f2);
if (!chain)
return -ENOMEM;
@@ -204,12 +234,20 @@ static int unwrap_chain_array(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
dma_fence_put(f1);
return -ENOMEM;
}

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
array = mock_array(2, f1, f2);
if (!array)
return -ENOMEM;
@@ -248,12 +286,20 @@ static int unwrap_merge(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2) {
err = -ENOMEM;
goto error_put_f1;
}

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
f3 = dma_fence_unwrap_merge(f1, f2);
if (!f3) {
err = -ENOMEM;
@@ -296,10 +342,18 @@ static int unwrap_merge_complex(void *arg)
if (!f1)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
+
f2 = mock_fence();
if (!f2)
goto error_put_f1;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f2);
+#endif
+
f3 = dma_fence_unwrap_merge(f1, f2);
if (!f3)
goto error_put_f2;
diff --git a/drivers/dma-buf/st-dma-fence.c b/drivers/dma-buf/st-dma-fence.c
index c8a12d7ad71a..4adb763f4509 100644
--- a/drivers/dma-buf/st-dma-fence.c
+++ b/drivers/dma-buf/st-dma-fence.c
@@ -27,6 +27,12 @@ static const char *mock_name(struct dma_fence *f)
{
return "mock";
}
+#ifdef CONFIG_DEBUG_FS
+static bool mock_fence_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif

static void mock_fence_release(struct dma_fence *f)
{
@@ -77,6 +83,9 @@ static const struct dma_fence_ops mock_ops = {
.get_driver_name = mock_name,
.get_timeline_name = mock_name,
.wait = mock_wait,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = mock_fence_enable_signaling,
+#endif
.release = mock_fence_release,
};

@@ -101,7 +110,9 @@ static int sanitycheck(void *arg)
f = mock_fence();
if (!f)
return -ENOMEM;
-
+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_signal(f);
dma_fence_put(f);

@@ -117,6 +128,9 @@ static int test_signaling(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_is_signaled(f)) {
pr_err("Fence unexpectedly signaled on creation\n");
goto err_free;
@@ -190,6 +204,9 @@ static int test_late_add_callback(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_signal(f);

if (!dma_fence_add_callback(f, &cb.cb, simple_callback)) {
@@ -282,6 +299,9 @@ static int test_status(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_get_status(f)) {
pr_err("Fence unexpectedly has signaled status on creation\n");
goto err_free;
@@ -308,6 +328,9 @@ static int test_error(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
dma_fence_set_error(f, -EIO);

if (dma_fence_get_status(f)) {
@@ -337,6 +360,9 @@ static int test_wait(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
if (dma_fence_wait_timeout(f, false, 0) != -ETIME) {
pr_err("Wait reported complete before being signaled\n");
goto err_free;
@@ -379,6 +405,9 @@ static int test_wait_timeout(void *arg)
if (!wt.f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(wt.f);
+#endif
if (dma_fence_wait_timeout(wt.f, false, 1) != -ETIME) {
pr_err("Wait reported complete before being signaled\n");
goto err_free;
@@ -458,6 +487,9 @@ static int thread_signal_callback(void *arg)
break;
}

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f1);
+#endif
rcu_assign_pointer(t->fences[t->id], f1);
smp_wmb();

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c
index 813779e3c9be..fa4e7b09d54c 100644
--- a/drivers/dma-buf/st-dma-resv.c
+++ b/drivers/dma-buf/st-dma-resv.c
@@ -18,9 +18,19 @@ static const char *fence_name(struct dma_fence *f)
return "selftest";
}

+#ifdef CONFIG_DEBUG_FS
+static bool fence_enable_signaling(struct dma_fence *f)
+{
+ return true;
+}
+#endif
+
static const struct dma_fence_ops fence_ops = {
.get_driver_name = fence_name,
.get_timeline_name = fence_name,
+#ifdef CONFIG_DEBUG_FS
+ .enable_signaling = fence_enable_signaling,
+#endif
};

static struct dma_fence *alloc_fence(void)
@@ -45,6 +55,10 @@ static int sanitycheck(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_fence_signal(f);
dma_fence_put(f);

@@ -69,6 +83,10 @@ static int test_signaling(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -114,6 +132,10 @@ static int test_for_each(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -173,6 +195,10 @@ static int test_for_each_unlocked(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
@@ -244,6 +270,10 @@ static int test_get_fences(void *arg)
if (!f)
return -ENOMEM;

+#ifdef CONFIG_DEBUG_FS
+ dma_fence_enable_sw_signaling(f);
+#endif
+
dma_resv_init(&resv);
r = dma_resv_lock(&resv, NULL);
if (r) {
--
2.25.1

2022-09-05 12:26:27

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 3/4] dma-buf: Add callback and enable signaling on debug



Am 05.09.22 um 12:56 schrieb Arvind Yadav:
> Here's on debug adding an enable_signaling callback for the stub
> fences and enabling software signaling for the stub fence which
> is always signaled. This fence should enable software signaling
> otherwise the AMD GPU scheduler will cause a GPU reset due to a
> GPU scheduler cleanup activity timeout.
>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
> drivers/dma-buf/dma-fence.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 066400ed8841..0a67af945ef8 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
> static DEFINE_SPINLOCK(dma_fence_stub_lock);
> static struct dma_fence dma_fence_stub;
>
> +#ifdef CONFIG_DEBUG_FS
> +static bool __dma_fence_enable_signaling(struct dma_fence *fence);
> +#endif
> +
> /*
> * fence context counter: each execution context should have its own
> * fence context, this allows checking if fences belong to the same
> @@ -116,9 +120,19 @@ static const char *dma_fence_stub_get_name(struct dma_fence *fence)
> return "stub";
> }
>
> +#ifdef CONFIG_DEBUG_FS
> +static bool dma_fence_stub_enable_signaling(struct dma_fence *f)
> +{
> + return true;
> +}
> +#endif

Again, adding the callback is unnecessary.

> +
> static const struct dma_fence_ops dma_fence_stub_ops = {
> .get_driver_name = dma_fence_stub_get_name,
> .get_timeline_name = dma_fence_stub_get_name,
> +#ifdef CONFIG_DEBUG_FS
> + .enable_signaling = dma_fence_stub_enable_signaling,
> +#endif
> };
>
> /**
> @@ -136,6 +150,9 @@ struct dma_fence *dma_fence_get_stub(void)
> &dma_fence_stub_ops,
> &dma_fence_stub_lock,
> 0, 0);
> +#ifdef CONFIG_DEBUG_FS
> + __dma_fence_enable_signaling(&dma_fence_stub);
> +#endif
> dma_fence_signal_locked(&dma_fence_stub);
> }
> spin_unlock(&dma_fence_stub_lock);

2022-09-05 12:26:58

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug



Am 05.09.22 um 12:56 schrieb Arvind Yadav:
> Here's on debug adding an enable_signaling callback for finished
> fences and enabling software signaling for finished fence.
>
> Signed-off-by: Arvind Yadav <[email protected]>
> ---
> drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++++++
> drivers/gpu/drm/scheduler/sched_main.c | 4 +++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c
> index 7fd869520ef2..ebd26cdb79a0 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -122,16 +122,28 @@ static void drm_sched_fence_release_finished(struct dma_fence *f)
>
> dma_fence_put(&fence->scheduled);
> }
> +#ifdef CONFIG_DEBUG_FS
> +static bool drm_sched_enable_signaling(struct dma_fence *f)
> +{
> + return true;
> +}
> +#endif
>
> static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
> .get_driver_name = drm_sched_fence_get_driver_name,
> .get_timeline_name = drm_sched_fence_get_timeline_name,
> +#ifdef CONFIG_DEBUG_FS
> + .enable_signaling = drm_sched_enable_signaling,
> +#endif
> .release = drm_sched_fence_release_scheduled,
> };
>
> static const struct dma_fence_ops drm_sched_fence_ops_finished = {
> .get_driver_name = drm_sched_fence_get_driver_name,
> .get_timeline_name = drm_sched_fence_get_timeline_name,
> +#ifdef CONFIG_DEBUG_FS
> + .enable_signaling = drm_sched_enable_signaling,
> +#endif

Adding the callback should not be necessary.

> .release = drm_sched_fence_release_finished,
> };
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index e0ab14e0fb6b..140e3d8646e2 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
> s_fence->parent = dma_fence_get(fence);
> /* Drop for original kref_init of the fence */
> dma_fence_put(fence);

Uff, not related to your patch but that looks wrong to me. The reference
can only be dropped after the call to dma_fence_add_callback().

> -
> +#ifdef CONFIG_DEBUG_FS
> + dma_fence_enable_sw_signaling(&s_fence->finished);
> +#endif

This should always be called, independent of the config options set.

Christian.

> r = dma_fence_add_callback(fence, &sched_job->cb,
> drm_sched_job_done_cb);
> if (r == -ENOENT)

2022-09-05 14:01:55

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug


On 9/5/2022 4:51 PM, Christian König wrote:
> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>> The core DMA-buf framework needs to enable signaling
>> before the fence is signaled. The core DMA-buf framework
>> can forget to enable signaling before the fence is signaled.
>> To avoid this scenario on the debug kernel, check the
>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>> the signaling bit status to confirm that enable_signaling
>> is enabled.
>
> You might want to put this patch at the end of the series to avoid
> breaking the kernel in between.
>
sure, I will add this patch at end.
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>   include/linux/dma-fence.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 775cdc0b4f24..60c0e935c0b5 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>> *fence)
>>   static inline bool
>>   dma_fence_is_signaled(struct dma_fence *fence)
>>   {
>> +#ifdef CONFIG_DEBUG_FS
>
> CONFIG_DEBUG_FS is certainly wrong, probably better to check for
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.
>
> Apart from that looks good to me,
> Christian.
>
I will use CONFIG_DEBUG_WW_MUTEX_SLOWPATH instead of CONFIG_DEBUG_FS

~arvind

>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>> +        return false;
>> +#endif
>> +
>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>           return true;
>

2022-09-05 14:08:09

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH 3/4] dma-buf: Add callback and enable signaling on debug


On 9/5/2022 4:56 PM, Christian König wrote:
>
>
> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>> Here's on debug adding an enable_signaling callback for the stub
>> fences and enabling software signaling for the stub fence which
>> is always signaled. This fence should enable software signaling
>> otherwise the AMD GPU scheduler will cause a GPU reset due to a
>> GPU scheduler cleanup activity timeout.
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>   drivers/dma-buf/dma-fence.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 066400ed8841..0a67af945ef8 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -27,6 +27,10 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
>>   static DEFINE_SPINLOCK(dma_fence_stub_lock);
>>   static struct dma_fence dma_fence_stub;
>>   +#ifdef CONFIG_DEBUG_FS
>> +static bool __dma_fence_enable_signaling(struct dma_fence *fence);
>> +#endif
>> +
>>   /*
>>    * fence context counter: each execution context should have its own
>>    * fence context, this allows checking if fences belong to the same
>> @@ -116,9 +120,19 @@ static const char
>> *dma_fence_stub_get_name(struct dma_fence *fence)
>>           return "stub";
>>   }
>>   +#ifdef CONFIG_DEBUG_FS
>> +static bool dma_fence_stub_enable_signaling(struct dma_fence *f)
>> +{
>> +    return true;
>> +}
>> +#endif
>
> Again, adding the callback is unnecessary.

sure, I will remove callback from here and other patch also.

~arvind

>
>> +
>>   static const struct dma_fence_ops dma_fence_stub_ops = {
>>       .get_driver_name = dma_fence_stub_get_name,
>>       .get_timeline_name = dma_fence_stub_get_name,
>> +#ifdef CONFIG_DEBUG_FS
>> +    .enable_signaling =  dma_fence_stub_enable_signaling,
>> +#endif
>>   };
>>     /**
>> @@ -136,6 +150,9 @@ struct dma_fence *dma_fence_get_stub(void)
>>                      &dma_fence_stub_ops,
>>                      &dma_fence_stub_lock,
>>                      0, 0);
>> +#ifdef CONFIG_DEBUG_FS
>> +        __dma_fence_enable_signaling(&dma_fence_stub);
>> +#endif
>>           dma_fence_signal_locked(&dma_fence_stub);
>>       }
>>       spin_unlock(&dma_fence_stub_lock);
>

2022-09-05 14:19:46

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug


On 9/5/2022 4:55 PM, Christian König wrote:
>
>
> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>> Here's on debug adding an enable_signaling callback for finished
>> fences and enabling software signaling for finished fence.
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>   drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++++++
>>   drivers/gpu/drm/scheduler/sched_main.c  |  4 +++-
>>   2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>> b/drivers/gpu/drm/scheduler/sched_fence.c
>> index 7fd869520ef2..ebd26cdb79a0 100644
>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>> @@ -122,16 +122,28 @@ static void
>> drm_sched_fence_release_finished(struct dma_fence *f)
>>         dma_fence_put(&fence->scheduled);
>>   }
>> +#ifdef CONFIG_DEBUG_FS
>> +static bool drm_sched_enable_signaling(struct dma_fence *f)
>> +{
>> +    return true;
>> +}
>> +#endif
>>     static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
>>       .get_driver_name = drm_sched_fence_get_driver_name,
>>       .get_timeline_name = drm_sched_fence_get_timeline_name,
>> +#ifdef CONFIG_DEBUG_FS
>> +    .enable_signaling = drm_sched_enable_signaling,
>> +#endif
>>       .release = drm_sched_fence_release_scheduled,
>>   };
>>     static const struct dma_fence_ops drm_sched_fence_ops_finished = {
>>       .get_driver_name = drm_sched_fence_get_driver_name,
>>       .get_timeline_name = drm_sched_fence_get_timeline_name,
>> +#ifdef CONFIG_DEBUG_FS
>> +    .enable_signaling = drm_sched_enable_signaling,
>> +#endif
>
> Adding the callback should not be necessary.
sure, I will remove this change.
>
>>       .release = drm_sched_fence_release_finished,
>>   };
>>   diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index e0ab14e0fb6b..140e3d8646e2 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
>>               s_fence->parent = dma_fence_get(fence);
>>               /* Drop for original kref_init of the fence */
>>               dma_fence_put(fence);
>
> Uff, not related to your patch but that looks wrong to me. The
> reference can only be dropped after the call to dma_fence_add_callback().
>
Shall I take care with this patch or I will submit separate one ?
>> -
>> +#ifdef CONFIG_DEBUG_FS
>> + dma_fence_enable_sw_signaling(&s_fence->finished);
>> +#endif
>
> This should always be called, independent of the config options set.
>
> Christian.

sure, I will remove the Config check.

~arvind

>
>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>                              drm_sched_job_done_cb);
>>               if (r == -ENOENT)
>

2022-09-05 15:14:54

by Yadav, Arvind

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug


On 9/5/2022 7:16 PM, Yadav, Arvind wrote:
>
> On 9/5/2022 4:55 PM, Christian König wrote:
>>
>>
>> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>>> Here's on debug adding an enable_signaling callback for finished
>>> fences and enabling software signaling for finished fence.
>>>
>>> Signed-off-by: Arvind Yadav <[email protected]>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_fence.c | 12 ++++++++++++
>>>   drivers/gpu/drm/scheduler/sched_main.c  |  4 +++-
>>>   2 files changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
>>> b/drivers/gpu/drm/scheduler/sched_fence.c
>>> index 7fd869520ef2..ebd26cdb79a0 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_fence.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
>>> @@ -122,16 +122,28 @@ static void
>>> drm_sched_fence_release_finished(struct dma_fence *f)
>>>         dma_fence_put(&fence->scheduled);
>>>   }
>>> +#ifdef CONFIG_DEBUG_FS
>>> +static bool drm_sched_enable_signaling(struct dma_fence *f)
>>> +{
>>> +    return true;
>>> +}
>>> +#endif
>>>     static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
>>>       .get_driver_name = drm_sched_fence_get_driver_name,
>>>       .get_timeline_name = drm_sched_fence_get_timeline_name,
>>> +#ifdef CONFIG_DEBUG_FS
>>> +    .enable_signaling = drm_sched_enable_signaling,
>>> +#endif
>>>       .release = drm_sched_fence_release_scheduled,
>>>   };
>>>     static const struct dma_fence_ops drm_sched_fence_ops_finished = {
>>>       .get_driver_name = drm_sched_fence_get_driver_name,
>>>       .get_timeline_name = drm_sched_fence_get_timeline_name,
>>> +#ifdef CONFIG_DEBUG_FS
>>> +    .enable_signaling = drm_sched_enable_signaling,
>>> +#endif
>>
>> Adding the callback should not be necessary.
> sure, I will remove this change.
>>
>>>       .release = drm_sched_fence_release_finished,
>>>   };
>>>   diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index e0ab14e0fb6b..140e3d8646e2 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
>>>               s_fence->parent = dma_fence_get(fence);
>>>               /* Drop for original kref_init of the fence */
>>>               dma_fence_put(fence);
>>
>> Uff, not related to your patch but that looks wrong to me. The
>> reference can only be dropped after the call to
>> dma_fence_add_callback().
>>
> Shall I take care with this patch or I will submit separate one ?

This changes was recently added by  Andrey Grodzovsky (commit :
45ecaea73) to fix the negative refcount.

>>> -
>>> +#ifdef CONFIG_DEBUG_FS
>>> + dma_fence_enable_sw_signaling(&s_fence->finished);
>>> +#endif
>>
>> This should always be called, independent of the config options set.
>>
>> Christian.
>
> sure, I will remove the Config check.
>
> ~arvind
>
>>
>>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>>                              drm_sched_job_done_cb);
>>>               if (r == -ENOENT)
>>

2022-09-05 15:36:14

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 2/4] drm/sched: Add callback and enable signaling on debug

Am 05.09.22 um 15:46 schrieb Yadav, Arvind:
> On 9/5/2022 4:55 PM, Christian König wrote:
>> [SNIP]
>>
>> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>>>       .release = drm_sched_fence_release_finished,
>>>   };
>>>   diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index e0ab14e0fb6b..140e3d8646e2 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -961,7 +961,9 @@ static int drm_sched_main(void *param)
>>>               s_fence->parent = dma_fence_get(fence);
>>>               /* Drop for original kref_init of the fence */
>>>               dma_fence_put(fence);
>>
>> Uff, not related to your patch but that looks wrong to me. The
>> reference can only be dropped after the call to
>> dma_fence_add_callback().
>>
> Shall I take care with this patch or I will submit separate one ?


Separate one. It's probably no big deal since we grab another reference
right before, but still quite some broken coding.

Thanks,
Christian.

>
>>> -
>>> +#ifdef CONFIG_DEBUG_FS
>>> + dma_fence_enable_sw_signaling(&s_fence->finished);
>>> +#endif
>>
>> This should always be called, independent of the config options set.
>>
>> Christian.
>
> sure, I will remove the Config check.
>
> ~arvind
>
>>
>>>               r = dma_fence_add_callback(fence, &sched_job->cb,
>>>                              drm_sched_job_done_cb);
>>>               if (r == -ENOENT)
>>

2022-09-05 16:43:19

by Tvrtko Ursulin

[permalink] [raw]
Subject: Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug


On 05/09/2022 12:21, Christian König wrote:
> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>> The core DMA-buf framework needs to enable signaling
>> before the fence is signaled. The core DMA-buf framework
>> can forget to enable signaling before the fence is signaled.
>> To avoid this scenario on the debug kernel, check the
>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>> the signaling bit status to confirm that enable_signaling
>> is enabled.
>
> You might want to put this patch at the end of the series to avoid
> breaking the kernel in between.
>
>>
>> Signed-off-by: Arvind Yadav <[email protected]>
>> ---
>>   include/linux/dma-fence.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 775cdc0b4f24..60c0e935c0b5 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>> *fence)
>>   static inline bool
>>   dma_fence_is_signaled(struct dma_fence *fence)
>>   {
>> +#ifdef CONFIG_DEBUG_FS
>
> CONFIG_DEBUG_FS is certainly wrong, probably better to check for
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.
>
> Apart from that looks good to me,

What's the full story in this series - I'm afraid the cover letter does not make it clear to a casual reader like myself? Where does the difference between debug and non debug kernel come from?

And how do the proposed changes relate to the following kerneldoc excerpt:

* Since many implementations can call dma_fence_signal() even when before
* @enable_signaling has been called there's a race window, where the
* dma_fence_signal() might result in the final fence reference being
* released and its memory freed. To avoid this, implementations of this
* callback should grab their own reference using dma_fence_get(), to be
* released when the fence is signalled (through e.g. the interrupt
* handler).
*
* This callback is optional. If this callback is not present, then the
* driver must always have signaling enabled.

Is it now an error, or should be impossible condition, for "is signaled" to return true _unless_ signaling has been enabled?

If the statement (in a later patch) is signalling should always be explicitly enabled by the callers of dma_fence_add_callback, then what about the existing call to __dma_fence_enable_signaling from dma_fence_add_callback?

Or if the rules are changing shouldn't kerneldoc be updated as part of the series?

Regards,

Tvrtko

> Christian.
>
>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>> +        return false;
>> +#endif
>> +
>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>           return true;
>

2022-09-05 18:39:25

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 1/4] dma-buf: Check status of enable-signaling bit on debug

Am 05.09.22 um 18:39 schrieb Tvrtko Ursulin:
>
> On 05/09/2022 12:21, Christian König wrote:
>> Am 05.09.22 um 12:56 schrieb Arvind Yadav:
>>> The core DMA-buf framework needs to enable signaling
>>> before the fence is signaled. The core DMA-buf framework
>>> can forget to enable signaling before the fence is signaled.
>>> To avoid this scenario on the debug kernel, check the
>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT status bit before checking
>>> the signaling bit status to confirm that enable_signaling
>>> is enabled.
>>
>> You might want to put this patch at the end of the series to avoid
>> breaking the kernel in between.
>>
>>>
>>> Signed-off-by: Arvind Yadav <[email protected]>
>>> ---
>>>   include/linux/dma-fence.h | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>>> index 775cdc0b4f24..60c0e935c0b5 100644
>>> --- a/include/linux/dma-fence.h
>>> +++ b/include/linux/dma-fence.h
>>> @@ -428,6 +428,11 @@ dma_fence_is_signaled_locked(struct dma_fence
>>> *fence)
>>>   static inline bool
>>>   dma_fence_is_signaled(struct dma_fence *fence)
>>>   {
>>> +#ifdef CONFIG_DEBUG_FS
>>
>> CONFIG_DEBUG_FS is certainly wrong, probably better to check for
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH here.
>>
>> Apart from that looks good to me,
>
> What's the full story in this series - I'm afraid the cover letter
> does not make it clear to a casual reader like myself? Where does the
> difference between debug and non debug kernel come from?

We have a bug that the drm_sync file doesn't properly enable signaling
leading to an igt test failure.

>
> And how do the proposed changes relate to the following kerneldoc
> excerpt:
>
>      * Since many implementations can call dma_fence_signal() even
> when before
>      * @enable_signaling has been called there's a race window, where the
>      * dma_fence_signal() might result in the final fence reference being
>      * released and its memory freed. To avoid this, implementations
> of this
>      * callback should grab their own reference using dma_fence_get(),
> to be
>      * released when the fence is signalled (through e.g. the interrupt
>      * handler).
>      *
>      * This callback is optional. If this callback is not present,
> then the
>      * driver must always have signaling enabled.
>
> Is it now an error, or should be impossible condition, for "is
> signaled" to return true _unless_ signaling has been enabled?

That's neither an error nor impossible. For debugging we just never
return signaled from the dma_fence_is_signaled() function when signaling
was not enabled before.

I also plan to remove the return value from the enable_signaling
callback. That was just not very well designed.

>
> If the statement (in a later patch) is signalling should always be
> explicitly enabled by the callers of dma_fence_add_callback, then what
> about the existing call to __dma_fence_enable_signaling from
> dma_fence_add_callback?

Oh, good point. That sounds like we have some bug in the core dma_fence
code as well.

Calls to dma_fence_add_callback() and dma_fence_wait() should enable
signaling implicitly and don't need an extra call for that.

Only dma_fence_is_signaled() needs this explicit enabling of signaling
through dma_fence_enable_sw_signaling().

>
> Or if the rules are changing shouldn't kerneldoc be updated as part of
> the series?

I think the kerneldoc is just a bit misleading. The point is that when
you need to call dma_fence_enable_sw_signaling() you must hold a
reference to the fence object.

But that's true for all the dma_fence_* functions. The race described in
the comment is just nonsense because you need to hold that reference anyway.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>> Christian.
>>
>>> +    if (!test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags))
>>> +        return false;
>>> +#endif
>>> +
>>>       if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>>           return true;
>>