2022-08-19 20:14:40

by Karol Herbst

[permalink] [raw]
Subject: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

It is a bit unlcear to us why that's helping, but it does and unbreaks
suspend/resume on a lot of GPUs without any known drawbacks.

Cc: [email protected] # v5.15+
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
Signed-off-by: Karol Herbst <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 35bb0bb3fe61..126b3c6e12f9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
+ /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+ nouveau_fence_wait(fence, false, false);
ret = ttm_bo_move_accel_cleanup(bo,
&fence->base,
evict, false,
--
2.37.1


2022-08-22 21:39:29

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Reviewed-by: Lyude Paul <[email protected]>

On Fri, 2022-08-19 at 22:09 +0200, Karol Herbst wrote:
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
>
> Cc: [email protected] # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> if (ret == 0) {
> ret = nouveau_fence_new(chan, false, &fence);
> if (ret == 0) {
> + /* TODO: figure out a better solution here
> + *
> + * wait on the fence here explicitly as going through
> + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> + *
> + * Without this the operation can timeout and we'll fallback to a
> + * software copy, which might take several minutes to finish.
> + */
> + nouveau_fence_wait(fence, false, false);
> ret = ttm_bo_move_accel_cleanup(bo,
> &fence->base,
> evict, false,

--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat

2022-09-20 10:48:53

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Hi,

On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
>
> Cc: [email protected] # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> if (ret == 0) {
> ret = nouveau_fence_new(chan, false, &fence);
> if (ret == 0) {
> + /* TODO: figure out a better solution here
> + *
> + * wait on the fence here explicitly as going through
> + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> + *
> + * Without this the operation can timeout and we'll fallback to a
> + * software copy, which might take several minutes to finish.
> + */
> + nouveau_fence_wait(fence, false, false);
> ret = ttm_bo_move_accel_cleanup(bo,
> &fence->base,
> evict, false,
> --
> 2.37.1
>
>

While this is marked for 5.15+ only, a user in Debian was seeing the
suspend issue as well on 5.10.y and did confirm the commit fixes the
issue as well in the 5.10.y series:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69

Karol, Lyude, should that as well be picked for 5.10.y?

Regards,
Salvatore

2022-09-20 11:42:56

by Karol Herbst

[permalink] [raw]
Subject: Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <[email protected]> wrote:
>
> Hi,
>
> On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> > It is a bit unlcear to us why that's helping, but it does and unbreaks
> > suspend/resume on a lot of GPUs without any known drawbacks.
> >
> > Cc: [email protected] # v5.15+
> > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> > Signed-off-by: Karol Herbst <[email protected]>
> > ---
> > drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > index 35bb0bb3fe61..126b3c6e12f9 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> > if (ret == 0) {
> > ret = nouveau_fence_new(chan, false, &fence);
> > if (ret == 0) {
> > + /* TODO: figure out a better solution here
> > + *
> > + * wait on the fence here explicitly as going through
> > + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> > + *
> > + * Without this the operation can timeout and we'll fallback to a
> > + * software copy, which might take several minutes to finish.
> > + */
> > + nouveau_fence_wait(fence, false, false);
> > ret = ttm_bo_move_accel_cleanup(bo,
> > &fence->base,
> > evict, false,
> > --
> > 2.37.1
> >
> >
>
> While this is marked for 5.15+ only, a user in Debian was seeing the
> suspend issue as well on 5.10.y and did confirm the commit fixes the
> issue as well in the 5.10.y series:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69
>
> Karol, Lyude, should that as well be picked for 5.10.y?
>

mhh from the original report 5.10 was fine, but maybe something got
backported and it broke it? I'll try to do some testing on my machine
and see what I can figure out, but it could also be a debian only
issue at this point.

> Regards,
> Salvatore
>

2022-09-20 12:26:45

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Hi,

On Tue, Sep 20, 2022 at 01:36:32PM +0200, Karol Herbst wrote:
> On Tue, Sep 20, 2022 at 12:42 PM Salvatore Bonaccorso <[email protected]> wrote:
> >
> > Hi,
> >
> > On Fri, Aug 19, 2022 at 10:09:28PM +0200, Karol Herbst wrote:
> > > It is a bit unlcear to us why that's helping, but it does and unbreaks
> > > suspend/resume on a lot of GPUs without any known drawbacks.
> > >
> > > Cc: [email protected] # v5.15+
> > > Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> > > Signed-off-by: Karol Herbst <[email protected]>
> > > ---
> > > drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> > > 1 file changed, 9 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > index 35bb0bb3fe61..126b3c6e12f9 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> > > @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> > > if (ret == 0) {
> > > ret = nouveau_fence_new(chan, false, &fence);
> > > if (ret == 0) {
> > > + /* TODO: figure out a better solution here
> > > + *
> > > + * wait on the fence here explicitly as going through
> > > + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> > > + *
> > > + * Without this the operation can timeout and we'll fallback to a
> > > + * software copy, which might take several minutes to finish.
> > > + */
> > > + nouveau_fence_wait(fence, false, false);
> > > ret = ttm_bo_move_accel_cleanup(bo,
> > > &fence->base,
> > > evict, false,
> > > --
> > > 2.37.1
> > >
> > >
> >
> > While this is marked for 5.15+ only, a user in Debian was seeing the
> > suspend issue as well on 5.10.y and did confirm the commit fixes the
> > issue as well in the 5.10.y series:
> >
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989705#69
> >
> > Karol, Lyude, should that as well be picked for 5.10.y?
> >
>
> mhh from the original report 5.10 was fine, but maybe something got
> backported and it broke it? I'll try to do some testing on my machine
> and see what I can figure out, but it could also be a debian only
> issue at this point.

Right, this is a possiblity, thanks for looking into it!

Computer Enthusiastic, can you verify the problem as well in a
non-Debian patched upstream kernel directly from the 5.10.y series
(latest 5.10.144) and verify the fix there?

Regards,
Salvatore

2022-09-30 21:25:44

by Computer Enthusiastic

[permalink] [raw]
Subject: Re: [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Hello,

Il giorno mar 20 set 2022 alle ore 13:59 Salvatore Bonaccorso
<[email protected]> ha scritto:
[..]
> Computer Enthusiastic, can you verify the problem as well in a
> non-Debian patched upstream kernel directly from the 5.10.y series
> (latest 5.10.144) and verify the fix there?
>
> Regards,
> Salvatore

I've tested the vanilla kernel 5.10.145 (it was the latest one week
ago) without Debian kernel patches, but using the kernel config file
from the latest kernel for Debian Stable:
- without the Karol's patch: it always fails both suspend to ram and
hibernate to disk with the usual behavior (a very long time to suspend
or hibernate, then it fails on resume with a garbled screen)
- with the Karol's patch: it succeeds both suspend and hibernate and
it correctly resumes afterwards.

The kernel was tested using the following graphic adapter:
Graphics: Device-1: NVIDIA G96CM [GeForce 9600M GT] driver: nouveau v: kernel
Device-2: Suyin Acer HD Crystal Eye webcam type: USB driver:
uvcvideo
Display: x11 server: X.Org 1.20.11 driver: loaded:
modesetting unloaded: fbdev,vesa
resolution: 1280x800~60Hz
OpenGL: renderer: NV96 v: 3.3 Mesa 20.3.5

Therefore, 5.10.y series of the kernel need to be patched to work
correctly at least with the aforementioned graphic card.

The script I used to compile the kernel are attached for further
reference and verification.

Hope that helps.


Attachments:
vanilla-kernel-build-5.10.145 (572.00 B)
vanilla-kernel-build-5.10.145-patched (781.00 B)
Download all attachments

2022-11-19 05:38:35

by Computer Enthusiastic

[permalink] [raw]
Subject: Re: [Nouveau] [PATCH] nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf

Hello,

Il giorno ven 19 ago 2022 alle ore 22:09 Karol Herbst
<[email protected]> ha scritto:
>
> It is a bit unlcear to us why that's helping, but it does and unbreaks
> suspend/resume on a lot of GPUs without any known drawbacks.
>
> Cc: [email protected] # v5.15+
> Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
> Signed-off-by: Karol Herbst <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 35bb0bb3fe61..126b3c6e12f9 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
> if (ret == 0) {
> ret = nouveau_fence_new(chan, false, &fence);
> if (ret == 0) {
> + /* TODO: figure out a better solution here
> + *
> + * wait on the fence here explicitly as going through
> + * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
> + *
> + * Without this the operation can timeout and we'll fallback to a
> + * software copy, which might take several minutes to finish.
> + */
> + nouveau_fence_wait(fence, false, false);
> ret = ttm_bo_move_accel_cleanup(bo,
> &fence->base,
> evict, false,
> --
> 2.37.1
>

Could it be possible to make land the aforementioned patch to the
5.10.x kernel version ? It is currently for >= 5.15.x kernel version
only.

Thanks.