[adding a bunch of list and people as well as Timur Tabi, who authored
the culprit]
Sid Pranjale, thx for the report. FWIW, I'm just replying to add this to
the regression tracking to ensure it does not fall through the cracks.
Nevertheless let me mention two things while at it:
On 29.02.24 18:58, Sid Pranjale wrote:
> Nouveau deallocates a few buffers post GPU init which are required for GPU suspend/resume to function correctly.
> This is likely not as big an issue on systems where the NVGPU is the only GPU, but on multi-GPU set ups it leads to a regression where the kernel module errors and results in a system-wide rendering freeze.
These lines are too long, see
Documentation/process/submitting-patches.rst for details.
> This commit addresses that regression by moving the two buffers required for suspend and resume to be deallocated at driver unload instead of post init.
>
> Fixes: 042b5f8 ("drm/nouveau: fix several DMA buffer leaks")
And that should be:
Fixes: 042b5f83841fbf ("drm/nouveau: fix several DMA buffer leaks")
> Signed-off-by: Sid Pranjale <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> index a64c81385..a73a5b589 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
> @@ -1054,8 +1054,6 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
> /* Release the DMA buffers that were needed only for boot and init */
> nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw);
> nvkm_gsp_mem_dtor(gsp, &gsp->libos);
> - nvkm_gsp_mem_dtor(gsp, &gsp->rmargs);
> - nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta);
>
> return ret;
> }
> @@ -2163,6 +2161,8 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
>
> r535_gsp_dtor_fws(gsp);
>
> + nvkm_gsp_mem_dtor(gsp, &gsp->rmargs);
> + nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta);
> nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem);
> nvkm_gsp_mem_dtor(gsp, &gsp->loginit);
> nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
To be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced 042b5f83841fbf
#regzbot title drm/nouveau: rendering freezes with multi-GPU setup
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.