I've recently been seeing some unexplained GSP errors on my RTX 6000 from
failed aux transactions:
[ 132.915867] nouveau 0000:1f:00.0: gsp: cli:0xc1d00002 obj:0x00730000
ctrl cmd:0x00731341 failed: 0x0000ffff
While the cause of these is not yet clear, these messages made me notice
that the aux transactions causing these transactions were succeeding - not
failing. As it turns out, this is because we're currently not returning the
correct variable when r535_dp_aux_xfer() hits an error - causing us to
never propagate GSP errors for failed aux transactions to userspace.
So, let's fix that.
Fixes: 4ae3a20102b2 ("nouveau/gsp: don't free ctrl messages on errors")
Signed-off-by: Lyude Paul <[email protected]>
---
drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
index 6a0a4d3b8902d..027867c2a8c5b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
@@ -1080,7 +1080,7 @@ r535_dp_aux_xfer(struct nvkm_outp *outp, u8 type, u32 addr, u8 *data, u8 *psize)
ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
if (ret) {
nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
- return PTR_ERR(ctrl);
+ return ret;
}
memcpy(data, ctrl->data, size);
--
2.43.0
Reviewed-by: Dave Airlie <[email protected]>
On Sat, Mar 16, 2024 at 7:21 AM Lyude Paul <[email protected]> wrote:
>
> I've recently been seeing some unexplained GSP errors on my RTX 6000 from
> failed aux transactions:
>
> [ 132.915867] nouveau 0000:1f:00.0: gsp: cli:0xc1d00002 obj:0x00730000
> ctrl cmd:0x00731341 failed: 0x0000ffff
>
> While the cause of these is not yet clear, these messages made me notice
> that the aux transactions causing these transactions were succeeding - not
> failing. As it turns out, this is because we're currently not returning the
> correct variable when r535_dp_aux_xfer() hits an error - causing us to
> never propagate GSP errors for failed aux transactions to userspace.
>
> So, let's fix that.
>
> Fixes: 4ae3a20102b2 ("nouveau/gsp: don't free ctrl messages on errors")
> Signed-off-by: Lyude Paul <[email protected]>
> ---
> drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
> index 6a0a4d3b8902d..027867c2a8c5b 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/r535.c
> @@ -1080,7 +1080,7 @@ r535_dp_aux_xfer(struct nvkm_outp *outp, u8 type, u32 addr, u8 *data, u8 *psize)
> ret = nvkm_gsp_rm_ctrl_push(&disp->rm.objcom, &ctrl, sizeof(*ctrl));
> if (ret) {
> nvkm_gsp_rm_ctrl_done(&disp->rm.objcom, ctrl);
> - return PTR_ERR(ctrl);
> + return ret;
> }
>
> memcpy(data, ctrl->data, size);
> --
> 2.43.0
>