2021-06-24 11:21:12

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH] drm/amdgpu: Fix resource leak on probe error path

From: Jiri Kosina <[email protected]>

This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.

It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.

What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.

Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: Vojtech Pavlik <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 57ec108b5972..0f1c0e17a587 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3414,13 +3414,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
r = amdgpu_device_get_job_timeout_settings(adev);
if (r) {
dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
- goto failed_unmap;
+ return r;
}

/* early init functions */
r = amdgpu_device_ip_early_init(adev);
if (r)
- goto failed_unmap;
+ return r;

/* doorbell bar mapping and doorbell index init*/
amdgpu_device_doorbell_init(adev);
@@ -3646,10 +3646,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
failed:
amdgpu_vf_error_trans_all(adev);

-failed_unmap:
- iounmap(adev->rmmio);
- adev->rmmio = NULL;
-
return r;
}

--
2.12.3



2021-07-01 09:20:36

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: Fix resource leak on probe error path

On Thu, 24 Jun 2021, Jiri Kosina wrote:

> From: Jiri Kosina <[email protected]>
>
> This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.
>
> It is not true (as stated in the reverted commit changelog) that we never
> unmap the BAR on failure; it actually does happen properly on
> amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
> amdgpu_device_fini() error path.
>
> What's worse, this commit actually completely breaks resource freeing on
> probe failure (like e.g. failure to load microcode), as
> amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
> early, leaving all the resources that'd normally be freed in
> amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
> to all sorts of oopses when someone tries to, for example, access the
> sysfs and procfs resources which are still around while the driver is
> gone.
>
> Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
> Reported-by: Vojtech Pavlik <[email protected]>
> Signed-off-by: Jiri Kosina <[email protected]>

Friendly ping on this one (as it's easily triggerable in the wild by just
missing proper firwmare).

Thanks.

> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 57ec108b5972..0f1c0e17a587 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3414,13 +3414,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> r = amdgpu_device_get_job_timeout_settings(adev);
> if (r) {
> dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> - goto failed_unmap;
> + return r;
> }
>
> /* early init functions */
> r = amdgpu_device_ip_early_init(adev);
> if (r)
> - goto failed_unmap;
> + return r;
>
> /* doorbell bar mapping and doorbell index init*/
> amdgpu_device_doorbell_init(adev);
> @@ -3646,10 +3646,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> failed:
> amdgpu_vf_error_trans_all(adev);
>
> -failed_unmap:
> - iounmap(adev->rmmio);
> - adev->rmmio = NULL;
> -
> return r;
> }
>
> --
> 2.12.3
>
>

--
Jiri Kosina
SUSE Labs

2021-07-01 14:10:58

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: Fix resource leak on probe error path

Applied. Thanks!

Alex

On Thu, Jul 1, 2021 at 4:32 AM Jiri Kosina <[email protected]> wrote:
>
> On Thu, 24 Jun 2021, Jiri Kosina wrote:
>
> > From: Jiri Kosina <[email protected]>
> >
> > This reverts commit 4192f7b5768912ceda82be2f83c87ea7181f9980.
> >
> > It is not true (as stated in the reverted commit changelog) that we never
> > unmap the BAR on failure; it actually does happen properly on
> > amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
> > amdgpu_device_fini() error path.
> >
> > What's worse, this commit actually completely breaks resource freeing on
> > probe failure (like e.g. failure to load microcode), as
> > amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
> > early, leaving all the resources that'd normally be freed in
> > amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
> > to all sorts of oopses when someone tries to, for example, access the
> > sysfs and procfs resources which are still around while the driver is
> > gone.
> >
> > Fixes: 4192f7b57689 ("drm/amdgpu: unmap register bar on device init failure")
> > Reported-by: Vojtech Pavlik <[email protected]>
> > Signed-off-by: Jiri Kosina <[email protected]>
>
> Friendly ping on this one (as it's easily triggerable in the wild by just
> missing proper firwmare).
>
> Thanks.
>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++------
> > 1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 57ec108b5972..0f1c0e17a587 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3414,13 +3414,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > r = amdgpu_device_get_job_timeout_settings(adev);
> > if (r) {
> > dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> > - goto failed_unmap;
> > + return r;
> > }
> >
> > /* early init functions */
> > r = amdgpu_device_ip_early_init(adev);
> > if (r)
> > - goto failed_unmap;
> > + return r;
> >
> > /* doorbell bar mapping and doorbell index init*/
> > amdgpu_device_doorbell_init(adev);
> > @@ -3646,10 +3646,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > failed:
> > amdgpu_vf_error_trans_all(adev);
> >
> > -failed_unmap:
> > - iounmap(adev->rmmio);
> > - adev->rmmio = NULL;
> > -
> > return r;
> > }
> >
> > --
> > 2.12.3
> >
> >
>
> --
> Jiri Kosina
> SUSE Labs
>