2023-03-10 07:41:23

by 李真能

[permalink] [raw]
Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland

During reboot test on arm64 platform, it may failure
on boot.

The error message are as follows:
[ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR*
late_init of IP block <si_dpm> failed -22
[ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed
[ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
---
drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index d6d9e3b1b2c0..dee51c757ac0 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
if (!adev->pm.dpm_enabled)
return 0;

- ret = si_set_temperature_range(adev);
- if (ret)
- return ret;
#if 0 //TODO ?
si_dpm_powergate_uvd(adev, true);
#endif
--
2.25.1



2023-03-10 07:41:27

by 李真能

[permalink] [raw]
Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland

During reboot test on arm64 platform, it may failure
on boot.

The error message are as follows:
[ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR*
late_init of IP block <si_dpm> failed -22
[ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed
[ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
---
drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index d6d9e3b1b2c0..dee51c757ac0 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
if (!adev->pm.dpm_enabled)
return 0;

- ret = si_set_temperature_range(adev);
- if (ret)
- return ret;
#if 0 //TODO ?
si_dpm_powergate_uvd(adev, true);
#endif
--
2.25.1


2023-03-10 08:19:01

by Chen, Guchun

[permalink] [raw]
Subject: RE: [PATCH] drm/amdgpu: resove reboot exception for si oland


> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of
> Zhenneng Li
> Sent: Friday, March 10, 2023 3:40 PM
> To: Deucher, Alexander <[email protected]>
> Cc: David Airlie <[email protected]>; Pan, Xinhui <[email protected]>;
> [email protected]; [email protected]; Zhenneng Li
> <[email protected]>; [email protected]; Daniel Vetter
> <[email protected]>; Koenig, Christian <[email protected]>
> Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
>
> During reboot test on arm64 platform, it may failure on boot.
>
> The error message are as follows:
> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
> late_init of IP block <si_dpm> failed -22
> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> failed
> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> ---
> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..dee51c757ac0 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
> if (!adev->pm.dpm_enabled)
> return 0;
>
> - ret = si_set_temperature_range(adev);
> - if (ret)
> - return ret;

si_set_temperature_range should be platform agnostic. Can you please elaborate more?

Regards,
Guchun

> #if 0 //TODO ?
> si_dpm_powergate_uvd(adev, true);
> #endif
> --
> 2.25.1


2023-03-10 15:33:58

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: resove reboot exception for si oland

On Fri, Mar 10, 2023 at 3:18 AM Chen, Guchun <[email protected]> wrote:
>
>
> > -----Original Message-----
> > From: amd-gfx <[email protected]> On Behalf Of
> > Zhenneng Li
> > Sent: Friday, March 10, 2023 3:40 PM
> > To: Deucher, Alexander <[email protected]>
> > Cc: David Airlie <[email protected]>; Pan, Xinhui <[email protected]>;
> > [email protected]; [email protected]; Zhenneng Li
> > <[email protected]>; [email protected]; Daniel Vetter
> > <[email protected]>; Koenig, Christian <[email protected]>
> > Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
> >
> > During reboot test on arm64 platform, it may failure on boot.
> >
> > The error message are as follows:
> > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> > *ERROR*
> > late_init of IP block <si_dpm> failed -22
> > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> > failed
> > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> > ---
> > drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > index d6d9e3b1b2c0..dee51c757ac0 100644
> > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> > @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
> > if (!adev->pm.dpm_enabled)
> > return 0;
> >
> > - ret = si_set_temperature_range(adev);
> > - if (ret)
> > - return ret;
>
> si_set_temperature_range should be platform agnostic. Can you please elaborate more?
>

Yes. Not setting this means we won't get thermal interrupts. We
shouldn't skip this.

Alex


> Regards,
> Guchun
>
> > #if 0 //TODO ?
> > si_dpm_powergate_uvd(adev, true);
> > #endif
> > --
> > 2.25.1
>

2023-03-13 01:05:15

by 李真能

[permalink] [raw]
Subject: Re: [PATCH] drm/amdgpu: resove reboot exception for si oland

This bug is first reported here:

https://lore.kernel.org/lkml/[email protected]/

I modify the patch accroding mail list's discusstion,   and I do reboot
test for tens of thousands of times about 10 machines on arm64,  there's
no bug reported.

在 2023/3/10 16:18, Chen, Guchun 写道:
>> -----Original Message-----
>> From: amd-gfx <[email protected]> On Behalf Of
>> Zhenneng Li
>> Sent: Friday, March 10, 2023 3:40 PM
>> To: Deucher, Alexander <[email protected]>
>> Cc: David Airlie <[email protected]>; Pan, Xinhui <[email protected]>;
>> [email protected]; [email protected]; Zhenneng Li
>> <[email protected]>; [email protected]; Daniel Vetter
>> <[email protected]>; Koenig, Christian <[email protected]>
>> Subject: [PATCH] drm/amdgpu: resove reboot exception for si oland
>>
>> During reboot test on arm64 platform, it may failure on boot.
>>
>> The error message are as follows:
>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
>> *ERROR*
>> late_init of IP block <si_dpm> failed -22
>> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
>> failed
>> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
>> ---
>> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
>> 1 file changed, 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
>> index d6d9e3b1b2c0..dee51c757ac0 100644
>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
>> @@ -7632,9 +7632,6 @@ static int si_dpm_late_init(void *handle)
>> if (!adev->pm.dpm_enabled)
>> return 0;
>>
>> - ret = si_set_temperature_range(adev);
>> - if (ret)
>> - return ret;
> si_set_temperature_range should be platform agnostic. Can you please elaborate more?
>
> Regards,
> Guchun
>
>> #if 0 //TODO ?
>> si_dpm_powergate_uvd(adev, true);
>> #endif
>> --
>> 2.25.1