2021-09-14 02:03:32

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH] platform/x86: amd-pmc: Increase the response register timeout

There have been reports of approximately a 0.9%-1.7% failure rate in SMU
communication timeouts with s0i3 entry on some OEM designs. Currently
the design in amd-pmc is to try every 100us for up to 20ms.

However the GPU driver which also communicates with the SMU using a
mailbox register which the driver polls every 1us for up to 2000ms.
In the GPU driver this was increased by commit 055162645a40 ("drm/amd/pm:
increase time out value when sending msg to SMU")

Increase the maximum timeout used by amd-pmc to 2000ms to match this
behavior. This has been shown to improve the stability for machines
that randomly have failures.

Cc: [email protected]
Reported-by: Julian Sikorski <[email protected]>
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1629
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/platform/x86/amd-pmc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
index 3481479a2942..d6a7c896ac86 100644
--- a/drivers/platform/x86/amd-pmc.c
+++ b/drivers/platform/x86/amd-pmc.c
@@ -71,7 +71,7 @@
#define AMD_CPU_ID_YC 0x14B5

#define PMC_MSG_DELAY_MIN_US 100
-#define RESPONSE_REGISTER_LOOP_MAX 200
+#define RESPONSE_REGISTER_LOOP_MAX 20000

#define SOC_SUBSYSTEM_IP_MAX 12
#define DELAY_MIN_US 2000
--
2.25.1


2021-09-14 08:40:26

by Shyam Sundar S K

[permalink] [raw]
Subject: Re: [PATCH] platform/x86: amd-pmc: Increase the response register timeout



On 9/14/2021 7:31 AM, Mario Limonciello wrote:
> There have been reports of approximately a 0.9%-1.7% failure rate in SMU
> communication timeouts with s0i3 entry on some OEM designs. Currently
> the design in amd-pmc is to try every 100us for up to 20ms.
>
> However the GPU driver which also communicates with the SMU using a
> mailbox register which the driver polls every 1us for up to 2000ms.
> In the GPU driver this was increased by commit 055162645a40 ("drm/amd/pm:
> increase time out value when sending msg to SMU")
>
> Increase the maximum timeout used by amd-pmc to 2000ms to match this
> behavior. This has been shown to improve the stability for machines
> that randomly have failures.
>
> Cc: [email protected]
> Reported-by: Julian Sikorski <[email protected]>
> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1629
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> drivers/platform/x86/amd-pmc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
> index 3481479a2942..d6a7c896ac86 100644
> --- a/drivers/platform/x86/amd-pmc.c
> +++ b/drivers/platform/x86/amd-pmc.c
> @@ -71,7 +71,7 @@
> #define AMD_CPU_ID_YC 0x14B5
>
> #define PMC_MSG_DELAY_MIN_US 100
> -#define RESPONSE_REGISTER_LOOP_MAX 200
> +#define RESPONSE_REGISTER_LOOP_MAX 20000
>
> #define SOC_SUBSYSTEM_IP_MAX 12
> #define DELAY_MIN_US 2000
>

Looks good to me.

Acked-by: Shyam Sundar S K <[email protected]>

2021-09-14 10:38:32

by Hans de Goede

[permalink] [raw]
Subject: Re: [PATCH] platform/x86: amd-pmc: Increase the response register timeout

Hi,

On 9/14/21 10:38 AM, Shyam Sundar S K wrote:
>
>
> On 9/14/2021 7:31 AM, Mario Limonciello wrote:
>> There have been reports of approximately a 0.9%-1.7% failure rate in SMU
>> communication timeouts with s0i3 entry on some OEM designs. Currently
>> the design in amd-pmc is to try every 100us for up to 20ms.
>>
>> However the GPU driver which also communicates with the SMU using a
>> mailbox register which the driver polls every 1us for up to 2000ms.
>> In the GPU driver this was increased by commit 055162645a40 ("drm/amd/pm:
>> increase time out value when sending msg to SMU")
>>
>> Increase the maximum timeout used by amd-pmc to 2000ms to match this
>> behavior. This has been shown to improve the stability for machines
>> that randomly have failures.
>>
>> Cc: [email protected]
>> Reported-by: Julian Sikorski <[email protected]>
>> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1629
>> Signed-off-by: Mario Limonciello <[email protected]>
>> ---
>> drivers/platform/x86/amd-pmc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/platform/x86/amd-pmc.c b/drivers/platform/x86/amd-pmc.c
>> index 3481479a2942..d6a7c896ac86 100644
>> --- a/drivers/platform/x86/amd-pmc.c
>> +++ b/drivers/platform/x86/amd-pmc.c
>> @@ -71,7 +71,7 @@
>> #define AMD_CPU_ID_YC 0x14B5
>>
>> #define PMC_MSG_DELAY_MIN_US 100
>> -#define RESPONSE_REGISTER_LOOP_MAX 200
>> +#define RESPONSE_REGISTER_LOOP_MAX 20000
>>
>> #define SOC_SUBSYSTEM_IP_MAX 12
>> #define DELAY_MIN_US 2000
>>
>
> Looks good to me.
>
> Acked-by: Shyam Sundar S K <[email protected]>

Thank you for your patch, I've applied this patch to my review-hans
branch:
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/log/?h=review-hans

I've also added this to the fixes branch so that it will be included
when I send my next fixes for 5.15 pull-req to Linus.

Once I've run some tests on this branch the patches there will be
added to the platform-drivers-x86/for-next branch and eventually
will be included in the pdx86 pull-request to Linus for the next
merge-window.

Regards,

Hans