2023-10-05 19:09:03

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v2 0/2] Fix Navi3x boot and hotplug problems

On some OEM systems multiple navi3x dGPUS are triggering RAS errors
and BACO errors.

These errors come from elements of the OEM system that weren't part of
original test environment. This series addresses those problems.

NOTE: Although this series touches two subsystems, I would prefer to
take this all through DRM because there is a workaround in
amd-staging-drm-next that I would like to be reverted at the same
time as picking up the fix.

v1->v2:
* Drop _PR3 patch from series, it was cherry picked and is on it's way
to 6.6-rcX already.
* Rather than changing global policy, fix the problematic power supply
driver.
v1: https://lore.kernel.org/linux-pm/[email protected]/

Mario Limonciello (2):
usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power
supply scope
Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
13.0.0"

drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
drivers/usb/typec/ucsi/psy.c | 10 ++++++++++
3 files changed, 13 insertions(+), 1 deletion(-)

--
2.34.1


2023-10-05 19:09:07

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope

On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs
at a black screen on startup. This issue occurs only if `ucsi_acpi` has
loaded before `amdgpu` has loaded. The reason for this failure is that
`amdgpu` uses power_supply_is_system_supplied() to determine if running
on AC or DC power at startup. If this value is reported incorrectly the
dGPU will also be programmed incorrectly and trigger errors.

power_supply_is_system_supplied() reports the wrong value because UCSI
power supplies provided as part of the system don't properly report the
scope as "DEVICE" scope (not powering the system).

In order to fix this issue check the capabilities reported from the UCSI
power supply to ensure that it supports charging a battery and that it can
be powered by AC. Mark the scope accordingly.

Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope")
Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28
Signed-off-by: Mario Limonciello <[email protected]>
---
Cc: Kai-Heng Feng <[email protected]>
Cc: Alex Deucher <[email protected]>>
Cc: Richard Gong <[email protected]>
---
drivers/usb/typec/ucsi/psy.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c
index 384b42267f1f..b35c6e07911e 100644
--- a/drivers/usb/typec/ucsi/psy.c
+++ b/drivers/usb/typec/ucsi/psy.c
@@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con,
struct device *dev = con->ucsi->dev;

device_property_read_u8(dev, "scope", &scope);
+ if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) {
+ u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY |
+ UCSI_CAP_ATTR_BATTERY_CHARGING;
+
+ if (con->ucsi->cap.attributes & mask)
+ scope = POWER_SUPPLY_SCOPE_SYSTEM;
+ else
+ scope = POWER_SUPPLY_SCOPE_DEVICE;
+ }
val->intval = scope;
return 0;
}
--
2.34.1

2023-10-05 19:09:10

by Mario Limonciello

[permalink] [raw]
Subject: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0"

This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23.

Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Mario Limonciello <[email protected]>
---
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 08cb9f8ce64e..9b62b45ebb7f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -1026,7 +1026,8 @@ static int smu_v13_0_process_pending_interrupt(struct smu_context *smu)
{
int ret = 0;

- if (smu_cmn_feature_is_enabled(smu, SMU_FEATURE_ACDC_BIT))
+ if (smu->dc_controlled_by_gpio &&
+ smu_cmn_feature_is_enabled(smu, SMU_FEATURE_ACDC_BIT))
ret = smu_v13_0_allow_ih_interrupt(smu);

return ret;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index 07df5be063e2..0fb6be11a0cc 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2662,6 +2662,7 @@ static const struct pptable_funcs smu_v13_0_0_ppt_funcs = {
.enable_mgpu_fan_boost = smu_v13_0_0_enable_mgpu_fan_boost,
.get_power_limit = smu_v13_0_0_get_power_limit,
.set_power_limit = smu_v13_0_set_power_limit,
+ .set_power_source = smu_v13_0_set_power_source,
.get_power_profile_mode = smu_v13_0_0_get_power_profile_mode,
.set_power_profile_mode = smu_v13_0_0_set_power_profile_mode,
.run_btc = smu_v13_0_run_btc,
--
2.34.1

2023-10-05 19:12:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0"

On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote:
> This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23.
>
> Reviewed-by: Alex Deucher <[email protected]>
> Signed-off-by: Mario Limonciello <[email protected]>

No explaination as to why this needs to be reverted? And does this need
to be backported anywhere?

thanks,

greg k-h

2023-10-05 19:13:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope

On Thu, Oct 05, 2023 at 12:52:29PM -0500, Mario Limonciello wrote:
> On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs
> at a black screen on startup. This issue occurs only if `ucsi_acpi` has
> loaded before `amdgpu` has loaded. The reason for this failure is that
> `amdgpu` uses power_supply_is_system_supplied() to determine if running
> on AC or DC power at startup. If this value is reported incorrectly the
> dGPU will also be programmed incorrectly and trigger errors.
>
> power_supply_is_system_supplied() reports the wrong value because UCSI
> power supplies provided as part of the system don't properly report the
> scope as "DEVICE" scope (not powering the system).
>
> In order to fix this issue check the capabilities reported from the UCSI
> power supply to ensure that it supports charging a battery and that it can
> be powered by AC. Mark the scope accordingly.
>
> Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope")
> Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> Cc: Kai-Heng Feng <[email protected]>
> Cc: Alex Deucher <[email protected]>>
> Cc: Richard Gong <[email protected]>
> ---
> drivers/usb/typec/ucsi/psy.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c
> index 384b42267f1f..b35c6e07911e 100644
> --- a/drivers/usb/typec/ucsi/psy.c
> +++ b/drivers/usb/typec/ucsi/psy.c
> @@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con,
> struct device *dev = con->ucsi->dev;
>
> device_property_read_u8(dev, "scope", &scope);
> + if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) {
> + u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY |
> + UCSI_CAP_ATTR_BATTERY_CHARGING;
> +
> + if (con->ucsi->cap.attributes & mask)
> + scope = POWER_SUPPLY_SCOPE_SYSTEM;
> + else
> + scope = POWER_SUPPLY_SCOPE_DEVICE;
> + }
> val->intval = scope;
> return 0;
> }
> --
> 2.34.1
>
>

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
a patch that has triggered this response. He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created. Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
older released kernel, yet you do not have a cc: stable line in the
signed-off-by area at all, which means that the patch will not be
applied to any older kernel releases. To properly fix this, please
follow the documented rules in the
Documentation/process/stable-kernel-rules.rst file for how to resolve
this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

2023-10-05 19:16:04

by Mario Limonciello

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0"

On 10/5/2023 14:12, Greg Kroah-Hartman wrote:
> On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote:
>> This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23.
>>
>> Reviewed-by: Alex Deucher <[email protected]>
>> Signed-off-by: Mario Limonciello <[email protected]>
>
> No explaination as to why this needs to be reverted? And does this need
> to be backported anywhere?
>
> thanks,
>
> greg k-h

No need to be backported anywhere. The commit is only in
amd-staging-drm-next right now.

I think it's up to whether Alex includes the workaround commit in the
final 6.7 pull request. If he does, then yeah this could use a larger
write up to explain why it went in and out.

I was sort of thinking we could land both commits amd-staging-drm-next
and then when Alex did the pull request the workaround commit just
wouldn't be part of the 6.7 PR since it's a no-op with the revert.

2023-10-05 19:17:39

by Alex Deucher

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] Revert "drm/amd/pm: workaround for the wrong ac power detection on smu 13.0.0"

On Thu, Oct 5, 2023 at 3:13 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Thu, Oct 05, 2023 at 12:52:30PM -0500, Mario Limonciello wrote:
> > This reverts commit 0e5e1a84f0b8c814d502a135824244127fed8f23.
> >
> > Reviewed-by: Alex Deucher <[email protected]>
> > Signed-off-by: Mario Limonciello <[email protected]>
>
> No explaination as to why this needs to be reverted? And does this need
> to be backported anywhere?

This patch ultimately never went upstream, but there was some
confusion about whether it did or not. It can be ignored.

Alex

2023-10-05 20:35:24

by Sebastian Reichel

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] usb: typec: ucsi: Use GET_CAPABILITY attributes data to set power supply scope

Hi,

On Thu, Oct 05, 2023 at 12:52:29PM -0500, Mario Limonciello wrote:
> On some OEM systems, adding a W7900 dGPU triggers RAS errors and hangs
> at a black screen on startup. This issue occurs only if `ucsi_acpi` has
> loaded before `amdgpu` has loaded. The reason for this failure is that
> `amdgpu` uses power_supply_is_system_supplied() to determine if running
> on AC or DC power at startup. If this value is reported incorrectly the
> dGPU will also be programmed incorrectly and trigger errors.
>
> power_supply_is_system_supplied() reports the wrong value because UCSI
> power supplies provided as part of the system don't properly report the
> scope as "DEVICE" scope (not powering the system).
>
> In order to fix this issue check the capabilities reported from the UCSI
> power supply to ensure that it supports charging a battery and that it can
> be powered by AC. Mark the scope accordingly.
>
> Fixes: a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope")
> Link: https://www.intel.com/content/www/us/en/products/docs/io/universal-serial-bus/usb-type-c-ucsi-spec.html p28
> Signed-off-by: Mario Limonciello <[email protected]>
> ---
> Cc: Kai-Heng Feng <[email protected]>
> Cc: Alex Deucher <[email protected]>>
> Cc: Richard Gong <[email protected]>
> ---
> drivers/usb/typec/ucsi/psy.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/usb/typec/ucsi/psy.c b/drivers/usb/typec/ucsi/psy.c
> index 384b42267f1f..b35c6e07911e 100644
> --- a/drivers/usb/typec/ucsi/psy.c
> +++ b/drivers/usb/typec/ucsi/psy.c
> @@ -37,6 +37,15 @@ static int ucsi_psy_get_scope(struct ucsi_connector *con,
> struct device *dev = con->ucsi->dev;
>
> device_property_read_u8(dev, "scope", &scope);
> + if (scope == POWER_SUPPLY_SCOPE_UNKNOWN) {
> + u32 mask = UCSI_CAP_ATTR_POWER_AC_SUPPLY |
> + UCSI_CAP_ATTR_BATTERY_CHARGING;
> +
> + if (con->ucsi->cap.attributes & mask)
> + scope = POWER_SUPPLY_SCOPE_SYSTEM;
> + else
> + scope = POWER_SUPPLY_SCOPE_DEVICE;
> + }
> val->intval = scope;
> return 0;
> }

Reviewed-by: Sebastian Reichel <[email protected]>

-- Sebastian


Attachments:
(No filename) (2.17 kB)
signature.asc (849.00 B)
Download all attachments