On some OEM systems multiple navi3x dGPUS are triggering RAS errors
and BACO errors.
These errors come from elements of the OEM system that weren't part of
original test environment. This series addresses those problems.
NOTE: Although this series touches two subsystems, I would prefer to
take this all through DRM because there is a workaround in linux-next
that I would like to be reverted at the same time as picking up the first
two patches.
Mario Limonciello (3):
drm/amd: Fix detection of _PR3 on the PCIe root port
power: supply: Don't count 'unknown' scope power supplies
Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
13.0.0"
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
drivers/power/supply/power_supply_core.c | 2 +-
4 files changed, 5 insertions(+), 3 deletions(-)
--
2.34.1
On 9/28/2023 13:00, Alex Deucher wrote:
> On Thu, Sep 28, 2023 at 12:41 PM Mario Limonciello
> <[email protected]> wrote:
>>
>> On some OEM systems multiple navi3x dGPUS are triggering RAS errors
>> and BACO errors.
>>
>> These errors come from elements of the OEM system that weren't part of
>> original test environment. This series addresses those problems.
>>
>> NOTE: Although this series touches two subsystems, I would prefer to
>> take this all through DRM because there is a workaround in linux-next
>> that I would like to be reverted at the same time as picking up the first
>> two patches.
>
> FWIW, the workaround is not in linux-next yet. At the time I thought
> it was already fixed by the fixes in ucsi and power supply when we
> first encountered this.
I looked yesterday and I did see it there, but I think it was
specifically because it had merged the amd-staging-drm-next tree.
It's not there today..
If Sebastian is OK, I'd still rather keep it all together so that people
testing amd-staging-drm-next get the fixes.
>
> Alex
>
>>
>> Mario Limonciello (3):
>> drm/amd: Fix detection of _PR3 on the PCIe root port
>> power: supply: Don't count 'unknown' scope power supplies
>> Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
>> 13.0.0"
>>
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
>> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
>> drivers/power/supply/power_supply_core.c | 2 +-
>> 4 files changed, 5 insertions(+), 3 deletions(-)
>>
>> --
>> 2.34.1
>>
On Thu, Sep 28, 2023 at 12:41 PM Mario Limonciello
<[email protected]> wrote:
>
> On some OEM systems multiple navi3x dGPUS are triggering RAS errors
> and BACO errors.
>
> These errors come from elements of the OEM system that weren't part of
> original test environment. This series addresses those problems.
>
> NOTE: Although this series touches two subsystems, I would prefer to
> take this all through DRM because there is a workaround in linux-next
> that I would like to be reverted at the same time as picking up the first
> two patches.
FWIW, the workaround is not in linux-next yet. At the time I thought
it was already fixed by the fixes in ucsi and power supply when we
first encountered this.
Alex
>
> Mario Limonciello (3):
> drm/amd: Fix detection of _PR3 on the PCIe root port
> power: supply: Don't count 'unknown' scope power supplies
> Revert "drm/amd/pm: workaround for the wrong ac power detection on smu
> 13.0.0"
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 3 ++-
> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 1 +
> drivers/power/supply/power_supply_core.c | 2 +-
> 4 files changed, 5 insertions(+), 3 deletions(-)
>
> --
> 2.34.1
>