2020-08-24 10:58:31

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

From: Joerg Roedel <[email protected]>

Hi,

Some IOMMUv2 capable devices do not work correctly when SME is
active, because their DMA mask does not include the encryption bit, so
that they can not DMA to encrypted memory directly.

The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
capable devices into an identity mapped domain. Fix that by not
forcing an identity mapped domain on devices when SME is active and
forbid using their IOMMUv2 functionality.

Please review.

Thanks,

Joerg

Joerg Roedel (2):
iommu/amd: Do not force direct mapping when SME is active
iommu/amd: Do not use IOMMUv2 functionality when SME is active

drivers/iommu/amd/iommu.c | 7 ++++++-
drivers/iommu/amd/iommu_v2.c | 7 +++++++
2 files changed, 13 insertions(+), 1 deletion(-)

--
2.28.0


2020-08-26 15:23:29

by Felix Kuehling

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

[+Ray]


Thanks for the heads up. Currently KFD won't work on APUs when IOMMUv2
is disabled. But Ray is working on fallbacks that will allow KFD to work
on APUs even without IOMMUv2, similar to our dGPUs. Along with changes
in ROCm user mode, those fallbacks are necessary for making ROCm on APUs
generally useful.


How common is SME on typical PCs or laptops that would use AMD APUs?


Alex, do you know if anyone has tested amdgpu on an APU with SME
enabled? Is this considered something we support?


Thanks,
  Felix


Am 2020-08-26 um 10:14 a.m. schrieb Deucher, Alexander:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> + Felix
> ------------------------------------------------------------------------
> *From:* Joerg Roedel <[email protected]>
> *Sent:* Monday, August 24, 2020 6:54 AM
> *To:* [email protected] <[email protected]>
> *Cc:* Joerg Roedel <[email protected]>; [email protected]
> <[email protected]>; Lendacky, Thomas <[email protected]>;
> Suthikulpanit, Suravee <[email protected]>; Deucher,
> Alexander <[email protected]>; [email protected]
> <[email protected]>
> *Subject:* [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active
>  
> From: Joerg Roedel <[email protected]>
>
> Hi,
>
> Some IOMMUv2 capable devices do not work correctly when SME is
> active, because their DMA mask does not include the encryption bit, so
> that they can not DMA to encrypted memory directly.
>
> The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
> capable devices into an identity mapped domain. Fix that by not
> forcing an identity mapped domain on devices when SME is active and
> forbid using their IOMMUv2 functionality.
>
> Please review.
>
> Thanks,
>
>         Joerg
>
> Joerg Roedel (2):
>   iommu/amd: Do not force direct mapping when SME is active
>   iommu/amd: Do not use IOMMUv2 functionality when SME is active
>
>  drivers/iommu/amd/iommu.c    | 7 ++++++-
>  drivers/iommu/amd/iommu_v2.c | 7 +++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> --
> 2.28.0
>

2020-08-26 15:27:59

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

[AMD Public Use]

+ Christian

> -----Original Message-----
> From: Kuehling, Felix <[email protected]>
> Sent: Wednesday, August 26, 2020 11:22 AM
> To: Deucher, Alexander <[email protected]>; Joerg Roedel
> <[email protected]>; [email protected]; Huang, Ray
> <[email protected]>
> Cc: [email protected]; Lendacky, Thomas <[email protected]>;
> Suthikulpanit, Suravee <[email protected]>; linux-
> [email protected]
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> [+Ray]
>
>
> Thanks for the heads up. Currently KFD won't work on APUs when IOMMUv2
> is disabled. But Ray is working on fallbacks that will allow KFD to work on
> APUs even without IOMMUv2, similar to our dGPUs. Along with changes in
> ROCm user mode, those fallbacks are necessary for making ROCm on APUs
> generally useful.
>
>
> How common is SME on typical PCs or laptops that would use AMD APUs?

I think the hw supports it, but it as far as I know it's not formally productized on client parts.

>
>
> Alex, do you know if anyone has tested amdgpu on an APU with SME
> enabled? Is this considered something we support?

It's not something we've tested. I'm not even sure the GPU portion of APUs will work properly without an identity mapping. SME should work properly with dGPUs however, so this is a proper fix for them. We don't use the IOMMUv2 path on dGPUs at all.

Alex

>
>
> Thanks,
>   Felix
>
>
> Am 2020-08-26 um 10:14 a.m. schrieb Deucher, Alexander:
> >
> > [AMD Official Use Only - Internal Distribution Only]
> >
> >
> > + Felix
> > ----------------------------------------------------------------------
> > --
> > *From:* Joerg Roedel <[email protected]>
> > *Sent:* Monday, August 24, 2020 6:54 AM
> > *To:* [email protected]
> > <[email protected]>
> > *Cc:* Joerg Roedel <[email protected]>; [email protected]
> > <[email protected]>; Lendacky, Thomas <[email protected]>;
> > Suthikulpanit, Suravee <[email protected]>; Deucher,
> > Alexander <[email protected]>; [email protected]
> > <[email protected]>
> > *Subject:* [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> > active
> >
> > From: Joerg Roedel <[email protected]>
> >
> > Hi,
> >
> > Some IOMMUv2 capable devices do not work correctly when SME is active,
> > because their DMA mask does not include the encryption bit, so that
> > they can not DMA to encrypted memory directly.
> >
> > The IOMMU can jump in here, but the AMD IOMMU driver puts IOMMUv2
> > capable devices into an identity mapped domain. Fix that by not
> > forcing an identity mapped domain on devices when SME is active and
> > forbid using their IOMMUv2 functionality.
> >
> > Please review.
> >
> > Thanks,
> >
> >         Joerg
> >
> > Joerg Roedel (2):
> >   iommu/amd: Do not force direct mapping when SME is active
> >   iommu/amd: Do not use IOMMUv2 functionality when SME is active
> >
> >  drivers/iommu/amd/iommu.c    | 7 ++++++-
> >  drivers/iommu/amd/iommu_v2.c | 7 +++++++
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > --
> > 2.28.0
> >

2020-08-28 13:48:32

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

On Wed, Aug 26, 2020 at 03:25:58PM +0000, Deucher, Alexander wrote:
> > Alex, do you know if anyone has tested amdgpu on an APU with SME
> > enabled? Is this considered something we support?
>
> It's not something we've tested. I'm not even sure the GPU portion of
> APUs will work properly without an identity mapping. SME should work
> properly with dGPUs however, so this is a proper fix for them. We
> don't use the IOMMUv2 path on dGPUs at all.

Is it possible to make the IOMMUv2 paths optional on iGPUs as well when
SME is active (or better, when the GPU is not identity mapped)?

Regards,

Joerg

2020-08-28 13:57:23

by Felix Kuehling

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

Am 2020-08-28 um 9:46 a.m. schrieb [email protected]:
> On Wed, Aug 26, 2020 at 03:25:58PM +0000, Deucher, Alexander wrote:
>>> Alex, do you know if anyone has tested amdgpu on an APU with SME
>>> enabled? Is this considered something we support?
>> It's not something we've tested. I'm not even sure the GPU portion of
>> APUs will work properly without an identity mapping. SME should work
>> properly with dGPUs however, so this is a proper fix for them. We
>> don't use the IOMMUv2 path on dGPUs at all.
> Is it possible to make the IOMMUv2 paths optional on iGPUs as well when
> SME is active (or better, when the GPU is not identity mapped)?

Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not
needed for graphics. And we're making it optional for KFD as well.

The question Alex and I raised here is more general. We may have some
assumptions in the amdgpu driver that are broken when the framebuffer is
not identity mapped. This would break the iGPU in a more general sense,
regardless of KFD and IOMMUv2. In that case, we don't really need to
worry about breaking KFD because we have a much bigger problem.

Regards,
  Felix


>
> Regards,
>
> Joerg

2020-08-28 15:13:17

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

[AMD Public Use]

> -----Original Message-----
> From: Kuehling, Felix <[email protected]>
> Sent: Friday, August 28, 2020 9:55 AM
> To: [email protected]; Deucher, Alexander <[email protected]>
> Cc: Joerg Roedel <[email protected]>; [email protected];
> Huang, Ray <[email protected]>; Koenig, Christian
> <[email protected]>; Lendacky, Thomas
> <[email protected]>; Suthikulpanit, Suravee
> <[email protected]>; [email protected]
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> Am 2020-08-28 um 9:46 a.m. schrieb [email protected]:
> > On Wed, Aug 26, 2020 at 03:25:58PM +0000, Deucher, Alexander wrote:
> >>> Alex, do you know if anyone has tested amdgpu on an APU with SME
> >>> enabled? Is this considered something we support?
> >> It's not something we've tested. I'm not even sure the GPU portion
> >> of APUs will work properly without an identity mapping. SME should
> >> work properly with dGPUs however, so this is a proper fix for them.
> >> We don't use the IOMMUv2 path on dGPUs at all.
> > Is it possible to make the IOMMUv2 paths optional on iGPUs as well
> > when SME is active (or better, when the GPU is not identity mapped)?
>
> Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not needed
> for graphics. And we're making it optional for KFD as well.
>
> The question Alex and I raised here is more general. We may have some
> assumptions in the amdgpu driver that are broken when the framebuffer is
> not identity mapped. This would break the iGPU in a more general sense,
> regardless of KFD and IOMMUv2. In that case, we don't really need to worry
> about breaking KFD because we have a much bigger problem.

There are hw bugs on Raven and probably Carrizo/Stoney where they need 1:1 mapping to avoid bugs in some corner cases with the displays. Other GPUs should be fine. The VIDs is 0x1002 and the DIDs are 0x15dd and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876, 0x9877 and 0x98e4 for carrizo and stoney. As long as we preserve the 1:1 mapping for those asics, we should be fine.

Alex

>
> Regards,
>   Felix
>
>
> >
> > Regards,
> >
> > Joerg

2020-08-28 15:29:50

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

Hi Felix,

On Fri, Aug 28, 2020 at 09:54:59AM -0400, Felix Kuehling wrote:
> Yes, we're working on this. IOMMUv2 is only needed for KFD. It's not
> needed for graphics. And we're making it optional for KFD as well.

Okay, KFD should fail gracefully because it can't initialize the
device's iommuv2 functionality.


Regards,

Joerg

2020-08-28 15:32:55

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

On Fri, Aug 28, 2020 at 03:11:32PM +0000, Deucher, Alexander wrote:
> There are hw bugs on Raven and probably Carrizo/Stoney where they need
> 1:1 mapping to avoid bugs in some corner cases with the displays.
> Other GPUs should be fine. The VIDs is 0x1002 and the DIDs are 0x15dd
> and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876,
> 0x9877 and 0x98e4 for carrizo and stoney. As long as we
> preserve the 1:1 mapping for those asics, we should be fine.

Okay, Stoney at least has no Zen-based CPU, so no support for memory
encryption anyway. How about Raven, is it paired with a Zen CPU?

Regards,

Joerg

2020-08-28 15:48:55

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

[AMD Public Use]

> -----Original Message-----
> From: [email protected] <[email protected]>
> Sent: Friday, August 28, 2020 11:30 AM
> To: Deucher, Alexander <[email protected]>
> Cc: Kuehling, Felix <[email protected]>; Joerg Roedel
> <[email protected]>; [email protected]; Huang, Ray
> <[email protected]>; Koenig, Christian <[email protected]>;
> Lendacky, Thomas <[email protected]>; Suthikulpanit, Suravee
> <[email protected]>; [email protected]
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> On Fri, Aug 28, 2020 at 03:11:32PM +0000, Deucher, Alexander wrote:
> > There are hw bugs on Raven and probably Carrizo/Stoney where they need
> > 1:1 mapping to avoid bugs in some corner cases with the displays.
> > Other GPUs should be fine. The VIDs is 0x1002 and the DIDs are 0x15dd
> > and 0x15d8 for raven variants and 0x9870, 0x9874, 0x9875, 0x9876,
> > 0x9877 and 0x98e4 for carrizo and stoney. As long as we preserve the
> > 1:1 mapping for those asics, we should be fine.
>
> Okay, Stoney at least has no Zen-based CPU, so no support for memory
> encryption anyway. How about Raven, is it paired with a Zen CPU?

Ah, right, So CZ and ST are not an issue. Raven is paired with Zen based CPUs.

Thanks,

Alex

2020-09-04 10:08:20

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

On Fri, Aug 28, 2020 at 03:47:07PM +0000, Deucher, Alexander wrote:
> Ah, right, So CZ and ST are not an issue. Raven is paired with Zen based CPUs.

Okay, so for the Raven case, can you add code to the amdgpu driver which
makes it fail to initialize on Raven when SME is active? There is a
global checking function for that, so that shouldn't be hard to do.

Regards,

Joerg

2020-09-06 16:12:11

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

[AMD Official Use Only - Internal Distribution Only]

> -----Original Message-----
> From: Joerg Roedel <[email protected]>
> Sent: Friday, September 4, 2020 6:06 AM
> To: Deucher, Alexander <[email protected]>
> Cc: [email protected]; Kuehling, Felix <[email protected]>;
> [email protected]; Huang, Ray <[email protected]>;
> Koenig, Christian <[email protected]>; Lendacky, Thomas
> <[email protected]>; Suthikulpanit, Suravee
> <[email protected]>; [email protected]
> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
> active
>
> On Fri, Aug 28, 2020 at 03:47:07PM +0000, Deucher, Alexander wrote:
> > Ah, right, So CZ and ST are not an issue. Raven is paired with Zen based
> CPUs.
>
> Okay, so for the Raven case, can you add code to the amdgpu driver which
> makes it fail to initialize on Raven when SME is active? There is a global
> checking function for that, so that shouldn't be hard to do.
>

Sure. How about the attached patch?

Alex


Attachments:
0001-drm-amdgpu-Fail-to-load-on-RAVEN-if-SME-is-active.patch (1.40 kB)
0001-drm-amdgpu-Fail-to-load-on-RAVEN-if-SME-is-active.patch

2020-09-07 10:46:22

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

On Sun, Sep 06, 2020 at 04:08:58PM +0000, Deucher, Alexander wrote:
> From f479b9da353c2547c26ebac8930a5dcd9a134eb7 Mon Sep 17 00:00:00 2001
> From: Alex Deucher <[email protected]>
> Date: Sun, 6 Sep 2020 12:05:12 -0400
> Subject: [PATCH] drm/amdgpu: Fail to load on RAVEN if SME is active
>
> Due to hardware bugs, scatter/gather display on raven requires
> a 1:1 IOMMU mapping, however, SME (System Memory Encryption)
> requires an indirect IOMMU mapping because the encryption bit
> is beyond the DMA mask of the chip. As such, the two are
> incompatible.
>
> Signed-off-by: Alex Deucher <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 12e16445df7c..d87d37c25329 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1102,6 +1102,16 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> return -ENODEV;
> }
>
> + /* Due to hardware bugs, S/G Display on raven requires a 1:1 IOMMU mapping,
> + * however, SME requires an indirect IOMMU mapping because the encryption
> + * bit is beyond the DMA mask of the chip.
> + */
> + if (mem_encrypt_active() && ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
> + dev_info(&pdev->dev,
> + "SME is not compatible with RAVEN\n");
> + return -ENOTSUPP;
> + }
> +
> #ifdef CONFIG_DRM_AMDGPU_SI
> if (!amdgpu_si_support) {
> switch (flags & AMD_ASIC_MASK) {
> --
> 2.25.4
>

Looks good to me, thanks.

Acked-by: Joerg Roedel <[email protected]>

2020-09-07 11:46:04

by Christian König

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

Am 07.09.20 um 12:44 schrieb Joerg Roedel:
> On Sun, Sep 06, 2020 at 04:08:58PM +0000, Deucher, Alexander wrote:
>> From f479b9da353c2547c26ebac8930a5dcd9a134eb7 Mon Sep 17 00:00:00 2001
>> From: Alex Deucher <[email protected]>
>> Date: Sun, 6 Sep 2020 12:05:12 -0400
>> Subject: [PATCH] drm/amdgpu: Fail to load on RAVEN if SME is active
>>
>> Due to hardware bugs, scatter/gather display on raven requires
>> a 1:1 IOMMU mapping, however, SME (System Memory Encryption)
>> requires an indirect IOMMU mapping because the encryption bit
>> is beyond the DMA mask of the chip. As such, the two are
>> incompatible.
>>
>> Signed-off-by: Alex Deucher <[email protected]>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 12e16445df7c..d87d37c25329 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1102,6 +1102,16 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>> return -ENODEV;
>> }
>>
>> + /* Due to hardware bugs, S/G Display on raven requires a 1:1 IOMMU mapping,
>> + * however, SME requires an indirect IOMMU mapping because the encryption
>> + * bit is beyond the DMA mask of the chip.
>> + */
>> + if (mem_encrypt_active() && ((flags & AMD_ASIC_MASK) == CHIP_RAVEN)) {
>> + dev_info(&pdev->dev,
>> + "SME is not compatible with RAVEN\n");
>> + return -ENOTSUPP;
>> + }
>> +
>> #ifdef CONFIG_DRM_AMDGPU_SI
>> if (!amdgpu_si_support) {
>> switch (flags & AMD_ASIC_MASK) {
>> --
>> 2.25.4
>>
> Looks good to me, thanks.
>
> Acked-by: Joerg Roedel <[email protected]>

This is really unfortunate, but I don't see any other solution either.

Reviewed-by: Christian König <[email protected]>

2020-09-08 03:43:21

by Felix Kuehling

[permalink] [raw]
Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is active

Am 2020-09-06 um 12:08 p.m. schrieb Deucher, Alexander:
> [AMD Official Use Only - Internal Distribution Only]
>
>> -----Original Message-----
>> From: Joerg Roedel <[email protected]>
>> Sent: Friday, September 4, 2020 6:06 AM
>> To: Deucher, Alexander <[email protected]>
>> Cc: [email protected]; Kuehling, Felix <[email protected]>;
>> [email protected]; Huang, Ray <[email protected]>;
>> Koenig, Christian <[email protected]>; Lendacky, Thomas
>> <[email protected]>; Suthikulpanit, Suravee
>> <[email protected]>; [email protected]
>> Subject: Re: [PATCH 0/2] iommu/amd: Fix IOMMUv2 devices when SME is
>> active
>>
>> On Fri, Aug 28, 2020 at 03:47:07PM +0000, Deucher, Alexander wrote:
>>> Ah, right, So CZ and ST are not an issue. Raven is paired with Zen based
>> CPUs.
>>
>> Okay, so for the Raven case, can you add code to the amdgpu driver which
>> makes it fail to initialize on Raven when SME is active? There is a global
>> checking function for that, so that shouldn't be hard to do.
>>
> Sure. How about the attached patch?

The patch is

Acked-by: Felix Kuehling <[email protected]>

Thanks,
? Felix


>
> Alex
>