Dear Tom, dear Linux folks,
Selecting the symbol `AMD_MEM_ENCRYPT` – as done in Debian 5.13.9-1~exp1
[1] – also selects `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults
to yes, causing boot failures on AMD Raven systems. On the MSI B350M
MORTAR with AMD Ryzen 3 2200G, Linux logs and the AMDGPU graphics
driver, despite being loaded, does not work, and the framebuffer driver
is used instead.
[ 19.679824] amdgpu 0000:26:00.0: amdgpu: SME is not compatible
with RAVEN
It even causes black screens on other systems as reported to the Debian
bug tracking system *Black screen on AMD Ryzen based systems (AMDGPU
related when AMD Secure Memory Encryption not disabled --
mem_encrypt=off)* [2].
Should the default be changed?
Kind regards,
Paul
[1]:
https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/changelog#L1138
[2]: https://bugs.debian.org/994453
On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote:
> Selecting the symbol `AMD_MEM_ENCRYPT` – as
> done in Debian 5.13.9-1~exp1 [1] – also selects
> `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes,
I'm assuming that "selecting" is done automatically: alldefconfig,
olddefconfig?
Because CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT only depends on
CONFIG_AMD_MEM_ENCRYPT and former can be disabled in oldconfig or
menuconfig etc.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, Oct 5, 2021 at 10:29 AM Paul Menzel <[email protected]> wrote:
>
> Dear Tom, dear Linux folks,
>
>
> Selecting the symbol `AMD_MEM_ENCRYPT` – as done in Debian 5.13.9-1~exp1
> [1] – also selects `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults
> to yes, causing boot failures on AMD Raven systems. On the MSI B350M
> MORTAR with AMD Ryzen 3 2200G, Linux logs and the AMDGPU graphics
> driver, despite being loaded, does not work, and the framebuffer driver
> is used instead.
>
> [ 19.679824] amdgpu 0000:26:00.0: amdgpu: SME is not compatible
> with RAVEN
>
> It even causes black screens on other systems as reported to the Debian
> bug tracking system *Black screen on AMD Ryzen based systems (AMDGPU
> related when AMD Secure Memory Encryption not disabled --
> mem_encrypt=off)* [2].
It's not incompatible per se, but SEM requires the IOMMU be enabled
because the C bit used for encryption is beyond the dma_mask of most
devices. If the C bit is not set, the en/decryption for DMA doesn't
occur. So you need IOMMU to be enabled in remapping mode to use SME
with most devices. Raven has further requirements in that it requires
IOMMUv2 functionality to support some features which currently uses a
direct mapping in the IOMMU and hence the C bit is not properly
handled.
Alex
>
> Should the default be changed?
>
>
> Kind regards,
>
> Paul
>
>
> [1]:
> https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/changelog#L1138
> [2]: https://bugs.debian.org/994453
Dear Borislav,
Thank you for your reply.
Am 05.10.21 um 16:38 schrieb Borislav Petkov:
> On Tue, Oct 05, 2021 at 04:29:41PM +0200, Paul Menzel wrote:
>> Selecting the symbol `AMD_MEM_ENCRYPT` – as
>> done in Debian 5.13.9-1~exp1 [1] – also selects
>> `AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT`, as it defaults to yes,
>
> I'm assuming that "selecting" is done automatically: alldefconfig,
> olddefconfig?
>
> Because CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT only depends on
> CONFIG_AMD_MEM_ENCRYPT and former can be disabled in oldconfig or
> menuconfig etc.
Sorry for being unclear. Distributions want to enable support for that
feature, but as long as it breaks systems, it should be opt-in via the
Linux kernel command line, and not opt-out. Also the Kconfig help texts
do not mention anything about these problems, and the AMDGPU log message
is of level info and not error. It’d be even better, if the message
would contain the information, how to disable SME (`mem_encrypt=off`).
Kind regards,
Paul
On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
> It's not incompatible per se, but SEM requires the IOMMU be enabled
> because the C bit used for encryption is beyond the dma_mask of most
> devices. If the C bit is not set, the en/decryption for DMA doesn't
> occur. So you need IOMMU to be enabled in remapping mode to use SME
> with most devices. Raven has further requirements in that it requires
> IOMMUv2 functionality to support some features which currently uses a
> direct mapping in the IOMMU and hence the C bit is not properly
> handled.
So lemme ask you this: do Raven-containing systems exist out there which
don't have IOMMUv2 functionality and which can cause boot failures when
SME is enabled in the kernel .config?
IOW, can we handle this at boot time properly, i.e., disable SME if we
detect Raven or IOMMUv2 support is missing?
If not, then we really will have to change the default.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 6, 2021 at 5:42 AM Borislav Petkov <[email protected]> wrote:
>
> On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
> > It's not incompatible per se, but SEM requires the IOMMU be enabled
> > because the C bit used for encryption is beyond the dma_mask of most
> > devices. If the C bit is not set, the en/decryption for DMA doesn't
> > occur. So you need IOMMU to be enabled in remapping mode to use SME
> > with most devices. Raven has further requirements in that it requires
> > IOMMUv2 functionality to support some features which currently uses a
> > direct mapping in the IOMMU and hence the C bit is not properly
> > handled.
>
> So lemme ask you this: do Raven-containing systems exist out there which
> don't have IOMMUv2 functionality and which can cause boot failures when
> SME is enabled in the kernel .config?
There could be some OEM systems that disable the IOMMU on the platform
and don't provide a switch in the bios to enable it. The GPU driver
will still work in that case, it will just not be able to enable KFD
support for ROCm compute. SME won't work for most devices in that
case however since most devices have a DMA mask too small to handle
the C bit for encryption. SME should be dependent on IOMMU being
enabled.
>
> IOW, can we handle this at boot time properly, i.e., disable SME if we
> detect Raven or IOMMUv2 support is missing?
>
> If not, then we really will have to change the default.
I'm not an SME expert, but I thought that that was already the case.
We just added the error condition in the GPU driver to prevent the
driver from loading when the user forced SME on. IIRC, there were
users that cared more about SME than graphics support.
Alex
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 06, 2021 at 09:23:22AM -0400, Alex Deucher wrote:
> There could be some OEM systems that disable the IOMMU on the platform
> and don't provide a switch in the bios to enable it. The GPU driver
> will still work in that case, it will just not be able to enable KFD
> support for ROCm compute. SME won't work for most devices in that
> case however since most devices have a DMA mask too small to handle
> the C bit for encryption. SME should be dependent on IOMMU being
> enabled.
Yeah, I'd let you hash this out with Tom.
> I'm not an SME expert, but I thought that that was already the case.
Yeah, I think Paul wants this:
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b79e88ee6627..e94c2df7a043 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,7 +1518,6 @@ config AMD_MEM_ENCRYPT
config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
- default y
depends on AMD_MEM_ENCRYPT
help
Say yes to have system memory encrypted by default if running on
---
The reason we did this is so that you don't want to supply
mem_encrypt=on on the cmdline but didn't anticipate any such fun with
some devices.
> We just added the error condition in the GPU driver to prevent the
> driver from loading when the user forced SME on. IIRC, there were
> users that cared more about SME than graphics support.
Well, it's a distro kernel so we should at least try to make everyone
happy. :)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 10/6/21 8:23 AM, Alex Deucher wrote:
> On Wed, Oct 6, 2021 at 5:42 AM Borislav Petkov <[email protected]> wrote:
>>
>> On Tue, Oct 05, 2021 at 10:48:15AM -0400, Alex Deucher wrote:
>>> It's not incompatible per se, but SEM requires the IOMMU be enabled
>>> because the C bit used for encryption is beyond the dma_mask of most
>>> devices. If the C bit is not set, the en/decryption for DMA doesn't
>>> occur. So you need IOMMU to be enabled in remapping mode to use SME
>>> with most devices. Raven has further requirements in that it requires
>>> IOMMUv2 functionality to support some features which currently uses a
>>> direct mapping in the IOMMU and hence the C bit is not properly
>>> handled.
>>
>> So lemme ask you this: do Raven-containing systems exist out there which
>> don't have IOMMUv2 functionality and which can cause boot failures when
>> SME is enabled in the kernel .config?
>
> There could be some OEM systems that disable the IOMMU on the platform
> and don't provide a switch in the bios to enable it. The GPU driver
> will still work in that case, it will just not be able to enable KFD
> support for ROCm compute. SME won't work for most devices in that
> case however since most devices have a DMA mask too small to handle
> the C bit for encryption. SME should be dependent on IOMMU being
> enabled.
That's not completely true. If the IOMMU is not enabled (off or in
passthrough mode), then the DMA api will check the DMA mask and use
SWIOTLB to bounce the DMA if the device doesn't support DMA at the
position where the c-bit is located (see force_dma_unencrypted() in
arch/x86/mm/mem_encrypt.c).
To avoid bounce buffering, though, commit 2cc13bb4f59f was introduced to
disable passthrough mode when SME is active (unless iommu=pt was
explicitly specified).
Thanks,
Tom
>
>>
>> IOW, can we handle this at boot time properly, i.e., disable SME if we
>> detect Raven or IOMMUv2 support is missing?
>>
>> If not, then we really will have to change the default.
>
> I'm not an SME expert, but I thought that that was already the case.
> We just added the error condition in the GPU driver to prevent the
> driver from loading when the user forced SME on. IIRC, there were
> users that cared more about SME than graphics support.
>
> Alex
>
>>
>> Thx.
>>
>> --
>> Regards/Gruss,
>> Boris.
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cbab2eedbc1704f90f63408d988cc7fb2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637691234178637291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xCXc1pcfJiWvKG1DTJKq986Ecid8M7M7K3gvCDWrZL8%3D&reserved=0
Ok,
so I sat down and wrote something and tried to capture all the stuff we
so talked about that it is clear in the future why we did it.
Thoughts?
---
From: Borislav Petkov <[email protected]>
Date: Wed, 6 Oct 2021 19:34:55 +0200
Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
automatically
This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.
However, Raven-class GPUs, a.o., cannot handle DMA masks which are
shorter than the bit position of the encryption, aka C-bit. For that,
those devices need to have the IOMMU present.
If the IOMMU is disabled or in passthrough mode, though, the kernel
would switch to SWIOTLB bounce-buffering for those transfers.
In order to avoid that,
2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
disables the default IOMMU passthrough mode so that devices for which
the default 256K DMA is insufficient, can use the IOMMU instead.
However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
think the usual hardware folk "oops, I dropped the ball there" cases.
Which means, it can happen that there are systems out there with devices
which need the IOMMU to function properly with SME enabled but the IOMMU
won't necessarily be enabled.
So in order for those devices to function, drop the "default y" for
the SME by default on option so that users who want to have SME, will
need to either enable it in their config or use "mem_encrypt=on" on the
kernel command line.
Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8055da49f1c0..6a336b1f3f28 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
- default y
depends on AMD_MEM_ENCRYPT
help
Say yes to have system memory encrypted by default if running on
--
2.29.2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 6, 2021 at 1:48 PM Borislav Petkov <[email protected]> wrote:
>
> Ok,
>
> so I sat down and wrote something and tried to capture all the stuff we
> so talked about that it is clear in the future why we did it.
>
> Thoughts?
>
> ---
> From: Borislav Petkov <[email protected]>
> Date: Wed, 6 Oct 2021 19:34:55 +0200
> Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> automatically
>
> This Kconfig option was added initially so that memory encryption is
> enabled by default on machines which support it.
>
> However, Raven-class GPUs, a.o., cannot handle DMA masks which are
> shorter than the bit position of the encryption, aka C-bit. For that,
> those devices need to have the IOMMU present.
This is not limited to Raven. All GPUs (and quite a few other
devices) have a limited DMA mask. AMD GPUs have between 32 and 48
bits of DMA depending on what generation the hardware is. So to
support SME, you either need swiotlb with bounce buffers or you need
IOMMU in remapping mode. The limitation with Raven is that if you want
to use it with the IOMMU enabled it requires the IOMMU to be set up in
passthrough mode to support IOMMUv2 functionality for compute support
and due to other hardware limitations on the display side. So for all
GPUs except raven, just having IOMMU enabled in remapping mode is
fine. GPUs from other vendors would likely run into similar
limitations. Raven just has further limitations.
>
> If the IOMMU is disabled or in passthrough mode, though, the kernel
> would switch to SWIOTLB bounce-buffering for those transfers.
>
> In order to avoid that,
>
> 2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
>
> disables the default IOMMU passthrough mode so that devices for which
> the default 256K DMA is insufficient, can use the IOMMU instead.
>
> However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
> think the usual hardware folk "oops, I dropped the ball there" cases.
>
> Which means, it can happen that there are systems out there with devices
> which need the IOMMU to function properly with SME enabled but the IOMMU
> won't necessarily be enabled.
>
> So in order for those devices to function, drop the "default y" for
> the SME by default on option so that users who want to have SME, will
> need to either enable it in their config or use "mem_encrypt=on" on the
> kernel command line.
Another option would be to enable SME by default on Epyc platforms,
but disabled by default on client APU platforms or even just raven.
Other than these comments, looks fine to me.
Alex
>
> Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
> Reported-by: Paul Menzel <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8055da49f1c0..6a336b1f3f28 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
>
> config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> bool "Activate AMD Secure Memory Encryption (SME) by default"
> - default y
> depends on AMD_MEM_ENCRYPT
> help
> Say yes to have system memory encrypted by default if running on
> --
> 2.29.2
>
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
And just another general comment, swiotlb + bounce buffers isn't
really useful on GPUs. You may have 10-100s of MBs of memory mapped
long term into the GPU's address space for random access. E.g., you
may have buffers in system memory that the display hardware is
actively scanning out of. For GPUs you should really only enable SME
if IOMMU is enabled in remapping mode. But that is probably beyond
the discussion here.
Alex
On Wed, Oct 6, 2021 at 2:10 PM Alex Deucher <[email protected]> wrote:
>
> On Wed, Oct 6, 2021 at 1:48 PM Borislav Petkov <[email protected]> wrote:
> >
> > Ok,
> >
> > so I sat down and wrote something and tried to capture all the stuff we
> > so talked about that it is clear in the future why we did it.
> >
> > Thoughts?
> >
> > ---
> > From: Borislav Petkov <[email protected]>
> > Date: Wed, 6 Oct 2021 19:34:55 +0200
> > Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> > automatically
> >
> > This Kconfig option was added initially so that memory encryption is
> > enabled by default on machines which support it.
> >
> > However, Raven-class GPUs, a.o., cannot handle DMA masks which are
> > shorter than the bit position of the encryption, aka C-bit. For that,
> > those devices need to have the IOMMU present.
>
> This is not limited to Raven. All GPUs (and quite a few other
> devices) have a limited DMA mask. AMD GPUs have between 32 and 48
> bits of DMA depending on what generation the hardware is. So to
> support SME, you either need swiotlb with bounce buffers or you need
> IOMMU in remapping mode. The limitation with Raven is that if you want
> to use it with the IOMMU enabled it requires the IOMMU to be set up in
> passthrough mode to support IOMMUv2 functionality for compute support
> and due to other hardware limitations on the display side. So for all
> GPUs except raven, just having IOMMU enabled in remapping mode is
> fine. GPUs from other vendors would likely run into similar
> limitations. Raven just has further limitations.
>
>
> >
> > If the IOMMU is disabled or in passthrough mode, though, the kernel
> > would switch to SWIOTLB bounce-buffering for those transfers.
> >
> > In order to avoid that,
> >
> > 2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
> >
> > disables the default IOMMU passthrough mode so that devices for which
> > the default 256K DMA is insufficient, can use the IOMMU instead.
> >
> > However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
> > think the usual hardware folk "oops, I dropped the ball there" cases.
> >
> > Which means, it can happen that there are systems out there with devices
> > which need the IOMMU to function properly with SME enabled but the IOMMU
> > won't necessarily be enabled.
> >
> > So in order for those devices to function, drop the "default y" for
> > the SME by default on option so that users who want to have SME, will
> > need to either enable it in their config or use "mem_encrypt=on" on the
> > kernel command line.
>
> Another option would be to enable SME by default on Epyc platforms,
> but disabled by default on client APU platforms or even just raven.
>
> Other than these comments, looks fine to me.
>
> Alex
>
> >
> > Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
> > Reported-by: Paul Menzel <[email protected]>
> > Signed-off-by: Borislav Petkov <[email protected]>
> > Cc: <[email protected]>
> > Link: https://lkml.kernel.org/r/[email protected]
> > ---
> > arch/x86/Kconfig | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 8055da49f1c0..6a336b1f3f28 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
> >
> > config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> > bool "Activate AMD Secure Memory Encryption (SME) by default"
> > - default y
> > depends on AMD_MEM_ENCRYPT
> > help
> > Say yes to have system memory encrypted by default if running on
> > --
> > 2.29.2
> >
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
> This is not limited to Raven.
That's what the innocuous "a.o." wanted to state. :)
> All GPUs (and quite a few other
> devices) have a limited DMA mask. AMD GPUs have between 32 and 48
> bits of DMA depending on what generation the hardware is. So to
> support SME, you either need swiotlb with bounce buffers or you need
> IOMMU in remapping mode. The limitation with Raven is that if you want
> to use it with the IOMMU enabled it requires the IOMMU to be set up in
> passthrough mode to support IOMMUv2 functionality for compute support
> and due to other hardware limitations on the display side. So for all
> GPUs except raven, just having IOMMU enabled in remapping mode is
> fine. GPUs from other vendors would likely run into similar
> limitations. Raven just has further limitations.
Hmm, and in passthrough mode it would use bounce buffers when SME is
enabled. And when those 256K are not enough, it would fail there too,
even with IOMMUv2. At least this is how it looks from here.
Dunno, it feels like doing GPU compute and SME does not go hand-in-hand
real smoothly currently but that probably doesn't matter all too much
for both user camps. But that's just me with a hunch.
> Another option would be to enable SME by default on Epyc platforms,
> but disabled by default on client APU platforms or even just raven.
Thing is, we don't know at SME init time - very early during boot -
whether we're Epyc or client. Can we find that out reliably from the hw?
And even if we do, that's still not accurate enough - we wanna know
whether the IOMMU works.
So I guess we're all left to the user to decide. But I'm always open
to suggestions for solving things in sw and not requiring any user
interaction.
> Other than these comments, looks fine to me.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 6, 2021 at 2:21 PM Borislav Petkov <[email protected]> wrote:
>
> On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
> > This is not limited to Raven.
>
> That's what the innocuous "a.o." wanted to state. :)
Whoops, my eyes passed right over that.
>
> > All GPUs (and quite a few other
> > devices) have a limited DMA mask. AMD GPUs have between 32 and 48
> > bits of DMA depending on what generation the hardware is. So to
> > support SME, you either need swiotlb with bounce buffers or you need
> > IOMMU in remapping mode. The limitation with Raven is that if you want
> > to use it with the IOMMU enabled it requires the IOMMU to be set up in
> > passthrough mode to support IOMMUv2 functionality for compute support
> > and due to other hardware limitations on the display side. So for all
> > GPUs except raven, just having IOMMU enabled in remapping mode is
> > fine. GPUs from other vendors would likely run into similar
> > limitations. Raven just has further limitations.
>
> Hmm, and in passthrough mode it would use bounce buffers when SME is
> enabled. And when those 256K are not enough, it would fail there too,
> even with IOMMUv2. At least this is how it looks from here.
>
> Dunno, it feels like doing GPU compute and SME does not go hand-in-hand
> real smoothly currently but that probably doesn't matter all too much
> for both user camps. But that's just me with a hunch.
Well, this limitation only applies to Raven which is an integrated GPU
in client parts. SME was initially productized on server parts so
there was not a lot of concern given to interactions with integrated
graphics at the time. This has since been fixed in newer integrated
graphics. dGPUs work fine as long as the IOMMU is in remapping mode
to handle the C bit.
>
> > Another option would be to enable SME by default on Epyc platforms,
> > but disabled by default on client APU platforms or even just raven.
>
> Thing is, we don't know at SME init time - very early during boot -
> whether we're Epyc or client. Can we find that out reliably from the hw?
>
From the x86 model and family info? I think Raven has different
families from other Zen based CPUs.
> And even if we do, that's still not accurate enough - we wanna know
> whether the IOMMU works.
Right.
>
> So I guess we're all left to the user to decide. But I'm always open
> to suggestions for solving things in sw and not requiring any user
> interaction.
@Tom Lendacky Any ideas?
Alex
>
> > Other than these comments, looks fine to me.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote:
> And just another general comment, swiotlb + bounce buffers isn't
> really useful on GPUs. You may have 10-100s of MBs of memory mapped
> long term into the GPU's address space for random access. E.g., you
> may have buffers in system memory that the display hardware is
> actively scanning out of. For GPUs you should really only enable SME
> if IOMMU is enabled in remapping mode. But that is probably beyond
> the discussion here.
Right, but insights into how these things work (or don't work) together
are always welcome. And yes, as 2cc13bb4f59f says:
"... The bounce buffer
code has an upper limit of 256kb for the size of DMA
allocations, which is too small for certain devices and
causes them to fail."
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 06, 2021 at 02:36:56PM -0400, Alex Deucher wrote:
> From the x86 model and family info? I think Raven has different
> families from other Zen based CPUs.
Yeah, I'd like to avoid a f/m/s mapping table, if possible. Those things
should be a last resort and they always need adjustment when new models
pop up.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 10/6/21 1:36 PM, Alex Deucher wrote:
> On Wed, Oct 6, 2021 at 2:21 PM Borislav Petkov <[email protected]> wrote:
>> On Wed, Oct 06, 2021 at 02:10:30PM -0400, Alex Deucher wrote:
>
> From the x86 model and family info? I think Raven has different
> families from other Zen based CPUs.
>
>> And even if we do, that's still not accurate enough - we wanna know
>> whether the IOMMU works.
>
> Right.
>
>>
>> So I guess we're all left to the user to decide. But I'm always open
>> to suggestions for solving things in sw and not requiring any user
>> interaction.
>
> @Tom Lendacky Any ideas?
I think user decision is probably going to be best. We have to enable and
encrypt the kernel for SME very early in boot, that, short of
family/model/stepping checks, there's not a lot that we can do.
Thanks,
Tom
>
> Alex
>
>>
>>> Other than these comments, looks fine to me.
>>
>> Thx.
>>
>> --
>> Regards/Gruss,
>> Boris.
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.kernel.org%2Ftglx%2Fnotes-about-netiquette&data=04%7C01%7CThomas.Lendacky%40amd.com%7C88c625e0b5684c2df98708d988f84d26%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637691422304849477%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=47jfRaCMn16Ii7xGLXQ31RRdr7Iz%2BG52zU7u%2B3YEM2g%3D&reserved=0
Am 06.10.21 um 21:32 schrieb Borislav Petkov:
> On Wed, Oct 06, 2021 at 02:21:40PM -0400, Alex Deucher wrote:
>> And just another general comment, swiotlb + bounce buffers isn't
>> really useful on GPUs. You may have 10-100s of MBs of memory mapped
>> long term into the GPU's address space for random access. E.g., you
>> may have buffers in system memory that the display hardware is
>> actively scanning out of. For GPUs you should really only enable SME
>> if IOMMU is enabled in remapping mode. But that is probably beyond
>> the discussion here.
> Right, but insights into how these things work (or don't work) together
> are always welcome. And yes, as 2cc13bb4f59f says:
>
> "... The bounce buffer
> code has an upper limit of 256kb for the size of DMA
> allocations, which is too small for certain devices and
> causes them to fail."
To make the matter even worse, bounce buffers don't work with APIs like
Vulkan and some OpenGL/OpenCL extensions.
In those APIs or extensions the assumption is that you can malloc()
memory in userspace, give the pointer to the kernel driver and have
coherent access with your device and the CPU at the same time.
In other words you don't even get the chance to bounce the buffers
between CPU and device access because they are accessed by both at the
same time.
Regards,
Christian.
Dear Borislav,
Am 06.10.21 um 19:48 schrieb Borislav Petkov:
> Ok,
>
> so I sat down and wrote something and tried to capture all the stuff we
> so talked about that it is clear in the future why we did it.
>
> Thoughts?
>
> ---
> From: Borislav Petkov <[email protected]>
> Date: Wed, 6 Oct 2021 19:34:55 +0200
> Subject: [PATCH] x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically
>
> This Kconfig option was added initially so that memory encryption is
> enabled by default on machines which support it.
>
> However, Raven-class GPUs, a.o., cannot handle DMA masks which are
> shorter than the bit position of the encryption, aka C-bit. For that,
> those devices need to have the IOMMU present.
>
> If the IOMMU is disabled or in passthrough mode, though, the kernel
> would switch to SWIOTLB bounce-buffering for those transfers.
>
> In order to avoid that,
>
> 2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
>
> disables the default IOMMU passthrough mode so that devices for which
> the default 256K DMA is insufficient, can use the IOMMU instead.
>
> However 2, there are cases where the IOMMU is disabled in the BIOS, etc,
> think the usual hardware folk "oops, I dropped the ball there" cases.
>
> Which means, it can happen that there are systems out there with devices
> which need the IOMMU to function properly with SME enabled but the IOMMU
> won't necessarily be enabled.
>
> So in order for those devices to function, drop the "default y" for
> the SME by default on option so that users who want to have SME, will
> need to either enable it in their config or use "mem_encrypt=on" on the
> kernel command line.
>
> Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
> Reported-by: Paul Menzel <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> arch/x86/Kconfig | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8055da49f1c0..6a336b1f3f28 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
>
> config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
> bool "Activate AMD Secure Memory Encryption (SME) by default"
> - default y
> depends on AMD_MEM_ENCRYPT
> help
> Say yes to have system memory encrypted by default if running on
>
I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise,
yes this looks fine. The help text could also be updated to mention
problems with AMD Raven devices.
Kind regards,
Paul
On 10/11/21 8:11 AM, Borislav Petkov wrote:
> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes
>> this looks fine. The help text could also be updated to mention problems
>> with AMD Raven devices.
>
> This is not only about Raven GPUs but, as Alex explained, pretty much
> about every device which doesn't support a 48 bit DMA mask. I'll expand
> that aspect in the changelog.
In general, non-GPU devices that don't support a 48-bit DMA mask work fine
(assuming they have set their DMA mask appropriately). It really depends
on whether SWIOTLB will be able to satisfy the memory requirements of the
driver when the IOMMU is not enabled or in passthrough mode. Since GPU
devices need/use a lot of memory, that becomes a problem.
Thanks,
Tom
>
On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
> I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes
> this looks fine. The help text could also be updated to mention problems
> with AMD Raven devices.
This is not only about Raven GPUs but, as Alex explained, pretty much
about every device which doesn't support a 48 bit DMA mask. I'll expand
that aspect in the changelog.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Dear Tom,
Am 11.10.21 um 15:27 schrieb Tom Lendacky:
> On 10/11/21 8:11 AM, Borislav Petkov wrote:
>> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
>>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise,
>>> yes
>>> this looks fine. The help text could also be updated to mention problems
>>> with AMD Raven devices.
>>
>> This is not only about Raven GPUs but, as Alex explained, pretty much
>> about every device which doesn't support a 48 bit DMA mask. I'll expand
>> that aspect in the changelog.
>
> In general, non-GPU devices that don't support a 48-bit DMA mask work
> fine (assuming they have set their DMA mask appropriately). It really
> depends on whether SWIOTLB will be able to satisfy the memory
> requirements of the driver when the IOMMU is not enabled or in
> passthrough mode. Since GPU devices need/use a lot of memory, that
> becomes a problem.
How can I check that?
Kind regards,
Paul
On 10/11/21 8:52 AM, Paul Menzel wrote:
> Dear Tom,
>
>
> Am 11.10.21 um 15:27 schrieb Tom Lendacky:
>> On 10/11/21 8:11 AM, Borislav Petkov wrote:
>>> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
>>>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but otherwise, yes
>>>> this looks fine. The help text could also be updated to mention problems
>>>> with AMD Raven devices.
>>>
>>> This is not only about Raven GPUs but, as Alex explained, pretty much
>>> about every device which doesn't support a 48 bit DMA mask. I'll expand
>>> that aspect in the changelog.
>>
>> In general, non-GPU devices that don't support a 48-bit DMA mask work
>> fine (assuming they have set their DMA mask appropriately). It really
>> depends on whether SWIOTLB will be able to satisfy the memory
>> requirements of the driver when the IOMMU is not enabled or in
>> passthrough mode. Since GPU devices need/use a lot of memory, that
>> becomes a problem.
>
> How can I check that?
How can you check what? 32-bit DMA devices? GPUs? I need a bit more
information...
Thanks,
Tom
>
>
> Kind regards,
>
> Paul
Dear Tom,
Am 11.10.21 um 15:58 schrieb Tom Lendacky:
> On 10/11/21 8:52 AM, Paul Menzel wrote:
>> Am 11.10.21 um 15:27 schrieb Tom Lendacky:
>>> On 10/11/21 8:11 AM, Borislav Petkov wrote:
>>>> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
>>>>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but
>>>>> otherwise, yes
>>>>> this looks fine. The help text could also be updated to mention
>>>>> problems
>>>>> with AMD Raven devices.
>>>>
>>>> This is not only about Raven GPUs but, as Alex explained, pretty much
>>>> about every device which doesn't support a 48 bit DMA mask. I'll expand
>>>> that aspect in the changelog.
>>>
>>> In general, non-GPU devices that don't support a 48-bit DMA mask work
>>> fine (assuming they have set their DMA mask appropriately). It really
>>> depends on whether SWIOTLB will be able to satisfy the memory
>>> requirements of the driver when the IOMMU is not enabled or in
>>> passthrough mode. Since GPU devices need/use a lot of memory, that
>>> becomes a problem.
>>
>> How can I check that?
>
> How can you check what? 32-bit DMA devices? GPUs? I need a bit more
> information...
How can I check, why MEM_ENCRYPT is not working on my device despite the
IOMMU being enabled.
Kind regards,
Paul
On 10/11/21 9:21 AM, Paul Menzel wrote:
> Dear Tom,
>
>
> Am 11.10.21 um 15:58 schrieb Tom Lendacky:
>> On 10/11/21 8:52 AM, Paul Menzel wrote:
>
>>> Am 11.10.21 um 15:27 schrieb Tom Lendacky:
>>>> On 10/11/21 8:11 AM, Borislav Petkov wrote:
>>>>> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
>>>>>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but
>>>>>> otherwise, yes
>>>>>> this looks fine. The help text could also be updated to mention
>>>>>> problems
>>>>>> with AMD Raven devices.
>>>>>
>>>>> This is not only about Raven GPUs but, as Alex explained, pretty much
>>>>> about every device which doesn't support a 48 bit DMA mask. I'll expand
>>>>> that aspect in the changelog.
>>>>
>>>> In general, non-GPU devices that don't support a 48-bit DMA mask work
>>>> fine (assuming they have set their DMA mask appropriately). It really
>>>> depends on whether SWIOTLB will be able to satisfy the memory
>>>> requirements of the driver when the IOMMU is not enabled or in
>>>> passthrough mode. Since GPU devices need/use a lot of memory, that
>>>> becomes a problem.
>>>
>>> How can I check that?
>>
>> How can you check what? 32-bit DMA devices? GPUs? I need a bit more
>> information...
>
> How can I check, why MEM_ENCRYPT is not working on my device despite the
> IOMMU being enabled.
I believe Alex already explained that. Your original message is from commit:
ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active")
Thanks,
Tom
>
>
> Kind regards,
>
> Paul
On Mon, Oct 11, 2021 at 10:21 AM Paul Menzel <[email protected]> wrote:
>
> Dear Tom,
>
>
> Am 11.10.21 um 15:58 schrieb Tom Lendacky:
> > On 10/11/21 8:52 AM, Paul Menzel wrote:
>
> >> Am 11.10.21 um 15:27 schrieb Tom Lendacky:
> >>> On 10/11/21 8:11 AM, Borislav Petkov wrote:
> >>>> On Mon, Oct 11, 2021 at 03:05:33PM +0200, Paul Menzel wrote:
> >>>>> I think, the IOMMU is enabled on the MSI B350M MORTAR, but
> >>>>> otherwise, yes
> >>>>> this looks fine. The help text could also be updated to mention
> >>>>> problems
> >>>>> with AMD Raven devices.
> >>>>
> >>>> This is not only about Raven GPUs but, as Alex explained, pretty much
> >>>> about every device which doesn't support a 48 bit DMA mask. I'll expand
> >>>> that aspect in the changelog.
> >>>
> >>> In general, non-GPU devices that don't support a 48-bit DMA mask work
> >>> fine (assuming they have set their DMA mask appropriately). It really
> >>> depends on whether SWIOTLB will be able to satisfy the memory
> >>> requirements of the driver when the IOMMU is not enabled or in
> >>> passthrough mode. Since GPU devices need/use a lot of memory, that
> >>> becomes a problem.
> >>
> >> How can I check that?
> >
> > How can you check what? 32-bit DMA devices? GPUs? I need a bit more
> > information...
>
> How can I check, why MEM_ENCRYPT is not working on my device despite the
> IOMMU being enabled.
I think there are several potential problem cases:
1. Device is in passthrough mode in the IOMMU and the device has a
limited DMA mask. This could be due to a hardware requirements (e.g.,
IOMMUv2 functionality) or a hardware/platform requirements (e.g., ACPI
IOMMU tables define passthrough for a specific device or memory
region). This is the case for Raven.
2. Device driver bug (e.g., driver not using the DMA API properly)
Alex
>
>
> Kind regards,
>
> Paul
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 711885906b5c2df90746a51f4cd674f1ab9fbb1d
Gitweb: https://git.kernel.org/tip/711885906b5c2df90746a51f4cd674f1ab9fbb1d
Author: Borislav Petkov <[email protected]>
AuthorDate: Wed, 06 Oct 2021 19:34:55 +02:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 11 Oct 2021 19:14:22 +02:00
x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically
This Kconfig option was added initially so that memory encryption is
enabled by default on machines which support it.
However, devices which have DMA masks that are less than the bit
position of the encryption bit, aka C-bit, require the use of an IOMMU
or the use of SWIOTLB.
If the IOMMU is disabled or in passthrough mode, the kernel would switch
to SWIOTLB bounce-buffering for those transfers.
In order to avoid that,
2cc13bb4f59f ("iommu: Disable passthrough mode when SME is active")
disables the default IOMMU passthrough mode so that devices for which the
default 256K DMA is insufficient, can use the IOMMU instead.
However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
(think the usual hardware folk "oops, I dropped the ball there" cases) or a
driver doesn't properly use the DMA APIs or a device has a firmware or
hardware bug, e.g.:
ea68573d408f ("drm/amdgpu: Fail to load on RAVEN if SME is active")
However 3, in the above GPU use case, there are APIs like Vulkan and
some OpenGL/OpenCL extensions which are under the assumption that
user-allocated memory can be passed in to the kernel driver and both the
GPU and CPU can do coherent and concurrent access to the same memory.
That cannot work with SWIOTLB bounce buffers, of course.
So, in order for those devices to function, drop the "default y" for the
SME by default active option so that users who want to have SME enabled,
will need to either enable it in their config or use "mem_encrypt=on" on
the kernel command line.
[ tlendacky: Generalize commit message. ]
Fixes: 7744ccdbc16f ("x86/mm: Add Secure Memory Encryption (SME) support")
Reported-by: Paul Menzel <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/Kconfig | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd70e8a..d9830e7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1525,7 +1525,6 @@ config AMD_MEM_ENCRYPT
config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
- default y
depends on AMD_MEM_ENCRYPT
help
Say yes to have system memory encrypted by default if running on