2022-03-31 02:45:28

by Alex Xu (Hello71)

[permalink] [raw]
Subject: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

Hi,

After a recent kernel update, booting one of my machines causes it to
hang on a black screen. Pressing Lock keys on the USB keyboard does not
turn on the indicators, and the machine does not appear on the Ethernet
network. I don't have a serial port on this machine. I didn't try
netconsole, but I suspect it won't work.

Setting mem_encrypt=0 seems to resolve the issue. Reverting f5ff79fddf0e
("dma-mapping: remove CONFIG_DMA_REMAP") also appears to resolve the
issue.

The machine in question has an AMD Ryzen 5 1600 and ASRock B450 Pro4.

Cheers,
Alex.


2022-03-31 02:55:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

On Wed, Mar 30, 2022 at 01:51:07PM -0400, Alex Xu (Hello71) wrote:
> Hi,
>
> After a recent kernel update, booting one of my machines causes it to
> hang on a black screen. Pressing Lock keys on the USB keyboard does not
> turn on the indicators, and the machine does not appear on the Ethernet
> network. I don't have a serial port on this machine. I didn't try
> netconsole, but I suspect it won't work.
>
> Setting mem_encrypt=0 seems to resolve the issue. Reverting f5ff79fddf0e
> ("dma-mapping: remove CONFIG_DMA_REMAP") also appears to resolve the
> issue.
>
> The machine in question has an AMD Ryzen 5 1600 and ASRock B450 Pro4.

This looks like something in the AMD IOMMU code or it's users can't
deal with vmalloc addresses. I'll start looking for a culprit ASAP.

2022-03-31 03:41:20

by Alex Xu (Hello71)

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

Excerpts from Christoph Hellwig's message of March 30, 2022 2:01 pm:
> Can you try this patch, which is a bit of a hack?
>
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 50d209939c66c..61997c2ee0a17 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -28,7 +28,8 @@ bool force_dma_unencrypted(struct device *dev)
> * device does not support DMA to addresses that include the
> * encryption mask.
> */
> - if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
> + if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) &&
> + !get_dma_ops(dev)) {
> u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> dev->bus_dma_limit);
>

This seems to work for me.

Cheers,
Alex.

2022-03-31 04:35:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

Can you try this patch, which is a bit of a hack?

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 50d209939c66c..61997c2ee0a17 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,7 +28,8 @@ bool force_dma_unencrypted(struct device *dev)
* device does not support DMA to addresses that include the
* encryption mask.
*/
- if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
+ if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) &&
+ !get_dma_ops(dev)) {
u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
dev->bus_dma_limit);

2022-03-31 05:15:32

by Christoph Hellwig

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

On Wed, Mar 30, 2022 at 03:17:20PM -0400, Alex Xu (Hello71) wrote:
> Excerpts from Christoph Hellwig's message of March 30, 2022 2:01 pm:
> > Can you try this patch, which is a bit of a hack?
> >
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index 50d209939c66c..61997c2ee0a17 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -28,7 +28,8 @@ bool force_dma_unencrypted(struct device *dev)
> > * device does not support DMA to addresses that include the
> > * encryption mask.
> > */
> > - if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) {
> > + if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) &&
> > + !get_dma_ops(dev)) {
> > u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> > u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> > dev->bus_dma_limit);
> >
>
> This seems to work for me.

Ok, I'll try to come up with a less hacky version and will start a
discussion with the AMD folks that know memory encryption better.
Thanks for the report and testing already!

2022-03-31 08:25:10

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail

[TLDR: I'm adding the regression report below to regzbot, the Linux
kernel regression tracking bot; all text you find below is compiled from
a few templates paragraphs you might have encountered already already
from similar mails.]

Hi, this is your Linux kernel regression tracker. Sending this just to
the lists, as it's already handled.

On 30.03.22 19:51, Alex Xu (Hello71) wrote:
>
> After a recent kernel update, booting one of my machines causes it to
> hang on a black screen. Pressing Lock keys on the USB keyboard does not
> turn on the indicators, and the machine does not appear on the Ethernet
> network. I don't have a serial port on this machine. I didn't try
> netconsole, but I suspect it won't work.
>
> Setting mem_encrypt=0 seems to resolve the issue. Reverting f5ff79fddf0e
> ("dma-mapping: remove CONFIG_DMA_REMAP") also appears to resolve the
> issue.
>
> The machine in question has an AMD Ryzen 5 1600 and ASRock B450 Pro4.

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced f5ff79fddf0e
#regzbot title dma: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD
SME boot fail
#regzbot ignore-activity

If it turns out this isn't a regression, free free to remove it from the
tracking by sending a reply to this thread containing a paragraph like
"#regzbot invalid: reason why this is invalid" (without the quotes).

Ciao, Thorsten

2022-04-16 01:44:19

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD SME boot fail #forregzbot

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

#regzbot fixed-by: 4fe87e818ea492ade079cc0

On 31.03.22 08:51, Thorsten Leemhuis wrote:
> [TLDR: I'm adding the regression report below to regzbot, the Linux
> kernel regression tracking bot; all text you find below is compiled from
> a few templates paragraphs you might have encountered already already
> from similar mails.]
>
> Hi, this is your Linux kernel regression tracker. Sending this just to
> the lists, as it's already handled.
>
> On 30.03.22 19:51, Alex Xu (Hello71) wrote:
>>
>> After a recent kernel update, booting one of my machines causes it to
>> hang on a black screen. Pressing Lock keys on the USB keyboard does not
>> turn on the indicators, and the machine does not appear on the Ethernet
>> network. I don't have a serial port on this machine. I didn't try
>> netconsole, but I suspect it won't work.
>>
>> Setting mem_encrypt=0 seems to resolve the issue. Reverting f5ff79fddf0e
>> ("dma-mapping: remove CONFIG_DMA_REMAP") also appears to resolve the
>> issue.
>>
>> The machine in question has an AMD Ryzen 5 1600 and ASRock B450 Pro4.
>
> To be sure below issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
>
> #regzbot ^introduced f5ff79fddf0e
> #regzbot title dma: "dma-mapping: remove CONFIG_DMA_REMAP" causes AMD
> SME boot fail
> #regzbot ignore-activity
>
> If it turns out this isn't a regression, free free to remove it from the
> tracking by sending a reply to this thread containing a paragraph like
> "#regzbot invalid: reason why this is invalid" (without the quotes).
>
> Ciao, Thorsten