2020-05-06 14:55:37

by Uwe Kleine-König

[permalink] [raw]
Subject: Failure to shutdown/reboot with intel_iommu=on

Hello,

On my Lenovo T460p I cannot shutdown and reboot when the iommu is
enabled. This is using linux 5.2.7 as provided by Debian, 5.6.4 has the
same problem. Suspend/resume also fails; I suspect this is the same
issue.

When requesting power off the kernel messages just end with:

sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Stopping disk
e1000e: EEE TX LPI TIMER: 00000011
ACPI: Preparing to enter system sleep state S5
reboot: Power down
acpi_power_off called

(photo at https://www.kleine-koenig.org/tmp/uklsiommu.jpg in case I
mistyped something. Full dmesg and lspci -vvv at
https://www.kleine-koenig.org/tmp/uklsiommu.tar.gz with and without
iommu enabled.)

With the iommu disabled (CONFIG_INTEL_IOMMU_DEFAULT_ON unset or
intel_iommu=off on cmdline) the machine just works as expected
(including working suspend/resume).

I already talked to tglx on irc but unfortunately no new insights
resulted from that.

Any ideas how to fix or continue debugging the issue?

Best regards
Uwe


Attachments:
(No filename) (1.02 kB)
signature.asc (499.00 B)
Download all attachments

2020-05-08 15:10:47

by Jörg Rödel

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

+ Baolu, Maintainer of Intel IOMMU

Baolu, does that ring any bells?

On Wed, May 06, 2020 at 04:46:02PM +0200, Uwe Kleine-K?nig wrote:
> Hello,
>
> On my Lenovo T460p I cannot shutdown and reboot when the iommu is
> enabled. This is using linux 5.2.7 as provided by Debian, 5.6.4 has the
> same problem. Suspend/resume also fails; I suspect this is the same
> issue.
>
> When requesting power off the kernel messages just end with:
>
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Stopping disk
> e1000e: EEE TX LPI TIMER: 00000011
> ACPI: Preparing to enter system sleep state S5
> reboot: Power down
> acpi_power_off called
>
> (photo at https://www.kleine-koenig.org/tmp/uklsiommu.jpg in case I
> mistyped something. Full dmesg and lspci -vvv at
> https://www.kleine-koenig.org/tmp/uklsiommu.tar.gz with and without
> iommu enabled.)
>
> With the iommu disabled (CONFIG_INTEL_IOMMU_DEFAULT_ON unset or
> intel_iommu=off on cmdline) the machine just works as expected
> (including working suspend/resume).
>
> I already talked to tglx on irc but unfortunately no new insights
> resulted from that.
>
> Any ideas how to fix or continue debugging the issue?
>
> Best regards
> Uwe


2020-05-09 02:05:45

by Lu Baolu

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

Hi Uwe,

Have you tried commenting out intel_disable_iommus() in
intel_iommu_shutdowan()?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0182cff2c7ac..532e62600f95 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4928,8 +4928,10 @@ void intel_iommu_shutdown(void)
for_each_iommu(iommu, drhd)
iommu_disable_protect_mem_regions(iommu);

+#if 0
/* Make sure the IOMMUs are switched off */
intel_disable_iommus();
+#endif

up_write(&dmar_global_lock);
}

Best regards,
baolu

On 5/8/20 11:07 PM, Joerg Roedel wrote:
> + Baolu, Maintainer of Intel IOMMU
>
> Baolu, does that ring any bells?
>
> On Wed, May 06, 2020 at 04:46:02PM +0200, Uwe Kleine-König wrote:
>> Hello,
>>
>> On my Lenovo T460p I cannot shutdown and reboot when the iommu is
>> enabled. This is using linux 5.2.7 as provided by Debian, 5.6.4 has the
>> same problem. Suspend/resume also fails; I suspect this is the same
>> issue.
>>
>> When requesting power off the kernel messages just end with:
>>
>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> sd 0:0:0:0: [sda] Stopping disk
>> e1000e: EEE TX LPI TIMER: 00000011
>> ACPI: Preparing to enter system sleep state S5
>> reboot: Power down
>> acpi_power_off called
>>
>> (photo at https://www.kleine-koenig.org/tmp/uklsiommu.jpg in case I
>> mistyped something. Full dmesg and lspci -vvv at
>> https://www.kleine-koenig.org/tmp/uklsiommu.tar.gz with and without
>> iommu enabled.)
>>
>> With the iommu disabled (CONFIG_INTEL_IOMMU_DEFAULT_ON unset or
>> intel_iommu=off on cmdline) the machine just works as expected
>> (including working suspend/resume).
>>
>> I already talked to tglx on irc but unfortunately no new insights
>> resulted from that.
>>
>> Any ideas how to fix or continue debugging the issue?
>>
>> Best regards
>> Uwe
>
>

2020-05-11 13:45:54

by Lenny Szubowicz

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

On 5/8/20 11:07 AM, Joerg Roedel wrote:
> + Baolu, Maintainer of Intel IOMMU
>
> Baolu, does that ring any bells?
>
> On Wed, May 06, 2020 at 04:46:02PM +0200, Uwe Kleine-König wrote:
>> Hello,
>>
>> On my Lenovo T460p I cannot shutdown and reboot when the iommu is
>> enabled. This is using linux 5.2.7 as provided by Debian, 5.6.4 has the
>> same problem. Suspend/resume also fails; I suspect this is the same
>> issue.
>>
>> When requesting power off the kernel messages just end with:
>>
>> sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> sd 0:0:0:0: [sda] Stopping disk
>> e1000e: EEE TX LPI TIMER: 00000011
>> ACPI: Preparing to enter system sleep state S5
>> reboot: Power down
>> acpi_power_off called
>>
>> (photo at https://www.kleine-koenig.org/tmp/uklsiommu.jpg in case I
>> mistyped something. Full dmesg and lspci -vvv at
>> https://www.kleine-koenig.org/tmp/uklsiommu.tar.gz with and without
>> iommu enabled.)
>>
>> With the iommu disabled (CONFIG_INTEL_IOMMU_DEFAULT_ON unset or
>> intel_iommu=off on cmdline) the machine just works as expected
>> (including working suspend/resume).
>>
>> I already talked to tglx on irc but unfortunately no new insights
>> resulted from that.
>>> Any ideas how to fix or continue debugging the issue?
>>
>> Best regards
>> Uwe
>
>

I suspect that you have TPM 2.x functionality enabled in the BIOS/firmware.

Unless you are actually using the TPM, try setting it to TPM 1.2 mode.
I've seen an incompatiblity on other Lenovo laptops between using the
IOMMU, TPM 2.x implementation in firmware, and shutdown/suspend.

-Lenny.

2020-05-11 20:01:31

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

On 5/9/20 3:58 AM, Lu Baolu wrote:
> Hi Uwe,
>
> Have you tried commenting out intel_disable_iommus() in
> intel_iommu_shutdowan()?
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 0182cff2c7ac..532e62600f95 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -4928,8 +4928,10 @@ void intel_iommu_shutdown(void)
>         for_each_iommu(iommu, drhd)
>                 iommu_disable_protect_mem_regions(iommu);
>
> +#if 0
>         /* Make sure the IOMMUs are switched off */
>         intel_disable_iommus();
> +#endif
>
>         up_write(&dmar_global_lock);
>  }

I just tested that and it didn't help. The machine still hangs with the
same symptoms as reported before.

Best regards
Uwe


Attachments:
signature.asc (499.00 B)
OpenPGP digital signature

2020-05-11 20:02:51

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

Hello Lenny,

On 5/11/20 3:43 PM, Lenny Szubowicz wrote:
> I suspect that you have TPM 2.x functionality enabled in the BIOS/firmware.

Indeed.

> Unless you are actually using the TPM, try setting it to TPM 1.2 mode.
> I've seen an incompatiblity on other Lenovo laptops between using the
> IOMMU, TPM 2.x implementation in firmware, and shutdown/suspend.

When setting it to TPM 1.2 reboot works again. Didn't test poweroff and
suspend/resume yet.

Best regards
Uwe


Attachments:
signature.asc (499.00 B)
OpenPGP digital signature

2020-05-11 20:21:01

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

Hello again,

On Mon, May 11, 2020 at 09:59:31PM +0200, Uwe Kleine-K?nig wrote:
> On 5/9/20 3:58 AM, Lu Baolu wrote:
> > Hi Uwe,
> >
> > Have you tried commenting out intel_disable_iommus() in
> > intel_iommu_shutdowan()?
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 0182cff2c7ac..532e62600f95 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -4928,8 +4928,10 @@ void intel_iommu_shutdown(void)
> > ??????? for_each_iommu(iommu, drhd)
> > ??????????????? iommu_disable_protect_mem_regions(iommu);
> >
> > +#if 0
> > ??????? /* Make sure the IOMMUs are switched off */
> > ??????? intel_disable_iommus();
> > +#endif
> >
> > ??????? up_write(&dmar_global_lock);
> > ?}
>
> I just tested that and it didn't help. The machine still hangs with the
> same symptoms as reported before.

I patched the file a bit differently:

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ef0a5246700e..b76acae6a6ac 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4922,16 +4922,24 @@ void intel_iommu_shutdown(void)
if (no_iommu || dmar_disabled)
return;

+ pr_warn("%s:%d\n", __func__, __LINE__);
down_write(&dmar_global_lock);

+ pr_warn("%s:%d\n", __func__, __LINE__);
/* Disable PMRs explicitly here. */
- for_each_iommu(iommu, drhd)
+ for_each_iommu(iommu, drhd) {
+ pr_warn("%s:%d\n", __func__, __LINE__);
iommu_disable_protect_mem_regions(iommu);
+ pr_warn("%s:%d\n", __func__, __LINE__);
+ }

+ pr_warn("%s:%d\n", __func__, __LINE__);
/* Make sure the IOMMUs are switched off */
intel_disable_iommus();

+ pr_warn("%s:%d\n", __func__, __LINE__);
up_write(&dmar_global_lock);
+ pr_warn("%s:%d\n", __func__, __LINE__);
}

static inline struct intel_iommu *dev_to_intel_iommu(struct device *dev)

and the output shows that the for_each_iommu loop runs twice and the
last pr_warn is reached, too. So the hang doesn't occur in
intel_iommu_shutdown() but later.

I don't know enough about x86 and iommus to judge what that means or
even if this was a useful test.

Best regards
Uwe


Attachments:
(No filename) (2.16 kB)
signature.asc (499.00 B)
Download all attachments

2020-05-12 13:38:31

by Jörg Rödel

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

On Mon, May 11, 2020 at 09:43:11AM -0400, Lenny Szubowicz wrote:
> I suspect that you have TPM 2.x functionality enabled in the BIOS/firmware.
>
> Unless you are actually using the TPM, try setting it to TPM 1.2 mode.
> I've seen an incompatiblity on other Lenovo laptops between using the
> IOMMU, TPM 2.x implementation in firmware, and shutdown/suspend.

Interesting, has this been debugged further into the TPM code?


Joerg

2020-05-12 20:03:43

by Lenny Szubowicz

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

On 5/12/20 9:34 AM, Joerg Roedel wrote:
> On Mon, May 11, 2020 at 09:43:11AM -0400, Lenny Szubowicz wrote:
>> I suspect that you have TPM 2.x functionality enabled in the BIOS/firmware.
>>
>> Unless you are actually using the TPM, try setting it to TPM 1.2 mode.
>> I've seen an incompatiblity on other Lenovo laptops between using the
>> IOMMU, TPM 2.x implementation in firmware, and shutdown/suspend.
>
> Interesting, has this been debugged further into the TPM code?
>
>
> Joerg
>

I believe the problem is in the Lenovo firmware and not in the kernel.

There are essentially two problems:
1. TPM 2.0 doesn't work when the IOMMU is enabled
2. Suspend/shutdown hangs when problem 1 is encountered on boot

Lenovo's firmware implementation of TPM 2.0 functionality on some of their
laptops uses DMA. When you ask the kernel to enable the IOMMU, this DMA
access is correctly blocked by the IOMMU hardware. If you look at your
dmesg log from when you have TPM 2.0 and the IOMMU enabled, there are
TPM timeout messages that indicate the inability to initialize and use
the TPM capability.

The hang on shutdown or S3 suspend appears to be in firmware, i.e.
after the kernel has transferred control back to the firmware.
It makes no difference if the kernel actively shuts down the IOMMU
before transferring control to the firmware on a suspend or shutdown.
The hang still occurs.

My guess is that the firmware wants to do some TPM related processing
on shutdown and suspend and can't handle the TPM state that exists
due to the startup failure. But that's just a guess. I don't know
what the firmware is actually doing.

Some Lenovo laptops provide an ACPI DMAR RMRR that identifies the memory
range that the kernel should open up for permissable DMA access
for this purpose. Unfortunately, the PCI device that performs these
DMA operations is hidden from the kernel by the BIOS. Given that the
associated PCI device is hidden, the Linux kernel does not act upon
the associated DMAR RMRR.

-Lenny.


2020-05-12 21:15:23

by Jörg Rödel

[permalink] [raw]
Subject: Re: Failure to shutdown/reboot with intel_iommu=on

Hi Lenny,

On Tue, May 12, 2020 at 04:00:26PM -0400, Lenny Szubowicz wrote:
> Some Lenovo laptops provide an ACPI DMAR RMRR that identifies the memory
> range that the kernel should open up for permissable DMA access
> for this purpose. Unfortunately, the PCI device that performs these
> DMA operations is hidden from the kernel by the BIOS. Given that the
> associated PCI device is hidden, the Linux kernel does not act upon
> the associated DMAR RMRR.

That sounds aweful. We should add to the VT-d driver that it sets up
RMRR mappings for request-ids which are not present as a PCI device, to
fix the Laptops which have it.

For the others, is the region the TPM talks to via DMA known so that we
can add a quirk?


Joerg