A regression failure of kdump kernel boot was reported on a HPE system.
Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
fix the failure.
With the commit, kdump kernel will always print below error message, then
naturally AMD iommu can't function normally during kdump kernel bootup.
~~~~~~~~~
AMD-Vi: [Firmware Bug]: IVRS invalid checksum
Why commit 387caf0b759ac43 causing it haven't been made clear.
From the commit log, a discussion thread link is pasted. In that discussion
thread, Adrian told the fix is for a system with already broken BIOS, and
Joerg suggested two options. Finally option 2) is taken. Maybe option 1)
should be the right approach?
1) Bail out and disable the IOMMU as the BIOS screwed up
2) Treat per-device exclusion ranges just as r/w unity-mapped
regions.
https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040117.html
Signed-off-by: Baoquan He <[email protected]>
---
drivers/iommu/amd/init.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9aa1eae26634..bbe7ceae5949 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1109,17 +1109,22 @@ static int __init add_early_maps(void)
*/
static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
{
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
if (!(m->flags & IVMD_FLAG_EXCL_RANGE))
return;
- /*
- * Treat per-device exclusion ranges as r/w unity-mapped regions
- * since some buggy BIOSes might lead to the overwritten exclusion
- * range (exclusion_start and exclusion_length members). This
- * happens when there are multiple exclusion ranges (IVMD entries)
- * defined in ACPI table.
- */
- m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
+ if (iommu) {
+ /*
+ * We only can configure exclusion ranges per IOMMU, not
+ * per device. But we can enable the exclusion range per
+ * device. This is done here
+ */
+ set_dev_entry_bit(devid, DEV_ENTRY_EX);
+ iommu->exclusion_start = m->range_start;
+ iommu->exclusion_length = m->range_length;
+ }
+
}
/*
--
2.17.2
Forgot CC-ing Jerry, add him.
On 09/23/20 at 10:26am, Baoquan He wrote:
> A regression failure of kdump kernel boot was reported on a HPE system.
> Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
> exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
> fix the failure.
>
> With the commit, kdump kernel will always print below error message, then
> naturally AMD iommu can't function normally during kdump kernel bootup.
>
> ~~~~~~~~~
> AMD-Vi: [Firmware Bug]: IVRS invalid checksum
>
> Why commit 387caf0b759ac43 causing it haven't been made clear.
Hi Joerg, Adrian
We only have one machine which can reproduce the issue, it's a gen10-01
of HPE. If any log or info are needed, please let me know, I can attach
here.
Thanks
Baoquan
>
> From the commit log, a discussion thread link is pasted. In that discussion
> thread, Adrian told the fix is for a system with already broken BIOS, and
> Joerg suggested two options. Finally option 2) is taken. Maybe option 1)
> should be the right approach?
>
> 1) Bail out and disable the IOMMU as the BIOS screwed up
> 2) Treat per-device exclusion ranges just as r/w unity-mapped
> regions.
>
> https://lists.linuxfoundation.org/pipermail/iommu/2019-November/040117.html
> Signed-off-by: Baoquan He <[email protected]>
> ---
> drivers/iommu/amd/init.c | 21 +++++++++++++--------
> 1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 9aa1eae26634..bbe7ceae5949 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -1109,17 +1109,22 @@ static int __init add_early_maps(void)
> */
> static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
> {
> + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
> +
> if (!(m->flags & IVMD_FLAG_EXCL_RANGE))
> return;
>
> - /*
> - * Treat per-device exclusion ranges as r/w unity-mapped regions
> - * since some buggy BIOSes might lead to the overwritten exclusion
> - * range (exclusion_start and exclusion_length members). This
> - * happens when there are multiple exclusion ranges (IVMD entries)
> - * defined in ACPI table.
> - */
> - m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
> + if (iommu) {
> + /*
> + * We only can configure exclusion ranges per IOMMU, not
> + * per device. But we can enable the exclusion range per
> + * device. This is done here
> + */
> + set_dev_entry_bit(devid, DEV_ENTRY_EX);
> + iommu->exclusion_start = m->range_start;
> + iommu->exclusion_length = m->range_length;
> + }
> +
> }
>
> /*
> --
> 2.17.2
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
Hi Baoquan,
> -----Original Message-----
> From: Baoquan He <[email protected]>
> Sent: Wednesday, September 23, 2020 10:33 AM
> To: [email protected]; Adrian Huang12 <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: [External] Re: [PATCH] Revert "iommu/amd: Treat per-device exclusion
> ranges as r/w unity-mapped regions"
>
> Forgot CC-ing Jerry, add him.
>
> On 09/23/20 at 10:26am, Baoquan He wrote:
> > A regression failure of kdump kernel boot was reported on a HPE system.
> > Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
> > exclusion ranges as r/w unity-mapped regions") as criminal. Reverting
> > it fix the failure.
> >
> > With the commit, kdump kernel will always print below error message,
> > then naturally AMD iommu can't function normally during kdump kernel
> bootup.
> >
> > ~~~~~~~~~
> > AMD-Vi: [Firmware Bug]: IVRS invalid checksum
> >
> > Why commit 387caf0b759ac43 causing it haven't been made clear.
>
> Hi Joerg, Adrian
>
> We only have one machine which can reproduce the issue, it's a gen10-01 of
> HPE. If any log or info are needed, please let me know, I can attach here.
Could you please provide the following info?
1. The booting log for both system kernel and kdump kernel by appending the kernel parameter 'amd_iommu_dump'
2. ACPI table (# acpidump > acpi-table) -> Send out the file 'acpi-table'.
-- Adrian
On Wed, Sep 23, 2020 at 10:26:55AM +0800, Baoquan He wrote:
> A regression failure of kdump kernel boot was reported on a HPE system.
> Bisect points at commit 387caf0b759ac43 ("iommu/amd: Treat per-device
> exclusion ranges as r/w unity-mapped regions") as criminal. Reverting it
> fix the failure.
>
> With the commit, kdump kernel will always print below error message, then
> naturally AMD iommu can't function normally during kdump kernel bootup.
>
> ~~~~~~~~~
> AMD-Vi: [Firmware Bug]: IVRS invalid checksum
>
> Why commit 387caf0b759ac43 causing it haven't been made clear.
I think this should be debugged further, in future IOMMUs the exclusion
range feature will not be available anymore (mmio-fields get re-used for
SNP). So starting to use them again is not going to work.
Regards,
Joerg