Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
a failure on AMD processors because those addresses are reserved by
HyperTransport (this is not documented). Avoid selftests failures
by reserving those guest physical addresses.
Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
Cc: [email protected]
Cc: David Matlack <[email protected]>
Reported-by: Maxim Levitsky <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 10a8ed691c66..d995cc9836ee 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
/* Limit physical addresses to PA-bits. */
vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
+#ifdef __x86_64__
+ /* Avoid reserved HyperTransport region on AMD processors. */
+ if (vm->pa_bits == 48)
+ vm->max_gfn = 0xfffcfffff;
+#endif
+
/* Allocate and setup memory for guest. */
vm->vpages_mapped = sparsebit_alloc();
if (phy_pages != 0)
--
2.27.0
On Thu, 2021-08-05 at 06:54 -0400, Paolo Bonzini wrote:
> Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
> a failure on AMD processors because those addresses are reserved by
> HyperTransport (this is not documented). Avoid selftests failures
> by reserving those guest physical addresses.
>
> Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
> Cc: [email protected]
> Cc: David Matlack <[email protected]>
> Reported-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 10a8ed691c66..d995cc9836ee 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
> /* Limit physical addresses to PA-bits. */
> vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
>
> +#ifdef __x86_64__
> + /* Avoid reserved HyperTransport region on AMD processors. */
> + if (vm->pa_bits == 48)
> + vm->max_gfn = 0xfffcfffff;
> +#endif
> +
> /* Allocate and setup memory for guest. */
> vm->vpages_mapped = sparsebit_alloc();
> if (phy_pages != 0)
I probably would have restricted this workaround to AMD vendor string,
but I don't mind this to be like that as well at least for now.
Reviewed-by: Maxim Levitsky <[email protected]>
Best regards,
Maxim Levitsky
On 8/5/21 11:54 AM, Paolo Bonzini wrote:
> Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
> a failure on AMD processors because those addresses are reserved by
> HyperTransport (this is not documented).
Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do
mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually
marked as a reserved type in E820. At least on the machines I've seen.
See manual section '2.1.2 IOMMU Logical Topology':
"Special address controls in Table 3 are interpreted against untranslated guest physical
addressess (GPA) that lack a PASID TLP prefix."
Base Address Top Address Use
FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
FD_F910_0000h FD_F91F_FFFFh System Management
FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
FD_FB00_0000h FD_FBFF_FFFFh Address Translation
FD_FC00_0000h FD_FDFF_FFFFh I/O Space
FD_FE00_0000h FD_FFFF_FFFFh Configuration
FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
FE_2000_0000h FF_FFFF_FFFFh Reserved
It covers the range starting that address you fixed up ... up to 1Tb, fwiw.
You mark it ~1010G as max gfn so shouldn't be a problem.
[0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
> Avoid selftests failures
> by reserving those guest physical addresses.
>
> Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
> Cc: [email protected]
> Cc: David Matlack <[email protected]>
> Reported-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 10a8ed691c66..d995cc9836ee 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
> /* Limit physical addresses to PA-bits. */
> vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
>
> +#ifdef __x86_64__
> + /* Avoid reserved HyperTransport region on AMD processors. */
> + if (vm->pa_bits == 48)
> + vm->max_gfn = 0xfffcfffff;
> +#endif
> +
Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:
#define HT_RANGE_START (0xfd00000000ULL)
#define MAX_GFN (HT_RANGE_START - 1ULL)
#ifdef __x86_64__
/* Avoid reserved HyperTransport region on AMD processors. */
if (vm->pa_bits == 48)
vm->max_gfn = MAX_GFN;
#endif
It's a detail, but *perhaps* would help people grepping around it.
Also, not sure if checking against AMD cpuid vendor is worth, considering this is
a limitation only on AMD.
> /* Allocate and setup memory for guest. */
> vm->vpages_mapped = sparsebit_alloc();
> if (phy_pages != 0)
>
On 06/08/21 12:57, Joao Martins wrote:
> Base Address Top Address Use
>
> FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
> FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
> FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
> FD_F910_0000h FD_F91F_FFFFh System Management
> FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
> FD_FB00_0000h FD_FBFF_FFFFh Address Translation
> FD_FC00_0000h FD_FDFF_FFFFh I/O Space
> FD_FE00_0000h FD_FFFF_FFFFh Configuration
> FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
> FE_2000_0000h FF_FFFF_FFFFh Reserved
The problems we're seeing are with FFFD_0000_0000h. How does the IOMMU
interpret bits 40-47 of the address?
Paolo
On Fri, 2021-08-06 at 11:57 +0100, Joao Martins wrote:
>
> On 8/5/21 11:54 AM, Paolo Bonzini wrote:
> > Accessing guest physical addresses at 0xFFFD_0000_0000 and above causes
> > a failure on AMD processors because those addresses are reserved by
> > HyperTransport (this is not documented).
>
> Oh, but it's actually documented in the AMD IOMMU manual [0] (and AMD IOMMU in linux do
> mark it as a reserved IOVA region i.e. HT_RANGE_START..HT_RANGE_END). And it's usually
> marked as a reserved type in E820. At least on the machines I've seen.
>
> See manual section '2.1.2 IOMMU Logical Topology':
>
> "Special address controls in Table 3 are interpreted against untranslated guest physical
> addressess (GPA) that lack a PASID TLP prefix."
>
> Base Address Top Address Use
>
> FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
> FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
> FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
> FD_F910_0000h FD_F91F_FFFFh System Management
> FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
> FD_FB00_0000h FD_FBFF_FFFFh Address Translation
> FD_FC00_0000h FD_FDFF_FFFFh I/O Space
> FD_FE00_0000h FD_FFFF_FFFFh Configuration
> FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
> FE_2000_0000h FF_FFFF_FFFFh Reserved
>
> It covers the range starting that address you fixed up ... up to 1Tb, fwiw.
>
> You mark it ~1010G as max gfn so shouldn't be a problem.
>
> [0] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf
>
> > Avoid selftests failures
> > by reserving those guest physical addresses.
> >
> > Fixes: ef4c9f4f6546 ("KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn()")
> > Cc: [email protected]
> > Cc: David Matlack <[email protected]>
> > Reported-by: Maxim Levitsky <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index 10a8ed691c66..d995cc9836ee 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -309,6 +309,12 @@ struct kvm_vm *vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm)
> > /* Limit physical addresses to PA-bits. */
> > vm->max_gfn = ((1ULL << vm->pa_bits) >> vm->page_shift) - 1;
> >
> > +#ifdef __x86_64__
> > + /* Avoid reserved HyperTransport region on AMD processors. */
> > + if (vm->pa_bits == 48)
> > + vm->max_gfn = 0xfffcfffff;
> > +#endif
> > +
>
> Not sure if it's worth the trouble having a macro with the same name as AMD iommu like:
>
> #define HT_RANGE_START (0xfd00000000ULL)
> #define MAX_GFN (HT_RANGE_START - 1ULL)
>
> #ifdef __x86_64__
> /* Avoid reserved HyperTransport region on AMD processors. */
> if (vm->pa_bits == 48)
> vm->max_gfn = MAX_GFN;
> #endif
I guess now that we know that it is documented, it is worth it,
to remove '== 48' check and add check for an AMD cpu, and add reference
to this manual.
I am mentioning the 48 bit check because I have seen that AMD just recently
posted 5 level NPT support, so I guess CPUs which > 48 bit max physical address
are also probably on horison.
And long term solution for this I guess is to add these areas to a blacklist
and avoid them.
Best regards,
Maxim Levitsky
>
> It's a detail, but *perhaps* would help people grepping around it.
>
> Also, not sure if checking against AMD cpuid vendor is worth, considering this is
> a limitation only on AMD.
>
>
> > /* Allocate and setup memory for guest. */
> > vm->vpages_mapped = sparsebit_alloc();
> > if (phy_pages != 0)
> >
On 09/08/21 12:00, Joao Martins wrote:
> [0]https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
>
> 1286 Spurious #GP May Occur When Hypervisor Running on
> Another Hypervisor
>
> Description
>
> The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor
> attempts to access the following secure memory areas:
>
> • The reserved memory address region starting at FFFD_0000_0000h and extending up to
> FFFF_FFFF_FFFFh.
> • ASEG and TSEG memory regions for SMM (System Management Mode)
> • MMIO APIC Space
This errata took a few months to debug so we're quite familiar with it
:) but I only knew about the ASEG/TSEG/APIC cases.
So this HyperTransport region is not related to this issue, but the
errata does point out that FFFD_0000_0000h and upwards is special in guests.
The Xen folks also had to deal with it only a couple months ago
(https://yhbt.net/lore/all/[email protected]/):
From "Open-Source Register Reference for AMD Family 17h Processors
(PUB)":
https://developer.amd.com/wp-content/resources/56255_3_03.PDF
"The processor defines a reserved memory address region starting at
FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."
It's still doesn't say that it's at the top of physical address space
although I understand that's how it's now implemented. The official
document doesn't confirm it will move along with physical address space
extension.
[...]
1) On parts with <40 bits, its fully hidden from software
2) Before Fam17h, it was always 12G just below 1T, even if there was
more RAM above this location
3) On Fam17h and later, it is variable based on SME, and is either
just below 2^48 (no encryption) or 2^43 (encryption)
> It's
> interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.
It's only been defined after AMD deemed that the errata was not fixable
in current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.
I'll update the patch based on the findings from the Xen team.
Paolo
On 8/9/21 10:03 AM, Paolo Bonzini wrote:
> On 06/08/21 12:57, Joao Martins wrote:
>> Base Address Top Address Use
>>
>> FD_0000_0000h FD_F7FF_FFFFh Reserved interrupt address space
>> FD_F800_0000h FD_F8FF_FFFFh Interrupt/EOI IntCtl
>> FD_F900_0000h FD_F90F_FFFFh Legacy PIC IACK
>> FD_F910_0000h FD_F91F_FFFFh System Management
>> FD_F920_0000h FD_FAFF_FFFFh Reserved Page Tables
>> FD_FB00_0000h FD_FBFF_FFFFh Address Translation
>> FD_FC00_0000h FD_FDFF_FFFFh I/O Space
>> FD_FE00_0000h FD_FFFF_FFFFh Configuration
>> FE_0000_0000h FE_1FFF_FFFFh Extended Configuration/Device Messages
>> FE_2000_0000h FF_FFFF_FFFFh Reserved
>
> The problems we're seeing are with FFFD_0000_0000h. How does the IOMMU
> interpret bits 40-47 of the address?
The IOMMU supposedly makes an invalid device request iommu fault event when touching the
reserved interrupt address space. But that presumes performing DMA from/to device with
said IOVA=fdXXXXXXXX (doesn't matter if it's guest or baremetal).
Oh wait. You asked fffd******** not 00fd******** which then the spec quote I gave doesn't
document what you're aiming for. Ugh, sorry if I mislead you and Maxim. My mind eliminated
the starting 0xff when reading the address given the similarity.
But I've seen that address range somewhere. This errata might help [0]? But anyways, it
doesn't explain why how this is interpreted, just that it is 'reserved'. It doesn't look
to be HyperTransport address range either as that would be the 1010..1023G range (close to
1T) as documented by the IOMMU manual, not the 256T boundary (fffd********). It's
interesting that fn8000_000A EDX[28] is part of the reserved bits from that CPUID leaf.
[0] https://developer.amd.com/wp-content/resources/56323-PUB_0.78.pdf
1286 Spurious #GP May Occur When Hypervisor Running on
Another Hypervisor
Description
The processor may incorrectly generate a #GP fault if a hypervisor running on a hypervisor
attempts to access the following secure memory areas:
• The reserved memory address region starting at FFFD_0000_0000h and extending up to
FFFF_FFFF_FFFFh.
• ASEG and TSEG memory regions for SMM (System Management Mode)
• MMIO APIC Space
Potential Effect on System
Software running a hypervisor on a hypervisor may encounter an unexpected #GP fault.
Suggested Workaround
If CPUID bit fn8000_000A EDX[28] = 0b, then:
• Hypervisor software can trap #GP faults that potentially have this issue and ignore #GP
faults that were
erroneously generated.
If CPUID bit fn8000_000A EDX[28] = 1b, then the issue has been fixed and no workaround is
necessary.
Fix Planned
No fix planned
On Mon, Aug 09, 2021, Paolo Bonzini wrote:
> So this HyperTransport region is not related to this issue, but the errata
> does point out that FFFD_0000_0000h and upwards is special in guests.
>
> The Xen folks also had to deal with it only a couple months ago
> (https://yhbt.net/lore/all/[email protected]/):
>
> From "Open-Source Register Reference for AMD Family 17h Processors (PUB)":
> https://developer.amd.com/wp-content/resources/56255_3_03.PDF
>
> "The processor defines a reserved memory address region starting at
> FFFD_0000_0000h and extending up to FFFF_FFFF_FFFFh."
>
> It's still doesn't say that it's at the top of physical address space
> although I understand that's how it's now implemented. The official
> document doesn't confirm it will move along with physical address space
> extension.
>
> [...]
>
> 1) On parts with <40 bits, its fully hidden from software
> 2) Before Fam17h, it was always 12G just below 1T, even if there was
> more RAM above this location
> 3) On Fam17h and later, it is variable based on SME, and is either
> just below 2^48 (no encryption) or 2^43 (encryption)
>
> > It's interesting that fn8000_000A EDX[28] is part of the reserved bits from
> > that CPUID leaf.
>
> It's only been defined after AMD deemed that the errata was not fixable in
> current generation processors); it's X86_FEATURE_SVME_ADDR_CHK now.
>
> I'll update the patch based on the findings from the Xen team.
So, about that update... :-)