2022-04-14 19:52:11

by Peter Xu

[permalink] [raw]
Subject: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

Our QE team reported test failure on access_tracking_perf_test:

Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
guest physical test memory offset: 0x3fffbffff000

Populating memory : 0.684014577s
Writing to populated memory : 0.006230175s
Reading from populated memory : 0.004557805s
==== Test Assertion Failure ====
lib/kvm_util.c:1411: false
pid=125806 tid=125809 errno=4 - Interrupted system call
1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411
2 (inlined by) addr_gpa2hva at kvm_util.c:1405
3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98
4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152
5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232
6 0x00007fefe9ff81ce: ?? ??:0
7 0x00007fefe9c64d82: ?? ??:0
No vm physical memory at 0xffbffff000

And I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46
bits PA.

It turns out that the address translation for clearing idle page tracking
returned wrong result, in which addr_gva2gpa()'s last step should have
treated "pte[index[0]].pfn" to be a 32bit value. In above case the GPA
address 0x3fffbffff000 got cut-off into 0xffbffff000, then it caused
further lookup failure in the gpa2hva mapping.

I didn't yet check any other test that may fail too on some hosts, but
logically any test using addr_gva2gpa() could suffer.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036
Signed-off-by: Peter Xu <[email protected]>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f000dfb5594..6c356fb4a9bf 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -587,7 +587,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
if (!pte[index[0]].present)
goto unmapped_gva;

- return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
+ return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);

unmapped_gva:
TEST_FAIL("No mapping for vm virtual address, gva: 0x%lx", gva);
--
2.32.0


2022-04-15 03:50:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On 4/14/22 17:05, Peter Xu wrote:
>
> So sadly it's only gcc that's not working properly with the bitfields.. at
> least in my minimum test here.

Apparently both behaviors are allowed.

Paolo

2022-04-15 07:24:40

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On Thu, Apr 14, 2022 at 03:01:04PM -0700, Jim Mattson wrote:
> On Thu, Apr 14, 2022 at 2:36 PM Peter Xu <[email protected]> wrote:
> >
> > On Thu, Apr 14, 2022 at 04:14:22PM +0200, Paolo Bonzini wrote:
> > > On 4/14/22 15:56, Sean Christopherson wrote:
> > > > > - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > > > > + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > > > This is but one of many paths that can get burned by pfn being 40 bits. The
> > > > most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> > > > place the myriad "pfn * vm->page_size" instances.
> > > >
> > > > For a true long term solution, my vote is to do away with the bit field struct
> > > > and use #define'd masks and whatnot.
> > >
> > > Yes, bitfields larger than 32 bits are a mess.
> >
> > It's very interesting to know this..
>
> I don't think the undefined behavior is restricted to extended
> bit-fields. Even for regular bit-fields, the C99 spec says, "A
> bit-field shall have a type that is a qualified or unqualified version
> of _Bool, signed
> int, unsigned int, or some other implementation-defined type." One
> might assume that even the permissive final clause refers to
> fundamental language types, but I suppose "n-bit integer" meets the
> strict definition of a "type,"
> for arbitrary values of n.

Fair enough.

I just noticed it actually make sense to have such a behavior, because in
the case of A*B where A is the bitfield (<32 bits) and when B is an int
(=32bits, page_size in the test case or a default constant value which will
also be treated as int/uint).

Then it's simply extending the smaller field into the same size as the
bigger one, as 40bits*32bits goes into a 40bits output which needs some
proper masking if calculated with RAX, while a e.g. 20bits*32bits goes into
32bits, in which case no further masking needed.

Thanks,

--
Peter Xu

2022-04-15 12:38:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On Wed, Apr 13, 2022, Peter Xu wrote:
> Our QE team reported test failure on access_tracking_perf_test:
>
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory offset: 0x3fffbffff000
>
> Populating memory : 0.684014577s
> Writing to populated memory : 0.006230175s
> Reading from populated memory : 0.004557805s
> ==== Test Assertion Failure ====
> lib/kvm_util.c:1411: false
> pid=125806 tid=125809 errno=4 - Interrupted system call
> 1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411
> 2 (inlined by) addr_gpa2hva at kvm_util.c:1405
> 3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98
> 4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152
> 5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232
> 6 0x00007fefe9ff81ce: ?? ??:0
> 7 0x00007fefe9c64d82: ?? ??:0
> No vm physical memory at 0xffbffff000
>
> And I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46
> bits PA.
>
> It turns out that the address translation for clearing idle page tracking
> returned wrong result, in which addr_gva2gpa()'s last step should have

"should have" is very misleading, that makes it sound like the address was
intentionally truncated. Or did you mean "should have been treated as 64-bit
value"?

> treated "pte[index[0]].pfn" to be a 32bit value.

It didn't get treated as a 32-bit value, it got treated as a 40-bit value, because
the pfn is stored as 40 bits.

struct pageTableEntry {
uint64_t present:1;
uint64_t writable:1;
uint64_t user:1;
uint64_t write_through:1;
uint64_t cache_disable:1;
uint64_t accessed:1;
uint64_t dirty:1;
uint64_t reserved_07:1;
uint64_t global:1;
uint64_t ignored_11_09:3;
uint64_t pfn:40; <================
uint64_t ignored_62_52:11;
uint64_t execute_disable:1;
};

> In above case the GPA
> address 0x3fffbffff000 got cut-off into 0xffbffff000, then it caused
> further lookup failure in the gpa2hva mapping.
>
> I didn't yet check any other test that may fail too on some hosts, but
> logically any test using addr_gva2gpa() could suffer.
>
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036
> Signed-off-by: Peter Xu <[email protected]>
> ---
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 9f000dfb5594..6c356fb4a9bf 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -587,7 +587,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> if (!pte[index[0]].present)
> goto unmapped_gva;
>
> - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);

This is but one of many paths that can get burned by pfn being 40 bits. The
most backport friendly fix is probably to add a pfn=>gpa helper and use that to
place the myriad "pfn * vm->page_size" instances.

For a true long term solution, my vote is to do away with the bit field struct
and use #define'd masks and whatnot.

2022-04-16 00:10:33

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On Thu, Apr 14, 2022 at 01:56:12PM +0000, Sean Christopherson wrote:
> On Wed, Apr 13, 2022, Peter Xu wrote:
> > Our QE team reported test failure on access_tracking_perf_test:
> >
> > Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> > guest physical test memory offset: 0x3fffbffff000
> >
> > Populating memory : 0.684014577s
> > Writing to populated memory : 0.006230175s
> > Reading from populated memory : 0.004557805s
> > ==== Test Assertion Failure ====
> > lib/kvm_util.c:1411: false
> > pid=125806 tid=125809 errno=4 - Interrupted system call
> > 1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411
> > 2 (inlined by) addr_gpa2hva at kvm_util.c:1405
> > 3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98
> > 4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152
> > 5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232
> > 6 0x00007fefe9ff81ce: ?? ??:0
> > 7 0x00007fefe9c64d82: ?? ??:0
> > No vm physical memory at 0xffbffff000
> >
> > And I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46
> > bits PA.
> >
> > It turns out that the address translation for clearing idle page tracking
> > returned wrong result, in which addr_gva2gpa()'s last step should have
>
> "should have" is very misleading, that makes it sound like the address was
> intentionally truncated. Or did you mean "should have been treated as 64-bit
> value"?

No I was purely lazy yesterday as it was late, sorry. Obviously I should
have hold-off the patch until this morning because I do plan to look at
this into more details.

So sadly it's only gcc that's not working properly with the bitfields.. at
least in my minimum test here.

---8<---
$ cat a.c
#include <stdio.h>

struct test1 {
unsigned long a:24;
unsigned long b:40;
};

int main(void)
{
struct test1 val;
val.b = 0x123456789a;
printf("0x%lx\n", val.b * 16);
return 0;
}
$ gcc -o a.gcc a.c
$ clang -o a.clang a.c
$ ./a.gcc
0x23456789a0
$ ./a.clang
0x123456789a0
$ objdump -d a.gcc | grep -A20 -w main
0000000000401126 <main>:
401126: 55 push %rbp
401127: 48 89 e5 mov %rsp,%rbp
40112a: 48 83 ec 10 sub $0x10,%rsp
40112e: 48 8b 45 f8 mov -0x8(%rbp),%rax
401132: 25 ff ff ff 00 and $0xffffff,%eax
401137: 48 89 c2 mov %rax,%rdx
40113a: 48 b8 00 00 00 9a 78 movabs $0x123456789a000000,%rax
401141: 56 34 12
401144: 48 09 d0 or %rdx,%rax
401147: 48 89 45 f8 mov %rax,-0x8(%rbp)
40114b: 48 8b 45 f8 mov -0x8(%rbp),%rax
40114f: 48 c1 e8 18 shr $0x18,%rax
401153: 48 c1 e0 04 shl $0x4,%rax
401157: 48 ba ff ff ff ff ff movabs $0xffffffffff,%rdx
40115e: 00 00 00
401161: 48 21 d0 and %rdx,%rax <-------------------
401164: 48 89 c6 mov %rax,%rsi
401167: bf 10 20 40 00 mov $0x402010,%edi
40116c: b8 00 00 00 00 mov $0x0,%eax
401171: e8 ba fe ff ff callq 401030 <printf@plt>
$ objdump -d a.clang | grep -A20 -w main
0000000000401130 <main>:
401130: 55 push %rbp
401131: 48 89 e5 mov %rsp,%rbp
401134: 48 83 ec 10 sub $0x10,%rsp
401138: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40113f: 48 8b 45 f0 mov -0x10(%rbp),%rax
401143: 48 25 ff ff ff 00 and $0xffffff,%rax
401149: 48 b9 00 00 00 9a 78 movabs $0x123456789a000000,%rcx
401150: 56 34 12
401153: 48 09 c8 or %rcx,%rax
401156: 48 89 45 f0 mov %rax,-0x10(%rbp)
40115a: 48 8b 75 f0 mov -0x10(%rbp),%rsi
40115e: 48 c1 ee 18 shr $0x18,%rsi
401162: 48 c1 e6 04 shl $0x4,%rsi
401166: 48 bf 10 20 40 00 00 movabs $0x402010,%rdi
40116d: 00 00 00
401170: b0 00 mov $0x0,%al
401172: e8 b9 fe ff ff callq 401030 <printf@plt>
401177: 31 c0 xor %eax,%eax
401179: 48 83 c4 10 add $0x10,%rsp
40117d: 5d pop %rbp
---8<---

>
> > treated "pte[index[0]].pfn" to be a 32bit value.
>
> It didn't get treated as a 32-bit value, it got treated as a 40-bit value, because
> the pfn is stored as 40 bits.
>
> struct pageTableEntry {
> uint64_t present:1;
> uint64_t writable:1;
> uint64_t user:1;
> uint64_t write_through:1;
> uint64_t cache_disable:1;
> uint64_t accessed:1;
> uint64_t dirty:1;
> uint64_t reserved_07:1;
> uint64_t global:1;
> uint64_t ignored_11_09:3;
> uint64_t pfn:40; <================
> uint64_t ignored_62_52:11;
> uint64_t execute_disable:1;
> };
>
> > In above case the GPA
> > address 0x3fffbffff000 got cut-off into 0xffbffff000, then it caused
> > further lookup failure in the gpa2hva mapping.
> >
> > I didn't yet check any other test that may fail too on some hosts, but
> > logically any test using addr_gva2gpa() could suffer.
> >
> > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036
> > Signed-off-by: Peter Xu <[email protected]>
> > ---
> > tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index 9f000dfb5594..6c356fb4a9bf 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -587,7 +587,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> > if (!pte[index[0]].present)
> > goto unmapped_gva;
> >
> > - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
>
> This is but one of many paths that can get burned by pfn being 40 bits. The
> most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> place the myriad "pfn * vm->page_size" instances.

Yes, I'll respin.

>
> For a true long term solution, my vote is to do away with the bit field struct
> and use #define'd masks and whatnot.

I agree.

Thanks,

--
Peter Xu

2022-04-16 01:08:17

by Peter Xu

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On Thu, Apr 14, 2022 at 04:14:22PM +0200, Paolo Bonzini wrote:
> On 4/14/22 15:56, Sean Christopherson wrote:
> > > - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > > + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > This is but one of many paths that can get burned by pfn being 40 bits. The
> > most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> > place the myriad "pfn * vm->page_size" instances.
> >
> > For a true long term solution, my vote is to do away with the bit field struct
> > and use #define'd masks and whatnot.
>
> Yes, bitfields larger than 32 bits are a mess.

It's very interesting to know this..

I just tried out with <32 bits bitfield and indeed gcc will behave
differently, by having the calculation done with 32bit (eax) rather than
64bit (rax).

The question is for >=32 bits it needs an extra masking instruction, while
that does not exist for the <32bits bitfield.

---8<---
#include <stdio.h>

struct test1 {
unsigned long a:${X};
unsigned long b:10;
};

int main(void)
{
struct test1 val;
val.a = 0x1234;
printf("0x%lx\n", val.a * 16);
return 0;
}
---8<---

When X=20:

0000000000401126 <main>:
401126: 55 push %rbp
401127: 48 89 e5 mov %rsp,%rbp
40112a: 48 83 ec 10 sub $0x10,%rsp
40112e: 8b 45 f8 mov -0x8(%rbp),%eax
401131: 25 00 00 f0 ff and $0xfff00000,%eax
401136: 0d 34 12 00 00 or $0x1234,%eax
40113b: 89 45 f8 mov %eax,-0x8(%rbp)
40113e: 8b 45 f8 mov -0x8(%rbp),%eax
401141: 25 ff ff 0f 00 and $0xfffff,%eax
401146: c1 e0 04 shl $0x4,%eax <----------- calculation (no further masking)
401149: 89 c6 mov %eax,%esi
40114b: bf 10 20 40 00 mov $0x402010,%edi
401150: b8 00 00 00 00 mov $0x0,%eax
401155: e8 d6 fe ff ff callq 401030 <printf@plt>

When X=40:

0000000000401126 <main>:
401126: 55 push %rbp
401127: 48 89 e5 mov %rsp,%rbp
40112a: 48 83 ec 10 sub $0x10,%rsp
40112e: 48 8b 45 f8 mov -0x8(%rbp),%rax
401132: 48 ba 00 00 00 00 00 movabs $0xffffff0000000000,%rdx
401139: ff ff ff
40113c: 48 21 d0 and %rdx,%rax
40113f: 48 0d 34 12 00 00 or $0x1234,%rax
401145: 48 89 45 f8 mov %rax,-0x8(%rbp)
401149: 48 b8 ff ff ff ff ff movabs $0xffffffffff,%rax
401150: 00 00 00
401153: 48 23 45 f8 and -0x8(%rbp),%rax
401157: 48 c1 e0 04 shl $0x4,%rax <------------ calculation
40115b: 48 ba ff ff ff ff ff movabs $0xffffffffff,%rdx
401162: 00 00 00
401165: 48 21 d0 and %rdx,%rax <------------ masking (again)
401168: 48 89 c6 mov %rax,%rsi
40116b: bf 10 20 40 00 mov $0x402010,%edi
401170: b8 00 00 00 00 mov $0x0,%eax
401175: e8 b6 fe ff ff callq 401030 <printf@plt>

That feels a bit less consistent to me, comparing to clang where at least
the behavior keeps the same for whatever length of bits in the bitfields.

Thanks,

--
Peter Xu

2022-04-16 01:37:38

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On Thu, Apr 14, 2022 at 2:36 PM Peter Xu <[email protected]> wrote:
>
> On Thu, Apr 14, 2022 at 04:14:22PM +0200, Paolo Bonzini wrote:
> > On 4/14/22 15:56, Sean Christopherson wrote:
> > > > - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > > > + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> > > This is but one of many paths that can get burned by pfn being 40 bits. The
> > > most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> > > place the myriad "pfn * vm->page_size" instances.
> > >
> > > For a true long term solution, my vote is to do away with the bit field struct
> > > and use #define'd masks and whatnot.
> >
> > Yes, bitfields larger than 32 bits are a mess.
>
> It's very interesting to know this..

I don't think the undefined behavior is restricted to extended
bit-fields. Even for regular bit-fields, the C99 spec says, "A
bit-field shall have a type that is a qualified or unqualified version
of _Bool, signed
int, unsigned int, or some other implementation-defined type." One
might assume that even the permissive final clause refers to
fundamental language types, but I suppose "n-bit integer" meets the
strict definition of a "type,"
for arbitrary values of n.

2022-04-16 02:07:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] kvm: selftests: Fix cut-off of addr_gva2gpa lookup

On 4/14/22 15:56, Sean Christopherson wrote:
>> - return (pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
>> + return ((vm_paddr_t)pte[index[0]].pfn * vm->page_size) + (gva & 0xfffu);
> This is but one of many paths that can get burned by pfn being 40 bits. The
> most backport friendly fix is probably to add a pfn=>gpa helper and use that to
> place the myriad "pfn * vm->page_size" instances.
>
> For a true long term solution, my vote is to do away with the bit field struct
> and use #define'd masks and whatnot.

Yes, bitfields larger than 32 bits are a mess.

Paolo