2024-01-11 13:59:18

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

xen_shinfo_test is observed to be flaky failing sporadically with
"VM time too old". With min_ts/max_ts debug print added:

Wall clock (v 3269818) 1704906491.986255664
Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
min_ts: 1704906491.986312153
max_ts: 1704906506.001006963
==== Test Assertion Failure ====
x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
pid=32724 tid=32724 errno=4 - Interrupted system call
1 0x00000000004030ad: main at xen_shinfo_test.c:1003
2 0x00007fca6b23feaf: ?? ??:0
3 0x00007fca6b23ff5f: ?? ??:0
4 0x0000000000405e04: _start at ??:?
VM time too old

The test compares wall clock data from shinfo (which is the output of
kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the
host system before the VM is created. In the example above, it compares

shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153

and fails as the later is greater than the former. While this sounds like
a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch()
calculates guest's epoch (realtime when the guest was created) by
subtracting kvmclock from the current realtime and the calculation happens
when shinfo is setup. The problem is that kvmclock is a raw clock and
realtime clock is affected by NTP. This means that if realtime ticks with a
slightly reduced frequency, "guest's epoch" calculated by
kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big
issue from guest's perspective as the guest can't really observe this but
this epoch can't be compared with a fixed clock_gettime() on the host.

Replace the check with comparing wall clock data from shinfo to
KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
can be calculated by subtraction. Note, the computed epoch may still differ
a few nanoseconds from shinfo as different TSC is used and there are
rounding errors but 100 nanoseconds margin should be enough to cover
it (famous last words).

Reported-by: Jan Richter <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 36 ++++++++-----------
1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 9ec9ab60b63e..5e1ad243d95d 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -375,20 +375,6 @@ static void guest_code(void)
GUEST_SYNC(TEST_DONE);
}

-static int cmp_timespec(struct timespec *a, struct timespec *b)
-{
- if (a->tv_sec > b->tv_sec)
- return 1;
- else if (a->tv_sec < b->tv_sec)
- return -1;
- else if (a->tv_nsec > b->tv_nsec)
- return 1;
- else if (a->tv_nsec < b->tv_nsec)
- return -1;
- else
- return 0;
-}
-
static struct vcpu_info *vinfo;
static struct kvm_vcpu *vcpu;

@@ -425,7 +411,6 @@ static void *juggle_shinfo_state(void *arg)

int main(int argc, char *argv[])
{
- struct timespec min_ts, max_ts, vm_ts;
struct kvm_xen_hvm_attr evt_reset;
struct kvm_vm *vm;
pthread_t thread;
@@ -443,8 +428,6 @@ int main(int argc, char *argv[])
bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);

- clock_gettime(CLOCK_REALTIME, &min_ts);
-
vm = vm_create_with_one_vcpu(&vcpu, guest_code);

/* Map a region for the shared_info page */
@@ -969,7 +952,6 @@ int main(int argc, char *argv[])
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);

alarm(0);
- clock_gettime(CLOCK_REALTIME, &max_ts);

/*
* Just a *really* basic check that things are being put in the
@@ -978,11 +960,16 @@ int main(int argc, char *argv[])
*/
struct pvclock_wall_clock *wc;
struct pvclock_vcpu_time_info *ti, *ti2;
+ struct kvm_clock_data kcdata;
+ long long delta;

wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00);
ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20);
ti2 = addr_gpa2hva(vm, PVTIME_ADDR);

+ vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
+ delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
+
if (verbose) {
printf("Wall clock (v %d) %d.%09d\n", wc->version, wc->sec, wc->nsec);
printf("Time info 1: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n",
@@ -991,14 +978,19 @@ int main(int argc, char *argv[])
printf("Time info 2: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n",
ti2->version, ti2->tsc_timestamp, ti2->system_time, ti2->tsc_to_system_mul,
ti2->tsc_shift, ti2->flags);
+ printf("KVM_GET_CLOCK realtime: %lld.%09lld\n", kcdata.realtime / NSEC_PER_SEC,
+ kcdata.realtime % NSEC_PER_SEC);
+ printf("KVM_GET_CLOCK clock: %lld.%09lld\n", kcdata.clock / NSEC_PER_SEC,
+ kcdata.clock % NSEC_PER_SEC);
}

- vm_ts.tv_sec = wc->sec;
- vm_ts.tv_nsec = wc->nsec;
TEST_ASSERT(wc->version && !(wc->version & 1),
"Bad wallclock version %x", wc->version);
- TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old");
- TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new");
+
+ TEST_ASSERT(llabs(delta) < 100,
+ "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
+ wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
+ (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);

TEST_ASSERT(ti->version && !(ti->version & 1),
"Bad time_info version %x", ti->version);
--
2.43.0



2024-01-29 09:05:11

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

Vitaly Kuznetsov <[email protected]> writes:

> xen_shinfo_test is observed to be flaky failing sporadically with
> "VM time too old". With min_ts/max_ts debug print added:
>
> Wall clock (v 3269818) 1704906491.986255664
> Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> min_ts: 1704906491.986312153
> max_ts: 1704906506.001006963
> ==== Test Assertion Failure ====
> x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> pid=32724 tid=32724 errno=4 - Interrupted system call
> 1 0x00000000004030ad: main at xen_shinfo_test.c:1003
> 2 0x00007fca6b23feaf: ?? ??:0
> 3 0x00007fca6b23ff5f: ?? ??:0
> 4 0x0000000000405e04: _start at ??:?
> VM time too old
>
> The test compares wall clock data from shinfo (which is the output of
> kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the
> host system before the VM is created. In the example above, it compares
>
> shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153
>
> and fails as the later is greater than the former. While this sounds like
> a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch()
> calculates guest's epoch (realtime when the guest was created) by
> subtracting kvmclock from the current realtime and the calculation happens
> when shinfo is setup. The problem is that kvmclock is a raw clock and
> realtime clock is affected by NTP. This means that if realtime ticks with a
> slightly reduced frequency, "guest's epoch" calculated by
> kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big
> issue from guest's perspective as the guest can't really observe this but
> this epoch can't be compared with a fixed clock_gettime() on the host.
>
> Replace the check with comparing wall clock data from shinfo to
> KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> can be calculated by subtraction. Note, the computed epoch may still differ
> a few nanoseconds from shinfo as different TSC is used and there are
> rounding errors but 100 nanoseconds margin should be enough to cover
> it (famous last words).
>
> Reported-by: Jan Richter <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> .../selftests/kvm/x86_64/xen_shinfo_test.c | 36 ++++++++-----------
> 1 file changed, 14 insertions(+), 22 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
> index 9ec9ab60b63e..5e1ad243d95d 100644
> --- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
> @@ -375,20 +375,6 @@ static void guest_code(void)
> GUEST_SYNC(TEST_DONE);
> }
>
> -static int cmp_timespec(struct timespec *a, struct timespec *b)
> -{
> - if (a->tv_sec > b->tv_sec)
> - return 1;
> - else if (a->tv_sec < b->tv_sec)
> - return -1;
> - else if (a->tv_nsec > b->tv_nsec)
> - return 1;
> - else if (a->tv_nsec < b->tv_nsec)
> - return -1;
> - else
> - return 0;
> -}
> -
> static struct vcpu_info *vinfo;
> static struct kvm_vcpu *vcpu;
>
> @@ -425,7 +411,6 @@ static void *juggle_shinfo_state(void *arg)
>
> int main(int argc, char *argv[])
> {
> - struct timespec min_ts, max_ts, vm_ts;
> struct kvm_xen_hvm_attr evt_reset;
> struct kvm_vm *vm;
> pthread_t thread;
> @@ -443,8 +428,6 @@ int main(int argc, char *argv[])
> bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
> bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);
>
> - clock_gettime(CLOCK_REALTIME, &min_ts);
> -
> vm = vm_create_with_one_vcpu(&vcpu, guest_code);
>
> /* Map a region for the shared_info page */
> @@ -969,7 +952,6 @@ int main(int argc, char *argv[])
> vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset);
>
> alarm(0);
> - clock_gettime(CLOCK_REALTIME, &max_ts);
>
> /*
> * Just a *really* basic check that things are being put in the
> @@ -978,11 +960,16 @@ int main(int argc, char *argv[])
> */
> struct pvclock_wall_clock *wc;
> struct pvclock_vcpu_time_info *ti, *ti2;
> + struct kvm_clock_data kcdata;
> + long long delta;
>
> wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00);
> ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20);
> ti2 = addr_gpa2hva(vm, PVTIME_ADDR);
>
> + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
> + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
> +
> if (verbose) {
> printf("Wall clock (v %d) %d.%09d\n", wc->version, wc->sec, wc->nsec);
> printf("Time info 1: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n",
> @@ -991,14 +978,19 @@ int main(int argc, char *argv[])
> printf("Time info 2: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n",
> ti2->version, ti2->tsc_timestamp, ti2->system_time, ti2->tsc_to_system_mul,
> ti2->tsc_shift, ti2->flags);
> + printf("KVM_GET_CLOCK realtime: %lld.%09lld\n", kcdata.realtime / NSEC_PER_SEC,
> + kcdata.realtime % NSEC_PER_SEC);
> + printf("KVM_GET_CLOCK clock: %lld.%09lld\n", kcdata.clock / NSEC_PER_SEC,
> + kcdata.clock % NSEC_PER_SEC);
> }
>
> - vm_ts.tv_sec = wc->sec;
> - vm_ts.tv_nsec = wc->nsec;
> TEST_ASSERT(wc->version && !(wc->version & 1),
> "Bad wallclock version %x", wc->version);
> - TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old");
> - TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new");
> +
> + TEST_ASSERT(llabs(delta) < 100,
> + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
> + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
> + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
>
> TEST_ASSERT(ti->version && !(ti->version & 1),
> "Bad time_info version %x", ti->version);

Ping?

--
Vitaly


2024-01-29 16:06:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

On Mon, Jan 29, 2024, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <[email protected]> writes:
>
> > xen_shinfo_test is observed to be flaky failing sporadically with
> > "VM time too old". With min_ts/max_ts debug print added:
> >
> > Wall clock (v 3269818) 1704906491.986255664
> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > min_ts: 1704906491.986312153
> > max_ts: 1704906506.001006963
> > ==== Test Assertion Failure ====
> > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> > pid=32724 tid=32724 errno=4 - Interrupted system call
> > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003
> > 2 0x00007fca6b23feaf: ?? ??:0
> > 3 0x00007fca6b23ff5f: ?? ??:0
> > 4 0x0000000000405e04: _start at ??:?
> > VM time too old
> >
> > The test compares wall clock data from shinfo (which is the output of
> > kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the
> > host system before the VM is created. In the example above, it compares
> >
> > shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153
> >
> > and fails as the later is greater than the former. While this sounds like
> > a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch()
> > calculates guest's epoch (realtime when the guest was created) by
> > subtracting kvmclock from the current realtime and the calculation happens
> > when shinfo is setup. The problem is that kvmclock is a raw clock and
> > realtime clock is affected by NTP. This means that if realtime ticks with a
> > slightly reduced frequency, "guest's epoch" calculated by
> > kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big
> > issue from guest's perspective as the guest can't really observe this but
> > this epoch can't be compared with a fixed clock_gettime() on the host.
> >
> > Replace the check with comparing wall clock data from shinfo to
> > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> > can be calculated by subtraction. Note, the computed epoch may still differ
> > a few nanoseconds from shinfo as different TSC is used and there are
> > rounding errors but 100 nanoseconds margin should be enough to cover
> > it (famous last words).
> >
> > Reported-by: Jan Richter <[email protected]>
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > ---

David, any objection?

2024-01-31 01:02:50

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote:
> xen_shinfo_test is observed to be flaky failing sporadically with
> "VM time too old". With min_ts/max_ts debug print added:
>
> Wall clock (v 3269818) 1704906491.986255664
> Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> min_ts: 1704906491.986312153
> max_ts: 1704906506.001006963
> ==== Test Assertion Failure ====
> x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> pid=32724 tid=32724 errno=4 - Interrupted system call
> 1 0x00000000004030ad: main at xen_shinfo_test.c:1003
> 2 0x00007fca6b23feaf: ?? ??:0
> 3 0x00007fca6b23ff5f: ?? ??:0
> 4 0x0000000000405e04: _start at ??:?
> VM time too old
>
> [...]

Applied to kvm-x86 selftests. David, please holler if you disagree with
any of the changes. They look sane to me, but clock stuff ain't my forte.

Thanks!

[1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
https://github.com/kvm-x86/linux/commit/a0868e7c5575

--
https://github.com/kvm-x86/linux/tree/next

2024-01-31 16:34:51

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

Sorry for delayed response.

On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote:
>
> +       vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
> +       delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);

I think you need to check for KVM_CLOCK_REALTIME in the flags here,
don't you? It might not always be set.

And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands
at the moment. It's fundamentally broken because it should always have
used CLOCK_TAI not CLOCK_REALTIME.

I'm in the process of fixing that up as an incremental ABI change,
putting the TAI offset into one of the spare pad fields in the
kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag.

> + TEST_ASSERT(llabs(delta) < 100,
> + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
> + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
> + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
>

> Replace the check with comparing wall clock data from shinfo to
> KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> can be calculated by subtraction. Note, the computed epoch may still differ
> a few nanoseconds from shinfo as different TSC is used and there are
> rounding errors but 100 nanoseconds margin should be enough to cover
> it (famous last words).

Aren't those just bugs? Surely this *should* be cycle-perfect, if the
host has a stable TSC?

But I suppose this isn't the test case for that. This is just testing
that the Xen shinfo is doing basically the right thing.

And we're going to need a few more fixes like this WIP before we get
get to cycle perfection from the horrid mess that is our kvmclock code:
https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4


Attachments:
smime.p7s (5.83 kB)

2024-02-01 10:20:56

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

David Woodhouse <[email protected]> writes:

> Sorry for delayed response.
>
> On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote:
>>
>> +       vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
>> +       delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
>
> I think you need to check for KVM_CLOCK_REALTIME in the flags here,
> don't you? It might not always be set.

Good suggestion; while this shouldn't be a common use-case on x86_64, it
is possible when TSC is *not* the clocksource used on the host. I guess
we can just skip the sub-test with a message in this case.

>
> And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands
> at the moment. It's fundamentally broken because it should always have
> used CLOCK_TAI not CLOCK_REALTIME.
>
> I'm in the process of fixing that up as an incremental ABI change,
> putting the TAI offset into one of the spare pad fields in the
> kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag.
>
>> + TEST_ASSERT(llabs(delta) < 100,
>> + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
>> + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
>> + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
>>
>
>> Replace the check with comparing wall clock data from shinfo to
>> KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
>> can be calculated by subtraction. Note, the computed epoch may still differ
>> a few nanoseconds from shinfo as different TSC is used and there are
>> rounding errors but 100 nanoseconds margin should be enough to cover
>> it (famous last words).
>
> Aren't those just bugs? Surely this *should* be cycle-perfect, if the
> host has a stable TSC?
>
> But I suppose this isn't the test case for that. This is just testing
> that the Xen shinfo is doing basically the right thing.
>
> And we're going to need a few more fixes like this WIP before we get
> get to cycle perfection from the horrid mess that is our kvmclock code:
> https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4

(Some time ago I was considering suggesting we switch to Hypet-V TSC page
clocksource for Linux guests to avoid fixing the mess :-) It's becoming
less and less relevant as with modern hardware which supports TSC
scaling (and which doesn't support famous TSC bugs like TSC divergence
across sockets :-) passing through raw TSC to guests is becoming more
and more popular.)

Regarding this fix, what's your decree? My intention here is to
basically get rid on xen_shinfo flakyness without reducing test
coverage. I.e. we still want to test that shinfo data is somewhat
sane. Later, when you get your 'cycle perfection' and 'TAI' patches in,
we can always adjust the test accordingly. In case you agree, I can
re-send this with KVM_CLOCK_REALTIME check added.

--
Vitaly


2024-02-01 17:05:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

On Tue, Jan 30, 2024, Sean Christopherson wrote:
> On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote:
> > xen_shinfo_test is observed to be flaky failing sporadically with
> > "VM time too old". With min_ts/max_ts debug print added:
> >
> > Wall clock (v 3269818) 1704906491.986255664
> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
> > min_ts: 1704906491.986312153
> > max_ts: 1704906506.001006963
> > ==== Test Assertion Failure ====
> > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
> > pid=32724 tid=32724 errno=4 - Interrupted system call
> > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003
> > 2 0x00007fca6b23feaf: ?? ??:0
> > 3 0x00007fca6b23ff5f: ?? ??:0
> > 4 0x0000000000405e04: _start at ??:?
> > VM time too old
> >
> > [...]
>
> Applied to kvm-x86 selftests. David, please holler if you disagree with
> any of the changes. They look sane to me, but clock stuff ain't my forte.
>
> Thanks!
>
> [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
> https://github.com/kvm-x86/linux/commit/a0868e7c5575

FYI, I've dropped this as it looks like there will be a new version.

2024-02-01 18:03:11

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

On Thu, 2024-02-01 at 11:19 +0100, Vitaly Kuznetsov wrote:
> David Woodhouse <[email protected]> writes:
>
> > Sorry for delayed response.
> >
> > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote:
> > >
> > > +       vm_ioctl(vm, KVM_GET_CLOCK, &kcdata);
> > > +       delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock);
> >
> > I think you need to check for KVM_CLOCK_REALTIME in the flags here,
> > don't you? It might not always be set.
>
> Good suggestion; while this shouldn't be a common use-case on x86_64, it
> is possible when TSC is *not* the clocksource used on the host. I guess
> we can just skip the sub-test with a message in this case.
>
> >
> > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands
> > at the moment. It's fundamentally broken because it should always have
> > used CLOCK_TAI not CLOCK_REALTIME.
> >
> > I'm in the process of fixing that up as an incremental ABI change,
> > putting the TAI offset into one of the spare pad fields in the
> > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag.
> >
> > > +       TEST_ASSERT(llabs(delta) < 100,
> > > +                   "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld",
> > > +                   wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC,
> > > +                   (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC);
> > >
> >
> > > Replace the check with comparing wall clock data from shinfo to
> > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
> > > can be calculated by subtraction. Note, the computed epoch may still differ
> > > a few nanoseconds from shinfo as different TSC is used and there are
> > > rounding errors but 100 nanoseconds margin should be enough to cover
> > > it (famous last words).
> >
> > Aren't those just bugs? Surely this *should* be cycle-perfect, if the
> > host has a stable TSC?
> >
> > But I suppose this isn't the test case for that. This is just testing
> > that the Xen shinfo is doing basically the right thing.
> >
> > And we're going to need a few more fixes like this WIP before we get
> > get to cycle perfection from the horrid mess that is our kvmclock code:
> > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4
>
> (Some time ago I was considering suggesting we switch to Hypet-V TSC page
> clocksource for Linux guests to avoid fixing the mess :-) It's becoming
> less and less relevant as with modern hardware which supports TSC
> scaling (and which doesn't support famous TSC bugs like TSC divergence
> across sockets :-) passing through raw TSC to guests is becoming more
> and more popular.)

Except for migration, which is kind of hosed. And live update too.
Where you are only doing a kexec on the *same* host and your time
source is the *same* TSC, there's no excuse for anything less than
cycle perfection. But KVM is a long way from that :)

> Regarding this fix, what's your decree? My intention here is to
> basically get rid on xen_shinfo flakyness without reducing test
> coverage. I.e. we still want to test that shinfo data is somewhat
> sane. Later, when you get your 'cycle perfection' and 'TAI' patches in,
> we can always adjust the test accordingly. In case you agree, I can
> re-send this with KVM_CLOCK_REALTIME check added.

I think just the KVM_CLOCK_REALTIME check is all we need. No need for
*this* test to be pedantic about the clock precision; mostly it's just
checking that something that looks a *bit* like a time value is written
to the right place in guest memory.

So forget TAI, accept a slop of over 1s, with a snarky comment that the
API using CLOCK_REALTIME should never have been accepted into the
kernel in the first place. (I mean that; we don't want people cutting
and pasting it as an example).


Attachments:
smime.p7s (5.83 kB)

2024-02-02 23:28:07

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK

Sean Christopherson <[email protected]> writes:

> On Tue, Jan 30, 2024, Sean Christopherson wrote:
>> On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote:
>> > xen_shinfo_test is observed to be flaky failing sporadically with
>> > "VM time too old". With min_ts/max_ts debug print added:
>> >
>> > Wall clock (v 3269818) 1704906491.986255664
>> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
>> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
>> > min_ts: 1704906491.986312153
>> > max_ts: 1704906506.001006963
>> > ==== Test Assertion Failure ====
>> > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
>> > pid=32724 tid=32724 errno=4 - Interrupted system call
>> > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003
>> > 2 0x00007fca6b23feaf: ?? ??:0
>> > 3 0x00007fca6b23ff5f: ?? ??:0
>> > 4 0x0000000000405e04: _start at ??:?
>> > VM time too old
>> >
>> > [...]
>>
>> Applied to kvm-x86 selftests. David, please holler if you disagree with
>> any of the changes. They look sane to me, but clock stuff ain't my forte.
>>
>> Thanks!
>>
>> [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
>> https://github.com/kvm-x86/linux/commit/a0868e7c5575
>
> FYI, I've dropped this as it looks like there will be a new version.

Thanks, indeed, I'll do do changes suggested by David in v2 but as I'm
traveling this weekend (FOSDEM calls!) please expect my posting Tuesday+
next week.

--
Vitaly