2014-11-05 19:46:49

by David Matlack

[permalink] [raw]
Subject: [PATCH] kvm: x86: add trace event for pvclock updates

The new trace event records:
* the id of vcpu being updated
* the pvclock_vcpu_time_info struct being written to guest memory

This is useful for debugging pvclock bugs, such as the bug fixed by
"[PATCH] kvm: x86: Fix kvm clock versioning.".

Signed-off-by: David Matlack <[email protected]>
---
arch/x86/kvm/trace.h | 37 +++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
2 files changed, 39 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6b06ab8..c2a34bb 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -5,6 +5,7 @@
#include <asm/vmx.h>
#include <asm/svm.h>
#include <asm/clocksource.h>
+#include <asm/pvclock-abi.h>

#undef TRACE_SYSTEM
#define TRACE_SYSTEM kvm
@@ -877,6 +878,42 @@ TRACE_EVENT(kvm_ple_window,
#define trace_kvm_ple_window_shrink(vcpu_id, new, old) \
trace_kvm_ple_window(false, vcpu_id, new, old)

+TRACE_EVENT(kvm_pvclock_update,
+ TP_PROTO(unsigned int vcpu_id, struct pvclock_vcpu_time_info *pvclock),
+ TP_ARGS(vcpu_id, pvclock),
+
+ TP_STRUCT__entry(
+ __field( unsigned int, vcpu_id )
+ __field( __u32, version )
+ __field( __u64, tsc_timestamp )
+ __field( __u64, system_time )
+ __field( __u32, tsc_to_system_mul )
+ __field( __s8, tsc_shift )
+ __field( __u8, flags )
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu_id;
+ __entry->version = pvclock->version;
+ __entry->tsc_timestamp = pvclock->tsc_timestamp;
+ __entry->system_time = pvclock->system_time;
+ __entry->tsc_to_system_mul = pvclock->tsc_to_system_mul;
+ __entry->tsc_shift = pvclock->tsc_shift;
+ __entry->flags = pvclock->flags;
+ ),
+
+ TP_printk("vcpu_id %u, pvclock { version %u, tsc_timestamp 0x%llx, "
+ "system_time 0x%llx, tsc_to_system_mul 0x%x, tsc_shift %d, "
+ "flags 0x%x }",
+ __entry->vcpu_id,
+ __entry->version,
+ __entry->tsc_timestamp,
+ __entry->system_time,
+ __entry->tsc_to_system_mul,
+ __entry->tsc_shift,
+ __entry->flags)
+);
+
#endif /* _TRACE_KVM_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..0f533df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1662,6 +1662,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)

vcpu->hv_clock.flags = pvclock_flags;

+ trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
+
kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
&vcpu->hv_clock,
sizeof(vcpu->hv_clock));
--
2.1.0.rc2.206.gedb03e5


2014-11-06 10:53:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: add trace event for pvclock updates



On 05/11/2014 20:46, David Matlack wrote:
> The new trace event records:
> * the id of vcpu being updated
> * the pvclock_vcpu_time_info struct being written to guest memory
>
> This is useful for debugging pvclock bugs, such as the bug fixed by
> "[PATCH] kvm: x86: Fix kvm clock versioning.".
>
> Signed-off-by: David Matlack <[email protected]>
> ---
> arch/x86/kvm/trace.h | 37 +++++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 2 ++
> 2 files changed, 39 insertions(+)
>
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 6b06ab8..c2a34bb 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -5,6 +5,7 @@
> #include <asm/vmx.h>
> #include <asm/svm.h>
> #include <asm/clocksource.h>
> +#include <asm/pvclock-abi.h>
>
> #undef TRACE_SYSTEM
> #define TRACE_SYSTEM kvm
> @@ -877,6 +878,42 @@ TRACE_EVENT(kvm_ple_window,
> #define trace_kvm_ple_window_shrink(vcpu_id, new, old) \
> trace_kvm_ple_window(false, vcpu_id, new, old)
>
> +TRACE_EVENT(kvm_pvclock_update,
> + TP_PROTO(unsigned int vcpu_id, struct pvclock_vcpu_time_info *pvclock),
> + TP_ARGS(vcpu_id, pvclock),
> +
> + TP_STRUCT__entry(
> + __field( unsigned int, vcpu_id )
> + __field( __u32, version )
> + __field( __u64, tsc_timestamp )
> + __field( __u64, system_time )
> + __field( __u32, tsc_to_system_mul )
> + __field( __s8, tsc_shift )
> + __field( __u8, flags )
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu_id = vcpu_id;
> + __entry->version = pvclock->version;
> + __entry->tsc_timestamp = pvclock->tsc_timestamp;
> + __entry->system_time = pvclock->system_time;
> + __entry->tsc_to_system_mul = pvclock->tsc_to_system_mul;
> + __entry->tsc_shift = pvclock->tsc_shift;
> + __entry->flags = pvclock->flags;
> + ),
> +
> + TP_printk("vcpu_id %u, pvclock { version %u, tsc_timestamp 0x%llx, "
> + "system_time 0x%llx, tsc_to_system_mul 0x%x, tsc_shift %d, "
> + "flags 0x%x }",
> + __entry->vcpu_id,
> + __entry->version,
> + __entry->tsc_timestamp,
> + __entry->system_time,
> + __entry->tsc_to_system_mul,
> + __entry->tsc_shift,
> + __entry->flags)
> +);
> +
> #endif /* _TRACE_KVM_H */
>
> #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0033df3..0f533df 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1662,6 +1662,8 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
>
> vcpu->hv_clock.flags = pvclock_flags;
>
> + trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
> +
> kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
> &vcpu->hv_clock,
> sizeof(vcpu->hv_clock));
>

Applied, thanks.

Paolo

2014-11-11 01:19:09

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: add trace event for pvclock updates

On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote:
> The new trace event records:
> * the id of vcpu being updated
> * the pvclock_vcpu_time_info struct being written to guest memory
>
> This is useful for debugging pvclock bugs, such as the bug fixed by
> "[PATCH] kvm: x86: Fix kvm clock versioning.".
>
> Signed-off-by: David Matlack <[email protected]>

So you actually hit that bug in practice? Can you describe the
scenario?

2014-11-12 18:00:57

by David Matlack

[permalink] [raw]
Subject: Re: [PATCH] kvm: x86: add trace event for pvclock updates

On 11/10 11:18 PM, Marcelo Tosatti wrote:
> On Wed, Nov 05, 2014 at 11:46:42AM -0800, David Matlack wrote:
> > The new trace event records:
> > * the id of vcpu being updated
> > * the pvclock_vcpu_time_info struct being written to guest memory
> >
> > This is useful for debugging pvclock bugs, such as the bug fixed by
> > "[PATCH] kvm: x86: Fix kvm clock versioning.".
> >
> > Signed-off-by: David Matlack <[email protected]>
>
> So you actually hit that bug in practice? Can you describe the
> scenario?

We noticed guests running stress workloads would occasionally get stuck
on the far side of a save/restore. Inspecting the guest state revealed
arch/x86/kernel/pvclock.c:last_value was stuck at a value like
8020566108469899263, despite TSC and pvclock looking sane.

Since these guests ran without PVCLOCK_TSC_STABLE_BIT set in their
CPUID, they were stuck with this large time value until real time
caught up (in about 250 years :).

We've been unable to reproduce the bug with "kvm: x86: Fix kvm clock
versioning." so we didn't invest in catching the overflow in the act,
but a likely explanation is the guest gets save/restore-ed while
computing the pvclock delta:

u64 delta = __native_read_tsc() - src->tsc_timestamp;

causing the subtraction to underflow and delta to be huge.

>
>