2011-05-12 13:37:00

by Dhaval Giani

[permalink] [raw]
Subject: [PATCH] kvm: log directly from the guest to the host kvm buffer

Hi,

As part of some of the work for my project, I have been looking at
tracing some of the events in the guest from inside the host. In
my usecase, I have been looking to co-relate the time of a network
packet arrival with that in the host. ftrace makes such arbitrary
use quite simple, so I went ahead an extended this functionality
in terms of a hypercall. There are still a few issues with this patch.

1. For some reason, the first time the hypercall is called, it works
just fine, but the second invocation refuses to happen. I am still
clueless about it. (and am looking for hints :-) )
2. I am not very sure if I got the demarcation between the guest and
the host code fine or not. Someone more experienced than me should take
a look at the code as well :-)
3. This adds a new paravirt call.
4. This has been implemented just for x86 as of now. If there is enough
interest, I will look to make it more generic to be used across other
architectures. However, it is quite easy to do the same.
5. It still does not have all the fancy ftrace features, but again,
depending on the interest, I can add all those in.
6. Create a config option for this feature.

I think such a feature is useful for debugging purposes and might make
sense to carry upstream.

Thanks,
Dhaval

---
kvm: log directly from the guest to the host kvm buffer

Add a new hypercall kvm_pv_ftrace() which logs in the host's
ftrace buffer. To use, the caller should use the kvm_ftrace
macro in the guest. This is still very early code and does not fully
work.

Signed-off-by: Dhaval Giani <[email protected]>

---
arch/x86/include/asm/kvm_para.h | 11 +++++++++++
arch/x86/kernel/kvm.c | 22 ++++++++++++++++++++++
arch/x86/kvm/x86.c | 18 ++++++++++++++++++
include/linux/kvm_para.h | 1 +
4 files changed, 52 insertions(+)

Index: linux-2.6/arch/x86/kvm/x86.c
===================================================================
--- linux-2.6.orig/arch/x86/kvm/x86.c
+++ linux-2.6/arch/x86/kvm/x86.c
@@ -4832,6 +4832,21 @@ int kvm_hv_hypercall(struct kvm_vcpu *vc
return 1;
}

+int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
+{
+ int ret;
+ char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
+
+ ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
+
+ trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
+ vcpu->kvm, vcpu->vcpu_id, ip, fmt);
+
+ kfree(fmt);
+
+ return 0;
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -4868,6 +4883,9 @@ int kvm_emulate_hypercall(struct kvm_vcp
case KVM_HC_MMU_OP:
r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
break;
+ case KVM_HC_FTRACE:
+ ret = kvm_pv_ftrace_printk(vcpu, a0, hc_gpa(vcpu, a1, a2));
+ break;
default:
ret = -KVM_ENOSYS;
break;
Index: linux-2.6/include/linux/kvm_para.h
===================================================================
--- linux-2.6.orig/include/linux/kvm_para.h
+++ linux-2.6/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
#define KVM_HC_MMU_OP 2
#define KVM_HC_FEATURES 3
#define KVM_HC_PPC_MAP_MAGIC_PAGE 4
+#define KVM_HC_FTRACE 5

/*
* hypercalls use architecture specific
Index: linux-2.6/arch/x86/kernel/kvm.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/kvm.c
+++ linux-2.6/arch/x86/kernel/kvm.c
@@ -274,6 +274,28 @@ static void kvm_mmu_op(void *buffer, uns
} while (len);
}

+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...)
+{
+ char *buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ int ret;
+ unsigned long a1, a2;
+ va_list args;
+ int i;
+
+ va_start(args, fmt);
+ i = vsnprintf(buffer, PAGE_SIZE, fmt, args);
+ va_end(args);
+
+ a1 = __pa(buffer);
+ a2 = 0;
+
+ ret = kvm_hypercall3(KVM_HC_FTRACE, ip, a1, a2);
+
+ kfree(buffer);
+ return ret;
+}
+EXPORT_SYMBOL(kvm_ftrace_printk);
+
static void mmu_queue_flush(struct kvm_para_state *state)
{
if (state->mmu_queue_len) {
Index: linux-2.6/arch/x86/include/asm/kvm_para.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/kvm_para.h
+++ linux-2.6/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,10 @@
*/
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
+/*
+ * Right now an experiment, hopefully it works
+ */
+#define KVM_FEATURE_FTRACE 5

/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
@@ -188,6 +192,13 @@ static inline u32 kvm_read_and_reset_pf_
}
#endif

+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...);
+
+#define kvm_ftrace(fmt, args...) \
+ do { \
+ kvm_ftrace_printk(_THIS_IP_, fmt, ##args); \
+ } while(0)
+
#endif /* __KERNEL__ */

#endif /* _ASM_X86_KVM_PARA_H */


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2011-05-12 15:13:23

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

On 05/12/2011 04:36 PM, Dhaval Giani wrote:
> Hi,
>
> As part of some of the work for my project, I have been looking at
> tracing some of the events in the guest from inside the host. In
> my usecase, I have been looking to co-relate the time of a network
> packet arrival with that in the host. ftrace makes such arbitrary
> use quite simple, so I went ahead an extended this functionality
> in terms of a hypercall. There are still a few issues with this patch.
>
> 1. For some reason, the first time the hypercall is called, it works
> just fine, but the second invocation refuses to happen. I am still
> clueless about it. (and am looking for hints :-) )
> 2. I am not very sure if I got the demarcation between the guest and
> the host code fine or not. Someone more experienced than me should take
> a look at the code as well :-)
> 3. This adds a new paravirt call.
> 4. This has been implemented just for x86 as of now. If there is enough
> interest, I will look to make it more generic to be used across other
> architectures. However, it is quite easy to do the same.
> 5. It still does not have all the fancy ftrace features, but again,
> depending on the interest, I can add all those in.
> 6. Create a config option for this feature.
>
> I think such a feature is useful for debugging purposes and might make
> sense to carry upstream.

I guess it could help things like virtio/vhost development and profiling.


I think that one hypercall per trace is too expensive. Tracing is meant
to be lightweight! I think the guest can log to a buffer, which is
flushed on overflow or when a vmexit occurs. That gives us automatic
serialization between a vcpu and the cpu it runs on, but not between a
vcpu and a different host cpu.

>
> +int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
> +{
> + int ret;
> + char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
> +
> + ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
> +
> + trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
> + vcpu->kvm, vcpu->vcpu_id, ip, fmt);
> +
> + kfree(fmt);
> +
> + return 0;
> +}

A kmalloc and printf seem expensive here. I'd prefer to log the
arguments and format descriptor instead. Similarly the guest should
pass unformatted parameters.+int kvm_ftrace_printk(unsigned long ip,
const char *fmt, ...)
> +{
> + char *buffer = kzalloc(PAGE_SIZE, GFP_KERNEL);
> + int ret;
> + unsigned long a1, a2;
> + va_list args;
> + int i;
> +
> + va_start(args, fmt);
> + i = vsnprintf(buffer, PAGE_SIZE, fmt, args);
> + va_end(args);
> +
> + a1 = __pa(buffer);
> + a2 = 0;
> +
> + ret = kvm_hypercall3(KVM_HC_FTRACE, ip, a1, a2);
> +
> + kfree(buffer);
> + return ret;
> +}
> +EXPORT_SYMBOL(kvm_ftrace_printk);
> +
> static void mmu_queue_flush(struct kvm_para_state *state)
> {

--
error compiling committee.c: too many arguments to function

2011-05-12 15:39:51

by Dhaval Giani

[permalink] [raw]
Subject: Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

On Thu, May 12, 2011 at 5:13 PM, Avi Kivity <[email protected]> wrote:
> On 05/12/2011 04:36 PM, Dhaval Giani wrote:
>>
>> Hi,
>>
>> As part of some of the work for my project, I have been looking at
>> tracing some of the events in the guest from inside the host. In
>> my usecase, I have been looking to co-relate the time of a network
>> packet arrival with that in the host. ftrace makes such arbitrary
>> use quite simple, so I went ahead an extended this functionality
>> in terms of a hypercall. There are still a few issues with this patch.
>>
>> 1. For some reason, the first time the hypercall is called, it works
>> just fine, but the second invocation refuses to happen. I am still
>> clueless about it. (and am looking for hints :-) )
>> 2. I am not very sure if I got the demarcation between the guest and
>> the host code fine or not. Someone more experienced than me should take
>> a look at the code as well :-)
>> 3. This adds a new paravirt call.
>> 4. This has been implemented just for x86 as of now. If there is enough
>> interest, I will look to make it more generic to be used across other
>> architectures. However, it is quite easy to do the same.
>> 5. It still does not have all the fancy ftrace features, but again,
>> depending on the interest, I can add all those in.
>> 6. Create a config option for this feature.
>>
>> I think such a feature is useful for debugging purposes and might make
>> sense to carry upstream.
>
> I guess it could help things like virtio/vhost development and profiling.
>

Exactly what i am using it for.

>
> I think that one hypercall per trace is too expensive. ?Tracing is meant to
> be lightweight! ?I think the guest can log to a buffer, which is flushed on
> overflow or when a vmexit occurs. ?That gives us automatic serialization
> between a vcpu and the cpu it runs on, but not between a vcpu and a
> different host cpu.
>

hmm. So, basically, log all of these events, and then send them to the
host either on an exit, or when your buffer fills up. There is one
problem with approach though. One of the reasons I wanted this
approach was beacuse i wanted to co-relate the guest and the host
times. (which is why I kept is synchronous). I lose out that
information with what you say. However I see your point about the
overhead. I will think about this a bit more.

>>
>> +int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
>> +{
>> + ? ? ? int ret;
>> + ? ? ? char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
>> +
>> + ? ? ? ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
>> +
>> + ? ? ? trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? vcpu->kvm, vcpu->vcpu_id, ip, fmt);
>> +
>> + ? ? ? kfree(fmt);
>> +
>> + ? ? ? return 0;
>> +}
>
> A kmalloc and printf seem expensive here. ?I'd prefer to log the arguments
> and format descriptor instead. ?Similarly the guest should pass unformatted
> parameters.+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...)
>>

the trace_printk is actually quite cheap (IIRC), but I guess Steve is
the best person to let us know about that. We can avoid the kzalloc
overhead though.

Dhaval

2011-05-12 15:43:04

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

On 05/12/2011 06:39 PM, Dhaval Giani wrote:
> >
> > I think that one hypercall per trace is too expensive. Tracing is meant to
> > be lightweight! I think the guest can log to a buffer, which is flushed on
> > overflow or when a vmexit occurs. That gives us automatic serialization
> > between a vcpu and the cpu it runs on, but not between a vcpu and a
> > different host cpu.
> >
>
> hmm. So, basically, log all of these events, and then send them to the
> host either on an exit, or when your buffer fills up. There is one
> problem with approach though. One of the reasons I wanted this
> approach was beacuse i wanted to co-relate the guest and the host
> times. (which is why I kept is synchronous). I lose out that
> information with what you say. However I see your point about the
> overhead. I will think about this a bit more.

You might use kvmclock to get a zero-exit (but not zero-cost) time which
can be correlated.

Another option is to use xadd on a shared memory area to have a global
counter incremented. However that can be slow on large machines, and is
hard to do securely with multiple guests.


--
error compiling committee.c: too many arguments to function

2011-05-13 00:00:07

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

On Thu, May 12, 2011 at 8:42 AM, Avi Kivity <[email protected]> wrote:
>
> On 05/12/2011 06:39 PM, Dhaval Giani wrote:
>>
>> >
>> > ?I think that one hypercall per trace is too expensive. ?Tracing is meant to
>> > ?be lightweight! ?I think the guest can log to a buffer, which is flushed on
>> > ?overflow or when a vmexit occurs. ?That gives us automatic serialization
>> > ?between a vcpu and the cpu it runs on, but not between a vcpu and a
>> > ?different host cpu.
>> >
>>
>> hmm. So, basically, log all of these events, and then send them to the
>> host either on an exit, or when your buffer fills up. There is one
>> problem with approach though. One of the reasons I wanted this
>> approach was beacuse i wanted to co-relate the guest and the host
>> times. (which is why I kept is synchronous). I lose out that
>> information with what you say. However I see your point about the
>> overhead. I will think about this a bit more.
>
> You might use kvmclock to get a zero-exit (but not zero-cost) time which can be correlated.
>
> Another option is to use xadd on a shared memory area to have a global counter incremented. ?However that can be slow on large machines, and is hard to do securely with multiple guests.

If the guest puts guest TSC into the buffer with each event, KVM can
convert guest->host time when it drains the buffers on the next
vmexit.? That's enough information to do an offline correlation of
guest and host events.

2011-05-17 15:50:39

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] kvm: log directly from the guest to the host kvm buffer

On Thu, 2011-05-12 at 17:39 +0200, Dhaval Giani wrote:

> >>
> >> +int kvm_pv_ftrace(struct kvm_vcpu *vcpu, unsigned long ip, gpa_t addr)
> >> +{
> >> + int ret;
> >> + char *fmt = (char *) kzalloc(PAGE_SIZE, GFP_KERNEL);
> >> +
> >> + ret = kvm_read_guest(vcpu->kvm, addr, fmt, PAGE_SIZE);
> >> +
> >> + trace_printk("KVM instance %p: VCPU %d, IP %lu: %s",
> >> + vcpu->kvm, vcpu->vcpu_id, ip, fmt);
> >> +
> >> + kfree(fmt);
> >> +
> >> + return 0;
> >> +}
> >
> > A kmalloc and printf seem expensive here. I'd prefer to log the arguments
> > and format descriptor instead. Similarly the guest should pass unformatted
> > parameters.+int kvm_ftrace_printk(unsigned long ip, const char *fmt, ...)
> >>
>
> the trace_printk is actually quite cheap (IIRC), but I guess Steve is
> the best person to let us know about that. We can avoid the kzalloc
> overhead though.

You could replace the kzalloc with a static page. Perhaps, one page per
CPU. Then you just need to disable interrupts (unless this is never
called by an interrupt) to get the information into the page, without
any allocation.

trace_printk() does basically the same thing, but uses a single page
with locks. I've thought about replacing that with per cpu pages too,
but haven't gotten around to it.

I also agree that you should just write to some buffer and when it
overflows, flush it.

-- Steve