Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753443Ab3FJOFB (ORCPT ); Mon, 10 Jun 2013 10:05:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29285 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752303Ab3FJOFA (ORCPT ); Mon, 10 Jun 2013 10:05:00 -0400 Date: Mon, 10 Jun 2013 11:04:24 -0300 From: Marcelo Tosatti To: Gleb Natapov Cc: Yoshihiro YUNOMAE , linux-kernel@vger.kernel.org, "H. Peter Anvin" , David Sharp , Steven Rostedt , Hidehiro Kawai , Ingo Molnar , yrl.pp-manager.tt@hitachi.com, Masami Hiramatsu , Thomas Gleixner Subject: Re: Re: Re: [PATCH V2 1/1] kvm/vmx: Add a tracepoint write_tsc_offset Message-ID: <20130610140424.GA25632@amt.cnet> References: <20130604083616.22713.24922.stgit@yunodevel> <20130604083619.22713.25360.stgit@yunodevel> <20130606002322.GA24351@amt.cnet> <20130606113305.GB4725@redhat.com> <51B16E0E.5020208@hitachi.com> <20130609111442.GP4725@redhat.com> <51B59CC2.3060707@hitachi.com> <20130610100505.GB4725@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130610100505.GB4725@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7055 Lines: 147 On Mon, Jun 10, 2013 at 01:05:05PM +0300, Gleb Natapov wrote: > On Mon, Jun 10, 2013 at 06:30:42PM +0900, Yoshihiro YUNOMAE wrote: > > Hi Gleb, > > > > (2013/06/09 20:14), Gleb Natapov wrote: > > >On Fri, Jun 07, 2013 at 02:22:22PM +0900, Yoshihiro YUNOMAE wrote: > > >>(2013/06/06 20:33), Gleb Natapov wrote: > > >>>On Wed, Jun 05, 2013 at 09:23:22PM -0300, Marcelo Tosatti wrote: > > >>>>On Tue, Jun 04, 2013 at 05:36:19PM +0900, Yoshihiro YUNOMAE wrote: > > >>>>>Add a tracepoint write_tsc_offset for tracing TSC offset change. > > >>>>>We want to merge ftrace's trace data of guest OSs and the host OS using > > >>>>>TSC for timestamp in chronological order. We need "TSC offset" values for > > >>>>>each guest when merge those because the TSC value on a guest is always the > > >>>>>host TSC plus guest's TSC offset. If we get the TSC offset values, we can > > >>>>>calculate the host TSC value for each guest events from the TSC offset and > > >>>>>the event TSC value. The host TSC values of the guest events are used when we > > >>>>>want to merge trace data of guests and the host in chronological order. > > >>>>>(Note: the trace_clock of both the host and the guest must be set x86-tsc in > > >>>>>this case) > > >>>>> > > >>>>>TSC offset is stored in the VMCS by vmx_write_tsc_offset() or > > >>>>>vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots. > > >>>>>The latter function is executed when kvm clock is updated. Only host can read > > >>>>>TSC offset value from VMCS, so a host needs to output TSC offset value > > >>>>>when TSC offset is changed. > > >>>>> > > >>>>>Since the TSC offset is not often changed, it could be overwritten by other > > >>>>>frequent events while tracing. To avoid that, I recommend to use a special > > >>>>>instance for getting this event: > > >>>>> > > >>>>>1. set a instance before booting a guest > > >>>>> # cd /sys/kernel/debug/tracing/instances > > >>>>> # mkdir tsc_offset > > >>>>> # cd tsc_offset > > >>>>> # echo x86-tsc > trace_clock > > >>>>> # echo 1 > events/kvm/kvm_write_tsc_offset/enable > > >>>>> > > >>>>>2. boot a guest > > >>>>> > > >>>>>Signed-off-by: Yoshihiro YUNOMAE > > >>>>>Cc: Marcelo Tosatti > > >>>>>Cc: Gleb Natapov > > >>>>>Cc: Thomas Gleixner > > >>>>>Cc: Ingo Molnar > > >>>>>Cc: "H. Peter Anvin" > > >>>>>--- > > >>>>> arch/x86/kvm/trace.h | 18 ++++++++++++++++++ > > >>>>> arch/x86/kvm/vmx.c | 3 +++ > > >>>>> arch/x86/kvm/x86.c | 1 + > > >>>>> 3 files changed, 22 insertions(+) > > >>>>> > > >>>>>diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h > > >>>>>index fe5e00e..9c22e39 100644 > > >>>>>--- a/arch/x86/kvm/trace.h > > >>>>>+++ b/arch/x86/kvm/trace.h > > >>>>>@@ -815,6 +815,24 @@ TRACE_EVENT(kvm_track_tsc, > > >>>>> __print_symbolic(__entry->host_clock, host_clocks)) > > >>>>> ); > > >>>>> > > >>>>>+TRACE_EVENT(kvm_write_tsc_offset, > > >>>>>+ TP_PROTO(__u64 previous_tsc_offset, __u64 next_tsc_offset), > > >>>>>+ TP_ARGS(previous_tsc_offset, next_tsc_offset), > > >>>>>+ > > >>>>>+ TP_STRUCT__entry( > > >>>>>+ __field( __u64, previous_tsc_offset ) > > >>>>>+ __field( __u64, next_tsc_offset ) > > >>>>>+ ), > > >>>>>+ > > >>>>>+ TP_fast_assign( > > >>>>>+ __entry->previous_tsc_offset = previous_tsc_offset; > > >>>>>+ __entry->next_tsc_offset = next_tsc_offset; > > >>>>>+ ), > > >>>>>+ > > >>>>>+ TP_printk("previous=%llu next=%llu", > > >>>>>+ __entry->previous_tsc_offset, __entry->next_tsc_offset) > > >>>>>+); > > >>>>>+ > > >>>> > > >>>>Yoshihiro YUNOMAE, > > >>>> > > >>>>1) Why is previous_tsc_offset necessary? > > >> > > >>I was considering the situations where we did not enable > > >>kvm_write_tsc_offset event before booting a guest or where we did not > > >>use multiple buffers. Here, we will need another new I/F to get current > > >>TSC offset of a given VCPU. For example, if kvm_write_tsc_offset is not > > >>included in the host's trace data, we get the current TSC offset from > > >>the new I/F and apply it to all guest events. On the other hand, if > > >>kvm_write_tsc_offset event appears more than once, we apply the > > >>previous offset to guest events before the first TSC offset change. > > >> > > >>Since we support only for using multiple buffers now, we don't need to > > >>record previous TSC offset at this time. But I'm conscious that we have > > >>to change the format of kvm_write_tsc_offset event when we support > > >>those situations. > > >> > > >>>>2) The TSC offset traces should include vcpu number, so that its > > >>>>possible to correlate traces of SMP guests (the tool should use > > >>>>the individual vcpu tsc offsets when converting guests trace). > > >>>> > > >>>Why PID is not enough? No other trace, except kvm_entry, outputs vcpu id. > > >> > > >>As Gleb mentioned, a tool can understand TSC offset for each vcpu from > > >>PID and vcpu number of kvm_entry. IMO, that is indirect way, so I would > > >>be better off including vcpu number. > > >> > > >But doesn't the tool operates on vcpu's PID for all other events. I mean to > > >figure out what vcpu an event belongs too during merge. Why tsc offset > > >event is different? > > > > In vcpu_load()@virt/kvm/kvm_main.c, it seems that PID of the vcpu thread > > can be changed. Are you familiar with this situation? > Recommended way of using KVM API is to have dedicated thread per vcpu > and this is how all known userspace implementations use it, but having > one thread drive several vcpus (not simultaneously obviously) also > works, but not recommended. > > > If the situation can be occurred, outputting vcpu number is better, I > > think. If not occurred, as you say, we will be able to merge those data > > without vcpu number in write_tsc_offset event. > The thing is that all other traces that you want to merge do not contain > vcpu number, only pid, so if the situation occurs how do you merge the > data? Guest traces contain vcpu number and not pid (because guest is unaware of host PID). > > However, when we > > focus on output data of the write_tsc_offset event, it is difficult to > > directly understand contents of the data if vcpu number information is > > not included. So, including the information is useful, I think. > > > How your tool does it now? It merges guest trace with host trace (by converting the TSC timestamp in the guest trace to host TSC using tsc_offset information). By not recording vcpu ID in the tsc_offset trace, it is necessary to supply the tool with PID<->VCPU_id tuples for translation (so its an additional step required, and it makes trace merge impossible if the information is not available). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/