Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756755Ab3FGV4Q (ORCPT ); Fri, 7 Jun 2013 17:56:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21095 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752999Ab3FGV4P (ORCPT ); Fri, 7 Jun 2013 17:56:15 -0400 Date: Fri, 7 Jun 2013 18:55:42 -0300 From: Marcelo Tosatti To: Yoshihiro YUNOMAE Cc: Gleb Natapov , linux-kernel@vger.kernel.org, "H. Peter Anvin" , David Sharp , Steven Rostedt , Hidehiro Kawai , Ingo Molnar , yrl.pp-manager.tt@hitachi.com, Masami Hiramatsu , Thomas Gleixner Subject: Re: Re: [PATCH V2 1/1] kvm/vmx: Add a tracepoint write_tsc_offset Message-ID: <20130607215542.GA22131@amt.cnet> References: <20130604083616.22713.24922.stgit@yunodevel> <20130604083619.22713.25360.stgit@yunodevel> <20130606002322.GA24351@amt.cnet> <20130606113305.GB4725@redhat.com> <51B16E0E.5020208@hitachi.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51B16E0E.5020208@hitachi.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4908 Lines: 118 On Fri, Jun 07, 2013 at 02:22:22PM +0900, Yoshihiro YUNOMAE wrote: > (2013/06/06 20:33), Gleb Natapov wrote: > >On Wed, Jun 05, 2013 at 09:23:22PM -0300, Marcelo Tosatti wrote: > >>On Tue, Jun 04, 2013 at 05:36:19PM +0900, Yoshihiro YUNOMAE wrote: > >>>Add a tracepoint write_tsc_offset for tracing TSC offset change. > >>>We want to merge ftrace's trace data of guest OSs and the host OS using > >>>TSC for timestamp in chronological order. We need "TSC offset" values for > >>>each guest when merge those because the TSC value on a guest is always the > >>>host TSC plus guest's TSC offset. If we get the TSC offset values, we can > >>>calculate the host TSC value for each guest events from the TSC offset and > >>>the event TSC value. The host TSC values of the guest events are used when we > >>>want to merge trace data of guests and the host in chronological order. > >>>(Note: the trace_clock of both the host and the guest must be set x86-tsc in > >>>this case) > >>> > >>>TSC offset is stored in the VMCS by vmx_write_tsc_offset() or > >>>vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots. > >>>The latter function is executed when kvm clock is updated. Only host can read > >>>TSC offset value from VMCS, so a host needs to output TSC offset value > >>>when TSC offset is changed. > >>> > >>>Since the TSC offset is not often changed, it could be overwritten by other > >>>frequent events while tracing. To avoid that, I recommend to use a special > >>>instance for getting this event: > >>> > >>>1. set a instance before booting a guest > >>> # cd /sys/kernel/debug/tracing/instances > >>> # mkdir tsc_offset > >>> # cd tsc_offset > >>> # echo x86-tsc > trace_clock > >>> # echo 1 > events/kvm/kvm_write_tsc_offset/enable > >>> > >>>2. boot a guest > >>> > >>>Signed-off-by: Yoshihiro YUNOMAE > >>>Cc: Marcelo Tosatti > >>>Cc: Gleb Natapov > >>>Cc: Thomas Gleixner > >>>Cc: Ingo Molnar > >>>Cc: "H. Peter Anvin" > >>>--- > >>> arch/x86/kvm/trace.h | 18 ++++++++++++++++++ > >>> arch/x86/kvm/vmx.c | 3 +++ > >>> arch/x86/kvm/x86.c | 1 + > >>> 3 files changed, 22 insertions(+) > >>> > >>>diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h > >>>index fe5e00e..9c22e39 100644 > >>>--- a/arch/x86/kvm/trace.h > >>>+++ b/arch/x86/kvm/trace.h > >>>@@ -815,6 +815,24 @@ TRACE_EVENT(kvm_track_tsc, > >>> __print_symbolic(__entry->host_clock, host_clocks)) > >>> ); > >>> > >>>+TRACE_EVENT(kvm_write_tsc_offset, > >>>+ TP_PROTO(__u64 previous_tsc_offset, __u64 next_tsc_offset), > >>>+ TP_ARGS(previous_tsc_offset, next_tsc_offset), > >>>+ > >>>+ TP_STRUCT__entry( > >>>+ __field( __u64, previous_tsc_offset ) > >>>+ __field( __u64, next_tsc_offset ) > >>>+ ), > >>>+ > >>>+ TP_fast_assign( > >>>+ __entry->previous_tsc_offset = previous_tsc_offset; > >>>+ __entry->next_tsc_offset = next_tsc_offset; > >>>+ ), > >>>+ > >>>+ TP_printk("previous=%llu next=%llu", > >>>+ __entry->previous_tsc_offset, __entry->next_tsc_offset) > >>>+); > >>>+ > >> > >>Yoshihiro YUNOMAE, > >> > >>1) Why is previous_tsc_offset necessary? > > I was considering the situations where we did not enable > kvm_write_tsc_offset event before booting a guest or where we did not > use multiple buffers. Here, we will need another new I/F to get current > TSC offset of a given VCPU. For example, if kvm_write_tsc_offset is not > included in the host's trace data, we get the current TSC offset from > the new I/F and apply it to all guest events. On the other hand, if > kvm_write_tsc_offset event appears more than once, we apply the > previous offset to guest events before the first TSC offset change. OK. > Since we support only for using multiple buffers now, we don't need to > record previous TSC offset at this time. But I'm conscious that we have > to change the format of kvm_write_tsc_offset event when we support > those situations. OK, feel free to keep prev_tsc_offset. > >>2) The TSC offset traces should include vcpu number, so that its > >>possible to correlate traces of SMP guests (the tool should use > >>the individual vcpu tsc offsets when converting guests trace). > >> > >Why PID is not enough? No other trace, except kvm_entry, outputs vcpu id. > > As Gleb mentioned, a tool can understand TSC offset for each vcpu from > PID and vcpu number of kvm_entry. IMO, that is indirect way, so I would > be better off including vcpu number. Yes. > >>3) Please add traces for svm.c. > > Sure, I'll add the tracepoint for SVM. OK, no further comments for now. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/