Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752275Ab2KTKgi (ORCPT ); Tue, 20 Nov 2012 05:36:38 -0500 Received: from mail4.hitachi.co.jp ([133.145.228.5]:45768 "EHLO mail4.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953Ab2KTKgg (ORCPT ); Tue, 20 Nov 2012 05:36:36 -0500 X-AuditID: b753bd60-98653ba000002f78-88-50ab5d31345b X-AuditID: b753bd60-98653ba000002f78-88-50ab5d31345b Message-ID: <50AB5D31.7060305@hitachi.com> Date: Tue, 20 Nov 2012 19:36:33 +0900 From: Yoshihiro YUNOMAE User-Agent: Mozilla/5.0 (Windows NT 5.2; rv:13.0) Gecko/20120604 Thunderbird/13.0 MIME-Version: 1.0 To: Marcelo Tosatti Cc: Steven Rostedt , David Sharp , "H. Peter Anvin" , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Joerg Roedel , Hidehiro Kawai , Ingo Molnar , Avi Kivity , yrl.pp-manager.tt@hitachi.com, Masami Hiramatsu , Thomas Gleixner Subject: Re: Re: Re: [RFC PATCH 0/2] kvm/vmx: Output TSC offset References: <20121114013611.5338.15086.stgit@yunodevel> <1352858437.18025.47.camel@gandalf.local.home> <1352860305.18025.48.camel@gandalf.local.home> <50A355A2.5040101@hitachi.com> <20121116191537.GB28622@amt.cnet> In-Reply-To: <20121116191537.GB28622@amt.cnet> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5499 Lines: 116 Hi Marcelo, Sorry for the late reply. (2012/11/17 4:15), Marcelo Tosatti wrote: > On Wed, Nov 14, 2012 at 05:26:10PM +0900, Yoshihiro YUNOMAE wrote: >> Thank you for commenting on my patch set. >> >> (2012/11/14 11:31), Steven Rostedt wrote: >>> On Tue, 2012-11-13 at 18:03 -0800, David Sharp wrote: >>>> On Tue, Nov 13, 2012 at 6:00 PM, Steven Rostedt wrote: >>>>> On Wed, 2012-11-14 at 10:36 +0900, Yoshihiro YUNOMAE wrote: >>>>> >>>>>> To merge the data like previous pattern, we apply this patch set. Then, we can >>>>>> get TSC offset of the guest as follows: >>>>>> >>>>>> $ dmesg | grep kvm >>>>>> [ 57.717180] kvm: (2687) write TSC offset 18446743360465545001, now clock ## >>>>>> ^^^^ ^^^^^^^^^^^^^^^^^^^^ | >>>>>> PID TSC offset | >>>>>> HOST TSC value --+ >>>>>> >>>>> >>>>> Using printk to export something like this is IMO a nasty hack. >>>>> >>>>> Can't we create a /sys or /proc file to export the same thing? >>>> >>>> Since the value changes over the course of the trace, and seems to be >>>> part of the context of the trace, I think I'd include it as a >>>> tracepoint. >>>> >>> >>> I'm fine with that too. >> >> Using some tracepoint is a nice idea, but there is one problem. Here, >> our discussion point is "the event which TSC offset is changed does not >> frequently occur, but the buffer must keep the event data." >> >> There are two ideas for using tracepoint. First, we define new >> tracepoint for changed TSC offset. This is simple and the overhead will >> be low. However, this trace event stored in the buffer will be >> overwritten by other trace events because this TSC offset event does >> not frequently occur. Second, we add TSC offset information to the >> tracepoint frequently occured. For example, we assume that TSC offset >> information is added to arguments of trace_kvm_exit(). > > The TSC offset is in the host trace. So given a host trace with two TSC > offset updates, how do you know which events in the guest trace > (containing a number of events) refer to which tsc offset update? > > Unless i am missing something, you can't solve this easily (well, except > exporting information to the guest that allows it to transform RDTSC -> > host TSC value, which can be done via pvclock). As you say, TSC offset events are in the host trace, but we don't need to notify guests of updating TSC offset. The offset event will output the next TSC offset value and the current TSC value, so we can calculate the guest TSC (T1) for the event. Guest TSCs since T1 can be converted to host TSC using the TSC offset, so we can integrate those trace data. > Another issue as mentioned is lack of TSC synchronization in the host. > Should you provide such a feature without the possibility of proper > chronological order on systems with unsynchronized TSC? I think, we cannot support this sorting feature using TSC on systems with unsynchronized TSC. On systems with unsynchronized TSC, it is difficult to sort not only trace data of guests and the host but trace data of a guest or a host using TSC in chronological order. Actually, if we want to output tracing data of ftrace in chronological order with unsynchronized TSC, we will use the "global" mode as the timestamp. The global mode uses wallclock added TSC correction, so the mode guarantees to sort in chronological order for trace data of the guest or of the host. If we use this mode to sort the trace data of guests and the host in chronological order, we need to consider about the difference between the guest and the host and timekeeping of guests and the host, so it is difficult to solve these issues. At least, I haven't came up with the good solution. We cannot sort the trace data of guests and the host in chronological order with unsynchronized TSC, but if we can set following synchronization events for both guests and the host, we will know where we should sort. First, a guest and the host uses the global mode as the timestamp of ftrace. Next, a user on the guest writes "1" to the synchronization I/F as the ID, then the synchronization event "1" is recorded in a ring-buffer of the guest. The synchronization operation induces hypercall, so the host can handle the event. After the operation moves to the host, the host records the event "1" in a ring-buffer of the host. In the end, the operation returns to the host, and the synchronization is finished. When we integrate tracing data of the guest and the host, we calculate difference of the timestamp between the synchronizing events with the same ID. This value is a temporary "offset". We will convert the timestamp of the guests to the timestamp of the host before the next synchronizing event. If the synchronizing event cycle is very short, we will not need to consider the timekeeping. Then, we can sort the trace data in chronological order. Would you comment for this or do you have another idea? Thanks, -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/