Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754492AbYLEIFK (ORCPT ); Fri, 5 Dec 2008 03:05:10 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750852AbYLEIE5 (ORCPT ); Fri, 5 Dec 2008 03:04:57 -0500 Received: from smtp-out.google.com ([216.239.45.13]:19287 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750839AbYLEIE5 (ORCPT ); Fri, 5 Dec 2008 03:04:57 -0500 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:in-reply-to:references:date:message-id:subject:from:to: cc:content-type:content-transfer-encoding; b=TmPkZOhuMTW04g2wY+1o+1Nhyj5DWb59WdAYz6Yy2pTls9otmaQOpP3cJiYZhh+xI DPvfuI7ooQs90gRyWimig== MIME-Version: 1.0 In-Reply-To: <5df78e1d0812022130g7c5a004bnd729d0a8454f24ed@mail.gmail.com> References: <5df78e1d0812021626y68c7241ahfdba4d37c7840e10@mail.gmail.com> <1228278908.4886.49.camel@localhost.localdomain> <5df78e1d0812022130g7c5a004bnd729d0a8454f24ed@mail.gmail.com> Date: Fri, 5 Dec 2008 00:04:54 -0800 Message-ID: <5df78e1d0812050004q505cba8dnee39997a84d67847@mail.gmail.com> Subject: Re: [RFC PATCH 3/3] kernel tracing prototype From: Jiaying Zhang To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Michael Rubin , Martin Bligh , Michael Davidson , Andrew Morton Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5738 Lines: 153 To better answer the question why we want to implement a new kernel tracing prototype, here are some performance results we collected before with the tbench benchmark. We were considering to use the existing marker code then and had built a simple prototype that allows us to measure the performance overhead from different layers. - vanilla 2.6.26 kernel, CONFIG_MARKERS=n Throughput 759.352 MB/sec 4 - markers compiled in, tracing disabled Throughput 754.18 MB/sec 4 - tracing syscall entry/exit, use markers, not logging data to ring_buffer Throughput 715.68 MB/sec 4 - tracing syscall entry/exit, use markers, logging data to ring_buffer Throughput 654.056 MB/sec 4 - tracing syscall entry/exit, bypass markers, not logging data to ring_buffer Throughput 735.575 MB/sec 4 - tracing syscall entry/exit, bypass markers, logging data to ring_buffer Throughput 667.68 MB/sec 4 We found that the marker layer alone adds about 3% overhead while tracing is enabled. This overhead is mostly from the extra code and function calls added by the markers code. In our use case, we don't need several features in the current marker code, like the support for multiple probe functions and the use of va_arg to allow arbitrary trace format, but we have very strict performance requirements. So we implemented our own trace instrumentation code that has many similarities as the markers but is simpler and thus we think is easier to be made more efficient. We also found that logging data to the unified trace buffer adds about 10% overhead. After taking a close look, we found that there are two extra overheads. One is the extra function calls introduced by the unified trace buffer. That is why we are considering to add the inline trace write functions in the unified trace buffer. The other is the overhead on obtaining the timestamps. I saw that the unified trace buffer code uses sched_clock to obtain timestamps. For some reason, this function alone adds about 2~3% overhead on my testing machine. The above numbers were measured on one multi-core system. I am going to run more performance tests on more systems and will keep you updated. Jiaying On Tue, Dec 2, 2008 at 9:30 PM, Jiaying Zhang wrote: > > On Tue, Dec 2, 2008 at 8:35 PM, Steven Rostedt wrote: > > > > On Tue, 2008-12-02 at 16:26 -0800, Jiaying Zhang wrote: > >> Refer to the previous email > >> "[RFC PATCH 0/3] A couple of feature requests to the unified trace > >> buffer". > >> > >> A kernel tracing prototype that uses the unified trace buffer to keep > >> the collected trace data. > >> > > > >> Index: linux-2.6.26.2/include/linux/ktrace.h > >> =================================================================== > >> --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > >> +++ linux-2.6.26.2/include/linux/ktrace.h 2008-11-25 > >> 11:32:37.000000000 -0800 > >> @@ -0,0 +1,73 @@ > >> +#ifndef _LINUX_KTRACE_H > >> +#define _LINUX_KTRACE_H > >> + > >> +#include > >> + > >> +struct kernel_trace; > >> + > >> +typedef void ktrace_probe_func(struct kernel_trace *, void *, > >> size_t); > >> + > >> +struct kernel_trace { > >> + const char *name; > >> + const char *format; /* format string describing variable > >> list */ > >> + u32 enabled:1, event_id:31; > >> + ktrace_probe_func *func; /* probe function */ > >> + struct list_head list; /* list head linked to the hash table > >> entry */ > >> +} __attribute__((aligned(8))); > >> + > >> +extern int ktrace_enabled; > >> + > >> +/* > >> + * Make sure the alignment of the structure in the __ktrace section > >> will > >> + * not add unwanted padding between the beginning of the section and > >> the > >> + * structure. Force alignment to the same alignment as the section > >> start. > >> + */ > >> + > >> +#define DEFINE_KTRACE_STRUCT(name) struct ktrace_struct_##name > >> + > >> +#ifdef CONFIG_KTRACE > >> +#define DO_TRACE(name, format, args...) > >> \ > >> + do { > >> \ > >> + static const char __kstrtab_##name[] > >> \ > >> + __attribute__((section("__ktrace_strings"))) > >> \ > >> + = #name "\0" format; > >> \ > >> + static struct kernel_trace __ktrace_##name > >> \ > >> + __attribute__((section("__ktrace"), aligned(8))) = > >> \ > >> + { __kstrtab_##name, &__kstrtab_##name[sizeof(#name)], > >> \ > >> + 0, 0, NULL, LIST_HEAD_INIT(__ktrace_##name.list) }; > >> \ > >> + __ktrace_check_format(format, ## args); > >> \ > >> + if (unlikely(ktrace_enabled) && > >> \ > >> + unlikely(__ktrace_##name.enabled)) > >> { \ > >> + struct ktrace_struct_##name karg = { args }; > >> \ > >> + (*__ktrace_##name.func) > >> \ > >> + (&__ktrace_##name, &karg, sizeof(karg)); > >> \ > >> + } > >> \ > > > > This looks like another form of markers/trace-points. Why not use them? > Yes. It borrows many ideas from the marker code. But we don't intend > to be as general as markers and lttng, which allows us to make many > simplifications and we think will give us performance benefits. > > Jiaying > > > >> + } while (0) > >> + > >> +#else /* !CONFIG_KTRACE */ > >> +#define DO_TRACE(name, format, args...) > >> +#endif > >> + > > > > -- Steve > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/