Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756362AbZCFTLT (ORCPT ); Fri, 6 Mar 2009 14:11:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754532AbZCFTLA (ORCPT ); Fri, 6 Mar 2009 14:11:00 -0500 Received: from tomts10-srv.bellnexxia.net ([209.226.175.54]:53501 "EHLO tomts10-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754498AbZCFTK7 (ORCPT ); Fri, 6 Mar 2009 14:10:59 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ah0FAMoAsUlMQW1W/2dsb2JhbACBTtUOgVOCNQY Date: Fri, 6 Mar 2009 14:10:52 -0500 From: Mathieu Desnoyers To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Andrew Morton , Peter Zijlstra , Frederic Weisbecker , Theodore Tso , Arjan van de Ven , Pekka Paalanen , Arnaldo Carvalho de Melo , "H. Peter Anvin" , Martin Bligh , "Frank Ch. Eigler" , Tom Zanussi , Masami Hiramatsu , KOSAKI Motohiro , Jason Baron , Christoph Hellwig , Jiaying Zhang , Eduard - Gabriel Munteanu , mrubin@google.com, md@google.com Subject: Re: [PATCH 0/5] [RFC] binary reading of ftrace ring buffers Message-ID: <20090306191052.GB16067@Krystal> References: <20090304024921.153061228@goodmis.org> <20090304170015.GB1150@Krystal> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 14:05:16 up 6 days, 15:31, 2 users, load average: 1.16, 1.06, 0.93 User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6158 Lines: 147 * Steven Rostedt (rostedt@goodmis.org) wrote: > > On Wed, 4 Mar 2009, Mathieu Desnoyers wrote: > > > > > > From previous patches, we have in include/trace/sched_event_types.h: > > > > > > #undef TRACE_SYSTEM > > > #define TRACE_SYSTEM sched > > > > > > TRACE_EVENT_FORMAT(sched_switch, > > > TPPROTO(struct rq *rq, struct task_struct *prev, > > > struct task_struct *next), > > > TPARGS(rq, prev, next), > > > TPFMT("task %s:%d ==> %s:%d", > > > prev->comm, prev->pid, next->comm, next->pid), > > > TRACE_STRUCT( > > > TRACE_FIELD(pid_t, prev_pid, prev->pid) > > > TRACE_FIELD(int, prev_prio, prev->prio) > > > TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN], > > > next_comm, > > > TPCMD(memcpy(TRACE_ENTRY->next_comm, > > > next->comm, > > > TASK_COMM_LEN))) > > > TRACE_FIELD(pid_t, next_pid, next->pid) > > > TRACE_FIELD(int, next_prio, next->prio) > > > ), > > > TPRAWFMT("prev %d:%d ==> next %s:%d:%d") > > > ); > > > > > > > I fear that putting these user-visible format strings in tracepoint > > header files will create a big maintainability issue. > > > > I'll post the LTTng patchset in a jiffy, where the format string > > awareness is done in a tracer-specific module. I don't understand why > > Peter Z. is not yelling against your tracepoints modifications : they > > are actually presenting to userspace an interface that is meant to > > eventually change. > > > > I used a separate layer for format string presentation for this very > > purpose : I don't want to tie the kernel code instrumentation > > (tracepoints) to any kind of user-visible API. > > > > Hi Mathieu, > > Sorry for the late reply, I started to reply but was distracted, and sadly > forgot about it :-( (and I rebooted the box that had the reply window > open) > > > Just a couple of points to make. > > 1) The raw format string is a "hint" for userspace to read the buffers. > The offset and sizeof is the real data that userspace should use. The > print format can be just a way to help them out. But it is by no means > something that is expected to be constant. > > It is also used internal by ftrace because it needs a way to read the > "fast C style recording" records. It is better to print some text than to > have a bunch of hex print out in the screen. Say you have the C style > recording enabled and you take an oops. With ftrace_dumps_on_oops set, the > buffers will be printed to the console. This lets ftrace have a means to > print out the events. > > 2) The difference between this and markers is that the format string is > bound to the declaration in the trace header. Not in the code itself. The > maintainer should only see the trace point function call. It should only > look like a function call and not some funky printf string that the > maintainer might not care about. > > If the code changes and it breaks the format string, it is up to the > tracing maintainers to fix it. If the format string was embedded within > the code, then the maintainer would be burdoned with that duty. But here > the format string is in the include/trace directory and any breakage there > should definitely be fixed by a some tracing maintainer. Not the > maintainer of the subsystem. Unless of course that maintainer wants to fix > it ;-) > > > 3) TRACE_EVENT_FORMAT is an extension to TRACE_FORMAT. Developers may > choose to use the more simple, but a bit more expensive TRACE_FORMAT if > they choose to. Perhaps it may be more prudent to anyway. Recently > Peter Zijlstra was tracing locks and just wanted to trace the name. > There's really no difference in copying the name via a C style (strcpy) or > having a printf formatting do the work. Here is a case where just using > TRACE_FORMAT is a reasonable solution. > > 4) I actually avoided moving the format string into the tracing utility. > The tracing utility can ignore it if it wants. I orginially had the > headers included in the kernel/trace/events.c file and found that was too > cumbersome. I did not like the fact that I needed to go into some tracing > infrastructure to add an event. It was a burdon to me to add a new event > and I wrote the code! That's why I moved it all into the include/trace > directory. One stop shopping. > > A developer to add new events only needs to add the trace points in their > code and then update the files (perhaps add new ones) in include/trace. No > need to dig through the kernel source to find out where else they need to > go. > > > I see that you published your patch queue. I've looked at most of the > patches already. I'll make my comments shortly. > Hi Steve, I totally agree that we want something like a format string to identify the data we export. My disagreement is on the location where such identification belongs. I have the funny feeling that you are currently creating no less than full-blown callbacks expressed as a weird, custom, macros-based, header definition, and I think it's seriously wrong. To paraphrase Andrew Morton : we are writing in C. Let's keep it that way. This is why I strongly prefer to use the tracepoint callback -> marker format string approach rather that putting any sort of format string in the tracepoint header. If what you really worry about is to have the ability to "quickly" add debug-style instrumentation when you write code, then you do not want, nor should ever, use a tracepoint, because tracepoint location is not meant to be decided without careful consideration. Markers are there fore this : you can add a quick-and-dirty one-liner : trace_mark(group, event, "field_identifier1 %u f_ident2 %u", ...); In your source code and you are all set. I don't see the problem with such approach ? Mathieu > Thanks, > > -- Steve > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/