LinuxLists.cc - The emperor is naked: why *comprehensive* static markup belongs in mainline

2006-09-17 09:20:08

by Karim Yaghmour

[permalink] [raw]

Subject: The emperor is naked: why comprehensive static markup belongs in mainline

Time and again we've had this debate. In the past many claimed,
and many continue to claim, that the mainlining of static markup
of key kernel events (i.e. otherwise designated as static
instrumentation or static tracing) is heresy. The following is
meant as a case-in-point rebuttal.

First, some historical context:
-------------------------------

I personally introduced the Linux Trace Toolkit in July 1999.
Subsequently, I initiated discussions with the IBM DProbes
team back in 2000 and thereafter implemented facilities for
enabling dynamically-inserted probes to route their events
through ltt -- all of which was functional as of November
2000. Further down the road, many efforts were made for mainlining
some of ltt's functionality, with little success. Fast forward
a few years, maintenance of the project has been passed to
Mathieu Desnoyers as of November of 2005. Mathieu inherited
from the project but the name, his is an entire rewrite of
everything I had done.

[ Disclaimer: The following is *not* an attempt to push ltt
specifically. Rather, it is an argument for the inclusion of
*comprehensive* static markup, regardless of the underlying
tool. Whether the reader cares to take my word on this or not
isn't within my ability to influence as I write this. Hopefully
those who choose to continue reading will confirm my stated
goal. ]

Parallel to that, for various reasons which have been
documented elsewhere, a variety of different projects were
initiated in and around the similar scope or nature or as
an outgrowth of existing relevant components. Here are but a
few in no particular order: LKST, syscalltrack, SystemTap, LKET,
GKHI, evlog, kernel hooks, kprobes, relayfs, etc. LTT having
been the first to attempt mainlining, and miserably fail at it,
many of those involved in those other projects paid special
attention to LTT's fate on lkml -- and they were wise to do so.
Some of the criticism against LTT was entirely warranted: it
had many technical flaws -- simply because I was learning the
ropes of kernel development. But while technical flaws could
have been overcome with appropriate guidance, systematic
resistance to mainline static instrumentation could not.

There was/is also a slew of heavily-tailored subsystem-specific
and kernel-debugging/specialized tracing mechanisms that
flourished, died or, surprisingly, got mainlined: iotrace,
latency-trace, blktrace, ktrace, kft, and many others. Usually
some source greping yields to interesting discoveries in
mainline. The history of these has been entirely independent
from that of those other efforts mentioned above mainly in
that they were mostly developed by/for kernel developers.

The commonly held wisdom:
-------------------------

Now, orthodox Linux kernel development philosophy, in as far
I've experienced it online and face-to-face with various
developers, has been that *any* form of static instrumentation
is to be avoided. And the single argument that has constantly
come back has always been that such instrumentation creates
unmaintainable bloat. Factoring in that most developers, at
least the ones I spoke to while being a maintainer, could
only conceive of kernel tracing as they themselves had used
it (i.e. for kernel debugging) and you get an unsurmountable
obstacle for anyone pushing for inclusion of such functionality.

[ This misconception was so profound that many initially labeled
ltt as a kernel debugging tool. Even educated observers from
reputable Linux news sources repeatedly mislabeled ltt. The
misconception went so far that prominent kernel developers
tried to use ltt or attempted helping others use ltt for kernel
debugging purposes, which it obviously wasn't much good at. ]

So what was the solution I asked? And the answer was: none. I
was told I would likely have to maintain ltt out of tree
forever. But I don't give up easily and I figured time would
show purpose, namely that ordinary sysadmins and developers
actually need to understand the dynamic behavior of the
kernel they're using.

The "perfect" solution:
-----------------------

And sure enough, eventually, truth came knocking. And truth
had a name. It was called dtrace. All of a sudden, everybody
and his little sister insisted Linux should have an equivalent.
I'll spare the reader all the political stuff in between, but
I'll readily admit to this: ltt wasn't a dtrace substitute.
While it did target the right audience, it lacked the ability
to allow the user to arbitrarily control instrumentation at
runtime.

[ I've claimed in the past, not without some bitterness I
confess, that history might have been different had ltt been
given a chance to mainline earlier, thereby freeing time from
chasing kernel versions and onto more interesting endeavors,
but alternative historical possibilities aren't the topic of
this post. ]

Leading up to that, of course, the submitting of ltt patches
continued. And, of course, suggestions had already been made
to the effect that kprobes was the way to go instead of
static inlined calls. And my objections were the same then
as they are today: a) taking an int3 at every event is not
my idea of performance b) I'd still have to keep chasing
kernels to make sure those events needed by ltt still work.
If I was to chase kernels, it might as well be in source.

But, regardless, the snapshot in time for anyone tasked with
coming up with a dtrace-equivalent for Linux was the
following: a) passed attempts to mainline tracing have been
countered with remarkable ferocity, b) the most prominent
tracing project out there, ltt, seems to have an especially
bad reputation with kernel developers. So any sane being
concludes the following: a) we should start from a clean
slate and adopt the path of least resistance (i.e. the
bloody thing better not depend on anything static), b)
anybody blacklisted by kernel developers for attempting to
mainline tracing is to be avoided -- especially that Karim
guy, he doesn't, shall we say, seem to be too preoccupied
with offending prominent developers; we're going to spend
good money on this, and things better go smoothly from
here on.

[ Of course the above is my interpretation of things. I
could just be off my a mile or a thousand. Though ... ]

So off they went.

I know what I did last summer:
------------------------------

Frustrating as it was, I remained convinced that no matter
how much they try, they'll eventually come back to the
same point I was making: maintaining instrumentation outside
the kernel is a bitch.

And sure enough, once more, truth came knocking. After being
heckled at a BoF at OLS2005 for having suggested the
introduction of a markers infrastructure allowing developers
to identify important events, what do we have in OLS2006?

Well, we have one paper from a SystemTap developer discussing
that specific topic:
http://www.linuxsymposium.org/2006/view_abstract.php?content_key=17
And a BoF on none other than ... wait for it ... drumroll ...
"Divorcing Linux kernel analysis tools from kernel version":
http://www.linuxsymposium.org/2006/view_abstract.php?content_key=196

Obviously I attended both. Frank's presentation was not only
excellent, but the room it was given in was packed. And
most everybody in there seemed to agree: we need this marker
stuff. Good, I thought, that's progress in the right direction.

But the divorce bof the previous evening was priceless. Here
we have everybody that's been involved in some form of tracing
in the kernel over the passed 5 years, and the whole atmosphere
is just surreal. The chair introduces the topic, and then, you'll
have to use your imagination a little to picture this, you've
got these puzzled looks on people's faces as they discuss
back and forth very seriously how they should solve these
maintenance issues they're encountering ... stuff like:
"well, yes, we've had this case when variable X changed,
and then our stuff didn't work no more" ... "yeah, plugged
this here, and that there" ... etc.

And I was sitting there mesmerized by the exchange between
these participants going back and forth having this
discussion whom simply couldn't state the obvious. Of course,
I'm not usually shy to state my opinion and I called
bullshit by its name. Needless to say things went downhill
from there. This was like a scene from Harry Potter: the
one who's name you shall not pronounce. I mean, one would
have believed I was to shut up lest the dead rise from their
grave.

So that was last summer.

The *real* picture emerges:
---------------------------

And now, this week, we have this huge thread sparked by
... you guessed it ... the posting of an ltt patch to the
lkml. And again, the same arguments are put forth, the same
type of personal attacks are made, etc. But this time it's
different. It's different because those that did travel the
road kernel developers had requested be taken -- that of
exclusive reliance on dynamic instrumentation -- have
actually done enough of it that they know exactly the cost
of having to maintain dynamic instrumentation out of the
kernel. While I personally predicted this diagnostic 2 or 3
years ago, they've actually had to do the stuff.

And you can still feel the weight of Linux's twisted tracing
history on those of the dynamic instrumentation camp as they
post their comments. I mean, for me, this comment by Frank
speaks volumes on the fear instilled by passed flamewars
on lkml about static instrumentation:

> This is the reason why I'm in favour of some lightweight event-marking
> facility: a way of catching those points where dynamic probing is not
> sufficiently fast or dependable.

[ The following is an arbitrary interpretation of Frank's
writing and I hope Frank won't be upset with my liberal
interpretation of his writing. For the record, I think
Frank is a great guy and while I've disagreed with him
in the past, I highly respect his technical abilities. ]

Now, you can imagine Frank writing this piece ... "must not
sound too uncompromising" ... "must insist on what kernel
developers like to see" ... "mention dynamic tracing" ...
I mean, look at the choice of words: "I'm in favour of
*some* *lightweigth* event-marking facility", "... where
*dynamic probing* is not ..." Smart. Keep to accepted
orthodox principles, don't upset the natives.

Well, clearly, I for one have no fear of upsetting the
natives. What Frank is telling us here is that
maintaining "some" -- let me call it like that for now --
of his instrumentation out of tree is a bitch. But if
you really looked at it honestly, you would see that
mainlining of most of SystemTap's scripts would actually
result in SystemTap being a much more universally usable
tool -- i.e. no need to make sure your scripts work for
the kernel you're running on.

Why, in fact, that's exactly Jose's point of view. Who's
Jose? Well, just in case you weren't aware of his work,
Jose maintains LKET. What's LKET? An ltt-equivalent
that uses SystemTap to get its events. And what does
Jose say? Well I couldn't say it better than him:

> I agree with you here, I think is silly to claim dynamic instrumentation
> as a fix for the "constant maintainace overhead" of static trace point.
> Working on LKET, one of the biggest burdens that we've had is mantainig
> the probe points when something in the kernel changes enough to cause a
> breakage of the dynamic instrumentation. The solution to this is having
> the SystemTap tapsets maintained by the subsystems maintainers so that
> changes in the code can be applied to the dynamic instrumentation as
> well. This of course means that the subsystem maintainer would need to
> maintain two pieces of code instead of one. There are a lot of
> advantages to dynamic vs static instrumentation, but I don't think
> maintainace overhead is one of them.

Well, well, well. Here's a guy doing *exactly* what I was
asked to do a couple of years back. And what does he say?
"I think is silly to claim dynamic instrumentation as a
fix for the "constant maintainace overhead" of static trace
point."

And just in case you missed it the first time in his
paragraph, he repeats it *again* at the end:
" There are a lot of advantages to dynamic vs static
instrumentation, but I don't think maintainace overhead is
one of them."

But not content with Jose and Frank's first-hand experience
and testimonials about the cost of outside maintenance of
dynamically-inserted tracepoint, and obviously outright
dismissing the feedback from such heretics as Roman, Martin,
Mathieu, Tim, Karim and others, we have a continued barrage of
criticism from, shall we say, very orthodox kernel developers
who insist that the collective experience of the previously
mentioned people is simply misguided and that, as experienced
kernel developers, *they* know better.

Of course, I'm simplifying things a little. And in all
fairness there has been some conceding on the part of very
orthodox kernel developers that there may be in **very**
*special* cases the need for static instrumentation. Oh
boy, one almost reads those posts in glee -- imagine me
rubbing my hands -- thinking about the fate awaiting the
poor bastard that submits this first *special* case.
Boy is he going to have to prove how *special* that trace
point is.

That concession, however, still doesn't stop those very
same orthodox developers continuing to insist that
somehow "dynamic tracing" is superior to "static tracing",
even though they have actually never had to maintain an
infrastructure based on either for the purpose of allowing
mainstream users to trace their kernels for *user* purposes.
And in all fairness some are pretty open about it.

So be it. I, for one, have no fear of calling things by
their name.

Why the emperor is naked:
-------------------------

Truth be told:

There is no justification why Mathieu should continue
chasing kernels to allow his users utilize ltt on as
many kernel versions as possible.

There is no justification why the SystemTap team should
continue chasing kernels to make sure users can use
SystemTap on as many kernel versions as possible.

There is no justification why Jose should continue
chasing kernels to allow his users to use LKET on as
many kernel versions as possible.

There is, in fact, no justification why Jose, Frank,
and Mathieu aren't working on the same project.

There is no justification to any of this but the continued
*FEAR* by kernel developers that somehow their maintenance
workload is going to become unmanageable should anybody
get his way of adding static instrumentation into the
kernel. And no matter what personal *and* financial cost
this fear has had on various development teams, actual
*experience* from even those who have applied the most
outrageous of kernel developers requirements is but
grudgingly and conditionally recognized. No value, of
course, being placed on the experience of those that
*didn't* follow the orthodox diktat -- say by pointing
out that ltt tracepoints did not vary on a 5 year timespan.

For the argument, as it is at this stage of the long
intertwined thread of this week, is that "dynamic tracing"
is superior to "static tracing" because, amongst other
things, "static tracing" requires more instrumentation
than "dynamic tracing". But that, as I said within said
thread, is a fallacy. The statement that "static tracing"
requires more instrumentation than "dynamic tracing" is
only true in as far as you ignore that there is a cost
for out-of-tree maintenance of scripts for use by probe
mechanisms. And as you've read earlier, those doing this
stuff tell us there *is* cost to this. Not only do they
say that, but they go as far as telling us that this
cost is *no different* than that involved in maintaining
static trace points. That, in itself, flies in the face
of all accepted orthodox principles on the topic of
mainlined static tracing.

And that is but the maintenance aspect, I won't even
start on the performance issue. Because the current party
line is that while the kprobes mechanism is slow: a) it's
fast enough for all applicable uses, b) there's this
great new mechanism we're working on called djprobes which
eliminates all of kprobes' performance limitations. Of
course you are asked to pay no attention to the man behind
the curtain: a) if there is justification to work on
djprobes, it's because kprobes is dog-slow, which even
those using it for systemtap readily acknowledge, b)
djprobes has been more or less "on its way" for a year or
two now, and that's for one single architecture.

Meanwhile, if any of those screaming at me ever bothered
listening, my claim has been rather simple (as taken from
an earlier email):

What is sufficient for tracing a given set of events by means
of binary editing *that-does-not-require-out-of-tree-maintenance*
can be made to be sufficient for the tracing of events using
direct inlined static calls. The *only* difference being that
binary editing allows further extension of the pool of events
of interest by means of outside specification of additional
interest points.

And that, therefore, if we accept the idea that static
markup is necessary, then what hides behind the marked up
code becomes utterly *irrelevant*.

A proposal catering for orthodox fears:
---------------------------------------

Now here I am, 7 years after starting ltt, with all the stories
above, having passed on maintainership to someone else close
to a year ago, yet somehow I'm still around to ruin the party
for the naysayers and spend 4 days full-time addressing all
the misguided cruft I've encountered through the years in the
hope that someone somewhere will see the light and a unified
approach will emerge. For make no mistake, none of my
interventions were for profit or for ego -- both have long
been lost in the topic of ltt. This was on principle. If I
see BS I say BS, and this schizophrenic fear of static
instrumentation to which I've been a witness for the passed
7 years is but a classic example of unjustified fears getting
out of hand.

Nevertheless, I persist and submit a proposal which I feel
addresses many, if not all, of the previous fears I've heard
voiced over the years. Yet, while ample opportunity was
given and repeated requests, hardliners and observers alike
refuse to even comment on what I propose -- what's changed.
So, here again, yet another time, a proposal for a static
markup system:

> The plain function:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
>
> ... [lots of code] ...
> }
>
> The function with static markup:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2(); /*T* @here:arg1,arg2,arg3 */
>
> ... [lots of code] ...
> }
>
> The semantics are primitive at this stage, and they could definitely
> benefit from lkml input, but essentially we have a build-time parser
> that goes around the code and automagically does one of two things:
> a) create information for binary editors to use
> b) generate an alternative C file (foo-trace.c) with inlined static
> function calls.
>
> And there might be other possibilities I haven't thought of.
>
> This beats every argument I've seen to date on static instrumentation.
> Namely:
> - It isn't visually offensive: it's a comment.
> - It's not a maintenance drag: outdated comments are not alien.
> - It doesn't use weird function names or caps: it's a comment.
> - There is precedent: kerneldoc.
> And it does preserve most of the key things those who've asked for
> static markup are looking for. Namely:
> - Static instrumentation
> - Mainline maintainability
> - Contextualized variables

To date, only one comment came in on this. And, amazingly, it
confirms everything I say above:
> This makes sense to me, when combined with kprobes.

Again, the misconception is so entrenched that, while being
positive, the feedback entirely misses the point that once
you agree on markup, the underlying mechanism is entirely
*irrelevant*.

N'ough said:
------------

Now, I really have to ask: How much time do we have to
continue wasting? If collective feedback from those who's
combined considerable work dictates a course of action --
while still this course of action is begrudgingly accepted --
explanations are given why existing processes allow for
vetting of unnecessary markup and proposals are made to
alleviate much of the entrenched fears, what more level of
proof will be sufficient to come to terms with the obvious?

Namely that *comprehensive* static markup belongs in
mainline and *nowhere* else.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-17 11:21:49

[permalink] [raw]

Subject: Re: The emperor is naked: why comprehensive static markup belongs in mainline

On Sun, Sep 17, 2006 at 05:40:59AM -0400, Karim Yaghmour wrote:
> Now, orthodox Linux kernel development philosophy, in as far
> I've experienced it online and face-to-face with various
> developers, has been that *any* form of static instrumentation
> is to be avoided. And the single argument that has constantly
> come back has always been that such instrumentation creates
> unmaintainable bloat.

There are more arguments than this, but for some reason you choose to
ignore them and selectively interpret the maintenance one. The
maintenance thing is one part of it, some of the other issues are:

- Placing trace points where they will have an impact on
performance.

- You have a select user base that will tolerate rebuilding
their kernel and maintaining separate debug kernels to boot
when the need for tracing comes up, whereas most users will
not want to or be unable to do this.

Dynamic instrumentation solves some of these problems, but not all.
Taking an int3 on the event might not be your idea of performance in the
tracing case, but it's much more appealing to leaving static points
enabled in a running system, or having to switch between kernels
arbitrarily to get any work done.

As Ingo has also pointed out, there's plenty of room for optimization in
the kprobes case, and with djprobes on the way, this will be even more
marginalized. Why you choose to write this off is mind boggling,
particularly since it goes to lowering the cost of dynamic
instrumentation, which seems to be one of your primary concerns.

> The "perfect" solution:
> -----------------------
>
> And sure enough, eventually, truth came knocking. And truth
> had a name. It was called dtrace. All of a sudden, everybody
> and his little sister insisted Linux should have an equivalent.
> I'll spare the reader all the political stuff in between, but
> I'll readily admit to this: ltt wasn't a dtrace substitute.
> While it did target the right audience, it lacked the ability
> to allow the user to arbitrarily control instrumentation at
> runtime.
>
So DTrace was the "perfect" solution because it did allow for dynamic
instrumentation, and ltt wasn't a substitute because it lacked it?
That's clearly the most compelling argument for static instrumentation
I've ever seen.

> Now, you can imagine Frank writing this piece ... "must not
> sound too uncompromising" ... "must insist on what kernel
> developers like to see" ... "mention dynamic tracing" ...
> I mean, look at the choice of words: "I'm in favour of
> *some* *lightweigth* event-marking facility", "... where
> *dynamic probing* is not ..." Smart. Keep to accepted
> orthodox principles, don't upset the natives.
>
What exactly are you trying to prove with this? Yes, people aren't
opposed to a lightweight marker facility. Ingo made some suggestions
regarding that, and others (Andrew, Martin, etc.) have pointed out that
this would also be beneficial for certain use cases. I don't see anyone
violently opposed to lightweight markers, I see people violently opposed
to the ltt-centric breed of static instrumentation (and yes, I'm one of
them), let's not confuse the two.

This thread would be much better off talking about how to go about
implementing lightweight markers rather than spent on mindless rants.

> And what does Jose say? Well I couldn't say it better than him:
>
> > I agree with you here, I think is silly to claim dynamic instrumentation
> > as a fix for the "constant maintainace overhead" of static trace point.
> > Working on LKET, one of the biggest burdens that we've had is mantainig
> > the probe points when something in the kernel changes enough to cause a
> > breakage of the dynamic instrumentation. The solution to this is having
> > the SystemTap tapsets maintained by the subsystems maintainers so that
> > changes in the code can be applied to the dynamic instrumentation as
> > well. This of course means that the subsystem maintainer would need to
> > maintain two pieces of code instead of one. There are a lot of
> > advantages to dynamic vs static instrumentation, but I don't think
> > maintainace overhead is one of them.
>
> Well, well, well. Here's a guy doing *exactly* what I was
> asked to do a couple of years back. And what does he say?
> "I think is silly to claim dynamic instrumentation as a
> fix for the "constant maintainace overhead" of static trace
> point."
>
That's a pretty liberal interpretation of that paragraph. Comparatively
let's look at this:

> > Working on LKET, one of the biggest burdens that we've had is mantainig
> > the probe points when something in the kernel changes enough to cause a
> > breakage of the dynamic instrumentation.

Strange, that reads a lot like a maintenance burden to me, and the only
argument for alleviating the burden is by punting it off to subsystem
maintainers so they can sync up the probe points along with the code.

Markers may very well be the answer for this, but you can't
realistically sit there claiming that this is not a maintenance issue
when it's clearly been an issue for everyone involved. Shifting the
burden is one thing, and might be the answer if there's a consensus,
claiming that it's not there is ignoring reality.

> And just in case you missed it the first time in his
> paragraph, he repeats it *again* at the end:
> " There are a lot of advantages to dynamic vs static
> instrumentation, but I don't think maintainace overhead is
> one of them."
>
Easy to say when you aren't maintaining the trace points ;-)

> But not content with Jose and Frank's first-hand experience
> and testimonials about the cost of outside maintenance of
> dynamically-inserted tracepoint, and obviously outright
> dismissing the feedback from such heretics as Roman, Martin,
> Mathieu, Tim, Karim and others, we have a continued barrage of
> criticism from, shall we say, very orthodox kernel developers
> who insist that the collective experience of the previously
> mentioned people is simply misguided and that, as experienced
> kernel developers, *they* know better.
>
Have you considered that some of the suggestions being offered are aimed
at what's best for the kernel instead of what's best for LTT?

Feedback is one thing, saying "kprobes sucks because it's not available
on my architecture and I don't feel like porting it" is a rather
different beast.

> That concession, however, still doesn't stop those very
> same orthodox developers continuing to insist that
> somehow "dynamic tracing" is superior to "static tracing",
> even though they have actually never had to maintain an
> infrastructure based on either for the purpose of allowing
> mainstream users to trace their kernels for *user* purposes.
> And in all fairness some are pretty open about it.
>
And once these points are mainlined, who will be maintaining them I
wonder?

> For the argument, as it is at this stage of the long
> intertwined thread of this week, is that "dynamic tracing"
> is superior to "static tracing" because, amongst other
> things, "static tracing" requires more instrumentation
> than "dynamic tracing". But that, as I said within said
> thread, is a fallacy. The statement that "static tracing"
> requires more instrumentation than "dynamic tracing" is
> only true in as far as you ignore that there is a cost
> for out-of-tree maintenance of scripts for use by probe
> mechanisms. And as you've read earlier, those doing this
> stuff tell us there *is* cost to this. Not only do they
> say that, but they go as far as telling us that this
> cost is *no different* than that involved in maintaining
> static trace points. That, in itself, flies in the face
> of all accepted orthodox principles on the topic of
> mainlined static tracing.
>
Yes, if you want to do tracing, trace points have to be maintained. I
don't think this strikes anyone as being news. It's where it becomes
maintained, and at what cost it has on the rest of the system that is
the issue.

> Nevertheless, I persist and submit a proposal which I feel
> addresses many, if not all, of the previous fears I've heard
> voiced over the years. Yet, while ample opportunity was
> given and repeated requests, hardliners and observers alike
> refuse to even comment on what I propose -- what's changed.
> So, here again, yet another time, a proposal for a static
> markup system:
>
The only issue with this is that the argument list has to be maintained
in two places. Personally I don't have any objections to something like
this, though. As long as the places where this happens are restricted to
useful points determined by subsystem maintainers, and the rest handled
by dynamic instrumentation. Otherwise you fall back in to "my tracepoint
is better than yours" fight and they start piling up again, even sans
overhead..

2006-09-17 14:44:49

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Paul Mundt <[email protected]> wrote:

> What exactly are you trying to prove with this? Yes, people aren't
> opposed to a lightweight marker facility. Ingo made some suggestions
> regarding that, and others (Andrew, Martin, etc.) have pointed out
> that this would also be beneficial for certain use cases. I don't see
> anyone violently opposed to lightweight markers, I see people
> violently opposed to the ltt-centric breed of static instrumentation
> (and yes, I'm one of them), let's not confuse the two.

yes. The way i see this whole issue (and what i've been trying argue for
a long time) is that with dynamic tracers we have a _continuum_ of
_tracepoint maintainance models_ that maintainers can choose from, each
of which model gives the same "end-user experience":

- model #1: we could have all static markers in the main kernel
source. No dynamic markups at all.

- model #2: we could have the least intrusive markers in the main
kernel source, while the more intrusive ones would still be in the
upstream kernel, but in scripts/instrumentation/.

- model #3: we could have the 'hardest' markups in the source, and the
'easy' ones as dynamic markups in scripts/instrumentation/.

- model #4: we could have each and every tracepoint in
scripts/intrumentation/ - none in the main source.

Note that each model has a different maintainance tradeoff. In my
judgement model #2 is the one with the smallest total maintainance cost,
but we dont _have to_ make a hard decision about this here and now. Not
having to do a (potentially wrong) maintainance-model decision is always
good!

These tracepoint models arent even global, they can and should be
per-subsystem. A seldom changing subsystem could have all its markers
right embedded in the main kernel source. A subsystem under active
development will most likely not have many markers (because they are
just a hindrance when doing high-frequency updates).

The tracepoint model is not only per-subsystem, it can also change in
time. If a subsystem goes through heavy changes (due to a rewrite), it
might remove all of its static markups and move all the tracing
infrastructure into scripts. Once the rate of changes has 'cooled down',
the tracepoints can move back into the source again.

Furthermore, since there is no end-user visible impact of these "where
should the markups be" decisions, the decisions will be made on a pure
technical basis. Nobody will flame anyone about having a particular
static marker moved to a script, because it's only an implementational
(performance and maintainance micro-overhead) issue, not a functionality
issue. In fact, with dynamic tracers, an end-user visible breakage can
even be fixed _after the main kernel has been released, compiled and
booted on the end-user's system_. Systemtap scripts can be updated on
live systems. So there is very, very little maintainance pressure caused
by dynamic tracing.

On the other hand, if we accept static tracers into the mainline kernel,
we have to decide in favor of tracepoint-maintainance model #1
_FOREVER_. It will be a point of no return for a likely long time.
Moving a static tracepoint or even breaking it will cause end-user pain
that needs an _upstream kernel fix_. It needs a new stable kernel, etc.,
etc. It is very inflexible, and fundamentally so.

So my argument isnt "dynamic markup vs. static markup", my argument is:
"we shouldnt force the kernel to carry a 100% set of static markups
forever". We should allow maintainers to decide the 'mix' of static vs.
dynamic markups that they prefer in their subsystem.

And i might even be proven wrong in a few years, maybe all tracepoints
will be static markups in the source. I strongly doubt it, but still
it's a possibility, it wouldnt be the first time i'm wrong. In that case
we'd still have the same functionality (sans a few rarer arches that
done have kprobes, yet). But one thing is sure: if i'm just 20% right,
we'll be much worse off with all the static tracer dependencies.

> This thread would be much better off talking about how to go about
> implementing lightweight markers rather than spent on mindless rants.

i agree, as long as it's lightweight markers for _dynamic tracers_, so
that we keep our options open - as per the arguments above.

Ingo

2006-09-17 15:03:08

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> > This thread would be much better off talking about how to go about
> > implementing lightweight markers rather than spent on mindless rants.
>
> i agree, as long as it's lightweight markers for _dynamic tracers_, so
> that we keep our options open - as per the arguments above.

Could you please explain, why we can't have markers which are usable by
any tracer?

bye, Roman

2006-09-17 15:18:45

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Sun, 17 Sep 2006, Ingo Molnar wrote:
>
> > > This thread would be much better off talking about how to go about
> > > implementing lightweight markers rather than spent on mindless rants.
> >
> > i agree, as long as it's lightweight markers for _dynamic tracers_, so
> > that we keep our options open - as per the arguments above.
>
> Could you please explain, why we can't have markers which are usable
> by any tracer?

the main reason for that i explained in the portion of the email you
snipped:

> > On the other hand, if we accept static tracers into the mainline
> > kernel, we have to decide in favor of tracepoint-maintainance model
> > #1 _FOREVER_. It will be a point of no return for a likely long
> > time. Moving a static tracepoint or even breaking it will cause
> > end-user pain that needs an _upstream kernel fix_. It needs a new
> > stable kernel, etc., etc. It is very inflexible, and fundamentally
> > so.

of course it's easy to have static markup that is usable for both types
of tracers - but that is of little use. Static tracers also need the
guarantee of a _full set_ of static markups. It is that _guarantee_ of a
full set that i'm arguing against primarily. Without that guarantee it's
useless to have markups that can be used by static tracers as well: you
wont get a full set of tracepoints and the end-user will complain.
(partial static markups are of course still very useful to dynamic
tracers)

( furthermore, there are other reasons as well: i explained my position
in some of those replies that you did not want to "further dvelve
into". I'm happy to give you Message-IDs if you'd like to follow up on
them, there's no need to repeat them here. )

Ingo

2006-09-17 15:36:38

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

* Ingo Molnar ([email protected]) wrote:
> - model #2: we could have the least intrusive markers in the main
> kernel source, while the more intrusive ones would still be in the
> upstream kernel, but in scripts/instrumentation/.
>

Please define : marker intrusiveness. I think that this is not a sole concept.
First, I think we have to look at intrusiveness under three different angles :

- Visual intrusiveness (hurts visually in the code)
- Compiled-in, but inactive intrusiveness
- Modifies compiler optimisations when the marker is compiled in but no
tracing is active.
- Wastes a few cycles because it adds NOPs, jump, etc in a critical path
when tracing is not active.
- Active tracing intrusiveness
- Wastes too many cycles in a critical path when tracing is active.

The problem is that a static marker will speed up the active tracing while a
dynamic probe will speed up the case where tracing is inactive. The problem is
that the dynamic probe cost can get so big that it modifies the traced system
often more than acceptable. Under this angle, I would be tempted to say that the
most intrusive instrumentation should be helped by marker, which means accepting
a very small performance impact (NOPs on modern CPUs are quite fast) when
tracing is not active in order to enable fast tracing of some very high event
rate kernel code paths.

> - model #3: we could have the 'hardest' markups in the source, and the
> 'easy' ones as dynamic markups in scripts/instrumentation/.
>
By "hardest", do you mean : where the data that is to be extracted is not easily
available due to compiler optimisations ?

> So my argument isnt "dynamic markup vs. static markup", my argument is:
> "we shouldnt force the kernel to carry a 100% set of static markups
> forever". We should allow maintainers to decide the 'mix' of static vs.
> dynamic markups that they prefer in their subsystem.
>
We completely agree on this last paragraph.

> i agree, as long as it's lightweight markers for _dynamic tracers_, so
> that we keep our options open - as per the arguments above.

But I also think that a marker mechanism should not only mark the code location
where the instrumentation is to be made, but also the information the probe is
interested into (provide compile-time data type verification and address at
runtime). Doing otherwise would limit what could be provided to static markup
users.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-17 17:18:54

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> of course it's easy to have static markup that is usable for both types
> of tracers - but that is of little use. Static tracers also need the
> guarantee of a _full set_ of static markups. It is that _guarantee_ of a
> full set that i'm arguing against primarily. Without that guarantee it's
> useless to have markups that can be used by static tracers as well: you
> wont get a full set of tracepoints and the end-user will complain.
> (partial static markups are of course still very useful to dynamic
> tracers)

And yet again, you offer no prove at all and just work from assumptions.
You throw in some magic "_full set_" of marker and just assume any change
in that will completely break static tracers.
You just assume that we absolutely must make this "guarantee" for static
tracers, as if static tracer can't be updated at all.
You completely ignore that it might be possible to create some rules and
educate users that the amount of exported events can't be completely
static.
What is so special between users of dynamic and static tracers, that the
former will never complain, if some tracepoint doesn't work anymore?

Do you really think that users of static tracers are that stupid, that
they are not aware of its limitations? Of course they sometimes have to
maintain their own set of tracepoints (especially in the area of kernel
development). That still doesn't change the fact that _any_ trace user
will benefit from a base set of tracepoints, which you seem to think
can't exist.

bye, Roman

2006-09-17 20:21:55

by Nicholas Miell

[permalink] [raw]

Subject: Re: tracepoint maintainance models

On Sun, 2006-09-17 at 16:36 +0200, Ingo Molnar wrote:
> * Paul Mundt <[email protected]> wrote:
>
> > What exactly are you trying to prove with this? Yes, people aren't
> > opposed to a lightweight marker facility. Ingo made some suggestions
> > regarding that, and others (Andrew, Martin, etc.) have pointed out
> > that this would also be beneficial for certain use cases. I don't see
> > anyone violently opposed to lightweight markers, I see people
> > violently opposed to the ltt-centric breed of static instrumentation
> > (and yes, I'm one of them), let's not confuse the two.
>
> yes. The way i see this whole issue (and what i've been trying argue for
> a long time) is that with dynamic tracers we have a _continuum_ of
> _tracepoint maintainance models_ that maintainers can choose from, each
> of which model gives the same "end-user experience":

To inject some facts into this argument, I took a look at dtrace on a
Solaris LiveCD (Belenix 0.4.4, actually, and wow are their userspace
apps are as terrible as I've been lead to be believe.)

On my system, Solaris has 49 "real" static probes (with actual
documentation[1]). They are as follows:

io:::done proc:::lwp-start
io:::start proc:::signal-clear
io:::wait-done proc:::signal-discard
io:::wait-start proc:::signal-handle
lockstat:::adaptive-acquire proc:::signal-send
lockstat:::adaptive-block proc:::start
lockstat:::adaptive-release sched:::change-pri
lockstat:::adaptive-spin sched:::dequeue
lockstat:::rw-acquire sched:::enqueue
lockstat:::rw-block sched:::off-cpu
lockstat:::rw-downgrade sched:::on-cpu
lockstat:::rw-release sched:::preempt
lockstat:::rw-upgrade sched:::remain-cpu
lockstat:::spin-acquire sched:::schedctl-nopreempt
lockstat:::spin-release sched:::schedctl-preempt
lockstat:::spin-spin sched:::schedctl-yield
lockstat:::thread-spin sched:::sleep
proc:::create sched:::surrender
proc:::exec sched:::tick
proc:::exec-failure sched:::wakeup
proc:::exec-success sdt:::callout-end
proc:::exit sdt:::callout-start
proc:::fault sdt:::interrupt-complete
proc:::lwp-create sdt:::interrupt-start
proc:::lwp-exit

You'll note that these probes are all generic high-level concepts, some
of which occur at multiple places within the kernel (You can just trust
me on this, the dtrace -l output lists multiple function sites for the
provider:::name pair).

In addition to those 49 probes, there are 330 more documented probes
which fire whenever a statistical counter changes (most of them are SNMP
MIB counters, but there are also probes related to VM behavior,
filesystem activity, etc.). These are all hidden inside the pre-existing
counter update macros[2] and didn't increase the kernel maintenance
burden because the counters already had to be maintained (which is why I
don't consider them "real").

There are also 134 more undocumented driver-specific probes. Every probe
comes labeled with a stability indicator that looks like something like
this:

8271 sdt zfs arc_evict_ghost arc-delete

Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: Unknown

Argument Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: ISA

Argument Types
None

Which basically says that this undocumented probe is for private Sun use
and if you touch it and something breaks, you were warned and it's your
own damn fault[3]. (Obviously, the stable probes have different
labeling.) Also, given a D script, the dtrace command can spit out a
summary of that script's stability based on the probes it uses, which is
handy for judging the future compatibility of a script.

So, Solaris has a grand total of 513 statically defined probe points,
most of them hidden inside macros that were already there.

Then why is dtrace useful?

Because there's 48288 dynamically defined probes on function entry and
exit and another 454 dynamic syscall entry and exit probes.

This is the important part: In a dynamic tracing system, the number of
static probes necessary for the tracing system to be useful is
drastically, dramatically, absurdly lower than in a purely static
tracing system. Hell, you don't even need the static probes for it to be
useful, they're just a convenience for events which happen in multiple
places or a high-level name for a low-level implementation detail.

In order for the static tracing system to be as useful as the dynamic
system, all of those dynamically generated probe points would have to be
manually added to the kernel. The maintenance burden of this number of
probes is stupidly high. In reality, no static system would ever reach
that level of coverage.

[1] http://docs.sun.com/app/docs/doc/817-6223
[2] http://blogs.sun.com/tpenta/entry/dtrace_using_placing_sdt_probes
[3] http://docs.sun.com/app/docs/doc/817-6223/6mlkidlnp?a=view

--
Nicholas Miell <[email protected]>

2006-09-17 20:38:42

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> Static tracers also need the
> guarantee of a _full set_ of static markups. It is that _guarantee_ of a
> full set that i'm arguing against primarily.

To those who are still reading this, let's fill this with a bit of
meaning (Ingo is unfortunately rather unspecific here):

What is this "_full set_ of static markups" needed/used by tracers?

A tracer can of course export all kinds of information, a lot of this
would only be interesting to a few users. Nevertheless there is a set of
information, which is interesting to many users. Let's take a minimal set
of just the information "schedule task from A to B", this information is
needed in many traces in order to understand what's going on in the
kernel.

Let's use this simple set to look at a few of myths around static tracing,
which Ingo brings up over and over without really proving it.

Scheduling is one of the basic kernel functions, how the actual scheduling
is done is in a constant flux, but over the years it always ended up in a
call to switch_to(), so any kernel developer could easily maintain such a
tracepoint. The exported information is also that simple that it's easy to
guarantee that this information is available over many kernel version to
come.

So what we have now is a minimal set of tracepoints, which is equally
useful to any tracer, which is easy to maintain and reasonably easy to
guarantee that it exists. Will there be any absolute guarantees, that this
set will exist forever? Of course not, but it's not needed, should there
be any change to it, it will very likely announce itself in a development
tree and userspace tools can adjust to it.

Static tracing of course has its limitations, it's of course not possible
to export any kind of information with in a standard way via the standard
kernel, but nobody is asking for this. The kind of information requested
is very much like the one above, all that has been asked for is _basic_
set of tracing information, which can be easily managed and is likely
available and as I just proved such set does exist, so why should we not
make it available to everyone?

Will this set satisfy anyone? Of course not, but anyone can easily add his
own trace points (statically or dynamically).

Will this set be only for the benefit of static tracers? No, this basic
set of traces is needed by all tools, so it's not ltt-centric at all. It
will help in the unification of the various trace tools, so that they can
share as much as possible.

So why again should this information only available to dynamic tracers?

bye, Roman

2006-09-17 22:43:19

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Sun, 17 Sep 2006, Ingo Molnar wrote:
>
> > Static tracers also need the
> > guarantee of a _full set_ of static markups. It is that _guarantee_ of a
> > full set that i'm arguing against primarily.
>
> To those who are still reading this, let's fill this with a bit of
> meaning (Ingo is unfortunately rather unspecific here):

Please make your own points instead of positing to "fill in" my points
with "a bit of meaning". My words can certainly speak for themselves,
and they do so in extensive specificity.

Ingo

2006-09-17 23:14:59

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Nicholas Miell <[email protected]> wrote:

> On my system, Solaris has 49 "real" static probes (with actual
> documentation[1]). They are as follows:

yeah, _some_ static markers are OK, as long as they are within a dynamic
tracing framework! (and are thus constantly "kept in check" by the easy
availability of dynamic probes)

what is being proposed here is entirely different from dprobes though:
Roman suggests that he doesnt want to implement kprobes on his arch, and
he wants LTT to remain an _all-static_ tracer. That's the point where i
beg to differ: static markers are fine (but they should be kept to a
minimum), but generic static /tracers/ need alot more than just a few
static markers to be meaningful.

So if we accepted static tracers into the kernel, we'd automatically
commit (for a long period of time) to a much larger body of static
markers - and i'm highly uncomfortable about that. (for the many reasons
outlined before)

Even if the LTT folks proposed to "compromise" to 50 tracepoints - users
of static tracers would likely _not_ be willing to compromise, so there
would be a constant (and I say unnecessary) battle going on for the
increase of the number of static markers. Static markers, if done for
static tracers, have "viral" (Roman: here i mean "auto-spreading", not
"disease") properties in that sense - they want to spread to alot larger
area of code than they start out from.

While if we only have a dynamic tracing framework (which is a mix of
static markers and dynamic probes) then pretty much the only user
pressure would be: "implement kprobes!". (which is already implemented
for 5 major arches and takes only between 500 and 1000 lines of per-arch
code for most of them.)

( furthermore, from what you've described it seems to me that
kprobes/kretprobes/djprobes+SystemTap is already more capable than
dprobes is - hence the number of static markes needed in Linux might
in fact be lower in the end than in Solaris. )

> This is the important part: In a dynamic tracing system, the number of
> static probes necessary for the tracing system to be useful is
> drastically, dramatically, absurdly lower than in a purely static
> tracing system. Hell, you don't even need the static probes for it to
> be useful, they're just a convenience for events which happen in
> multiple places or a high-level name for a low-level implementation
> detail.

yeah, precisely my point.

> In order for the static tracing system to be as useful as the dynamic
> system, all of those dynamically generated probe points would have to
> be manually added to the kernel. The maintenance burden of this number
> of probes is stupidly high. In reality, no static system would ever
> reach that level of coverage.

yeah, agreed.

Ingo

2006-09-17 23:36:15

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Roman Zippel <[email protected]> wrote:

> What is so special between users of dynamic and static tracers, that
> the former will never complain, if some tracepoint doesn't work
> anymore?

If by breakage you mean accidental regressions, i was not talking about
accidental breakages when i suggested that dynamic tracers would not see
them. The "breakage" i talked about, and which would cause regressions
to static tracer users but would not be noticed by dynamic tracer users
was:

_the moving of a static marker to a dynamic script_

(see <[email protected]>, my first paragraph there. Also see
<[email protected]> for the same topic.)

this breaks static tracers, but dynamic tracers remain unaffected,
because the dynamic probe (or the function attribute) still offers
equivalent functionality. Hence users of dynamic tracers still have the
same functionality - while users of static tracers see breakage. Ok?

If you meant accidental breakages, then of course users of both types of
tracers would be affected, but even in this case there's a more subtle
difference here, which i explained in <[email protected]>:

>> In fact, with dynamic tracers, an end-user visible breakage can even
>> be fixed _after the main kernel has been released, compiled and
>> booted on the end-user's system_. Systemtap scripts can be updated on
>> live systems. So there is very, very little maintainance pressure
>> caused by dynamic tracing.

i hope this explains.

Ingo

2006-09-17 23:50:04

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Roman Zippel <[email protected]> wrote:

> And yet again, you offer no prove at all and just work from
> assumptions. You throw in some magic "_full set_" of marker and just
> assume any change in that will completely break static tracers. [...]

i'm not sure i understand what you are trying to say here. Are you
saying that if i replaced half of the static markups with function
attributes (which would still provide equivalent functionality for
dynamic tracers), or if i removed a few dozen static markups with
dynamic scripts (which change too would be transparent to users of
dynamic tracers), that in this case static tracers would /not/ break?
[if yes then that would be the most puzzling suggestion ever posed in
this thread]

> You completely ignore that it might be possible to create some rules
> and educate users that the amount of exported events can't be
> completely static.

no serious trace user would accept it if for example half of their
static tracepoints would go away, because for example they were made
dynamic (or they were made function attributes).

that's the plain meaning of what i said. Were we to accept static
tracers, we'd be stuck with the full set of static tracepoints for a
long time, because users of static tracers would not accept a
significant reduction in the number of tracepoints. (even if those
"reduced" tracepoints were in fact just moved over to dynamic probes)

Was it truly confusing to you what i said? (in words that i thought were
more than clear) Please let me know and i'll try to formulate more
verbosely and more clearly when replying to you. This must be some
fundamental communication issue between you and me.

Ingo

2006-09-18 00:06:12

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Mon, 18 Sep 2006, Ingo Molnar wrote:

> what is being proposed here is entirely different from dprobes though:
> Roman suggests that he doesnt want to implement kprobes on his arch, and
> he wants LTT to remain an _all-static_ tracer. [...]
>
> Even if the LTT folks proposed to "compromise" to 50 tracepoints - users
> of static tracers would likely _not_ be willing to compromise, so there
> would be a constant (and I say unnecessary) battle going on for the
> increase of the number of static markers. Static markers, if done for
> static tracers, have "viral" (Roman: here i mean "auto-spreading", not
> "disease") properties in that sense - they want to spread to alot larger
> area of code than they start out from.

1. It's not that I don't want to, but I _can't_ implement kprobes and not
due to lack of skills, but lack of resources. (There is a subtle but
important difference.)
2. I don't want LTT to be "all static tracer" at all, I want it to be
usable as a static tracer, so that on archs where kprobes are available it
can use them of course. This puts your second paragraph in a new
perspective, since the userbase and thus the pressure for more and more
static tracepoints would be different.

bye, Roman

2006-09-18 00:10:24

by Nicholas Miell

[permalink] [raw]

Subject: Re: tracepoint maintainance models

On Mon, 2006-09-18 at 01:06 +0200, Ingo Molnar wrote:
> * Nicholas Miell <[email protected]> wrote:
>
> > On my system, Solaris has 49 "real" static probes (with actual
> > documentation[1]). They are as follows:
>
> yeah, _some_ static markers are OK, as long as they are within a dynamic
> tracing framework! (and are thus constantly "kept in check" by the easy
> availability of dynamic probes)
>
> what is being proposed here is entirely different from dprobes though:
> Roman suggests that he doesnt want to implement kprobes on his arch, and
> he wants LTT to remain an _all-static_ tracer. That's the point where i
> beg to differ: static markers are fine (but they should be kept to a
> minimum), but generic static /tracers/ need alot more than just a few
> static markers to be meaningful.

Anyone know what's hard about kprobes on m68k? Roman?

> So if we accepted static tracers into the kernel, we'd automatically
> commit (for a long period of time) to a much larger body of static
> markers - and i'm highly uncomfortable about that. (for the many reasons
> outlined before)
>
> Even if the LTT folks proposed to "compromise" to 50 tracepoints - users
> of static tracers would likely _not_ be willing to compromise, so there
> would be a constant (and I say unnecessary) battle going on for the
> increase of the number of static markers. Static markers, if done for
> static tracers, have "viral" (Roman: here i mean "auto-spreading", not
> "disease") properties in that sense - they want to spread to alot larger
> area of code than they start out from.
>
> While if we only have a dynamic tracing framework (which is a mix of
> static markers and dynamic probes) then pretty much the only user
> pressure would be: "implement kprobes!". (which is already implemented
> for 5 major arches and takes only between 500 and 1000 lines of per-arch
> code for most of them.)
>
> ( furthermore, from what you've described it seems to me that
> kprobes/kretprobes/djprobes+SystemTap is already more capable than
> dprobes is - hence the number of static markes needed in Linux might
> in fact be lower in the end than in Solaris. )

Most of what makes dtrace better than SystemTap right now is the polish
of the userspace tools, the extra features (pre-userspace tracing,
post-mortem trace buffer extraction, speculative tracing, userspace
tracing, ABI stability notations, etc.), the better runtime library for
scripts, and the fact that they've found everything that can't be traced
without crashing the kernel and marked it untracable.

The D language itself may be quite limited (and hated because of that),
but it is clean and complete, which is something I can't say about
stap's language.

The existence of documentation really helps, too.

The actual probing mechanism itself is a very small part of what makes
dtrace good and SystemTap not there yet.

> > This is the important part: In a dynamic tracing system, the number of
> > static probes necessary for the tracing system to be useful is
> > drastically, dramatically, absurdly lower than in a purely static
> > tracing system. Hell, you don't even need the static probes for it to
> > be useful, they're just a convenience for events which happen in
> > multiple places or a high-level name for a low-level implementation
> > detail.
>
> yeah, precisely my point.
>

I should note that, despite being unneeded in a dynamic trace system, I
think the addition of static probe points is actually useful, and the
convenience they provide shouldn't be minimized. Obviously you're not
going to want to add static probe points for implementation-details that
are likely to change in the future (without noting that they're
implementation-specific and prone to change, anyway).

> > In order for the static tracing system to be as useful as the dynamic
> > system, all of those dynamically generated probe points would have to
> > be manually added to the kernel. The maintenance burden of this number
> > of probes is stupidly high. In reality, no static system would ever
> > reach that level of coverage.
>
> yeah, agreed.
>
> Ingo
--
Nicholas Miell <[email protected]>

2006-09-18 00:15:38

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Mathieu Desnoyers <[email protected]> wrote:

> * Ingo Molnar ([email protected]) wrote:
> > - model #2: we could have the least intrusive markers in the main
> > kernel source, while the more intrusive ones would still be in the
> > upstream kernel, but in scripts/instrumentation/.
> >
>
> Please define : marker intrusiveness. I think that this is not a sole
> concept. First, I think we have to look at intrusiveness under three
> different angles :
>
> - Visual intrusiveness (hurts visually in the code)
> - Compiled-in, but inactive intrusiveness
> - Modifies compiler optimisations when the marker is compiled in but no
> tracing is active.
> - Wastes a few cycles because it adds NOPs, jump, etc in a critical path
> when tracing is not active.
> - Active tracing intrusiveness
> - Wastes too many cycles in a critical path when tracing is active.

as the primary factor i'd add:

- Maintainance intrusiveness

but yes, agreed - with that addition this is a good summary of the
intrusiveness factors.

> The problem is that a static marker will speed up the active tracing
> while a dynamic probe will speed up the case where tracing is
> inactive. The problem is that the dynamic probe cost can get so big
> that it modifies the traced system often more than acceptable. [...]

do you base this opinion of yours on the kprobes+LTT experiment you did
yesterday? If yes then would it be possible for you to try the 3 patches
that i sent, and re-measure the impact of kprobes? The kprobes overhead
should go down a bit, it would be interesting to see by how much.

Also, when forming your opinion do you consider djprobes - which in
essence inserts a function call (and not an INT3) into the probed code?

> [...] Under this angle, I would be tempted to say that the most
> intrusive instrumentation should be helped by marker, which means
> accepting a very small performance impact (NOPs on modern CPUs are
> quite fast) when tracing is not active in order to enable fast tracing
> of some very high event rate kernel code paths.

i'm not fundamentally worried about the runtime impact of static probes,
as long as the impact is unmeasurable. I'm more worried about their
maintainance impact - so i want the option to move them to a dynamic
script, if the tracepoint is for example not frequent. (but being a
perfectionist i cannot completely forget about their runtime overhead
either)

> > - model #3: we could have the 'hardest' markups in the source, and the
> > 'easy' ones as dynamic markups in scripts/instrumentation/.
> >
> By "hardest", do you mean : where the data that is to be extracted is
> not easily available due to compiler optimisations ?

yeah - hard in the sense of dynamic probing.

> > i agree, as long as it's lightweight markers for _dynamic tracers_,
> > so that we keep our options open - as per the arguments above.
>
> But I also think that a marker mechanism should not only mark the code
> location where the instrumentation is to be made, but also the
> information the probe is interested into (provide compile-time data
> type verification and address at runtime). Doing otherwise would limit
> what could be provided to static markup users.

yeah. If you look at the API suggestions i made, they are such. There
can be differences though to 'static tracepoints used by static
tracers': for example there's no need to 'mark' a static variable,
because dynamic tracers have access to it - while a static tracer would
have to pass it into its trace-event function call.

Ingo

2006-09-18 00:18:26

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Mon, 18 Sep 2006, Ingo Molnar wrote:

> Was it truly confusing to you what i said?

Not really, it's pretty clear, that you don't want me (or any other user
of an arch, which doesn't support kprobes) cut some slack, so that I can
make use of tracing. :-(

bye, Roman

2006-09-18 00:35:20

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

I've been following the evolution of this thread from a distance and must
point out that this "static" vs. "dynamic" issue has been cleared up
numerous times already. Your persistence in further continuing this
prompts me to provide the casual reader -- as if anyone is still
reading any of this -- with a least a minimum of correct semantics.

Ingo Molnar wrote:
> what is being proposed here is entirely different from dprobes though:
> Roman suggests that he doesnt want to implement kprobes on his arch, and
> he wants LTT to remain an _all-static_ tracer. That's the point where i
> beg to differ: static markers are fine (but they should be kept to a
> minimum), but generic static /tracers/ need alot more than just a few
> static markers to be meaningful.
>
> So if we accepted static tracers into the kernel, we'd automatically
> commit (for a long period of time) to a much larger body of static
> markers - and i'm highly uncomfortable about that. (for the many reasons
> outlined before)
>
> Even if the LTT folks proposed to "compromise" to 50 tracepoints - users
> of static tracers would likely _not_ be willing to compromise, so there
> would be a constant (and I say unnecessary) battle going on for the
> increase of the number of static markers. Static markers, if done for
> static tracers, have "viral" (Roman: here i mean "auto-spreading", not
> "disease") properties in that sense - they want to spread to alot larger
> area of code than they start out from.

The distinction you make is not substantiated by the factual record. See
overlap between lket list of events and *old* ltt list of events:
http://sourceware.org/systemtap/man5/lket.5.html

There is, actually, no reason to believe that end-users of dynamic trace
infrastructures are any more tolerant to breakage than, say, those of
the *old* ltt. In fact, Jose's feedback as a maintainer is exactly the
opposite. So, then, is Jose's SystemTap-based LKET a "static tracer"?
Based on your logic I would have to conclude that it is indeed so. In
fact, if some of SystemTap's important scripts get mainlined, then so
will SystemTap be "static" according to your benchmark.

There are in fact at least three parts to this, and for the life of me
I can't find any trace of an explanation of why you choose to persist
in claiming that it's the same puddle: a) the markup, b) the mechanism,
c) the events list.

You have agreed, up to this point, that static markup is needed. Good,
let's build on that. Because what I, at least, and Roman I believe
seems to want the same thing, have been advocating is that for a given
marker A, let there be a choice to the person building that kernel of
whether A resolves into probe-information or direct call to a direct
inline function. Granted, how either of these is implemented has not
yet been figured out, and neither need be implemented with the type of
limitations known to such things in the passed; as has been explained
to you many times.

And, in my opinion, this is as far as the discussion need to go for
now: let the mechanism not be tied to the markup.

But, you insist in going further and claim that a given trigger
mechanism implies a given dependency on a list of events. That is
where I'm completely lost. While it is true that the *old* ltt
rigidly implemented such a dependency, this is no *inherent* link
between mechanism and dependency on a given event list -- LKET being
a prime counter-example to your logic.

If you would care to actually investigate the *new* ltt, you would
actually see an entirely different picture, as has been clearly
explained to you by Mathieu. The thing is but an engine to deal
with the interpretation of large event streams -- I said it in
my other post: it inherits from my work but the name. It, in
fact, supports the injection of new event set definitions at
every new trace. Do you understand this fact? LTTng will allow
you to feed it definitions for the events you actually have in
a trace so that it can render them to you. IOW it will *not*
break because a static tracepoint went away. The importance of
such things, and the substantial interest from SystemTap folks
and LKET folks for such features of LTTng, is of course lost on
you because you're *convinced* that you're an expert on every
kind of tracing just because you personally implemented a couple
of basic highly-customized static tracers. It evades you that
some people out there may have actually put a lot more thought in
streamlining much of the irritants of basic tracing mechanisms.
The importance of this is lost on you: You are *convinced* that
what exists is only what you were personally able to make of
tracing mechanisms. I will let casual readers decide what this means
about everything you said up to this point, especially in the
light of the fact that you *never* showed at any Linux tracing
event, gathering or project which is represented by those projects
you so desperately wish to dictate the integration of.

For my part, I think it's much too early to discuss event lists.
In fact, no such thing was ever posted by Mathieu. The only
reason set event lists were discussed was discussion surrounding
the *old* ltt. And, if anything, the last few days should have made
it clear to the educated reader that, at this point in time, the
*old* ltt is but a case for academic study.

So please, of a) markup, b) mechanism, c) event list, let's
concentrate on a generic markup first.

And again, much of this has already been said elsewhere numerous
times. This is just a semantic addition to previous explanations.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 00:42:33

by Karim Yaghmour

[permalink] [raw]

Subject: Re: The emperor is naked: why comprehensive static markup belongs in mainline

Paul Mundt wrote:
> The only issue with this is that the argument list has to be maintained
> in two places.

Not necessarily. LTTng's genevent stuff could be intelligently used here.
Ideally markup is self-contained: it provides code location and context,
and provides any additional information required for postmortem
"rendering" of the event (i.e. how the event is displayed/analyzed).

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 00:43:34

by Roman Zippel

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

On Sun, 17 Sep 2006, Nicholas Miell wrote:

> Anyone know what's hard about kprobes on m68k? Roman?

A limited kprobes hack wouldn't be that difficult (but would still
require more time than I have right now), although it would be barely
usable with a large number of traces.
Ingo might be able to optimize kprobes on his machine to nothing, but that
doesn't help me very much.

bye, Roman

2006-09-18 00:52:08

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> yeah. If you look at the API suggestions i made, they are such. There
> can be differences though to 'static tracepoints used by static
> tracers': for example there's no need to 'mark' a static variable,
> because dynamic tracers have access to it - while a static tracer would
> have to pass it into its trace-event function call.

That has been your own personal experience of such things. Fortunately
by now you've provided to casual readers ample proof that such
experience is but limited and therefore misleading. The fact of the
matter is that *mechanisms* do not "magically" know what detail is
necessary for a given event or how to interpret it: only *markup* does
that.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 01:05:22

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> There is, actually, no reason to believe that end-users of dynamic
> trace infrastructures are any more tolerant to breakage than, say,
> those of the *old* ltt. [...]

are you saying that if i replaced half of the static markups with
function attributes (which would still provide equivalent functionality
to dynamic tracers), or if i removed a few dozen static markups with
dynamic scripts (which change too would be transparent to users of
dynamic tracers), that in this case users of static tracers would /not/
claim that tracing broke?

i fully understand that you can _teach_ the removal of static
tracepoints to LTT (and i'd expect no less from a tracer), but will
users accept the regression? I claim that they wont, and that's the
important issue. Frankly, i find it highly amusing that such seemingly
simple points have to be argued for such a long time. Is this really
necessary?

(since the rest of your mail seems to build on this premise, i'll wait
for your reply before replying to the rest.)

Ingo

2006-09-18 01:22:30

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> Ingo Molnar wrote:
> > yeah. If you look at the API suggestions i made, they are such. There
> > can be differences though to 'static tracepoints used by static
> > tracers': for example there's no need to 'mark' a static variable,
> > because dynamic tracers have access to it - while a static tracer would
> > have to pass it into its trace-event function call.
>
> That has been your own personal experience of such things. Fortunately
> by now you've provided to casual readers ample proof that such
> experience is but limited and therefore misleading. The fact of the
> matter is that *mechanisms* do not "magically" know what detail is
> necessary for a given event or how to interpret it: only *markup* does
> that.

Karim, i dont usually reply if you insult me (and you've grown a habit
of that lately ), but this one is almost parodic. To understand my
point, please consider this simple example of a static in-source markup,
to be used by a dynamic tracer:

static int x;

void func(int a)
{
...
MARK(event, a);
...
}

if a dynamic tracer installs a probe into that MARK() spot, it will have
access to 'a', but it can also have access to 'x'. While a static
in-source markup for _static tracers_, if it also wanted to have the 'x'
information, would also have to add 'x' as a parameter:

MARK(event, a, x);

thus for example value of the variable 'x' would be passed to the
function that does the static tracing. For dynamic tracers no such
'parameter preparation' instructions would need to be generated by gcc.
(thus for example the runtime overhead would be lower for inactive
tracepoints)

hence, in this specific example, there is a real difference between the
markup needed for dynamic tracers, compared to the markup needed for
static tracers - to achieve the same end-result of passing (event,a,x)
to the tracer.

Ingo

2006-09-18 01:48:20

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> are you saying that if i replaced half of the static markups with
> function attributes (which would still provide equivalent functionality
> to dynamic tracers), or if i removed a few dozen static markups with
> dynamic scripts (which change too would be transparent to users of
> dynamic tracers), that in this case users of static tracers would /not/
> claim that tracing broke?

Is this is a 3-for-1 combo? Here's an answer to each:

1- static markup replaced by function attributes:

Verbatim answer from earlier email
> What is sufficient for tracing a given set of events by means
> of binary editing *that-does-not-require-out-of-tree-maintenance*
> can be made to be sufficient for the tracing of events using
^^^^^^^^^^^^^^
> direct inlined static calls.

But since you insist on nitpicking ... nothing precludes earlier
mentioned build-time script from being instructed to act in a
similar fashion with regards to generating alternate build files
as described earlier but with using function attributes as the
cue instead of static markup.

2- removed markups are not transparent to "static" tracers:

False. LTTng couldn't care less. Though you'd have a point if
we talked about the *old* ltt, but we aren't.

3- users of tracing will *only* complain if they're using
"static" tracers:

False. You've quite elegantly stated that users don't give a damn
about *mechanism*. So the potential for complaint is, therefore,
no different. More practically: LTTng/SystemTap/LKET is aimed at
the same crowd. There's no factual basis to support the claim
that users of LKET or SystemTap are less likely to complain about
broken tracing that users of LTTng. In fact, there is ample
factual evidence to the contrary. And, *again*, LTTng will not
*break* because of missing events simply because it's a
framework for the analysis of large event sets, which the *old*
ltt never was.

> i fully understand that you can _teach_ the removal of static
> tracepoints to LTT (and i'd expect no less from a tracer), but will
> users accept the regression? I claim that they wont, and that's the
> important issue.

You can claim to be Santa Claus if it makes you happy, but the
factual record does not support the claim that LKET or SystemTap
users are less likely to complain than LTTng users. Again, the
factual record actual supports quite the opposite. And yet *again*,
LTTng is a framework for the analysis of large event sets. It
does *not* need to be *taught* about the removal of anything,
it presents what information it does have. Any fixing of
existing analysis plugins that depend on given event sets, _if
at all required_, is no different from the requirement to fix
scripts which analyze a fixed set of events collected by
SystemTap. LTTng is, in fact, less susceptible to breakage
than LKET, which does depend on a given set of events, even
if it's using a "dynamic" trace mechanism.

> Frankly, i find it highly amusing that such seemingly
> simple points have to be argued for such a long time. Is this really
> necessary?

I'm afraid it is. And the reason I think it's necessary is that
you've been advertising yourself as an expert about everything
tracing, and *that* is: a) just untrue (and therefore misleading),
b) absolutely insulting to all those -- and forget me from the
picture -- who invested *considerable* resources into developing
any form of tracing for Linux, especially considering the
*extremely* tight framework given to them by kernel developers. If
you would care to ask *and* listen, Ingo, those people who you so
blatantly choose to ignore would be more than happy to present
their work and ideas to you. Just try it: ask *and* listen.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 01:52:50

by Theodore Ts'o

[permalink] [raw]

Subject: Re: tracepoint maintainance models

On Mon, Sep 18, 2006 at 02:05:19AM +0200, Roman Zippel wrote:
> 1. It's not that I don't want to, but I _can't_ implement kprobes and not
> due to lack of skills, but lack of resources. (There is a subtle but
> important difference.)

Um, given the amount of time you've spent trying to pursuade us why
you can't implement kprobes for m68k, perhaps you would have
implemented it already if you had buckled down and started coding
instead of flaming about why everyone else should bend over backwards
just because you don't have time for your arch? :-)

- Ted

2006-09-18 02:11:41

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> Karim, i dont usually reply if you insult me (and you've grown a habit
> of that lately ), but this one is almost parodic.

FWIW, Ingo, my own appreciation of events is that I've shown much
restraint and patience with you than you'll ever acknowledge.

FWIW, Ingo, I have nothing against you personally. I've said it
before in unrelated threads and I'll say it again: I have a lot
of respect for your abilities, as a Linux user on a daily basis
I silently profit from immense contributions you have made time
and again.

FWIW, Ingo, I've been more than a good sport on other issues
where we've disagreed. Case-in-point: while I disagreed with you
on your choice to pursue preemption, I made it a point to
personally go out of my way to congratulate every single
preemption supporter I had disagreed with in the past at this
year's OLS: Thomas, Steven, Sven, Manas, etc. I didn't see you
personally, so here's a belated congratulations.

> MARK(event, a);
...
> MARK(event, a, x);

You assume these are mutually exclusive. Your argument can only
be made to be believable if people promoting direct inline
instrumentation were fascists -- which may be convenient for
some to believe. There is no reason why if the *default* inline
marker is insufficient that a user or developer cannot
circumvent it at runtime using a dynamic probe mechanism.

But if you look at the *facts*, you'll see that once a given
set of events is identified as being interesting, they usually
remain unchanged. Which is, in fact, the feedback given by
Jose's experience with LKET -- which, again, is based on
Systemtap.

For a given known-to-be-useful valid marker, information deficit
is the exception, not the rule.

> hence, in this specific example, there is a real difference between the
> markup needed for dynamic tracers, compared to the markup needed for
> static tracers - to achieve the same end-result of passing (event,a,x)
> to the tracer.

No. This is true only if you conceive that tool engineers
actually want to restrict themselves to obtaining events
from a given *mechanism*. And *that* is not substantiated
by any historical record. In fact, quite the opposite.
Even if you were to consider but the *old* ltt, here's
from previous correspondence:

> Subsequently, I initiated discussions with the IBM DProbes
> team back in 2000 and thereafter implemented facilities for
> enabling dynamically-inserted probes to route their events
> through ltt -- all of which was functional as of November
> 2000.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 02:43:47

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Ingo Molnar ([email protected]) wrote:
> Karim, i dont usually reply if you insult me (and you've grown a habit
> of that lately ), but this one is almost parodic. To understand my
> point, please consider this simple example of a static in-source markup,
> to be used by a dynamic tracer:
>
> static int x;
>
> void func(int a)
> {
> ...
> MARK(event, a);
> ...
> }
>
> if a dynamic tracer installs a probe into that MARK() spot, it will have
> access to 'a', but it can also have access to 'x'. While a static
> in-source markup for _static tracers_, if it also wanted to have the 'x'
> information, would also have to add 'x' as a parameter:
>
> MARK(event, a, x);
>

Hi,

If I may, if nothing marks the interest of the tracer in the "x" variable, what
happens when a kernel guru changes it for y (because it looks a lot better). The
code will not compile anymore when the markup marks the interest for x, when
your "dynamic tracer" markup will simply fail to find the information. My point
is that the markup of the interesting variables should follow code changes,
otherwise it will have to be constantly updated elsewhere (hmm ? Documentation/
someone ?)

I would say that not marking a static variable just because it is less visually
intrusive is a not such a good thing to do. That's not because we *can* that we
*should*.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 03:06:25

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> > MARK(event, a);
> ...
> > MARK(event, a, x);
>
> You assume these are mutually exclusive. [...]

Plese dont put words into my mouth. No, i dont assume they are mutually
exclusive, did i ever claim that? But i very much still claim what my
point was, and which point you disputed (at the same time also insulting
me): that even if hell freezes over, a static tracer wont be able to
extract 'x' from the MARK(event, a) markup. You accused me unfairly, you
insulted me and i defended my point. In case you forgot, here again is
the incident, in its entirety, where i make this point and you falsely
dispute it:

> > There can be differences though to 'static tracepoints used by
> > static tracers': for example there's no need to 'mark' a static
> > variable, because dynamic tracers have access to it - while a static
> > tracer would have to pass it into its trace-event function call.
>
> That has been your own personal experience of such things. Fortunately
> by now you've provided to casual readers ample proof that such
> experience is but limited and therefore misleading. The fact of the
> matter is that *mechanisms* do not "magically" know what detail is
> necessary for a given event or how to interpret it: only *markup* does
> that.

Ingo

2006-09-18 03:30:00

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi,

* Mathieu Desnoyers <[email protected]> wrote:

> * Ingo Molnar ([email protected]) wrote:
> > Karim, i dont usually reply if you insult me (and you've grown a habit
> > of that lately ), but this one is almost parodic. To understand my
> > point, please consider this simple example of a static in-source markup,
> > to be used by a dynamic tracer:
> >
> > static int x;
> >
> > void func(int a)
> > {
> > ...
> > MARK(event, a);
> > ...
> > }
> >
> > if a dynamic tracer installs a probe into that MARK() spot, it will have
> > access to 'a', but it can also have access to 'x'. While a static
> > in-source markup for _static tracers_, if it also wanted to have the 'x'
> > information, would also have to add 'x' as a parameter:
> >
> > MARK(event, a, x);
> >
>
> Hi,
>
> If I may, if nothing marks the interest of the tracer in the "x"
> variable, what happens when a kernel guru changes it for y (because it
> looks a lot better). The code will not compile anymore when the markup
> marks the interest for x, when your "dynamic tracer" markup will
> simply fail to find the information. My point is that the markup of
> the interesting variables should follow code changes, otherwise it
> will have to be constantly updated elsewhere (hmm ? Documentation/
> someone ?)

yeah - but it shows (as you have now recognized it too) that even static
markup for dynamic tracers _can_ be fundamentally different, just
because dynamic tracers have access to information that static tracers
dont.

(Karim still disputes it, and he is still wrong.)

> I would say that not marking a static variable just because it is less
> visually intrusive is a not such a good thing to do. That's not
> because we *can* that we *should*.

yeah. But obviously the (small but present) performance advantage is
there too, so it shouldnt be rejected out of hand. If a parameter is not
mentioned then it does not have to be prepared for function paramter
passing, etc. So it's 1-2 instructions less. So if this is in some
really stable area of code then it's a valid optimization.

Ingo

2006-09-18 03:33:26

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> Plese dont put words into my mouth. No, i dont assume they are mutually
> exclusive, did i ever claim that? But i very much still claim what my
> point was, and which point you disputed (at the same time also insulting
> me): that even if hell freezes over, a static tracer wont be able to
> extract 'x' from the MARK(event, a) markup. You accused me unfairly, you
> insulted me and i defended my point. In case you forgot, here again is
> the incident, in its entirety, where i make this point and you falsely
> dispute it:

Is this a recursive thread? Because if it is, I might as well point to
my follow-up to your answer, and that's not going to get us anywhere.

By no stretch of the english language did I insult you. This is a
convenient fabrication which *I* could take as an insult. Calling
into question a person's expertise on a given topic is by no
means unheard of in the scientific discourse if said person
insists on pushing an agenda using said "expertise" as the
founding basis. So no, the emperor has no cloths in this case:
you have de-facto proven your own expertise in this field is
but very limited. Historically, and maybe you're an exception
to this, individuals in the scientific community that took insult
when their "expertise" was questioned, as I did in the snippet
you so conveniently highlight, were usually wrong. Real experts
don't need status to prove their point: they use facts.

The fact that X cannot be extracted from a statically defined
set containing (K,P,F,Y,Z) is high school mathematics at best.
Your insistence on such a theoretical example is, for me, but
further proof of your actual lack of *practical* experience.
Because those with actual *practical* experience, have presented
us with *facts* and empirical *results*, both highly prized in
the scientific discourse, that in-real-life, contrary to
Ingo's strawman constructions, users would benefit from having
access to events collected using a variety of *mechanisms*.

So, yes Ingo, you "wont be able to extract 'x' from the
MARK(event, a) markup" using just a "static" tracer. What such
emphasis on this statement on your part and utter refusal
to respond to very solidly constructed arguments on my part
while instead choosing to emphasize "moral" tort entails,
however, is an entirely separate issue altogether.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 03:39:06

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> > [...] if i removed a few dozen static markups with dynamic scripts
> > (which change too would be transparent to users of dynamic tracers),
> > that in this case users of static tracers would /not/ claim that
> > tracing broke?

> 2- removed markups are not transparent to "static" tracers:
>
> False. LTTng couldn't care less. [...]

Amazing! So the trace data provided by those removed static markups
(which were moved into dynamic scripts and are thus still fully
available to dynamic tracers) are still available to LTT users? How is
that possible, via quantum tunneling perhaps? ;-)

Ingo

2006-09-18 03:53:45

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> Amazing! So the trace data provided by those removed static markups
> (which were moved into dynamic scripts and are thus still fully
> available to dynamic tracers) are still available to LTT users? How is
> that possible, via quantum tunneling perhaps? ;-)

Please run it one more time Mr. DJ:
> What is sufficient for tracing a given set of events by means
> of binary editing *that-does-not-require-out-of-tree-maintenance*
> can be made to be sufficient for the tracing of events using
^^^^^^^^^^^^^^
> direct inlined static calls.

Do I really need to explain this one to you? Do I?

Bahhh, ok, here we go:

Previously alluded to script can easily be made to read mainlined
dynamic scripts and generate alternate build files for the
designate source. Let me know if I need to expand on this.

You know what, let's cut through the chase. Go ahead an mainline
any infrastructure you think will be sufficient to make it
possible to maintain SystemTap's essential _in the tree_. *Anything*
that you insert in there to make it possible to make *any*
dynamic tracer mainline can and likely will be used to obtain direct
static calls. The only way this doesn't work is if the dynamic
tracer folks have to continue maintaining their stuff out of tree.

See, this not only is Karim evil, but so too are the facts. Even
when manipulated by Ingo, *mechanism* continues to be orthogonal
to *markup*. Now that's evil.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 03:52:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: tracepoint maintainance models

On Mon, Sep 18, 2006 at 05:30:27AM +0200, Ingo Molnar wrote:
> Amazing! So the trace data provided by those removed static markups
> (which were moved into dynamic scripts and are thus still fully
> available to dynamic tracers) are still available to LTT users? How is
> that possible, via quantum tunneling perhaps? ;-)

I *think* what Karim is trying to claim is that LTT also has some
dynamic capabilities, and isn't a pure static tracing system. But if
that's the case, I don't understand why LTT and SystemTap can't just
merge and play nice together....

- Ted

2006-09-18 04:03:35

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Theodore Tso wrote:
> I *think* what Karim is trying to claim is that LTT also has some
> dynamic capabilities, and isn't a pure static tracing system. But if
> that's the case, I don't understand why LTT and SystemTap can't just
> merge and play nice together....

That's been the thrust of my intervention here. There is already a
great deal of common ground between the respective teams. There are
historical "incidents", if we want to call them as such, which
prompted such separation. There is a common desire of interfacing,
and much talk has been done on the topic. From my point of view,
I think it's fair to say that the SystemTap folks have been
particularly wary of interfacing with ltt based mainly on its
controversial heritage. If the signal *and* endorsement from kernel
developers is that SystemTap and LTTng should "play nice together",
then, I think, everything is in place to accelerate that.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 04:18:22

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> By no stretch of the english language did I insult you. [...]

let me point it out then:

>> That has been your own personal experience of such things.
>> Fortunately by now you've provided to casual readers ample proof that
>> such experience is but limited and therefore misleading.

you wrote this disputing a point of mine that Mathieu acknowledged
meanwhile, and which you now acknowledge in this mail too:

> So, yes Ingo, you "wont be able to extract 'x' from the MARK(event, a)
> markup" using just a "static" tracer. [...]

and as i wrote to Mathieu, this differene in markup can have performance
impact on the code generated by gcc, so it can be of practical
relevance. Unfortunately i dont have much influence on the fact that it
takes so much time for you to understand and acknowledge such simple
points: in this highly trivial case it was 3(!) mail exchanges =B-)
Yuck! I should really heed others' advice that i should simply stop
replying to you ... but i have to admit that often it's so tempting.

Ingo

2006-09-18 04:18:26

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> Ingo Molnar wrote:
> > Amazing! So the trace data provided by those removed static markups
> > (which were moved into dynamic scripts and are thus still fully
> > available to dynamic tracers) are still available to LTT users? How is
> > that possible, via quantum tunneling perhaps? ;-)

> Previously alluded to script can easily be made to read mainlined
> dynamic scripts and generate alternate build files for the designate
> source. Let me know if I need to expand on this.

That suggestion is so funny to me that i'll let it stand here in its
absurdity :) Did i get it right, you are suggesting for LTT to build a
full SystemTap interpreter, an script-to-C compiler, an embedded-C
script interpreter, just to be able to build-time generate the SystemTap
scripts back into the source code? Dont you realize that you've just
invented SystemTap, sans the ability to remove inactive code? ;)

I know a much easier method: a "static tracer" can do all of that (and
more), if you rename "SystemTap" to "static tracer" ;-)

Ingo

2006-09-18 04:20:08

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Theodore Tso <[email protected]> wrote:

> On Mon, Sep 18, 2006 at 05:30:27AM +0200, Ingo Molnar wrote:
> > Amazing! So the trace data provided by those removed static markups
> > (which were moved into dynamic scripts and are thus still fully
> > available to dynamic tracers) are still available to LTT users? How is
> > that possible, via quantum tunneling perhaps? ;-)
>
> I *think* what Karim is trying to claim is that LTT also has some
> dynamic capabilities, and isn't a pure static tracing system. But if
> that's the case, I don't understand why LTT and SystemTap can't just
> merge and play nice together....

oh, that merge was certainly my suggestion from the very beginning of
this "discussion". And no, LTT has no kprobe capabilities at the moment,
it is a pure static tracer, but i'm still hoping that Karim stops doing
this what i believe to be a pointless Don Quijote fight (which i believe
keeps Mathieu from making the right technical decision), and lets LTT
adopt to the times and integrate SystemTap. But hey, it's really his
problem and not mine in the first place. I certainly have my share of
fun and i even get to write code :)

Ingo

2006-09-18 04:22:13

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> let me point it out then:

Recursive I tell you, recursive.

> Yuck! I should really heed others' advice that i should simply stop
> replying to you ... but i have to admit that often it's so tempting.

Good. So just make sure you reply to this email so that you can
have the last word.

FWIW, there's enough information floating around that those who
need to make up their mind have everything they need.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 04:26:55

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Ingo Molnar ([email protected]) wrote:
> > > static int x;
> > >
> > > void func(int a)
> > > {
> > > ...
> > > MARK(event, a);
> > > ...
> > > }
> > >
> > > if a dynamic tracer installs a probe into that MARK() spot, it will have
> > > access to 'a', but it can also have access to 'x'. While a static
> > > in-source markup for _static tracers_, if it also wanted to have the 'x'
> > > information, would also have to add 'x' as a parameter:
> > >
> > > MARK(event, a, x);
> > >
> >
> > Hi,
> >
> > If I may, if nothing marks the interest of the tracer in the "x"
> > variable, what happens when a kernel guru changes it for y (because it
> > looks a lot better). The code will not compile anymore when the markup
> > marks the interest for x, when your "dynamic tracer" markup will
> > simply fail to find the information. My point is that the markup of
> > the interesting variables should follow code changes, otherwise it
> > will have to be constantly updated elsewhere (hmm ? Documentation/
> > someone ?)
>
> yeah - but it shows (as you have now recognized it too) that even static
> markup for dynamic tracers _can_ be fundamentally different, just
> because dynamic tracers have access to information that static tracers
> dont.
>
> (Karim still disputes it, and he is still wrong.)

The following example voids your example : there are ways to implement static
markers that *could* have access to those variables. (implementation detail)

int x = 5;

#define MARK(a) printk(a, x)

voi func(int a)
{
...
MARK(a);
...
}

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 04:36:43

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> That suggestion is so funny to me that i'll let it stand here in its
> absurdity :) Did i get it right, you are suggesting for LTT to build a
> full SystemTap interpreter, an script-to-C compiler, an embedded-C
> script interpreter, just to be able to build-time generate the SystemTap
> scripts back into the source code? Dont you realize that you've just
> invented SystemTap, sans the ability to remove inactive code? ;)

Yes, an arbitrary factual fallacy for a change. I won't even get into
how trivial it would be to hack the SystemTap interpreter for the
purposes I state. Or any other part of your supposed argument for
that matter. Anyone seeking to implement what I outlined already has
plenty of information.

> I know a much easier method: a "static tracer" can do all of that (and
> more), if you rename "SystemTap" to "static tracer" ;-)

There is no point to debate further. You clearly have no intention of
having the decency to stand tall, make a man of yourself and
acknowledge that you were shown wrong. No matter what I put forward,
you're going to stubbornly reply and construct false arguments to
defend a now indefensible point of view -- all the while making those
snide remarks about the time you are wasting and all (that's a
classic, by the way, for presumed experts when loosing face.)

Go back, Ingo, and read my earlier posts regarding what such attitude
has in terms of encouraging input from outsiders.

I, personally, have said everything that needed to be said. The record
is there if someone is looking for the answers. I only chose to come
back to make sure the following semantic distinction clear:
markup != mechanism != event list. And I've proven that, whether you'd
care to acknowledge it or not.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 04:41:05

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Karim Yaghmour <[email protected]> wrote:

> Theodore Tso wrote:
> > I *think* what Karim is trying to claim is that LTT also has some
> > dynamic capabilities, and isn't a pure static tracing system. But if
> > that's the case, I don't understand why LTT and SystemTap can't just
> > merge and play nice together....
>
> That's been the thrust of my intervention here. [...]

indeed, and i severely misunderstood your points in this regard. Now i
have re-read some of your earlier points, and in particular:

>> And finally, do realize that in 2000 I personally contacted the head
>> of the DProbes project IBM in order to foster common development,
>> following which ltt was effectively modified in order to allow
>> dynamic instrumentation of the kernel ...

and now i'm red faced - i was wrong about this fundamental aspect of
your position. Please accept my apologies!

so regarding the big picture we are largely on the same page in essence
i think - sub-issues non-withstanding :-) As long as LTT comes with a
facility that allows the painless moving of a static LTT markup to a
SystemTap script, that would come quite a bit closer to being acceptable
for upstream acceptance in my opinion.

The curious bit is: why doesnt LTT integrate SystemTap yet? Is it the
performance aspect? Some of the extensive hooking you do in LTT could be
aleviated to a great degree if you used dynamic probes. For example the
syscall entry hackery in LTT looks truly scary. I cannot understand that
someone who does tracing doesnt see the fundamental strength of
SystemTap - i think that in part must have lead to my mistake of
assuming that you opposed SystemTap.

Ingo

2006-09-18 05:03:13

by Mathieu Desnoyers

[permalink] [raw]

Subject: LTTng and SystemTAP (Everyone who is scared to read this huge thread, skip to here)

* Ingo Molnar ([email protected]) wrote:
> so regarding the big picture we are largely on the same page in essence
> i think - sub-issues non-withstanding :-) As long as LTT comes with a
> facility that allows the painless moving of a static LTT markup to a
> SystemTap script, that would come quite a bit closer to being acceptable
> for upstream acceptance in my opinion.
>
> The curious bit is: why doesnt LTT integrate SystemTap yet? Is it the
> performance aspect?

Yes, for our needs, the performance impact of SystemTAP is too high. We are
totally open to integrate data coming from SystemTAP to our traces. Correct me
if I am wrong, but I think their project does an extensive use of strings in the
buffers. This is one, non compact, sub-optimal type, but it can do the job for
low rate events. I also makes classification and identification of the
information rather less straightforward. Plus, running a string format code in a
critical code path does not give the kind of performance I am looking for.

> Some of the extensive hooking you do in LTT could be
> aleviated to a great degree if you used dynamic probes. For example the
> syscall entry hackery in LTT looks truly scary.

Yes, agreed. The last time I checked, I thought about moving this tracing code
to the syscall_trace_entry/exit (used for security hooks and ptrace if I
remember well). I just didn't have the time to do it yet.

> I cannot understand that
> someone who does tracing doesnt see the fundamental strength of
> SystemTap - i think that in part must have lead to my mistake of
> assuming that you opposed SystemTap.
>

Can you find a way to instrument it dynamically without the breakpoint cost ?
System calls are a highly critical path both in a system and for tracing.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 05:17:21

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Mathieu Desnoyers <[email protected]> wrote:

> The following example voids your example : there are ways to implement
> static markers that *could* have access to those variables.
> (implementation detail)
>
> int x = 5;
>
> #define MARK(a) printk(a, x)

but this is only hiding it syntactically, hence the same
parameter-access side-effect remains - while in the dynamic probe case
the variable is accessed within the probe - so the true effect on the
callsite is different. But, in terms of having access to the
information, you (and Karim) are right that the static tracer can access
it too.

Ingo

2006-09-18 05:16:46

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ingo Molnar wrote:
> and now i'm red faced - i was wrong about this fundamental aspect of
> your position. Please accept my apologies!

Apologies accepted. Hopefully we can tone this thread down and
move on to more constructive implementation discussions.

> so regarding the big picture we are largely on the same page in essence
> i think - sub-issues non-withstanding :-) As long as LTT comes with a
> facility that allows the painless moving of a static LTT markup to a
> SystemTap script, that would come quite a bit closer to being acceptable
> for upstream acceptance in my opinion.

I don't think there's any impediment for that. In fact, the value
is not in the markup, but in the tools.

> The curious bit is: why doesnt LTT integrate SystemTap yet?

Performance aside, this is due to historic reasons which cannot,
unfortunately, be succinctly explained. The best I can do is refer
you to the topmost parent of this thread, lengthy as it may be. As I
told Ted, if the signal *and* endorsement is that LTT and SystemTap
should be complementary, then that is exactly what will happen.

It doesn't solve the performance problem, but even the SystemTap
folks are concerned by performance and would like to see some form
of static markup, so I think the LTTng and SystemTap efforts are
on the same page here.

> Is it the
> performance aspect? Some of the extensive hooking you do in LTT could be
> aleviated to a great degree if you used dynamic probes. For example the
> syscall entry hackery in LTT looks truly scary. I cannot understand that
> someone who does tracing doesnt see the fundamental strength of
> SystemTap - i think that in part must have lead to my mistake of
> assuming that you opposed SystemTap.

I am not opposed to SystemTap and neither do I fail to see its
fundamental strength. It's just a matter that a decision was made
at some point in time that SystemTap be developed separately *and*
independently from any existing tracing effort. Again, that
decision was based on what appeared to be good reasons for the
people in charge, and there's no point in further highlighting
differences.

I think what is important at this stage is that now that we have
an agreement on the need for some form of static markup, that
the developers of the various teams work together to come up
with an acceptable framework for all to use. And, ideally, this
effort should be spearheaded by someone who has enough knowledge
of the kernel's intricacies as to avoid any obvious pitfalls.
In that regard, you're likely the best person to take charge of
this.

Once markup is in place, much of the mechanics of either of
the existing *mechanisms* can then simply disappear in the
background without *any* impact on the rest of the developers.

Only then will there start to be constructive discussion as
to where best markup should be located and what mechanism
is typically most appropriate for that specific location.

All that being said, I would like to thank you for acknowledging
a misunderstanding on your part. Hopefully we can all set this
aside, and move forward on common goals.

Thanks,

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 09:01:42

by Jes Sorensen

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Roman Zippel wrote:
> Hi,
>
> On Mon, 18 Sep 2006, Ingo Molnar wrote:
>
>> Was it truly confusing to you what i said?
>
> Not really, it's pretty clear, that you don't want me (or any other user
> of an arch, which doesn't support kprobes) cut some slack, so that I can
> make use of tracing. :-(

Roman,

I have a PDF of the m68k instruction set sitting somewhere, do you want
me to forward you a copy so you can implement kprobe support for m68k?

Sorry, for the sarcasm, but that argument is just pointless, it doesn't
add any value that you keep repeating it!

Jes

2006-09-18 12:27:40

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

mingo wrote:

> [...]
> static int x;
> void func(int a)
> MARK(event, a);
>
> if a dynamic tracer installs a probe into that MARK() spot, it will
> have access to 'a', but it can also have access to 'x'. While a
> static in-source markup for _static tracers_, if it also wanted to
> have the 'x' information, would also have to add 'x' as a parameter:
> [...]

Without heroic measures taken by by a static tracer type of tool, this
is correct.

> For dynamic tracers no such 'parameter preparation' instructions
> would need to be generated by gcc. (thus for example the runtime
> overhead would be lower for inactive tracepoints)

Any such additional code would be small, plus if properly marked up
with unlikely() and compiled with -freorder-blocks, it would all be
out-of-line. This small cost could be worth the added benefit of
systemtap being able to probe that point without debugging information
present, and avoiding its slow & deliberate way of accessing
target-side variables like $x. (The slow & deliberate part comes in
from the need to check any pointer dereferences involved.)

- FChE

2006-09-18 15:11:08

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Frank Ch. Eigler <[email protected]> wrote:

> > For dynamic tracers no such 'parameter preparation' instructions
> > would need to be generated by gcc. (thus for example the runtime
> > overhead would be lower for inactive tracepoints)
>
> Any such additional code would be small, plus if properly marked up
> with unlikely() and compiled with -freorder-blocks, it would all be
> out-of-line. This small cost could be worth the added benefit of
> systemtap being able to probe that point without debugging information
> present, and avoiding its slow & deliberate way of accessing
> target-side variables like $x. (The slow & deliberate part comes in
> from the need to check any pointer dereferences involved.)

yeah, agreed. It seems Mathieu agrees that more synergy between
SystemTap and LTTng is possible and desirable, so i think that's a good
basis to step forward: lets figure out an API for static markups.

The current LTTng static markup APIs have the following form and
distribution:

82 trace_kernel_trap_exit
35 trace_kernel_trap_entry
8 trace_real_syscall_exit
8 trace_real_syscall_entry
7 trace_kernel_arch_syscall_entry
6 trace_kernel_stack_dump
6 trace_kernel_arch_syscall_exit
5 trace_process_kernel_thread
5 trace_ipc_call
3 trace_process_stack_dump
3 trace_kernel_irq_exit
3 trace_kernel_irq_entry
3 trace_fs_write
3 trace_fs_read
2 trace_timer_expired
2 trace_locking_irq_save
2 trace_locking_irq_restore
2 trace_locking_irq_enable
2 trace_locking_irq_disable
2 trace_kernel_tasklet_exit
2 trace_kernel_tasklet_entry
2 trace_fs_seek
2 trace_fs_exec
2 t_log_event
1 trace_timer_softirq
1 trace_timer_set_timer
1 trace_timer_set_itimer
1 trace_statedump_enumerate_modules
1 trace_statedump_enumerate_interrupts
1 trace_socket_sendmsg
1 trace_socket_recvmsg
1 trace_socket_create
1 trace_socket_call
1 trace_real_syscall32_entry
1 trace_process_wakeup
1 trace_process_signal
1 trace_process_schedchange
1 trace_process_kernel_thread__
1 trace_network_packet_out
1 trace_network_packet_in
1 trace_network_ip_interface_dev_up
1 trace_network_ip_interface_dev_down
1 trace_memory_swap_out
1 trace_memory_swap_in
1 trace_memory_page_wait_start
1 trace_memory_page_wait_end
1 trace_memory_page_free
1 trace_memory_page_alloc
1 trace_kernel_soft_irq_exit
1 trace_kernel_soft_irq_entry
1 trace_ipc_shm_create
1 trace_ipc_sem_create
1 trace_ipc_msg_create
1 trace_fs_select
1 trace_fs_poll
1 trace_fs_open
1 trace_fs_ioctl
1 trace_fs_data_write
1 trace_fs_data_read
1 trace_fs_close
1 trace_fs_buf_wait_start
1 trace_fs_buf_wait_end

that's 235 markups (i'm sure the list has a few false positives, but
this is the rough histogram).

Right now the name and type of the event is encoded in the trace
function name, which i dont really like. I think markups are less
intrusive visually in the following form:

MARK(trace_fs_data_read, fd, count, len, buf);

but no strong feelings either way.

also, there should be only a single switch for markups: either all of
them are compiled in or none of them. That simplifies the support
picture and gets rid of some ugly #ifdefs. Distro kernels will likely
enable all of thems, so there will be nice uniformity all across.

Ingo

2006-09-18 15:19:46

[permalink] [raw]

Subject: Re: LTTng and SystemTAP (Everyone who is scared to read this huge thread, skip to here)

* Mathieu Desnoyers <[email protected]> wrote:

> > Some of the extensive hooking you do in LTT could be aleviated to a
> > great degree if you used dynamic probes. For example the syscall
> > entry hackery in LTT looks truly scary.
>
> Yes, agreed. The last time I checked, I thought about moving this
> tracing code to the syscall_trace_entry/exit (used for security hooks
> and ptrace if I remember well). I just didn't have the time to do it
> yet.

correct, that's where all such things (auditing, seccomp, ptrace,
sigstop, freezing, etc.) hook into. Much (all?) of the current entry.S
hacks can go away in favor of a much easier .c patch to
do_syscall_trace() and this would reduce a significantion portion of the
present intrusiveness of LTTng.

Ingo

2006-09-18 15:26:17

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 17:02 +0200, ysgrifennodd Ingo Molnar:
> also, there should be only a single switch for markups: either all of
> them are compiled in or none of them. That simplifies the support
> picture and gets rid of some ugly #ifdefs. Distro kernels will likely
> enable all of thems, so there will be nice uniformity all across.

I think your implementation is questionable if it causes any kind of
jumps and conditions, even marked unlikely. Just put the needed data in
a seperate section which can be used by the debugging tools. No need to
actually mess with the code for the usual cases.

Alan

2006-09-18 15:31:28

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Alan Cox <[email protected]> wrote:

> > also, there should be only a single switch for markups: either all
> > of them are compiled in or none of them. That simplifies the support
> > picture and gets rid of some ugly #ifdefs. Distro kernels will
> > likely enable all of thems, so there will be nice uniformity all
> > across.
>
> I think your implementation is questionable if it causes any kind of
> jumps and conditions, even marked unlikely. Just put the needed data
> in a seperate section which can be used by the debugging tools. No
> need to actually mess with the code for the usual cases.

yeah - but i think to make it easier for SystemTap to insert a
low-overhead probe there needs to be a 5-byte NOP inserted. There wont
be any function call or condition at that place. At most there will be
some minimal impact on the way gcc compiles the code in that function,
to make sure that the data is not optimized out and is available to
SystemTap. For example at the point of the probe gcc might already have
destroyed a register-passed function parameter. (but normally there
should be any effect, because it's pointless to trace data that gcc
optimizes out.)

Ingo

2006-09-18 15:49:09

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

alan wrote:

> I think your implementation is questionable if it causes any kind of
> jumps and conditions, even marked unlikely. Just put the needed data in
> a seperate section which can be used by the debugging tools. [...]
> No need to actually mess with the code for the usual cases.

Trouble is that it is specifically the *unusual* cases that need
compiler assistance via static markers, otherwise we'd manage with
just k/djprobes & debuginfo type efforts.

- FChE

Attachments:

(No filename) (478.00 B)
(No filename) (189.00 B)
Download all attachments

2006-09-18 15:51:36

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Frank Ch. Eigler <[email protected]> wrote:

> > I think your implementation is questionable if it causes any kind of
> > jumps and conditions, even marked unlikely. Just put the needed data
> > in a seperate section which can be used by the debugging tools.
> > [...] No need to actually mess with the code for the usual cases.
>
> Trouble is that it is specifically the *unusual* cases that need
> compiler assistance via static markers, otherwise we'd manage with
> just k/djprobes & debuginfo type efforts.

i think it's all fine as long as it's just a single 5-byte NOP that we
are inserting - because in the *usual* case the 'parameter access
side-effects' should have no effect. They will have an effect in the
*unusual* case though, but that's very much by design - and it's not a
performance issue because it's 1) unusual, 2) at most means a bit
different code organization by gcc. It very likely wont mean any extra
branches even in the unusual case. Or do i underestimate the scope of
the problem? ;-)

Ingo

2006-09-18 15:50:48

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi Ingo,

If it is less visually intrusive to declare markers as a macro, let's do it this
way. I have no preference : as long as both dynamic probes and static ones can
be hooked and that it does imply so much black magic that kernel developers
won't know what code will be called from the marker when tracing is enabled.

Back in 2005, I made a quick macro example that would benefit to everybody. It
changes following config options to either :

- nothing
- a call to printk
- a call to a static tracer (either inline function or a real call)
- no operations (a 5 bytes site for an enhanced kprobe)

The 5 bytes of NOOP used here is absolutely not the way to go : djprobes guys
has much better alternatives.

Note that this example is a userspace program that can be trivially moved to a
kernel header (printf->printk, etc).

I also wanted to identify the trace point by a symbol, so it could be easily
found dynamically, but this part is not completed.

What are your thoughts about it ? (think of it as a proof of concept, and
search+replace MAGIC_TRACE for MARK). :)

Mathieu

----- BEGIN -----

/* ltt-macro.c
*
* Macro example for instrumentation
*
* Version 0.0.1
*
* Mathieu Desnoyers [email protected]
*
* This is released under the GPL v2 (or better) license.
*/

/* This is an example of noop, get this from the current arch header */
#define GENERIC_NOP1 ".byte 0x90\n"

/* PUT THIS IN A INCLUDE/LINUX HEADER */

#define __stringify_1(x) #x //see include/linux/stringify.h
#define __stringify(x) __stringify_1(x)

#define KBUILD_BASENAME basename
#define KBUILD_MODNAME modulename

#define MAGIC_TRACE_SYM(event) \
char * __trace_symbol_##event =__stringify(KBUILD_MODNAME) "_" \
__stringify(KBUILD_BASENAME) "_" \
#event ;

/* With config menu mutual exclusion of choice */
#ifdef CONFIG_NOLOG
#define MAGIC_TRACE(event, format, args...)
#endif

#ifdef CONFIG_PRINTLOG
#define MAGIC_TRACE(event, format, args...) \
printf(format, ##args);
#endif

#ifdef CONFIG_TRACELOG
#define MAGIC_TRACE(event, format, args...) \
trace_##event( args );
#endif

#ifdef CONFIG_KPROBELOG
#define MAGIC_TRACE(event, format, args...) \
__asm__ ( GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 )
#endif

/* PUT THIS IN A HEADER NEAR THE .C FILE */
#ifdef CONFIG_TRACELOG
static inline void trace_eventname(int a, char *b)
{
/* log.... */
printf("Tracing event : first arg %d, second arg %s", a, b);
}
#endif

/* PUT THIS IN THE .C FILE */

MAGIC_TRACE_SYM(eventname);

int main()
{
int myint = 55;
char * mystring = "blah";

MAGIC_TRACE(eventname, "%d %s", myint, mystring);

printf("\n");

return 0;
}

------ END -----

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 15:54:12

by Jose R. Santos

[permalink] [raw]

Subject: Re: The emperor is naked: why comprehensive static markup belongs in mainline

Karim Yaghmour wrote:
> Why, in fact, that's exactly Jose's point of view. Who's
> Jose? Well, just in case you weren't aware of his work,
> Jose maintains LKET. What's LKET? An ltt-equivalent
>

Small correction. Li GuangLei maintains LKET, I mostly oversee its
development and provide guidance to him and his team (and on occasions,
I like to cause trouble in mailing lists).
> that uses SystemTap to get its events. And what does
> Jose say? Well I couldn't say it better than him:
>
> > I agree with you here, I think is silly to claim dynamic instrumentation
> > as a fix for the "constant maintainace overhead" of static trace point.
> > Working on LKET, one of the biggest burdens that we've had is mantainig
> > the probe points when something in the kernel changes enough to cause a
> > breakage of the dynamic instrumentation. The solution to this is having
> > the SystemTap tapsets maintained by the subsystems maintainers so that
> > changes in the code can be applied to the dynamic instrumentation as
> > well. This of course means that the subsystem maintainer would need to
> > maintain two pieces of code instead of one. There are a lot of
> > advantages to dynamic vs static instrumentation, but I don't think
> > maintainace overhead is one of them.
>
> Well, well, well. Here's a guy doing *exactly* what I was
> asked to do a couple of years back. And what does he say?
> "I think is silly to claim dynamic instrumentation as a
> fix for the "constant maintainace overhead" of static trace
> point."
>

My point here was that someone still needs to maintain the tracepoints
regardless of where they are located. While I think that the challenges
of maintaining the tracepoints in kernel are less that maintaining them
out of kernel (either through dynamic or static tracepoints), the
maintenance overhead is still not zero for the subsystem maintainers.
> And just in case you missed it the first time in his
> paragraph, he repeats it *again* at the end:
> " There are a lot of advantages to dynamic vs static
> instrumentation, but I don't think maintainace overhead is
> one of them."
>

One thing I would like to add though, is that base on my experience
using event tracing tools, I say that the benefits of dynamic
instrumentation far outweigh its drawbacks.

> But not content with Jose and Frank's first-hand experience
> and testimonials about the cost of outside maintenance of
> dynamically-inserted tracepoint, and obviously outright
> dismissing the feedback from such heretics as Roman, Martin,
> Mathieu, Tim, Karim and others, we have a continued barrage of
> criticism from, shall we say, very orthodox kernel developers
> who insist that the collective experience of the previously
> mentioned people is simply misguided and that, as experienced
> kernel developers, *they* know better.
>

I think the problem here is that we haven't done a good job in educating
developers as to the value of event tracing the kernel has for
developers as well as sysadmins. For example, Frank has said to me in
the past that he does not see the value in just printing raw data out to
user-space the way LKET does. While him and the SystemTap folks have
not done anything specifically to block the inclusion of LKET into the
CVS tree, Frank lack of vision of what I want to achieve with this tool
is partly a failure on my part.

> Why the emperor is naked:
> -------------------------
>
> Truth be told:
>
> There is no justification why Mathieu should continue
> chasing kernels to allow his users utilize ltt on as
> many kernel versions as possible.
>
> There is no justification why the SystemTap team should
> continue chasing kernels to make sure users can use
> SystemTap on as many kernel versions as possible.
>
> There is no justification why Jose should continue
> chasing kernels to allow his users to use LKET on as
> many kernel versions as possible.
>
> There is, in fact, no justification why Jose, Frank,
> and Mathieu aren't working on the same project.
>

In all honesty, I think it is time to kill LTTng, LKET and LKST and use
the experience gathered for these projects to create a new tool that
exploits all of the advantages of the previous tools. The attitude I
gathered from the OLS tracing bof was that while there was interest in
making tool A work with tool B, there was absolutely no interest in
saying "fuck tools A and B and lets create tool C". I've always
advocated towards this goal. I will be the first one to say "fuck my
tool, lets work on tool C". It is now up to Mathiue and Hiramatsu-san
to do the same. In my view, egos instead of technical issues are the
thing that are slowing the adoption of a event tracer in Linux"

> There is no justification to any of this but the continued
> *FEAR* by kernel developers that somehow their maintenance
> workload is going to become unmanageable should anybody
> get his way of adding static instrumentation into the
> kernel. And no matter what personal *and* financial cost
> this fear has had on various development teams, actual
> *experience* from even those who have applied the most
> outrageous of kernel developers requirements is but
> grudgingly and conditionally recognized. No value, of
> course, being placed on the experience of those that
> *didn't* follow the orthodox diktat -- say by pointing
> out that ltt tracepoints did not vary on a 5 year timespan.
>

The fact that tracepoint did not vary in a 5 year timespan just proves
that the users of LTTng are very few. The truth is that there is no way
to have a trace tool that will have all the tracepoints needed to
diagnose every problem. If a static instrumentation mechanism where to
be included into the kernel, every user that had a useful static
tracepoint for their environment would want to push it into the kernel
in order to have their tracepoint available in distribution X and avoid
having to patch and recompile a kernel. This seems like the fear that
has been discussed on the thread and I think its well justified. I know
I would like to push my tracepoints if the tool was available in
mainline kernels.

One of the things that we tried to do with LKET was not predict what the
user would use the tool for. For this reason, the trace format and the
conversion tool was design to be very dynamic.
> For the argument, as it is at this stage of the long
> intertwined thread of this week, is that "dynamic tracing"
> is superior to "static tracing" because, amongst other
> things, "static tracing" requires more instrumentation
> than "dynamic tracing". But that, as I said within said
> thread, is a fallacy. The statement that "static tracing"
> requires more instrumentation than "dynamic tracing" is
> only true in as far as you ignore that there is a cost
> for out-of-tree maintenance of scripts for use by probe
> mechanisms. And as you've read earlier, those doing this
> stuff tell us there *is* cost to this. Not only do they
> say that, but they go as far as telling us that this
> cost is *no different* than that involved in maintaining
> static trace points. That, in itself, flies in the face
> of all accepted orthodox principles on the topic of
> mainlined static tracing.
>

Improving out-of-tree maintenance of scripts is something that needs to
improve. Especially when you need to insert probes in the middle of a
function.

> And that is but the maintenance aspect, I won't even
> start on the performance issue. Because the current party
> line is that while the kprobes mechanism is slow: a) it's
> fast enough for all applicable uses, b) there's this
> great new mechanism we're working on called djprobes which
> eliminates all of kprobes' performance limitations. Of
> course you are asked to pay no attention to the man behind
> the curtain: a) if there is justification to work on
> djprobes, it's because kprobes is dog-slow, which even
> those using it for systemtap readily acknowledge, b)
> djprobes has been more or less "on its way" for a year or
> two now, and that's for one single architecture.
>

I think that the performance issues should be better understood. Right
now, the thing that cause most of the slowdowns in LKET is not kprobes
but rather exporting the data. Gui Jian has done some measurements
using benchmarks that do real work and found that over head in most
cases is significantly less than 10%.

A better performance testing methodology needs to be defined in order to
justifying your argument that kprobes are not suitable for the purposes
of event tracing. Something more elaborated than a simple "ping -f
localhost" would be useful.

Another thing that needs to be considered is how much of an over head is
acceptable in order for the tool to be useful. I will argue that in
most cases, the overhead of kprobes will not inhibit the ability for
find problems.

-JRS

2006-09-18 15:56:10

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 17:22 +0200, ysgrifennodd Ingo Molnar:
> yeah - but i think to make it easier for SystemTap to insert a
> low-overhead probe there needs to be a 5-byte NOP inserted. There wont
> be any function call or condition at that place. At most there will be
> some minimal impact on the way gcc compiles the code in that function,

And more L1 misses. It seems that this problem should be solved by
jprobes and your int3 optimisation work.

> SystemTap. For example at the point of the probe gcc might already have
> destroyed a register-passed function parameter.

So its L1 misses more register reloads and the like. Sounds more and
more like wasted clock cycles for debug. Most of these watchpoints will
run billions of times a day on millions of machines none of whom are
using any debugging. You are optimising the corner case (in the extreme
in fact). Its one thing to dump trace helper data into the kernel, its
another when we all get to pay for it all the time when we don't need to
(or we compile it out at which point it offers nothing anyway).

Alan

2006-09-18 16:18:17

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

alan wrote:

> [...] So its L1 misses more register reloads and the like. Sounds
> more and more like wasted clock cycles for debug. [...]

But it's not just "for debug"! It is for system administrators,
end-users, developers.

> Its one thing to dump trace helper data into the kernel, its another
> when we all get to pay for it all the time when we don't need to
> [...]

Indeed, there will be some non-zero execution-time cost. We must be
willing to pay *something* in order to enable this functionality. One
question (still: http://lkml.org/lkml/2006/2/22/166) is trading
time/space cost; others include cross-platform vs. porting necessity;
robustness w.r.t. data-collectionand control-flow preservation.

- FChE

Attachments:

(No filename) (728.00 B)
(No filename) (189.00 B)
Download all attachments

2006-09-18 16:23:53

[permalink] [raw]

Subject: Re: tracepoint maintainance models

* Alan Cox <[email protected]> wrote:

> Ar Llu, 2006-09-18 am 17:22 +0200, ysgrifennodd Ingo Molnar:
> > yeah - but i think to make it easier for SystemTap to insert a
> > low-overhead probe there needs to be a 5-byte NOP inserted. There wont
> > be any function call or condition at that place. At most there will be
> > some minimal impact on the way gcc compiles the code in that function,
>
> And more L1 misses. It seems that this problem should be solved by
> jprobes and your int3 optimisation work.

Do you consider a single 5-byte NOP for a judiciously chosen 50 places
in the kernel unacceptable? Note that the argument has shifted from
static tracers to dynamic tracers: this _is_ about SystemTap: it adds
points to the kernel where we can _guarantee_ that a dynamic probe can
be inserted. In general there is no guarantee from gcc that any probe
can be inserted into a function (djprobes and int3 optimization
nonwithstanding) and this is a real practical problem for SystemTap.
Frank can attest to that.

Ingo

2006-09-18 16:30:46

by Mathieu Desnoyers

[permalink] [raw]

Subject: MARKER mechanism, try 2

Hi Ingo,

I played a bit with my marker proof of concept, it now makes a lot more sense.
Here it is. Comments are welcome.

It supports 5 modes :

- marker becomes nothing
- marker calls printk
- marker calls a tracer
- marker puts a symbol (for kprobe)
- marker puts a symbol and 5 NOPS for a jump probe.

Mathieu

-----BEGIN-----

/* Macro example for instrumentation
*
* Version 0.0.2
*
* Mathieu Desnoyers [email protected]
*
* This is released under the GPL v2 (or better) license.
*/

#include <stdio.h>

/* This is an example of noop, get this from the current arch header */
#define GENERIC_NOP1 ".byte 0x90\n"

/* PUT THIS IN A INCLUDE/LINUX HEADER */

#define __stringify_1(x) #x //see include/linux/stringify.h
#define __stringify(x) __stringify_1(x)

#define KBUILD_BASENAME basename
#define KBUILD_MODNAME modulename

#define MARK_SYM(event) \
__asm__ ( "__mark_"__stringify(KBUILD_MODNAME)"_"__stringify(KBUILD_BASENAME)"_"#event":" )

/* With config menu mutual exclusion of choice */
#ifdef CONFIG_NOLOG
#define MARK(event, format, args...)
#endif

#ifdef CONFIG_PRINTLOG
#define MARK(event, format, args...) \
printf(format, ##args);
#endif

#ifdef CONFIG_TRACELOG
#define MARK(event, format, args...) \
trace_##event( args );
#endif

#ifdef CONFIG_KPROBELOG
#define MARK(event, format, args...) \
{ \
MARK_SYM(event); \
}
#endif

#ifdef CONFIG_JUMPPROBELOG
#define MARK(event, format, args...) \
{ \
MARK_SYM(event); \
__asm__ ( GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 GENERIC_NOP1 ); \
}
#endif

/* PUT THIS IN A HEADER NEAR THE .C FILE */
#ifdef CONFIG_TRACELOG
static inline void trace_eventname(int a, char *b)
{
/* log.... */
printf("Tracing event : first arg %d, second arg %s", a, b);
}
#endif

/* PUT THIS IN THE .C FILE */

int main()
{
int myint = 55;
char * mystring = "blah";

MARK(eventname, "%d %s", myint, mystring);

printf("\n");

return 0;
}

-----END-----

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 16:37:15

[permalink] [raw]

Subject: Re: MARKER mechanism, try 2

* Mathieu Desnoyers <[email protected]> wrote:

> It supports 5 modes :
>
> - marker becomes nothing
> - marker calls printk
> - marker calls a tracer
> - marker puts a symbol (for kprobe)
> - marker puts a symbol and 5 NOPS for a jump probe.

just go for 'nothing' and the 5-NOP variant, and please implement
support for it from within LTT, via a kprobe - if you want me to support
this stuff for upstream inclusion. If we support any static tracer mode
and LTT does not support the kprobe mode then we are back to square 1
wrt. dependencies ...

Ingo

2006-09-18 16:39:46

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 12:15 -0400, ysgrifennodd Frank Ch. Eigler:
> > [...] So its L1 misses more register reloads and the like. Sounds
> > more and more like wasted clock cycles for debug. [...]
>
> But it's not just "for debug"! It is for system administrators,
> end-users, developers.

It is for debug. System administrators and developers also do debug,
they may just use different tools. The percentage of schedule() calls
executed across every Linux box on the planet where debug is enabled is
so close to nil its noise. Even with traces that won't change.

> Indeed, there will be some non-zero execution-time cost. We must be
> willing to pay *something* in order to enable this functionality.

There is an implementation which requires no penalty is paid. Create a
new elf section which contains something like

[address to whack with int3]
[or info for jprobes to make better use]
[name for debug tools to find]
[line number in source to parse the gcc debug data]

2006-09-18 16:40:42

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 18:15 +0200, ysgrifennodd Ingo Molnar:
> Do you consider a single 5-byte NOP for a judiciously chosen 50 places
> in the kernel unacceptable? Note that the argument has shifted from

Its not neccessary. The question about acceptability doesn't come up.

> static tracers to dynamic tracers: this _is_ about SystemTap: it adds
> points to the kernel where we can _guarantee_ that a dynamic probe can
> be inserted.

That already exists. You don't always know the address of the point.
Knowing where to stick the probe is out of line, shoving nops in the
code is an ugly unneccessary hack.

You can't really have it both ways - you argued that the performance
improvement for LTT static traces wasn't justification and pointed out
jprobes then optimised int3. Now if you want to do markup for awkward
tracepoints for kprobe use then the same rules seem to apply - jprobes
and the int3 optimisation mean you don't need to go shoving nops in code
paths that are used all the time.

Alan

2006-09-18 17:18:27

by Karim Yaghmour

[permalink] [raw]

Subject: Re: The emperor is naked: why comprehensive static markup belongs in mainline

Just one factual correction, the rest of your post I don't wish
to contest. In fact, your support for a unified tool is exactly
where I think things should go.

Jose R. Santos wrote:
> The fact that tracepoint did not vary in a 5 year timespan just proves
> that the users of LTTng are very few.

A rapid lookup will demonstrate that the *old* ltt, for which the
5 year mark was presented, was actually shipped by many distributions,
especially embedded ones.

Thanks,

Karim

2006-09-18 17:27:48

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

alan worte:
> > > [...] So its L1 misses more register reloads and the like. Sounds
> > > more and more like wasted clock cycles for debug. [...]
> >
> > But it's not just "for debug"! It is for system administrators,
> > end-users, developers.
>
> It is for debug. System administrators and developers also do debug,
> they may just use different tools.

Then you're using the term so broadly as to lose specific meaning.

> The percentage of schedule() calls executed across every Linux box
> on the planet where debug is enabled is so close to nil it's
> noise. [...]

Unless one's worried about planetary-scale energy use, I see no point
in multiplying overheads by "every box on the planet".

> > Indeed, there will be some non-zero execution-time cost. We must be
> > willing to pay *something* in order to enable this functionality.
>
> There is an implementation which requires no penalty is paid. Create a
> new elf section which contains something like [...]

Unfortunately, cases in which this sort of out-of-band markup would be
sufficient are pretty much those exact same cases where it is not
necessary. Remember, the complex cases occur when the compiler munges
up control flow and data accessability, so debuginfo cannot or does
not correctly place the probes and their data gathering compatriots.

- FChE

Attachments:

(No filename) (1.30 kB)
(No filename) (189.00 B)
Download all attachments

2006-09-18 17:44:09

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 13:27 -0400, ysgrifennodd Frank Ch. Eigler:
> Unless one's worried about planetary-scale energy use, I see no point
> in multiplying overheads by "every box on the planet".

Because we are all paying for your debug stuff we aren't using. Systems
get slow and sucky by the death of a million cuts not by one stupid
action.

> Unfortunately, cases in which this sort of out-of-band markup would be
> sufficient are pretty much those exact same cases where it is not
> necessary. Remember, the complex cases occur when the compiler munges
> up control flow and data accessability, so debuginfo cannot or does
> not correctly place the probes and their data gathering compatriots.

Which if understand you right you'd end up unmunging and reducing
performance for by reducing the options gcc has to make that critical
code go fast just so you know what register something is living in.

Alan

2006-09-18 17:47:47

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: MARKER mechanism, try 2

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > It supports 5 modes :
> >
> > - marker becomes nothing
> > - marker calls printk
> > - marker calls a tracer
> > - marker puts a symbol (for kprobe)
> > - marker puts a symbol and 5 NOPS for a jump probe.
>
> just go for 'nothing' and the 5-NOP variant, and please implement
> support for it from within LTT, via a kprobe - if you want me to support
> this stuff for upstream inclusion. If we support any static tracer mode
> and LTT does not support the kprobe mode then we are back to square 1
> wrt. dependencies ...
>

I am open to make LTTng support kprobes as a commodity (in fact, this point has
been on the LTTng project roadmap for almost a year). But in no way does it
solve the entire tracing problem. As an example, LTTng traces the page fault
handler, when kprobes just can't instrument it.

I keep thinking that a complete marker mechanism must have the ability to be
turned into function calls or inline functions when necessary.

Going further, we could think of a marker mechanism that would be aware of the
"difficulty" level of the probe, so that even if CONFIG_KPROBELOG is selected,
it would use a direct call or inlined function for probing the page fault
handler.

i.e. :

"normal" (nothing, kprobe, jumpprobe, printk or tracer)
MARK(eventname, "%d %s", myint, mystring);

"cannot be probed dynamically" (used in kprobes itself, page fault handler)
(only nothing or tracer)
MARK_NOPROBE(eventname, "%d %s", myint, mystring);

"cannot use printk" (used in scheduler, NMIs, wakeup, printk itself)
(nothing, kprobe, jumpprobe or tracer)
MARK_NOPRINT(eventname, "%d %s", myint, mystring);

Using the following table to select the mechanism :

Config/probe declaration | normal | noprobe | noprint
------------------------------------------------------------------------------
nothing | nothing | nothing | nothing
kprobe | kprobe | tracer | kprobe
jumpprobe | jumpprobe | tracer | jumpprobe
printk | printk | tracer | kprobe
tracer | tracer | tracer | tracer

Therefore, selecting the "kprobe" configuration option would still let people
instrument the hardest paths while having mostly dynamic probes.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 17:54:37

by Martin Bligh

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Alan Cox wrote:
> Ar Llu, 2006-09-18 am 13:27 -0400, ysgrifennodd Frank Ch. Eigler:
>
>>Unless one's worried about planetary-scale energy use, I see no point
>>in multiplying overheads by "every box on the planet".
>
>
> Because we are all paying for your debug stuff we aren't using. Systems
> get slow and sucky by the death of a million cuts not by one stupid
> action.

Bear in mind that it could be CONFIG'ed out, so you can still do as you
choose. But for many people, the ability to get insight into their
application's interaction with the kernel and get several % performance
improvement by understanding their environment will outweigh the 0.01%
overhead of a few nops.

IME, most performance problems are not little tiny instruction-cycle
level things, they're huge sucking wounds that people just don't know
how to fix, or that they even exist (such as "oops, I single-threaded
all my IO from my app").

M.

2006-09-18 18:06:05

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

alan wrote:

> > Unless one's worried about planetary-scale energy use, I see no point
> > in multiplying overheads by "every box on the planet".
>
> Because we are all paying for your debug stuff we aren't
> using. Systems get slow and sucky by the death of a million cuts not
> by one stupid action.

"slow and sucky" happens one machine at a time. One doesn't perceive
time that is "lost" by a random machine sitting in a hut somewhere
running a bit slower.

> > Unfortunately, cases in which this sort of out-of-band markup would be
> > sufficient are pretty much those exact same cases where it is not
> > necessary. Remember, the complex cases occur when the compiler munges
> > up control flow and data accessability, so debuginfo cannot or does
> > not correctly place the probes and their data gathering compatriots.
>
> Which if understand you right you'd end up unmunging and reducing
> performance for by reducing the options gcc has to make that critical
> code go fast just so you know what register something is living in.

Something like that, but not as drastic. The effect of a marker would
be to force the compiler to preserve a statement boundary and or
preserve or recreate the values when the marker is active. It may
interfere with the otherwise optimized code somewhat, but the amount
depends on the details. For the most time-critical probes, we could
opt for the least powerful/disruptive markers.

- FChE

Attachments:

(No filename) (1.41 kB)
(No filename) (189.00 B)
Download all attachments

2006-09-18 19:11:04

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Alan Cox wrote:

>Ar Llu, 2006-09-18 am 12:15 -0400, ysgrifennodd Frank Ch. Eigler:
>
>
>>>[...] So its L1 misses more register reloads and the like. Sounds
>>>more and more like wasted clock cycles for debug. [...]
>>>
>>>
>>But it's not just "for debug"! It is for system administrators,
>>end-users, developers.
>>
>>
>
>It is for debug. System administrators and developers also do debug,
>they may just use different tools. The percentage of schedule() calls
>executed across every Linux box on the planet where debug is enabled is
>so close to nil its noise. Even with traces that won't change.
>
>

Precisely the reason this huge thread is arguing why we shouldn't be
including only static marker mechanism in the kernel tree. We are using
dynamic probe mechanism which doesn't alter the execution flow or
prevent compiler in making good optimizations for the most part but
there are few code paths that are critical in understanding that we are
not able to use this dynamic method for which we need static markers. As
Martin pointed out if one is critical about performance they can be
compiled out.

It is also important to note the amount of $s lost by taking long time
to find a solution to a problem due to lack of good debugging tools is
also significant compared to few additional clock cycles machines spend
due to these static markers.

>
>
>>Indeed, there will be some non-zero execution-time cost. We must be
>>willing to pay *something* in order to enable this functionality.
>>
>>
>
>There is an implementation which requires no penalty is paid. Create a
>new elf section which contains something like
>
> [address to whack with int3]
> [or info for jprobes to make better use]
> [name for debug tools to find]
> [line number in source to parse the gcc debug data]
>
>
I am not sure i quiet understand your line number part of the proposal.
Does this proposal assume we have access to source code while generating
dynamic probes?

>
>
>
This still doesn't solve the problem of compiler optimizing such that a
variable i would like to read in my probe not being available at the
probe point.

>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2006-09-18 19:17:47

[permalink] [raw]

Subject: Re: MARKER mechanism, try 2

Ar Llu, 2006-09-18 am 13:47 -0400, ysgrifennodd Mathieu Desnoyers:
> "cannot use printk" (used in scheduler, NMIs, wakeup, printk itself)
> (nothing, kprobe, jumpprobe or tracer)

Also sometimes in character drivers - if it's the console device and you
printk in the driver code you go boom.

2006-09-18 19:26:53

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Ar Llu, 2006-09-18 am 12:10 -0700, ysgrifennodd Vara Prasad:
> I am not sure i quiet understand your line number part of the proposal.
> Does this proposal assume we have access to source code while generating
> dynamic probes?

Its one route - or we dump it into an ELF section in the binary.

> This still doesn't solve the problem of compiler optimizing such that a
> variable i would like to read in my probe not being available at the
> probe point.

Then what we really need by the sound of it is enough gcc smarts to do
something of the form

.section "debugbits"

.asciiz 'hook_sched'
.dword l1 # Address to probe
.word 1 # Argument count
.dword gcc_magic_whatregister("next"); [ reg num or memory ]
.dword gcc_magic_whataddress("next"); [ address if exists]

Can gcc do any of that for us today ?

2006-09-18 19:45:49

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

On Mon, Sep 18, 2006 at 08:49:40PM +0100, Alan Cox wrote:
> [...]
> Then what we really need by the sound of it is enough gcc smarts to do
> something of the form [...]
> .section "debugbits"
> [...]
> Can gcc do any of that for us today ?

This is not that different from what gcc does for DWARF. Trouble is,
there appear to exist optimization transformations which make such
data difficult or impossible to generate. (In particular, it is
unlikely to be easier to create specialized data like this if the
compiler can't be made to create first-class DWARF for the same probe
points / data values.)

- FChE

Attachments:

(No filename) (617.00 B)
(No filename) (189.00 B)
Download all attachments

2006-09-18 20:13:28

by Michel Dagenais

[permalink] [raw]

Subject: Re: tracepoint maintainance models

> ... I don't understand why LTT and SystemTap can't just
> merge and play nice together....

For simple userland tasks, GDB might be all one needs. In other cases,
strace is less intrusive. Yet, in many occasions, the problem is not
reproducible under strace because the timing is changed and a better
mechanism is needed.

Similarly, most sysadmins will be delighted to dynamically activate a
few tracepoints on their live system and catch their problem. In
difficult cases (e.g. distributed application in a large cluster,
embedded systems, nasty problem in the interrupt routine of a device
driver) you need the tool with the lowest disturbance and you will be
ready to recompile and reboot if necessary. If kprobes can achieve this
lowest disturbance (i.e. superior to a static tracepoint in almost all
cases) life will be simpler for all of us.

It does not appear to be the case, however. There are a number of
contexts where kprobes cannot be set (e.g. NMI, m68k :-) and, despite
not having the same reentrancy, its performance is lower than LTTng.
Note that the kprobe performance has improved over the weekend and we
should all be glad of that!

I am looking forward to having the best possible tools, indeed converge
to "merge" and play nice, taking the best parts from each system
(dynamic tracepoints with SystemTap, static tracepoints if needed for
more critical areas, the efficient reentrant per cpu LTTng/Relay
recording infrastructure...).

2006-09-18 20:29:04

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Alan Cox wrote:

>Ar Llu, 2006-09-18 am 12:10 -0700, ysgrifennodd Vara Prasad:
>
>
>>I am not sure i quiet understand your line number part of the proposal.
>>Does this proposal assume we have access to source code while generating
>>dynamic probes?
>>
>>
>
>Its one route - or we dump it into an ELF section in the binary.
>
>
Source code access is not a good solution but ELF section could work.

>
>
>>This still doesn't solve the problem of compiler optimizing such that a
>>variable i would like to read in my probe not being available at the
>>probe point.
>>
>>
>
>Then what we really need by the sound of it is enough gcc smarts to do
>something of the form
>
> .section "debugbits"
>
> .asciiz 'hook_sched'
> .dword l1 # Address to probe
> .word 1 # Argument count
> .dword gcc_magic_whatregister("next"); [ reg num or memory ]
> .dword gcc_magic_whataddress("next"); [ address if exists]
>
>
>Can gcc do any of that for us today ?
>
>
>
No, gcc doesn't do that today.

2006-09-19 12:59:00

by Christoph Hellwig

[permalink] [raw]

Subject: tracing - consensus building insteat of dogfights

I've been half-way through reading this thread after returning, and I must
say I'm rather annoyed that 80% of it is just Roman vs Ingo and Karim vs
Jes dogfights that run in circles. Let's try to find some majority optinion
and plans to move forward:

*) so far everyone but Roman seems to agree we want to support dynamic
tracing as an integral part of the tracing framework
*) most people seem to agree that we want some form of in-source annotation
instead of just external probes

so let's build on this rough consensus and decide on the next steps before
fighting the hard battels. I think those important steps are:

1) review and improve the lttng core tracing engine (without static traces
so far) and get it into mergeable shape. Make sure it works nicely
from *probe dynamic tracing handlers.
2) find a nice syntax for in-source tracing annotations, and implement a
backend for it using lttng and *probes.

We can fight the hard fight whether we want real static tracing and how
many annotations of what form were after we have those important building
blocks.

2006-09-19 13:25:11

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracing - consensus building insteat of dogfights

Christoph Hellwig wrote:
> I've been half-way through reading this thread after returning, and I must
> say I'm rather annoyed that 80% of it is just Roman vs Ingo and Karim vs
> Jes dogfights that run in circles. Let's try to find some majority optinion
> and plans to move forward:

Well, I believed such a consensus had been achieved and that somehow
Ingo and I had reached some friendly terms:

See this:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115855453205733&w=2
And this:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115855674320139&w=2

Then this:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115855674231992&w=2

For me, the fighting could have stopped there, but then Ingo saw it
fitting to post this:
http://marc.theaimsgroup.com/?l=linux-kernel&m=115859172307516&w=2

I still don't get how Ingo goes from friendly and compromising to
then attempting the worst kind of character assassination I've seen
on lkml. The only way it makes sense is if Ingo's understanding was
that Jes' email, to which he is responding, was responding to an email
I had sent after telling Ingo that the fighting was over. And that
would be simple to understand, within this humongous thread confusion
is more than likely. Though the inverse (friendly then angry) would
just make Ingo one of the most irrational persons I've come across.

My nitpicking in this last paragraph sounds absolutely silly, but it's
really quite important because if cleared up it would at least go to
show that one of two main protagonists in this issue had indeed
agreed to put disagreement aside. I've asked Ingo privately to
clear this up, but haven't gotten any response, maybe he was just
too angry and killfilled me since.

In any case, I'm more than happy to settle for the friendly end-
of-discussion terms.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-19 13:26:39

by Roman Zippel

[permalink] [raw]

Subject: Re: tracing - consensus building insteat of dogfights

Hi,

On Tue, 19 Sep 2006, Christoph Hellwig wrote:

> *) so far everyone but Roman seems to agree we want to support dynamic
> tracing as an integral part of the tracing framework

Actually I don't disagree at all, I'm sorry if I have been so easy to
misunderstand. All I'm asking for is to make static tracing possible if
reasonably possible. I know that pure static tracing will always be second
choice, but if we can _reasonably_ support it, why shouldn't we do it?

bye, Roman

2006-09-19 14:04:20

by Karim Yaghmour

[permalink] [raw]

Subject: Re: tracing - consensus building insteat of dogfights

typo ...

Karim Yaghmour wrote:
> My nitpicking in this last paragraph sounds absolutely silly, but it's
> really quite important because if cleared up it would at least go to
> show that one of two main protagonists in this issue had indeed
^^^^^^^^^^^^^^^
Makes more sense like this: "two of the main ..."

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-23 15:50:12

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: LTTng and SystemTAP (Everyone who is scared to read this huge thread, skip to here)

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > > Some of the extensive hooking you do in LTT could be aleviated to a
> > > great degree if you used dynamic probes. For example the syscall
> > > entry hackery in LTT looks truly scary.
> >
> > Yes, agreed. The last time I checked, I thought about moving this
> > tracing code to the syscall_trace_entry/exit (used for security hooks
> > and ptrace if I remember well). I just didn't have the time to do it
> > yet.
>
> correct, that's where all such things (auditing, seccomp, ptrace,
> sigstop, freezing, etc.) hook into. Much (all?) of the current entry.S
> hacks can go away in favor of a much easier .c patch to
> do_syscall_trace() and this would reduce a significantion portion of the
> present intrusiveness of LTTng.
>

Hi Ingo,

The only problem with do_syscall_trace is that it is only called at the
beginning of the system call. LTT also needs a marker at the end of the system
call to know when the control went back to user space.

Any idea of a nice location (in C code preferably) for such a marker ?

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-10-06 05:33:19

by Steven Rostedt

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Coming into this really late, and I'm still behind in reading this and
related threads, but I want to throw this idea out, and it's getting
late.

On Mon, 2006-09-18 at 13:28 -0700, Vara Prasad wrote:
> Alan Cox wrote:
>
> >
> >>This still doesn't solve the problem of compiler optimizing such that a
> >>variable i would like to read in my probe not being available at the
> >>probe point.
> >>
> >>
> >
> >Then what we really need by the sound of it is enough gcc smarts to do
> >something of the form
> >
> > .section "debugbits"
> >
> > .asciiz 'hook_sched'
> > .dword l1 # Address to probe
> > .word 1 # Argument count
> > .dword gcc_magic_whatregister("next"); [ reg num or memory ]
> > .dword gcc_magic_whataddress("next"); [ address if exists]
> >
> >
> >Can gcc do any of that for us today ?
> >
> >
> >
> No, gcc doesn't do that today.
>
>

---- cut here ----
#include <stdio.h>

#define MARK(label, var) \
asm ("debug_" #label ":\n" \
".section .data\n" \
#label "_" #var ": xor %0,%0\n" \
".previous" : : "r"(var))

static int func(int a)
{
int y;
int z;

y = a;
MARK(func, y);
z = y+2;

return z;

}

static void read_label(void)
{
extern unsigned short regA;
unsigned short *r = &regA;
char *regs[] = {
"A", "B", "C", "D", "DI", "BP", "SP", "CH"
};
int i;
extern unsigned short func_y;
extern unsigned long debug_func;

asm (".section .data\n"
"regA: xor %eax,%eax\n"
"regB: xor %ebx,%ebx\n"
"regC: xor %ecx,%ecx\n"
"regD: xor %edx,%edx\n"
"regDI: xor %edi,%edi\n"
"regBP: xor %ebp,%ebp\n"
"regSP: xor %esp,%esp\n"
".previous");

for (i=0; i < 7; i++) {
if (r[i] == func_y)
break;
}
if (i < 7)
printf("func y is in reg %s at %p\n",
regs[i],
&debug_func);
else
printf("func y not found!\n");
}

int main (int argc, char **argv)
{
int g;
g = func(argc);
read_label();
return g;
}
---- end cut ----

$ gcc -O2 -o mark mark.c
$ ./mark
func y is in reg B at 0x80483ce

Now the question is, isn't MARK() in this code a non intrusive marker?

So couldn't a kprobe be set at "debug_func" and we can find what
register "y" is without adding any overhead to the code being marked?

Obviously, this would need to be done special for each arch.

-- Steve

2006-10-06 13:03:20

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Hi -

On Fri, Oct 06, 2006 at 01:33:11AM -0400, Steven Rostedt wrote:
> Coming into this really late, and I'm still behind in reading this and
> related threads, but I want to throw this idea out, and it's getting
> late.
> [...]
> #define MARK(label, var) \
> asm ("debug_" #label ":\n" \
> ".section .data\n" \
> #label "_" #var ": xor %0,%0\n" \
> ".previous" : : "r"(var))
> [...]
> $ gcc -O2 -o mark mark.c
> $ ./mark
> func y is in reg B at 0x80483ce
> [...]

Clever.

> Now the question is, isn't MARK() in this code a non intrusive marker?

Not quite. The assembly code forces gcc to materialize the data that
it might already have inlined, and to borrow a register for the
duration. It's still a neat idea though.

- FChE

2006-10-06 14:23:31

by Steven Rostedt

[permalink] [raw]

Subject: Re: tracepoint maintainance models

On Fri, 6 Oct 2006, Frank Ch. Eigler wrote:

> Hi -
>
> On Fri, Oct 06, 2006 at 01:33:11AM -0400, Steven Rostedt wrote:
> > Coming into this really late, and I'm still behind in reading this and
> > related threads, but I want to throw this idea out, and it's getting
> > late.
> > [...]
> > #define MARK(label, var) \
> > asm ("debug_" #label ":\n" \
> > ".section .data\n" \
> > #label "_" #var ": xor %0,%0\n" \
> > ".previous" : : "r"(var))
> > [...]
> > $ gcc -O2 -o mark mark.c
> > $ ./mark
> > func y is in reg B at 0x80483ce
> > [...]
>
> Clever.
>
> > Now the question is, isn't MARK() in this code a non intrusive marker?
>
> Not quite. The assembly code forces gcc to materialize the data that
> it might already have inlined, and to borrow a register for the
> duration. It's still a neat idea though.

Thanks!

You're right, it is intrusive in a way that it does modify the way gcc can
optimize that section of code. But what I like about this idea, is that
it allows for us to tell gcc that we want this variable inside a register,
and then gcc can do that for us and still optimize around that. We put no
more constraints on the code, except that we want some value in a
register at some given point of execution. This should only be done for
local variables that are not easily captured in a probe.

Of course with i386's limit on registers, it can put a little strain if we
want more than one variable. But x86_64 will soon be the norm, and the
added registers should help out a lot.

-- Steve

2006-10-06 23:17:09

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: tracepoint maintainance models

Steven Rostedt wrote:
> Coming into this really late, and I'm still behind in reading this and
> related threads, but I want to throw this idea out, and it's getting
> late.
>
> On Mon, 2006-09-18 at 13:28 -0700, Vara Prasad wrote:
>
>> Alan Cox wrote:
>>
>>
>>>> This still doesn't solve the problem of compiler optimizing such that a
>>>> variable i would like to read in my probe not being available at the
>>>> probe point.
>>>>
>>>>
>>>>
>>> Then what we really need by the sound of it is enough gcc smarts to do
>>> something of the form
>>>
>>> .section "debugbits"
>>>
>>> .asciiz 'hook_sched'
>>> .dword l1 # Address to probe
>>> .word 1 # Argument count
>>> .dword gcc_magic_whatregister("next"); [ reg num or memory ]
>>> .dword gcc_magic_whataddress("next"); [ address if exists]
>>>
>>>
>>> Can gcc do any of that for us today ?
>>>
>>>
>>>
>>>
>> No, gcc doesn't do that today.
>>
>>
>>
>
>
> ---- cut here ----
> #include <stdio.h>
>
> #define MARK(label, var) \
> asm ("debug_" #label ":\n" \
> ".section .data\n" \
> #label "_" #var ": xor %0,%0\n" \
> ".previous" : : "r"(var))
>

That's a nice idea. As Frank pointed out, it does force things into
register. You could use "rm" as a constraint, so you can also get the
location wherever it exists. It will still force gcc into keeping the
value around at all, but presumably if its interesting for a mark, its
interesting to keep:

asm volatile("..." \
#label "_" #var ": mov %0,%%eax\n" \
".previous" : : "rm" (var))

and, aside from the naming issues, it could be a general expression
rather than a specific variable.

Of course, this requires a more complex addressing mode decoder, but it
does give gcc more flexibility. And in principle this is all redundant,
since DWARF should be able to encode all this too, and if you make use
of the variable as an asm argument, gcc really should be outputting the
debug info about it.

J