LinuxLists.cc - [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> Following an advice Christoph gave me this summer, submitting a
> smaller, easier to review patch should make everybody happier. Here is
> a stripped down version of LTTng : I removed everything that would
> make the code review reluctant (especially kernel instrumentation and
> kernel state dump module). I plan to release this "core" version every
> few LTTng releases and post it to LKML.
>
> Comments and reviews are very welcome.

i have one very fundamental question: why should we do this
source-intrusive method of adding tracepoints instead of the dynamic,
unintrusive (and thus zero-overhead) KProbes+SystemTap method?

Ingo

2006-09-14 13:41:36

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> i have one very fundamental question: why should we do this
> source-intrusive method of adding tracepoints instead of the dynamic,
> unintrusive (and thus zero-overhead) KProbes+SystemTap method?

Could you define "zero-overhead"?
Actual implementation aside having a core number of tracepoints is far
more portable than KProbes.

bye, Roman

2006-09-14 14:04:14

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> On Thu, 14 Sep 2006, Ingo Molnar wrote:
>
> > i have one very fundamental question: why should we do this
> > source-intrusive method of adding tracepoints instead of the dynamic,
> > unintrusive (and thus zero-overhead) KProbes+SystemTap method?
>
> Could you define "zero-overhead"?

zero overhead when not used: not a single instruction added to the
kernel codepath that is to be traced, anywhere. (which will be the case
on 99% of the systems)

> Actual implementation aside having a core number of tracepoints is far
> more portable than KProbes.

the key point is that we want _zero_ "static tracepoints". Firstly,
static tracepoints are fundamentally limited:

- they can only be added at the source code level

- modifying them requires a reboot which is not practical in a
production environment

- there can only be a limited set of them, while many problems need
finegrained tracepoints tailored to the problem at hand

- conditional tracepoints are typically either nonexistent or very
limited.

But besides the usability problems, the most important problem is that
static tracepoints add a _constant maintainance overhead_ to the kernel.
I'm talking from first hand experience: i wrote 'iotrace' (a static
tracer) in 1996 and have maintained it for many years, and even today
i'm maintaining a handful of tracepoints in the -rt kernel. I _dont_
want static tracepoints in the mainline kernel.

enter KProbes+SystemTap. It needs no changes at the source code level at
all, so no maintainance overhead to generic kernel code. Tracepoints can
be added and removed while the system is running. Trace actions and
filters can be added based on a scripting language, so tracing is as
dynamic as it gets.

(check out http://lwn.net/Articles/198557/ if you have an lwn
subscription - it's subscriber-only for a few weeks)

Ingo

2006-09-14 14:34:08

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> > On Thu, 14 Sep 2006, Ingo Molnar wrote:
> >
> > > i have one very fundamental question: why should we do this
> > > source-intrusive method of adding tracepoints instead of the dynamic,
> > > unintrusive (and thus zero-overhead) KProbes+SystemTap method?
> >
> > Could you define "zero-overhead"?
>
> zero overhead when not used: not a single instruction added to the
> kernel codepath that is to be traced, anywhere. (which will be the case
> on 99% of the systems)

Using alternatives this could be near zero as well and it will likely
have less overhead when it's actually used.

> > Actual implementation aside having a core number of tracepoints is far
> > more portable than KProbes.
>
> the key point is that we want _zero_ "static tracepoints". Firstly,
> static tracepoints are fundamentally limited:

BTW I don't mind KProbes as an option, but I have huge problem with making
it the only option.

> But besides the usability problems, the most important problem is that
> static tracepoints add a _constant maintainance overhead_ to the kernel.
> I'm talking from first hand experience: i wrote 'iotrace' (a static
> tracer) in 1996 and have maintained it for many years, and even today
> i'm maintaining a handful of tracepoints in the -rt kernel. I _dont_
> want static tracepoints in the mainline kernel.

Even dynamic tracepoints have a maintainance overhead and I doubt there is
much difference. The big problem is having to maintain them outside the
mainline kernel, that's why it's so important to get them into the
mainline kernel.
You didn't address my main issue at all - kprobes is only available for a
few archs...

bye, Roman

2006-09-14 15:02:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > Following an advice Christoph gave me this summer, submitting a
> > smaller, easier to review patch should make everybody happier. Here is
> > a stripped down version of LTTng : I removed everything that would
> > make the code review reluctant (especially kernel instrumentation and
> > kernel state dump module). I plan to release this "core" version every
> > few LTTng releases and post it to LKML.
> >
> > Comments and reviews are very welcome.
>
> i have one very fundamental question: why should we do this
> source-intrusive method of adding tracepoints instead of the dynamic,
> unintrusive (and thus zero-overhead) KProbes+SystemTap method?
>

Hi Ingo,

First, I never said that this tracing infrastructure was tied to static trace
points in any way. My goal is to provide a robust data serialisation mechanism
that could be used both from static and dynamic trace points.

Zero-overhead for static tracepoints can be achieved by compiling them out.

One problem with the KProbes approach is that is limits what can be instrumented
because of its performance impact when active : traps are very costly and can
limit instrumentation of often triggered code paths : scheduler change, traps,
interrupts...

Also, a major issue with dynamic instrumentation is that it will never be useful
to kernel developers who keep current with the git HEAD. Dynamic instrumentation
has to be defined outside of the kernel tree and cannot follow the code changes
quickly enough to be useful for a developer without himself maintaining his own
dynamic instrumentation.

I do not advocate for a particular approach : I think that dynamic
instrumentation is very well suited for distributions which stick to a
particular kernel version for a long time. However, static probes can be very
useful for kernel developers as they can follow the kernel HEAD because they
are part of the code.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-14 15:15:14

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Mathieu Desnoyers <[email protected]> wrote:
>
>> Following an advice Christoph gave me this summer, submitting a
>> smaller, easier to review patch should make everybody happier. Here is
>> a stripped down version of LTTng : I removed everything that would
>> make the code review reluctant (especially kernel instrumentation and
>> kernel state dump module). I plan to release this "core" version every
>> few LTTng releases and post it to LKML.
>>
>> Comments and reviews are very welcome.
>
> i have one very fundamental question: why should we do this
> source-intrusive method of adding tracepoints instead of the dynamic,
> unintrusive (and thus zero-overhead) KProbes+SystemTap method?

Because:

1. Kprobes are more overhead when they *are* being used.
2. You can get zero overhead by CONFIG'ing things out.
3. (most importantly) it's a bitch to maintain tracepoints out
of-tree on a rapidly moving kernel
4. I believe kprobes still doesn't have full access to local variables.

Now (3) is possibly solvable by putting the points in as no-ops (either
insert a few nops or just a marker entry in the symbol table?), but full
dynamic just isn't sustainable. What would be really nice is one trace
infrastructure, that allowed both static and dynamic tracepoints without
all the awk-style language crap that seems to come with systemtap.

M.

2006-09-14 15:19:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Roman Zippel <[email protected]> wrote:
>
> the key point is that we want _zero_ "static tracepoints". Firstly,
> static tracepoints are fundamentally limited:
>
> - they can only be added at the source code level
>
> - modifying them requires a reboot which is not practical in a
> production environment

Not for kernel modules : unload/load is enough.

> - there can only be a limited set of them, while many problems need
> finegrained tracepoints tailored to the problem at hand

Not true with the dynamic facility loading. LTTng can register new events upon
module load/unload.

>
> - conditional tracepoints are typically either nonexistent or very
> limited.
>
Maybe, but it can be useful to have static instrumentation available for those
limited conditional tracepoints.

> But besides the usability problems, the most important problem is that
> static tracepoints add a _constant maintainance overhead_ to the kernel.
> I'm talking from first hand experience: i wrote 'iotrace' (a static
> tracer) in 1996 and have maintained it for many years, and even today
> i'm maintaining a handful of tracepoints in the -rt kernel. I _dont_
> want static tracepoints in the mainline kernel.
>

If the trace points are modified with the code by the ones who make the
original code changes, it lessens the maintainance overhead. Furthermore, if
there is a major change in a code path that requires rethinking the trace
points, the person introducing the change has the best knowledge of what to do
with the trace point. I think that trace point maintainance should be left to
subsystem maintainers, not a centralised task done by distributions once in a
while.

Talking about experience, Karim has maintained the original LTT trace points,
which targeted key kernel event, for years without major trace points changes
between kernel versions. I think he already proved that maintainance of static
trace points in not an issue.

However, I restate that my position is that both static and dynamic
instrumentation of the kernel are a necessity and that a tracer core should be
usable by both.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-14 15:27:23

by Michel Dagenais

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Thu, 2006-14-09 at 16:33 +0200, Roman Zippel wrote:
> On Thu, 14 Sep 2006, Ingo Molnar wrote:
> > > On Thu, 14 Sep 2006, Ingo Molnar wrote:

> > > > i have one very fundamental question: why should we do this
> > > > source-intrusive method of adding tracepoints instead of the dynamic,
> > > > unintrusive (and thus zero-overhead) KProbes+SystemTap method?

> Using alternatives this could be near zero as well and it will likely
> have less overhead when it's actually used.

This is the crucial point. Using an INT3 at each dynamic tracepoint is
both costly and is a larger perturbation on the system under study.
Static tracepoints can be achieved by various means, including a few
NOPs to reserve space which get patched dynamically for activation. They
may also be compiled out completely. By the way, there are quite a few
tracers already in device drivers in the kernel.

> BTW I don't mind KProbes as an option, but I have huge problem with making
> it the only option.

Indeed, KProbes SystemTAP and LTTng are complementary and people
involved in the three projects are cooperating.

> > But besides the usability problems, the most important problem is that
> > static tracepoints add a _constant maintainance overhead_ to the kernel.
> > I'm talking from first hand experience: i wrote 'iotrace' (a static
> > tracer) in 1996 and have maintained it for many years, and even today
> > i'm maintaining a handful of tracepoints in the -rt kernel. I _dont_
> > want static tracepoints in the mainline kernel.
>
> Even dynamic tracepoints have a maintainance overhead and I doubt there is
> much difference. The big problem is having to maintain them outside the
> mainline kernel, that's why it's so important to get them into the
> mainline kernel.

Indeed, dynamic tracepoints are like code patches, when the kernel
source changes they may or not apply to newer versions. Mainline kernel
"static" tracepoints are more like the existing 70000+ printk
statements!

2006-09-14 17:22:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Thu, 14 Sep 2006, Ingo Molnar wrote:
>
> > > On Thu, 14 Sep 2006, Ingo Molnar wrote:
> > >
> > > > i have one very fundamental question: why should we do this
> > > > source-intrusive method of adding tracepoints instead of the dynamic,
> > > > unintrusive (and thus zero-overhead) KProbes+SystemTap method?
> > >
> > > Could you define "zero-overhead"?
> >
> > zero overhead when not used: not a single instruction added to the
> > kernel codepath that is to be traced, anywhere. (which will be the case
> > on 99% of the systems)
>
> Using alternatives this could be near zero as well and it will likely
> have less overhead when it's actually used.

if there are lots of tracepoints (and the union of _all_ useful
tracepoints that i ever encountered in my life goes into the thousands)
then the overhead is not zero at all.

also, the other disadvantages i listed very much count too. Static
tracepoints are fundamentally limited because:

- they can only be added at the source code level

- modifying them requires a reboot which is not practical in a
production environment

- there can only be a limited set of them, while many problems need
finegrained tracepoints tailored to the problem at hand

- conditional tracepoints are typically either nonexistent or very
limited.

for me these are all _independent_ grounds for rejection, as a generic
kernel infrastructure.

> > the key point is that we want _zero_ "static tracepoints". Firstly,
> > static tracepoints are fundamentally limited:
>
> BTW I don't mind KProbes as an option, but I have huge problem with
> making it the only option.

i'm not arguing for SystemTap to be the only option (KProbes is just the
infrastructure SystemTap is using - there are other uses for KProbes
too), but i'm arguing against the inclusion of static tracepoints as an
infrastructure, precisely because a much better option (SystemTap) is
already available and is usable on the stock kernel. You are of course
free to invent other, equally advantageous (or better) options.

> > But besides the usability problems, the most important problem is
> > that static tracepoints add a _constant maintainance overhead_ to
> > the kernel. I'm talking from first hand experience: i wrote
> > 'iotrace' (a static tracer) in 1996 and have maintained it for many
> > years, and even today i'm maintaining a handful of tracepoints in
> > the -rt kernel. I _dont_ want static tracepoints in the mainline
> > kernel.
>
> Even dynamic tracepoints have a maintainance overhead and I doubt
> there is much difference. The big problem is having to maintain them
> outside the mainline kernel, that's why it's so important to get them
> into the mainline kernel.

i dispute that: for example kernel/sched.c has zero maintainance
overhead under SystemTap, while it's nonzero with static tracepoints. Of
course SystemTap _itself_ has maintainance overhead, but it does not
slow down any other subsystem's speed of progress.

> You didn't address my main issue at all - kprobes is only available
> for a few archs...

the kprobes infrastructure, despite being fairly young, is widely
available: powerpc, i386, x86_64, ia64 and sparc64. The other
architectures are free to implement them too, there's nothing
hardware-specific about kprobes and the "porting overhead" is in essence
a one-time cost - while for static tracepoints the maintainance overhead
goes on forever and scales linearly with the number of tracepoints
added.

Ingo

2006-09-14 17:41:14

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Roman Zippel wrote:
> Even dynamic tracepoints have a maintainance overhead and I doubt there is
> much difference. The big problem is having to maintain them outside the
> mainline kernel, that's why it's so important to get them into the
> mainline kernel.

Thanks for pointing this out. This is indeed the nugget. We can try
slicing the pie in any direction we think is best, but the bottom
line is that there's somebody somewhere that is matching source code
to important events (regardless of whether the instrumentation is
static or dynamic.) For a very long time the mantra on LKML was
"instrumentation is evil: it's a maintenance nightmare." Try as I
may, every argument I put forth was countered by this mantra.

Unfortunately for me, but fortunately for the current ltt maintainers,
time is a powerful argument. So, with that in mind, here are some
excerpts of a discussion I had with Andrew back in the summer of
2004:

Here's Andrew pulling the "instrumentation is evil" mantra:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108873232414895&w=2

Here's me demonstrating that the mantra is wrong by comparing a
patch against 2.2.13 dated 1999/11/18 and a patch against 2.6.3
dated 2004/03/15:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108874078111041&w=2

And here's Andrew, to his credit, saying "Fair enough."
http://marc.theaimsgroup.com/?l=linux-kernel&m=108874940728542&w=2

Now, this is 2 years ago and I haven't done the analysis recently,
but I'd bet the comparison would probably yield very similar
results. The 1st ltt patch was made in July 1999, that's more
than **7** years ago. How much longer can anybody continue saying
with a straight face that static instrumentation is a maintenance
problem? In my opinion the real problem is what impact the fact
that this issue has lingered on for so long has in encouraging people
and/or companies in investing any sort of effort in the kernel
development process. There's just no excuse for Linux not to have
something that is clearly as essential as this.

I think now is a good time to put this issue to rest and drop the
misleading mantra.

Cheers,

Karim

2006-09-14 17:51:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin J. Bligh <[email protected]> wrote:

> >>Comments and reviews are very welcome.
> >
> > i have one very fundamental question: why should we do this
> > source-intrusive method of adding tracepoints instead of the
> > dynamic, unintrusive (and thus zero-overhead) KProbes+SystemTap
> > method?
>
> Because:
>
> 1. Kprobes are more overhead when they *are* being used.

minimally so - at least on i386 and x86_64. In that sense tracing is a
_slowpath_, and it _will_ slow things down if done excessively. I dont
care about the tracepoint being slower by a few instructions as long as
it has _zero effect_ on normal code, be that source code or binary code.

> 2. You can get zero overhead by CONFIG'ing things out.

but that's not how a fair chunk of people want to use tracing. People
(enterprise customers trying to figure out performance problems,
engineers trying to debug things on a live, production system) want to
be able to insert a tracepoint anywhere and anytime - and also they want
to have zero overhead from tracing if no tracepoints are used on a
system.

> 3. (most importantly) it's a bitch to maintain tracepoints out
> of-tree on a rapidly moving kernel

wrong: the original demo tracepoints that came with SystemTap still work
on the current kernel, because the 'coupling' is loose: based on
function names.

Static tracepoints on the other hand, if added via an external patch, do
depend on the target function not moving around and the context of the
tracepoint not being changed. (and static tracepoints if in the source
all the time are a constant hindrance to development and code
readability.)

and of course the big advantage of dynamic probing is its flexibility:
you can add add-hoc tracepoints to thousands of functions, instead of
having to maintain hundreds (or thousands) of static tracepoints all the
time. (and if we wont end up with hundreds/thousands of static
tracepoints then it wont be usable enough as a generic solution.)

> 4. I believe kprobes still doesn't have full access to local
> variables.

wrong: with SystemTap you can probe local variables too (via
jprobes/kretprobes, all in the upstream kernel already).

> Now (3) is possibly solvable by putting the points in as no-ops
> (either insert a few nops or just a marker entry in the symbol
> table?), but full dynamic just isn't sustainable. [...]

i'm not sure i follow. Could you explain where SystemTap has this
difficulty?

Ingo

2006-09-14 17:55:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> also, the other disadvantages i listed very much count too. Static
> tracepoints are fundamentally limited because:
>
> - they can only be added at the source code level
>
> - modifying them requires a reboot which is not practical in a
> production environment
>
> - there can only be a limited set of them, while many problems need
> finegrained tracepoints tailored to the problem at hand
>
> - conditional tracepoints are typically either nonexistent or very
> limited.
>
> for me these are all _independent_ grounds for rejection, as a generic
> kernel infrastructure.

Tracepoints of course need to be managed, but that's true for both dynamic
and static tracepoints. Both have their advantages and disadvantages and
just hammering on the possible problems of static ones (which are not much
of a problem for other people) is highly unfair and not a reason for
rejection. If you don't like them, don't use them, nobody forces you, it's
that simple...

> > You didn't address my main issue at all - kprobes is only available
> > for a few archs...
>
> the kprobes infrastructure, despite being fairly young, is widely
> available: powerpc, i386, x86_64, ia64 and sparc64. The other
> architectures are free to implement them too, there's nothing
> hardware-specific about kprobes and the "porting overhead" is in essence
> a one-time cost - while for static tracepoints the maintainance overhead
> goes on forever and scales linearly with the number of tracepoints
> added.

kprobes are not trivial to implement (especially to reach the level of
perfomance and flexibility of static tracepoints) and until then you deny
their users/developers a useful tool?
I also think you highly exaggerate the maintaince overhead of static
tracepoints, once added they hardly need any maintainance, most of the
time you can just ignore them. Only if the code drastically changes they
need to be adjusted, but at that point this should be the smallest
problem. The kernel is full debug prints, do you seriously suggest to
throw them out because of their "high maintainance"?

bye, Roman

2006-09-14 17:56:34

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Michel Dagenais <[email protected]> wrote:

> This is the crucial point. Using an INT3 at each dynamic tracepoint is
> both costly and is a larger perturbation on the system under study.
> [...]

have you measured this?

Ingo

2006-09-14 18:02:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> also, the other disadvantages i listed very much count too. Static
> tracepoints are fundamentally limited because:
>
> - they can only be added at the source code level

Non-issue. See below. This is actually a feature, as can be seen
by browsing the source code of various subsystems/filesystems/etc.
who's authors saw fit to include their own static tracepoints.
Darn, they must've been all misguided, so too were those who
reviewed the code and let it in.

> - modifying them requires a reboot which is not practical in a
> production environment

Non-issue. See below.

> - there can only be a limited set of them, while many problems need
> finegrained tracepoints tailored to the problem at hand

Non-issue. See below.

> - conditional tracepoints are typically either nonexistent or very
> limited.

I don't get this one. What's a "conditional tracepoint" for you?

> for me these are all _independent_ grounds for rejection, as a generic
> kernel infrastructure.

I've addressed other issues in another posting, but I want to
reiterate something here that Roman said that keeps getting
forgotten:

There is no competition between static and dynamic trace points.
They are both useful and complementary. If some set of existing
static trace points are insufficient at runtime for you to
resolve an issue, nothing precludes you from using the dynamic
mechanisms for adding more localized instrumentation.

Side point: you may be a kernel god, but there are mere mortals
out there who use Linux. The point I've been making for years
now is that there are legitimate reasons why normal non-kernel-
developer users who would benefit greatly from being able to
have access to tools that generate digested information
regarding key kernel events. You can argue all you want about
maintainability, and I continue to think you're wrong, but
you should know that the development and usefulness of any such
tools is gated by the continued inability to have a standard
set of known-to-be-good source of key kernel events. And I
repeat, the use of dynamic tracing does *not* solve this
issue.

At OLS2005 I had suggested a development of a markers infrastructure
who's users could use just to mark-up their code, the decision
for tying such markers to a given type of instrumentation not
actually being tied to the markers themselves. At OLS this
year a very good talk was given on this topic by Frank from the
systemtap team and it was very well received by the jam-packed
audience. IOW, while there used to be a time when people pitted
static instrumentation against dynamic instrumentation, there's
been an ever growing consensus that no such choice need be made.

Thanks,

Karim

2006-09-14 18:08:14

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Michel Dagenais wrote:
> On Thu, 2006-14-09 at 16:33 +0200, Roman Zippel wrote:

>>BTW I don't mind KProbes as an option, but I have huge problem with making
>>it the only option.
>
>
> Indeed, KProbes SystemTAP and LTTng are complementary and people
> involved in the three projects are cooperating.

That doesn't mean we want them all in the kernel.

The best aim would of course be to come up with a solution that has
the advantages of all and disadvantages of none. That may be
impossible, but if we can find one way to do things that is acceptable
to all...

What's the huge problem with making kprobes the only option (that can't
be fixed by doing a bit of coding)?

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-14 18:15:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> but that's not how a fair chunk of people want to use tracing. People
> (enterprise customers trying to figure out performance problems,
> engineers trying to debug things on a live, production system) want to
> be able to insert a tracepoint anywhere and anytime - and also they want
> to have zero overhead from tracing if no tracepoints are used on a
> system.

This is an implementation issue. You can easily have it so that at
the site of a marker you generate some code in a special "trace"
section of the binary which does the actual tracing and insert
noops at the marker site. Therefore the only penalty until the
tracing is enabled is the execution of additional noops.

[ note: this comes from a suggestion made by Hiramatsu-san at
this year's OLS. ]

> wrong: the original demo tracepoints that came with SystemTap still work
> on the current kernel, because the 'coupling' is loose: based on
> function names.
>
> Static tracepoints on the other hand, if added via an external patch, do
> depend on the target function not moving around and the context of the
> tracepoint not being changed. (and static tracepoints if in the source
> all the time are a constant hindrance to development and code
> readability.)

Instrumentation of function boundaries is usually not much of an issue.
Instrumentation of key events, though, is different. Here's the classic:
@@ -1709,6 +1712,7 @@ switch_tasks:
++*switch_count;

prepare_arch_switch(rq, next);
+ TRACE_SCHEDCHANGE(prev, next);
prev = context_switch(rq, prev, next);
barrier();

This is the kind of thing for which the instrumentation, be it static
or dynamic, requires some kind of intelligent analysis of where to
get the info. Now, answer honestly, wouldn't it be simpler to have
such an event marker instead of having to figure out for every kernel
binary you get where the darned probe needs to be inserted?

Karim

2006-09-14 18:24:50

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > for me these are all _independent_ grounds for rejection, as a generic
> > kernel infrastructure.
>
> Tracepoints of course need to be managed, but that's true for both
> dynamic and static tracepoints. [...]

that's not true, and this is the important thing that i believe you are
missing. A dynamic tracepoint is _detached_ from the normal source code
and thus is zero maintainance overhead. You dont have to maintain it
during normal development - only if you need it. You dont see the
dynamic tracepoints in the source code.

a static tracepoint, once it's in the mainline kernel, is a nonzero
maintainance overhead _until eternity_. It is a constant visual
hindrance and a constant build-correctness and boot-correctness problem
if you happen to change the code that is being traced by a static
tracepoint. Again, I am talking out of actual experience with static
tracepoints: i frequently break my kernel via static tracepoints and i
have constant maintainance cost from them. So what i do is that i try to
minimize the number of static tracepoints to _zero_. I.e. i only add
them when i need them for a given bug.

static tracepoints are inferior to dynamic tracepoints in almost every
way.

> [...] Both have their advantages and disadvantages and just hammering
> on the possible problems of static ones [...]

how about giving a line by line rebuttal to the very real problems of
static tracepoints i listed (twice already), instead of calling them
"possible problems"?

i am giving a line by line rebuttal of all arguments that come up.
Please be fair and do the same. Here are the arguments again, for a
third time. Thanks!

> > also, the other disadvantages i listed very much count too. Static
> > tracepoints are fundamentally limited because:
> >
> > - they can only be added at the source code level
> >
> > - modifying them requires a reboot which is not practical in a
> > production environment
> >
> > - there can only be a limited set of them, while many problems need
> > finegrained tracepoints tailored to the problem at hand
> >
> > - conditional tracepoints are typically either nonexistent or very
> > limited.

> > the kprobes infrastructure, despite being fairly young, is widely
> > available: powerpc, i386, x86_64, ia64 and sparc64. The other
> > architectures are free to implement them too, there's nothing
> > hardware-specific about kprobes and the "porting overhead" is in
> > essence a one-time cost - while for static tracepoints the
> > maintainance overhead goes on forever and scales linearly with the
> > number of tracepoints added.
>
> kprobes are not trivial to implement [...]

nor are smp-alternatives, which was suggested as a solution to reduce
the overhead of static tracepoints. So what's the point? It's a one-off
development overhead that has already been done for all the major
arches. If another arch needs it they can certainly implement it.

it's like arguing against ptrace on the grounds of: "application
developers can add printf if they want to debug their apps, or they can
add static tracepoints too, and besides, ptrace is hard to implement".

> I also think you highly exaggerate the maintaince overhead of static
> tracepoints, once added they hardly need any maintainance, most of the
> time you can just ignore them. [...]

hundreds (or possibly thousands) of tracepoints? Have you ever tried to
maintain that? I have and it's a nightmare.

Even assuming a rich set of hundreds of static tracepoints, it doesnt
even solve the problems at hand: people want to do much more when they
probe the kernel - and today, with DTrace under Solaris people _know_
that much better tracing _can be done_, and they _demand_ that Linux
adopts an intelligent solution. The clock is ticking for dinosaurs like
static printks and static tracepoints to debug the kernel...

> [...] The kernel is full debug prints, do you seriously suggest to
> throw them out because of their "high maintainance"?

oh yes, these days i frequently throw them out when i find them in code
i modify. (my most recent such zap was rwsemtrace()). Also, obviously
when most of them were added we didnt have good kernel debugging
infrastructure (in fact we didnt have any kernel debugging
infrastructure besides printk), so _something_ had to be used back then.
But today there's little reason to keep them. Welcome to 2006 :-)

Ingo

2006-09-14 18:28:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Nick Piggin wrote:
> What's the huge problem with making kprobes the only option (that can't
> be fixed by doing a bit of coding)?

No offense, having been on the receiving end of this for a number
of years, one feels like he's watching a never-ending repeat of a
30second commercial where the woman is holding up a magic scrub
and says something like "Just use Mr. Scrub" and the product then
twinkles with some light music and then cut, next commercial;
except in this case, it's "Just use Kprobes" and all your
problems will go away, wink-wink!

Sorry, it's just not that straight-forward. There's a reason
why the systemtap folks got interested in the markers proposal,
they actually have to maintain a dynamic instrumentation set.
Mr. Scrub just doesn't scrub as clean as advertised, you
actually have to scrub to make the scum go away. Which goes
back to what I said elsewhere: no matter where you draw the
line someone is doing the heavy lifting. Doing it outside the
kernel only means that there's yet another piece of software
that needs to be updated before you can actually start
profiting from your new and improved kernel ...

Karim

2006-09-14 18:35:27

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> that's not true, and this is the important thing that i believe you are
> missing. A dynamic tracepoint is _detached_ from the normal source code
> and thus is zero maintainance overhead. You dont have to maintain it
> during normal development - only if you need it. You dont see the
> dynamic tracepoints in the source code.
>

What happen if someone need trace points in "normal kernel development" (which
appears to be the case, see blktrace and latency tracer) ?

> a static tracepoint, once it's in the mainline kernel, is a nonzero
> maintainance overhead _until eternity_. It is a constant visual
> hindrance and a constant build-correctness and boot-correctness problem
> if you happen to change the code that is being traced by a static
> tracepoint. Again, I am talking out of actual experience with static
> tracepoints: i frequently break my kernel via static tracepoints and i
> have constant maintainance cost from them. So what i do is that i try to
> minimize the number of static tracepoints to _zero_. I.e. i only add
> them when i need them for a given bug.
>

What kind of code are you calling from your instrumentation sites to break your
kernel so easily ? Or perhaps are you instrumenting the page fault handler
which, yes, can have side effects? My goal is exctly to provide the kind of
code that can be called from any kernel site without breaking it!

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-14 18:43:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> that's not true, and this is the important thing that i believe you are
> missing. A dynamic tracepoint is _detached_ from the normal source code
> and thus is zero maintainance overhead. You dont have to maintain it
> during normal development - only if you need it. You dont see the
> dynamic tracepoints in the source code.

And that's actually a problem for those who maintain such dynamic
trace points.

> a static tracepoint, once it's in the mainline kernel, is a nonzero
> maintainance overhead _until eternity_. It is a constant visual
> hindrance and a constant build-correctness and boot-correctness problem
> if you happen to change the code that is being traced by a static
> tracepoint. Again, I am talking out of actual experience with static
> tracepoints: i frequently break my kernel via static tracepoints and i
> have constant maintainance cost from them. So what i do is that i try to
> minimize the number of static tracepoints to _zero_. I.e. i only add
> them when i need them for a given bug.

Bzzt, wrong. This is your own personal experience with tracing. Marked
up code does not need to be active under all build conditions. In
fact trace points can be inactive by default at all times, except
when you choose to build them in.

And as I said elsewhere, the fact that your use of instrumentation is
solely for debugging ("i only add them when i need them for a given bug"),
I repeat that there are mortals out there that need this for their
applications.

> static tracepoints are inferior to dynamic tracepoints in almost every
> way.

Sorry, orthogonal is the word.

> hundreds (or possibly thousands) of tracepoints? Have you ever tried to
> maintain that? I have and it's a nightmare.

I have, and I've showed you that you're wrong. The only reason you can
make this argument is that you view these things from the point of view
of what use they are for you as a kernel developer and I will repeat
what I've said for years now: static instrumentation of the kernel
isn't meant to be useful for kernel developers. While it may indeed
be in some cases, in most cases it's likely useless, as you've been
very successfully arguing in this thread. Nevertheless there are very
legitimate uses for standardized instrumentation points.

> Even assuming a rich set of hundreds of static tracepoints, it doesnt
> even solve the problems at hand: people want to do much more when they
> probe the kernel - and today, with DTrace under Solaris people _know_
> that much better tracing _can be done_, and they _demand_ that Linux
> adopts an intelligent solution. The clock is ticking for dinosaurs like
> static printks and static tracepoints to debug the kernel...

Thank you, I couldn't have put it better. This paragraph, more than
any other snippet I've seen to date, clearly demonstrates why
tracing is such a contentious issue. Kernel developers use tracing
during their normal development process, and of course their gut
reaction is: why the hell would anybody need this for mainline? But
of course this misses the entire point. Kernel tracing for developers
is but a corner case of kernel tracing in general. There are very valid
and legitimate reasons for userspace to be able to obtain important
events. And of course any infrastructure developed with that in
mind should also be usable by kernel developers.

Karim

2006-09-14 19:03:33

by grundig

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

El Thu, 14 Sep 2006 08:14:19 -0700,
"Martin J. Bligh" <[email protected]> escribi?:

> 2. You can get zero overhead by CONFIG'ing things out.

IOW, no distro will enable it by default to avoid the overhead,
making it useless for lots of real-world working systems where
you need to guess what's hapenning to software running real
workloads that can't just be stopped.

I guess there's no problem in having both LTT and Kprobes merged in
the main tree at the same time. But Kprobes + systemtap will get
enabled and used by distros massively just because users can start
using it inmediately, without recompiling or installing extra
kernels and rebooting. There're cases where distros may want to
enable automatic tracing in every boot and only on boot but that
don't like to suffer from an extra performance hit after booting...

I'm not meaning that LTT sucks and doesn't have advantages and that
doesn't deserve being merged/used, it just looks like kprobes+systemtap
will get way more real-world users no matter how much you discuss here

2006-09-14 19:11:01

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

grundig wrote:
> IOW, no distro will enable it by default to avoid the overhead,

Please bear in mind that this is an implementation issue. As I've
explained elsewhere, there are ways to implement this where even
compiled-in static tracepoints have practically no cost at all
-- being noops until enabling. Thereby being no justification for
not actually shipping with such built kernels and, therefore,
no reason why tools such as ltt can't real-world usage.

Karim

2006-09-14 19:38:39

by Tim Bird

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Roman Zippel <[email protected]> wrote:
>
>>> for me these are all _independent_ grounds for rejection, as a generic
>>> kernel infrastructure.
>> Tracepoints of course need to be managed, but that's true for both
>> dynamic and static tracepoints. [...]
>
> that's not true, and this is the important thing that i believe you are
> missing. A dynamic tracepoint is _detached_ from the normal source code
> and thus is zero maintainance overhead. You dont have to maintain it
> during normal development - only if you need it. You dont see the
> dynamic tracepoints in the source code.

It's only zero maintenance overhead for you. Someone has to
maintain it. The party line for years has been that in-tree
maintenance is easier than out-of-tree maintenance.

>
> a static tracepoint, once it's in the mainline kernel, is a nonzero
> maintainance overhead _until eternity_. It is a constant visual
> hindrance and a constant build-correctness and boot-correctness problem
> if you happen to change the code that is being traced by a static
> tracepoint. Again, I am talking out of actual experience with static
> tracepoints: i frequently break my kernel via static tracepoints and i
> have constant maintainance cost from them. So what i do is that i try to
> minimize the number of static tracepoints to _zero_. I.e. i only add
> them when i need them for a given bug.

Ingo - I'm sure you are doing things at a level where static tracepoints
impose a significant perturbation to the code. However, if you look
historically at the set of static tracepoints that people have used
with Linux (with LTT or LKST), they are really not too bad to maintain. I'm
repeating what others have said, but I've been working with LTT and
LTTng for several years, and the tracepoints haven't changed very much
in that time. Heck, I've even brought LTTng up on new kernel versions
and new architectures. How hard could it be if I can do it? ;-)
(Of course, who knows if I did it right? - since it's out-of-tree it
doesn't get as much testing.)

The set of static tracepoints (or markers) that is envisioned is in the
range of about 30 to 40 key kernel events. Dynamic tracepoints would
be used for other stuff.

I don't want to offend you, but I suspect your usage model for tracepoints
is different from what the expected (and historical) usage model
would be for LTTng-style static tracepoints.

>
> static tracepoints are inferior to dynamic tracepoints in almost every
> way.
>
>> [...] Both have their advantages and disadvantages and just hammering
>> on the possible problems of static ones [...]
>
> how about giving a line by line rebuttal to the very real problems of
> static tracepoints i listed (twice already), instead of calling them
> "possible problems"?

I respect your experience, but I think it would be more productive
to have this debate when a patch is submitted with a static tracepoint (or marker)
implementation. The patch in question, if I understand correctly, provides
infrastructure for tracing activities and should hopefully be useful for
either static or dynamic tracepoints. I'm hoping someone from the SystemTAP
camp can speak up and give their opinion on whether this is useful. If it is,
then the whole debate about static vs. dynamic tracepoints is less important.
If not, then that's a different debate.

I maintain Kernel Function Trace (KFT) out-of-tree. This is a system which
uses compiler flags to instrument every kernel function entry and exit. For obvious
reasons this type of instrumentation is used only during development, but it has
proven quite handy for certain development tasks (finding long-duration routines and
finding bloated call sequences). I can imagine KFT using the infrastructure
that is provided by the LTTng-core patch (and relinquishing my own infrastructure
for activation, trace control, event handling etc.)

Regards,
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

2006-09-14 19:40:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers <[email protected]> writes:

> [...] However, I restate that my position is that both static and
> dynamic instrumentation of the kernel are a necessity and that a
> tracer core should be usable by both.

On a complementary note, it would be nice if whatever static
instrumetation hooks are deemed worthwhile were themselves generic so
they could be coupled to either a fixed or dynamic "core" or back-end.

- FChE

2006-09-14 19:47:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> > > for me these are all _independent_ grounds for rejection, as a generic
> > > kernel infrastructure.
> >
> > Tracepoints of course need to be managed, but that's true for both
> > dynamic and static tracepoints. [...]
>
> that's not true, and this is the important thing that i believe you are
> missing. A dynamic tracepoint is _detached_ from the normal source code
> and thus is zero maintainance overhead. You dont have to maintain it
> during normal development - only if you need it. You dont see the
> dynamic tracepoints in the source code.
>
> a static tracepoint, once it's in the mainline kernel, is a nonzero
> maintainance overhead _until eternity_.

I hope you do realize that this a rather selfish point of view. The zero
maintainance overhead is a myth, only because _you_ don't have to do it.
OTOH maintaining the trace points along with the corresponding source is
a barely noticable noise and is certainly less work than having them to
maintain separately.

> It is a constant visual
> hindrance and a constant build-correctness and boot-correctness problem
> if you happen to change the code that is being traced by a static
> tracepoint. Again, I am talking out of actual experience with static
> tracepoints: i frequently break my kernel via static tracepoints and i
> have constant maintainance cost from them.

Sorry, but you're not the only one with actual experience and in my
experience the value far outweighs the occasional need for adjustments. If
you don't use them, they are of course a nuisance, but is your personal
dislike really reason enough to deny others a useful tool?

> i am giving a line by line rebuttal of all arguments that come up.
> Please be fair and do the same. Here are the arguments again, for a
> third time. Thanks!

Ingo, maybe you should try to understand the point I'm trying to make?
You mostly emphasize your personal dislike of static tracepoints.

> > > also, the other disadvantages i listed very much count too. Static
> > > tracepoints are fundamentally limited because:
> > >
> > > - they can only be added at the source code level
> > >
> > > - modifying them requires a reboot which is not practical in a
> > > production environment
> > >
> > > - there can only be a limited set of them, while many problems need
> > > finegrained tracepoints tailored to the problem at hand
> > >
> > > - conditional tracepoints are typically either nonexistent or very
> > > limited.

Sorry, but I fail to see the point you're trying to make (beside your
personal preferences), none of this is a unsolvable problem, which would
prevent making good use of static tracepoints.

> > > the kprobes infrastructure, despite being fairly young, is widely
> > > available: powerpc, i386, x86_64, ia64 and sparc64. The other
> > > architectures are free to implement them too, there's nothing
> > > hardware-specific about kprobes and the "porting overhead" is in
> > > essence a one-time cost - while for static tracepoints the
> > > maintainance overhead goes on forever and scales linearly with the
> > > number of tracepoints added.
> >
> > kprobes are not trivial to implement [...]
>
> nor are smp-alternatives, which was suggested as a solution to reduce
> the overhead of static tracepoints. So what's the point? It's a one-off
> development overhead that has already been done for all the major
> arches. If another arch needs it they can certainly implement it.

Static tracepoints don't have to be implemented via alternatives and
you continue to ignore that kprobes are nontrivial, you continue to ignore
that both can coexist just fine. You just want to force your personal
preferences onto others. :-(

> it's like arguing against ptrace on the grounds of: "application
> developers can add printf if they want to debug their apps, or they can
> add static tracepoints too, and besides, ptrace is hard to implement".

Sorry, I don't understand this point. Ptrace support would match kernel
gdb support, which would be a complete different discussion...

> > I also think you highly exaggerate the maintaince overhead of static
> > tracepoints, once added they hardly need any maintainance, most of the
> > time you can just ignore them. [...]
>
> hundreds (or possibly thousands) of tracepoints? Have you ever tried to
> maintain that? I have and it's a nightmare.

_This_ discussion is about a core set of trace points! Yes, you can have
thousands of trace points in drivers, but they don't have to be enabled by
default and are no reason at all against a few core trace point, which
can be used by _all_ archs to trace core events as _cheaply_ as possible.

> Even assuming a rich set of hundreds of static tracepoints, it doesnt
> even solve the problems at hand: people want to do much more when they
> probe the kernel - and today, with DTrace under Solaris people _know_
> that much better tracing _can be done_, and they _demand_ that Linux
> adopts an intelligent solution. The clock is ticking for dinosaurs like
> static printks and static tracepoints to debug the kernel...

Huh? How exactly do static tracepoints prevent you from doing this?
Different problems require different solutions, nobody is taking Kprobes
away, but why should Kprobes be the only solution?

bye, Roman

2006-09-14 19:48:59

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

"Martin J. Bligh" <[email protected]> writes:

> [...] What would be really nice is one trace infrastructure, that
> allowed both static and dynamic tracepoints

We in systemtap land hope to encounter *some* static tracepoint
structure, perhaps like the one I presented at OLS, via which
systemtap could become your unified static+dynamic "infrastructure".
Even in that universe, using LTT-derived code for high-performance
tracing is within the realm of reason.

> without all the awk-style language crap that seems to come with
> systemtap.

I'm sorry to hear you dislike the scripting language. But that's
okay, you Real Men can embed literal C code inside systemtap scripts
to do the Real Work, and leave to systemtap only sundry duties such as
probe placement and removal.

- FChE

2006-09-14 20:03:53

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Martin J. Bligh <[email protected]> wrote:
>
>
>>>>Comments and reviews are very welcome.
>>>
>>>i have one very fundamental question: why should we do this
>>>source-intrusive method of adding tracepoints instead of the
>>>dynamic, unintrusive (and thus zero-overhead) KProbes+SystemTap
>>>method?
>>
>>Because:
>>
>>1. Kprobes are more overhead when they *are* being used.
>
>
> minimally so - at least on i386 and x86_64. In that sense tracing is a
> _slowpath_, and it _will_ slow things down if done excessively. I dont
> care about the tracepoint being slower by a few instructions as long as
> it has _zero effect_ on normal code, be that source code or binary code.

Would be interesting to see some measurements. But jumping is slower
than a simple branch (or noops to skip over that can be overwritten).

>>2. You can get zero overhead by CONFIG'ing things out.
>
> but that's not how a fair chunk of people want to use tracing. People
> (enterprise customers trying to figure out performance problems,
> engineers trying to debug things on a live, production system) want to
> be able to insert a tracepoint anywhere and anytime - and also they want
> to have zero overhead from tracing if no tracepoints are used on a
> system.

I'm fine with that ... "a fair chunk of people" - but it's not everyone,
by any means. We need both static and dynamic tracepoints, in one
infrastructure.

>>3. (most importantly) it's a bitch to maintain tracepoints out
>> of-tree on a rapidly moving kernel
>
> wrong: the original demo tracepoints that came with SystemTap still work
> on the current kernel, because the 'coupling' is loose: based on
> function names.

And what do those trace? I bet not half the stuff we want to do.
I've been migrating Google's tracepoints around between different
kernel versions, and it's not a mechanical port. Just stupid things
like renaming of functions inside memory reclaim creates pain, for
starters. (shrink_cache/shrink_list, refill_inactive_zone, etc).

> Static tracepoints on the other hand, if added via an external patch, do
> depend on the target function not moving around and the context of the
> tracepoint not being changed. (and static tracepoints if in the source
> all the time are a constant hindrance to development and code
> readability.)

an external patch is, indeed, pretty useless. Merging a few simple
tracepoints should not be a problem - see blktrace and schedstats,
for instance.

> and of course the big advantage of dynamic probing is its flexibility:
> you can add add-hoc tracepoints to thousands of functions, instead of
> having to maintain hundreds (or thousands) of static tracepoints all the
> time. (and if we wont end up with hundreds/thousands of static
> tracepoints then it wont be usable enough as a generic solution.)

I wasn't saying that dynamic tracepoints are useless - I agree it's
valuable to add stuff on the fly. But some things are better done
statically.

>>4. I believe kprobes still doesn't have full access to local
>>variables.
>
> wrong: with SystemTap you can probe local variables too (via
> jprobes/kretprobes, all in the upstream kernel already).

I'll look again, but last time I looked it didn't do this, and
when I spoke to the kprobes/systemtap people at OLS, IIRC they
said it still couldn't.

>>Now (3) is possibly solvable by putting the points in as no-ops
>>(either insert a few nops or just a marker entry in the symbol
>>table?), but full dynamic just isn't sustainable. [...]
>
> i'm not sure i follow. Could you explain where SystemTap has this
> difficulty?

If you have an extremely limited set of probes, on a static area
of the kernel, then yes, they may work for a long time. But try
tracing something like the scheduler, which people seem to delight
in rewriting every month or two ...

It amuses me that we're so opposed to external patches to the tree
(for perfectly understandable reasons), but we somehow think tracepoints
are magically different and should be maintained out of tree somehow.
You yourself made the argument that it's a maintainance burden to
keep the trace points *in* the tree ... if that's true, how is it
any easier to keep them outside of the tree?

If we really want to, we can still keep the hooks inside the code,
and have them do absolutely nothing at all - putting markers into
the symbol table is pretty much free. It also reuses the well
structured code-sharing mechanism we already have in place - the
linux kernel tree.

I really don't want to deal with all the systemtap crap - I just
want something that works, and I don't particularly care if I have
to recompile the kernel to get it. I know that doesn't suit everyone,
but there are requirements on both sides, and we should not dismiss
each other's requirements out of hand.

Having one consistent consistent collection mechanism for all these
different types of tracing data seems both logical and very important
to me ...

M.

2006-09-14 20:09:37

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Tim Bird <[email protected]> wrote:

> > that's not true, and this is the important thing that i believe you
> > are missing. A dynamic tracepoint is _detached_ from the normal
> > source code and thus is zero maintainance overhead. You dont have to
> > maintain it during normal development - only if you need it. You
> > dont see the dynamic tracepoints in the source code.
>
> It's only zero maintenance overhead for you. Someone has to maintain
> it. The party line for years has been that in-tree maintenance is
> easier than out-of-tree maintenance.

There's a third option, and that's the one i'm advocating: adding the
tracepoint rules to the kernel, but in a _detached_ form from the actual
source code.

yes, someone has to maintain it, but that will be a detached effort, on
a low-frequency as-needed basis. It doesnt slow down or hinder
high-frequency fast prototyping work, it does not impact the source code
visually, and it does not make reading the code harder. Furthermore,
while a single broken LTT tracepoint prevents the kernel from building
at all, a single broken dynamic rule just wont be inserted into the
kernel. All the other rules are still very much intact.

Ingo

2006-09-14 20:23:31

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> an external patch is, indeed, pretty useless. Merging a few simple
> tracepoints should not be a problem [...]

the problem is, LTT is not about a 'few' tracepoints: it adds a whopping
350 tracepoints, a fair portion of it is multi-line with tons of
arguments.

$ diffstat patch-2.6.17-lttng-0.5.108-instrumentation*
98 files changed, 1450 insertions(+), 64 deletions(-)

saying "it's just a few lightweight tracepoints" misses two points: it's
not just a few, and it's not lightweight.

and the set of tracepoints never gets smaller. People who start to rely
on a tracepoint will scream bloody murder if it goes away or breaks.
Static tracepoints are a maintainance PITA that will rarely get smaller,
and will easily grow ...

> [...] - see blktrace and schedstats, for instance.

yes, i do want to remove the 34 schedstats tracepoints too, once a
feasible alternative is present. I already have to do two compilations
when changing something substantial in the scheduler - once with and
once without schedstats.

same for blktrace: once SystemTap can provide a compatible replacement,
it should.

> It amuses me that we're so opposed to external patches to the tree
> (for perfectly understandable reasons), but we somehow think
> tracepoints are magically different and should be maintained out of
> tree somehow.

i think you misunderstood what i meant. SystemTap should very much be
integrated into the kernel proper, but i dont think the _rules_ (and
scripts) should become part of the _source code files themselves_. So
yes, there's advantage to kernel integration, but there's disadvantage
to littering the kernel source with countless static tracepoints, if
dynamic tracepoints can offer the same benefits (or more).

the question is: what is more maintainance, hundreds of static
tracepoints (with long parameter lists) all around the (core) kernel, or
hundreds of detached dynamic rules that need an update every now and
then? [but of which most would still be usable even if some of them
"broke"] To me the answer is clear: having hundreds of tracepoints
_within_ the source code is higher cost. But please prove me wrong :-)

Ingo

2006-09-14 20:25:59

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> if there are lots of tracepoints (and the union of _all_ useful
> tracepoints that i ever encountered in my life goes into the thousands)
> then the overhead is not zero at all.
>
> also, the other disadvantages i listed very much count too. Static
> tracepoints are fundamentally limited because:
>
> - they can only be added at the source code level
>
> - modifying them requires a reboot which is not practical in a
> production environment
>
> - there can only be a limited set of them, while many problems need
> finegrained tracepoints tailored to the problem at hand
>
> - conditional tracepoints are typically either nonexistent or very
> limited.
>
> for me these are all _independent_ grounds for rejection, as a generic
> kernel infrastructure.

I don't think anyone is saying that static tracepoints do not have their
limitations, or that dynamic tracepointing is useless. But that's not
the point ... why can't we have one infrastructure that supports both?
Preferably in a fairly simple, consistent way.

M.

2006-09-14 20:33:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > > also, the other disadvantages i listed very much count too. Static
> > > > tracepoints are fundamentally limited because:
> > > >
> > > > - they can only be added at the source code level
> > > >
> > > > - modifying them requires a reboot which is not practical in a
> > > > production environment
> > > >
> > > > - there can only be a limited set of them, while many problems need
> > > > finegrained tracepoints tailored to the problem at hand
> > > >
> > > > - conditional tracepoints are typically either nonexistent or very
> > > > limited.
>
> Sorry, but I fail to see the point you're trying to make (beside your
> personal preferences), none of this is a unsolvable problem, which
> would prevent making good use of static tracepoints.

those are technical arguments - i'm not sure how you can understand them
to be "personal preferences". The only personal preference i have is
that in the end a technically most superior solution should be merged.
(be that one project or the other, or a hybrid of the two) The analysis
of which one is a better solution depends on pros and cons - exactly
like the ones listed above. If they are solvable problems then please
let me know how you would solve them and when you (or others) would
solve them, preferably before merging the code. Right now they are
pretty heavy cons as far as LTT goes, so obviously they have a primary
impact on the topic at hand (whic is whether to merge LTT or not).

Ingo

2006-09-14 20:36:19

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> There's a third option, and that's the one i'm advocating: adding the
> tracepoint rules to the kernel, but in a _detached_ form from the actual
> source code.
>
> yes, someone has to maintain it, but that will be a detached effort, on
> a low-frequency as-needed basis. It doesnt slow down or hinder
> high-frequency fast prototyping work, it does not impact the source code
> visually, and it does not make reading the code harder. Furthermore,
> while a single broken LTT tracepoint prevents the kernel from building
> at all, a single broken dynamic rule just wont be inserted into the
> kernel. All the other rules are still very much intact.

Actually the way ltt used to add its trace-statements is again an
implementation issue. Broken tracepoints need not lead to kernel
build failure.

That's where the markers idea can be useful. What a marker should
do is but provide location. It doesn't need to specify the variables
being observed or anything local, though it doesn't mean the
infrastructure shouldn't allow for this if the maintainer of the
code wanted to.

Ideally, though, markers should be self-contained. IOW, the person
implementing such a marker should not need to edit any other file
that the one being worked on to add an instrumentation point --
at least that's the way I think is easiest. What this means is that
you would be able to add an instrumentation point in the kernel,
build it, run the tracing and view the trace with your new event
without any further intervention on any tool, header, or anything
else.

The only way that I believe this can be done is with a flexible
marker infrastructure that a has a few basic properties:
- Markers should be inlined (clearly this is the bone of contention
at this point of the thread.)
- By default, all markers should generate not a single instruction
or modify any instruction path that would be generated should the
the instrumentation not be there.
- Allow the person instrumenting to specify which variables they
are interested in without any possibility of build failure should
the code change making the variable obsolete.
- Build options should be added allowing users to:
- Keep instrumentation disabled.
- Create inlined trace points.
- Create dynamic instrumentation markers.
- Automatically generate appropriate information required for
tools to be able to deal with the new instrumentation and/or
display new information properly -- possibly in a new section
of the binary.
- etc.

Again, the goal is to have the loop from instrumentation to
visualization as simple as possible. Any instrumentation required
more that single-file modification is bound to fall in bitrot,
and fast.

Hope this helps.

Thanks,

Karim

2006-09-14 20:40:44

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Martin Bligh <[email protected]> wrote:
>
>
>>an external patch is, indeed, pretty useless. Merging a few simple
>>tracepoints should not be a problem [...]
>
>
> the problem is, LTT is not about a 'few' tracepoints: it adds a whopping
> 350 tracepoints, a fair portion of it is multi-line with tons of
> arguments.

"static tracepoints" does not equate directly to "all of LTT". I'm not
saying we should accept LTT as-is. I'm saying we should not reject the
concept of static tracepoints.

> $ diffstat patch-2.6.17-lttng-0.5.108-instrumentation*
> 98 files changed, 1450 insertions(+), 64 deletions(-)
>
> saying "it's just a few lightweight tracepoints" misses two points: it's
> not just a few, and it's not lightweight.
>
> and the set of tracepoints never gets smaller. People who start to rely
> on a tracepoint will scream bloody murder if it goes away or breaks.
> Static tracepoints are a maintainance PITA that will rarely get smaller,
> and will easily grow ...

If people are *using* them, it's no easier to maintain them outside of
tree, than in-tree. it's significantly harder.

>>[...] - see blktrace and schedstats, for instance.
>
> yes, i do want to remove the 34 schedstats tracepoints too, once a
> feasible alternative is present. I already have to do two compilations
> when changing something substantial in the scheduler - once with and
> once without schedstats.
>
> same for blktrace: once SystemTap can provide a compatible replacement,
> it should.

Your argument about schedstats only seems to illustrate the flaws in the
arguments for dynamic tracepointing - you've put your finger on exactly
what the problem is, when the code changes, the tracing HAS to change
too. The best time to do this is when the code itself changes.

It's the same arguement for putting documentation in the C file against
the source itself.

>>It amuses me that we're so opposed to external patches to the tree
>>(for perfectly understandable reasons), but we somehow think
>>tracepoints are magically different and should be maintained out of
>>tree somehow.
>
> i think you misunderstood what i meant. SystemTap should very much be
> integrated into the kernel proper, but i dont think the _rules_ (and
> scripts) should become part of the _source code files themselves_. So
> yes, there's advantage to kernel integration, but there's disadvantage
> to littering the kernel source with countless static tracepoints, if
> dynamic tracepoints can offer the same benefits (or more).

If you're talking about the scriptable awk-like "stuff" that comes with
Systemtap, yes I agree it should not be in the C code, it's foul.
However, I don't think a simple macro hooks are a burden.

> the question is: what is more maintainance, hundreds of static
> tracepoints (with long parameter lists) all around the (core) kernel, or
> hundreds of detached dynamic rules that need an update every now and
> then? [but of which most would still be usable even if some of them
> "broke"] To me the answer is clear: having hundreds of tracepoints
> _within_ the source code is higher cost. But please prove me wrong :-)

How can you possibly say that maintaining the same set of data in two
dis-coupled trees is easier than doing it in the same place? You don't
require any *less* information to do it with systemtap than you do with
some form of static tracing.

If you're talking about the effort of maintaining just what's in the
kernel tree, then of course it's a little easier, but that's only half
the equation. And I don't think it's much of a burden, frankly. Yes,
if we have 2 billion tracepoints, it'll be a pain in the arse, but the
taste of the subsystem maintainers is what would regulate this, along
with everything else that we do. They'll accept a few important ones,
and reject the rest. If it's not valuable in general, they won't take
it. I don't see what the big problem is.

What *is* a problem is having a two separate mechanisms for doing
dynamic and static tracing. They should share the same logging
facilities and readback mechanisms so we can read both types
consistently from userspace, and the data is correctly interspersed.

M.

2006-09-14 20:42:37

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> >if there are lots of tracepoints (and the union of _all_ useful
> >tracepoints that i ever encountered in my life goes into the thousands)
> >then the overhead is not zero at all.
> >
> >also, the other disadvantages i listed very much count too. Static
> >tracepoints are fundamentally limited because:
> >
> > - they can only be added at the source code level
> >
> > - modifying them requires a reboot which is not practical in a
> > production environment
> >
> > - there can only be a limited set of them, while many problems need
> > finegrained tracepoints tailored to the problem at hand
> >
> > - conditional tracepoints are typically either nonexistent or very
> > limited.
> >
> >for me these are all _independent_ grounds for rejection, as a generic
> >kernel infrastructure.
>
> I don't think anyone is saying that static tracepoints do not have
> their limitations, or that dynamic tracepointing is useless. But
> that's not the point ... why can't we have one infrastructure that
> supports both? Preferably in a fairly simple, consistent way.

primarily because i fail to see any property of static tracers that are
not met by dynamic tracers. So to me dynamic tracers like SystemTap are
a superset of static tracers.

So my position is that what we should concentrate on is to make the life
of dynamic tracers easier (be that a handful of generic, parametric
hooks that gather debuginfo information and add NOPs for easy patching),
while realizing that static tracers have no advantage over dynamic
tracers.

i.e. why add infrastructure for the sake of something that is clearly
inferior? I have no problem with adding infrastructure for SystemTap,
but i am asking the question: is it worth adding a static tracer?

I would of course accept static tracers too if someone proved it that
they offer something that dynamic tracers cannot do.

(Just like i would accept the reintroduction of the Big Kernel Lock too,
if someone proved it that it's the right thing to do.)

Ingo

2006-09-14 20:54:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> those are technical arguments - i'm not sure how you can understand them
> to be "personal preferences". The only personal preference i have is
> that in the end a technically most superior solution should be merged.

Ingo, so far you have made not a single argument why they can't coexist
except for your personal dislike.

bye, Roman

2006-09-14 20:55:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Martin Bligh <[email protected]> wrote:
>
>
>>>if there are lots of tracepoints (and the union of _all_ useful
>>>tracepoints that i ever encountered in my life goes into the thousands)
>>>then the overhead is not zero at all.
>>>
>>>also, the other disadvantages i listed very much count too. Static
>>>tracepoints are fundamentally limited because:
>>>
>>> - they can only be added at the source code level
>>>
>>> - modifying them requires a reboot which is not practical in a
>>> production environment
>>>
>>> - there can only be a limited set of them, while many problems need
>>> finegrained tracepoints tailored to the problem at hand
>>>
>>> - conditional tracepoints are typically either nonexistent or very
>>> limited.
>>>
>>>for me these are all _independent_ grounds for rejection, as a generic
>>>kernel infrastructure.
>>
>>I don't think anyone is saying that static tracepoints do not have
>>their limitations, or that dynamic tracepointing is useless. But
>>that's not the point ... why can't we have one infrastructure that
>>supports both? Preferably in a fairly simple, consistent way.
>
>
> primarily because i fail to see any property of static tracers that are
> not met by dynamic tracers. So to me dynamic tracers like SystemTap are
> a superset of static tracers.

1. They're harder to maintain out of tree.
2. they're written in some jibberish awk crap
3. They're slower. If you're doing thousands of tracepoints a second,
into a circular 8GB log buffer, that *does* matter. You want
to peturb what you're measuring as little as possible.

If you're running across thousands of systems, in live production, in
order to catch a rare race condition, the performance does matter.

> So my position is that what we should concentrate on is to make the life
> of dynamic tracers easier (be that a handful of generic, parametric
> hooks that gather debuginfo information and add NOPs for easy patching),
> while realizing that static tracers have no advantage over dynamic
> tracers.

I'm confused. You're saying that the dynamic tracers need help by
adding some static data to the kernel, and yet at the same time
rejecting static additions to the kernel on the grounds they have
no value???

Perhaps we're just meaning different things by static tracing. To me,
what is important is that there is a well-defined place in the source
code where the data needed to be logged, and the exact place to log
it at, is defined. If all that macro does to the compilation is add
a couple of nops, and make an entry in a symbol data, or other debug
data, for something to hook into later that's *fine*. The point is
to maintain the location and intelligence about *what* to trace.

Perhaps I'm calling that static, and you're calling it dynamic? Would
explain why we're disagreeing ;-) Seems to be exactly what you're
suggesting above?

If we want it to be superfast, we could compile with a different config
option to insert some tracing statically in there or something, but I
agree it should not be the default.

> i.e. why add infrastructure for the sake of something that is clearly
> inferior? I have no problem with adding infrastructure for SystemTap,
> but i am asking the question: is it worth adding a static tracer?

Yes ;-) Realise that your usage model is not exactly the same as
everyone else's, and I don't give a damn if I have to recompile. I
realise other people do, but ....

> I would of course accept static tracers too if someone proved it that
> they offer something that dynamic tracers cannot do.

Can you *really* trace *any* variable (stack variables, etc) at *any*
point within *any* function with kprobes? It didn't do that before,
and I find it hard to see how it could, given compiler optimizations,
etc.

> (Just like i would accept the reintroduction of the Big Kernel Lock too,
> if someone proved it that it's the right thing to do.)

Surely it's still there at the moment? ;-)

M.

2006-09-14 21:02:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> > It's only zero maintenance overhead for you. Someone has to maintain
> > it. The party line for years has been that in-tree maintenance is
> > easier than out-of-tree maintenance.
>
> There's a third option, and that's the one i'm advocating: adding the
> tracepoint rules to the kernel, but in a _detached_ form from the actual
> source code.
>
> yes, someone has to maintain it, but that will be a detached effort, on
> a low-frequency as-needed basis. It doesnt slow down or hinder
> high-frequency fast prototyping work, it does not impact the source code
> visually, and it does not make reading the code harder. Furthermore,
> while a single broken LTT tracepoint prevents the kernel from building
> at all, a single broken dynamic rule just wont be inserted into the
> kernel. All the other rules are still very much intact.

This pretty much contradicts existing experience, most core events are
rather static - a schedule event is a schedule event no matter how the
actual scheduler is implemented.
Separate tracepoints are like separate documentation, there are forgotten
by the developers who could easily keep them uptodate if they were close
to the source.

bye, Roman

2006-09-14 21:05:51

by Michel Dagenais

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> the question is: what is more maintainance, hundreds of static
> tracepoints (with long parameter lists) all around the (core) kernel, or
> hundreds of detached dynamic rules that need an update every now and
> then? [but of which most would still be usable even if some of them
> "broke"] To me the answer is clear: having hundreds of tracepoints
> _within_ the source code is higher cost. But please prove me wrong :-)

Actually I rarely find that any of the 70 000 printk is such a huge
nuisance to code readability. They may even help understand what is
going on in a code area you are less familiar with.

2006-09-14 21:07:42

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Ingo Molnar wrote:

> primarily because i fail to see any property of static tracers that are
> not met by dynamic tracers. So to me dynamic tracers like SystemTap are
> a superset of static tracers.

You keep ignoring that a dynamic tracer is nontrivial... :-(
A static tracer is easy to implement and sufficient for many uses and
most important it doesn't prevent anyone from using a dynamic tracer.
Having a choice is good!

bye, Roman

2006-09-14 21:08:39

by Daniel Walker

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Thu, 2006-09-14 at 22:54 +0200, Roman Zippel wrote:
> Hi,
>
> On Thu, 14 Sep 2006, Ingo Molnar wrote:
>
> > those are technical arguments - i'm not sure how you can understand them
> > to be "personal preferences". The only personal preference i have is
> > that in the end a technically most superior solution should be merged.
>
> Ingo, so far you have made not a single argument why they can't coexist
> except for your personal dislike.

Not to put to fine a point on it, but I think there's not a small number
of us that "prefer" the best solution.

Daniel

2006-09-14 21:30:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Thu, 14 Sep 2006, Daniel Walker wrote:

> > Ingo, so far you have made not a single argument why they can't coexist
> > except for your personal dislike.
>
> Not to put to fine a point on it, but I think there's not a small number
> of us that "prefer" the best solution.

You can have it.
OTOH I would also like to know what's going in my m68k kernel without
having to implement some rather complex infrastructure, which I don't need
otherwise. There hasn't been a single argument so far, why we can't have
both.

bye, Roman

2006-09-14 21:39:38

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> > primarily because i fail to see any property of static tracers that
> > are not met by dynamic tracers. So to me dynamic tracers like
> > SystemTap are a superset of static tracers.
>
> 1. They're harder to maintain out of tree.

as i mentioned before, SystemTap should be in tree. Relayfs was added
for the sake of SystemTap for example, i have no problem with moving
SystemTap into the tree either.

> 2. they're written in some jibberish awk crap

You can write embedded-C SystemTap scripts too. There's an "EMBEDDED C"
section in "man stap".

> 3. They're slower. If you're doing thousands of tracepoints a second,
> into a circular 8GB log buffer, that *does* matter. You want
> to peturb what you're measuring as little as possible.

i very much agree that they should become as fast as possible. So to
rephrase the question: can we make dynamic tracepoints as fast (or
nearly as fast) as static tracepoints? If yes, should we care about
static tracers at all?

> >So my position is that what we should concentrate on is to make the life
> >of dynamic tracers easier (be that a handful of generic, parametric
> >hooks that gather debuginfo information and add NOPs for easy patching),
> >while realizing that static tracers have no advantage over dynamic
> >tracers.
>
> I'm confused. You're saying that the dynamic tracers need help by
> adding some static data to the kernel, and yet at the same time
> rejecting static additions to the kernel on the grounds they have no
> value???

no. I'm saying that dynamic tracers are fundamentally more advanced, and
that _iff_ we are to add static info to the kernel we should add it _for
the sole sake of speeding up dynamic tracers_. If static tracers can
live off the same hooks then fine, but we should architect primarily for
the needs of the dynamic tracers.

> Perhaps we're just meaning different things by static tracing. To me,
> what is important is that there is a well-defined place in the source
> code where the data needed to be logged, and the exact place to log it
> at, is defined. If all that macro does to the compilation is add a
> couple of nops, and make an entry in a symbol data, or other debug
> data, for something to hook into later that's *fine*. The point is to
> maintain the location and intelligence about *what* to trace.

ok. For me 'static tracepoints' are like the sort of stuff that LTT
adds: funky function names littering the tree.

i see the point behind 'data extraction point' hooks mentioned by you as
a compromise, which incidentally will also speed up dynamic tracepoints
to the level of static tracepoints. But they should be very much
constructed as data extraction points for the purposes of dynamic
tracers. (which the LTT hooks currently are not)

> If we want it to be superfast, we could compile with a different
> config option to insert some tracing statically in there or something,
> but I agree it should not be the default.

for a dynamic tracer all that is needed is a 5-byte NOP (even on
64-bit), and the availability of all the data. Maybe even a function
call that can be patched out after bootup, with NOPs. But the current
LTT stuff has lots of inlined crap that just bloats the kernel.

> >i.e. why add infrastructure for the sake of something that is clearly
> >inferior? I have no problem with adding infrastructure for SystemTap,
> >but i am asking the question: is it worth adding a static tracer?
>
> Yes ;-) Realise that your usage model is not exactly the same as
> everyone else's, and I don't give a damn if I have to recompile. I
> realise other people do, but ....

So you dont care about recompiling: that's fine - but others care, so as
long as all your needs are met (which we are working on meeting :-) then
we'll go for the solution that is better - instead of having some dual
debugging infrastructure.

> > (Just like i would accept the reintroduction of the Big Kernel Lock
> > too, if someone proved it that it's the right thing to do.)
>
> Surely it's still there at the moment? ;-)

no - at least for me it's the Big Kernel Semaphore ;-)

Ingo

2006-09-14 22:24:01

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Thu, 14 Sep 2006, Daniel Walker wrote:
>
> > > Ingo, so far you have made not a single argument why they can't coexist
> > > except for your personal dislike.
> >
> > Not to put to fine a point on it, but I think there's not a small number
> > of us that "prefer" the best solution.
>
> You can have it.
> OTOH I would also like to know what's going in my m68k kernel without
> having to implement some rather complex infrastructure, which I don't
> need otherwise. There hasn't been a single argument so far, why we
> can't have both.

the argument is very simple: LTT creates strong coupling, it is almost a
set of 350+ system-calls, moved into the heart of the kernel. Once moved
in, it's very hard to remove it. "Why did you remove that trace
information, you broke my LTT script!"

While with SystemTap the coupling is alot smaller. With dynamic tracing
there's no _fundamental requirement_ for _any_ tracepoint to be in the
source code, hence we have the present and future flexibility to
eliminate most of them. So my point is: shape all the static tracepoints
in a "provide data to dynamic tracers" way. If they are removed (which
we should have the freedom to do), the removal is not a showstopper.

Flexibility of future choices, especially for user/developer-visible
features, is one of the most important factors of kernel maintainance.

Ingo

2006-09-14 22:25:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Martin Bligh <[email protected]> wrote:
>
>
>>>primarily because i fail to see any property of static tracers that
>>>are not met by dynamic tracers. So to me dynamic tracers like
>>>SystemTap are a superset of static tracers.
>>
>>1. They're harder to maintain out of tree.
>
> as i mentioned before, SystemTap should be in tree. Relayfs was added
> for the sake of SystemTap for example, i have no problem with moving
> SystemTap into the tree either.

Right, but I'm not talking about the infrastructure, I'm talking about
the placement of the trace points, and the local variables they need
to access in order to get useful data.

>>2. they're written in some jibberish awk crap
>
> You can write embedded-C SystemTap scripts too. There's an "EMBEDDED C"
> section in "man stap".

OK, that helps - thanks. Will try to find some time to go back and look
again.

>
>>3. They're slower. If you're doing thousands of tracepoints a second,
>> into a circular 8GB log buffer, that *does* matter. You want
>> to peturb what you're measuring as little as possible.
>
> i very much agree that they should become as fast as possible. So to
> rephrase the question: can we make dynamic tracepoints as fast (or
> nearly as fast) as static tracepoints? If yes, should we care about
> static tracers at all?

Depends how many nops you're willing to add, I guess. Anything, even
the static tracepoints really needs at least a branch to be useful,
IMHO. At least for what I've been doing with it, you need to stop
the data flow after a while (when the event you're interested in
happens, I'm using it like a flight data recorder, so we can go back
and do postmortem on what went wrong). I should imagine branch
prediction makes it very cheap on most modern CPUs, but don't have
hard data to hand.

OTOH, if you don't know in advance how big the tracing point is
(ie what it's having to do within there to log), you have a problem.
I believe the usual way kprobes/systemtap does this is to do a jump
out of line, which is significantly slower. If we could get a good
estimate on how large the trace point was *likely* to be, maybe we
could leave enough space in nop's inline? OTOH, if we do that a lot,
we end up increasing code size ....

So I suspect the correct compromise is to have macros that normally
are extremely non-invasive, either just entries in a data table (no
code impact) or that plus enough nops to do a jump (as I understand
it, you sometimes need the nops because it's not always possible to
relocate certain bits of code ... perhaps we can detect when?). But
it *will* be slower at trace time, because we're still jumping.
OTOH, if you want it to be fast, you recompile with the "I actually
need tracing to be superfast" option, and it leaves more space.
Seems to give the best of both worlds, as needed.

>>>So my position is that what we should concentrate on is to make the life
>>>of dynamic tracers easier (be that a handful of generic, parametric
>>>hooks that gather debuginfo information and add NOPs for easy patching),
>>>while realizing that static tracers have no advantage over dynamic
>>>tracers.
>>
>>I'm confused. You're saying that the dynamic tracers need help by
>>adding some static data to the kernel, and yet at the same time
>>rejecting static additions to the kernel on the grounds they have no
>>value???
>
> no. I'm saying that dynamic tracers are fundamentally more advanced, and
> that _iff_ we are to add static info to the kernel we should add it _for
> the sole sake of speeding up dynamic tracers_. If static tracers can
> live off the same hooks then fine, but we should architect primarily for
> the needs of the dynamic tracers.

OK. Not too fusssed about the exact details ... would it be fair to say
that you agree that we may need to add *some* instrumentation / hooks
into the codebase in order to locate where and what to trace? Beyond
that, it seems like little bits of implementation detail to me. What
we ended up with was basically:

ktrace(major_type, minor_type, data, ...)

The minor and major types were enums, but given descriptive names, they
actually seem to help, rather than hinder, code readability. I'd send
out the code, but it needs a major cleanup first ;-)

> ok. For me 'static tracepoints' are like the sort of stuff that LTT
> adds: funky function names littering the tree.

I think it can be done in different ways, some cleaner than others.
What's important, to me at least, is that the tags are in tree to make
them maintained along with the code, and we can get at all local
variable data, etc, easily. Obviously, beyond that, it should be
as clean and uninvasive as possible. Maybe others have different views,
not sure.

> i see the point behind 'data extraction point' hooks mentioned by you as
> a compromise, which incidentally will also speed up dynamic tracepoints
> to the level of static tracepoints. But they should be very much
> constructed as data extraction points for the purposes of dynamic
> tracers. (which the LTT hooks currently are not)

OK. Not sure I care too much what the purpose is, as long as they tag
where and what needs extracting, people can use them for whatever ...
as handbags to dance round, as far as I care ;-)

>>If we want it to be superfast, we could compile with a different
>>config option to insert some tracing statically in there or something,
>>but I agree it should not be the default.
>
> for a dynamic tracer all that is needed is a 5-byte NOP (even on
> 64-bit), and the availability of all the data. Maybe even a function
> call that can be patched out after bootup, with NOPs. But the current
> LTT stuff has lots of inlined crap that just bloats the kernel.

OK. But I don't think that's inherent to tracing hooks ... sounds like
more of an implementation detail? Worst case, it's a config option as
to whether to put a nop or inlined stuff in there, if we decide that
the extra speed of not doing a jump may be important?

> So you dont care about recompiling: that's fine - but others care, so as
> long as all your needs are met (which we are working on meeting :-) then
> we'll go for the solution that is better - instead of having some dual
> debugging infrastructure.

Sounds absolutely correct to me. Even if we had some static points, I
think we'd still want the ability to mix both in *one* infrastructure.

>>>(Just like i would accept the reintroduction of the Big Kernel Lock
>>> too, if someone proved it that it's the right thing to do.)
>>
>>Surely it's still there at the moment? ;-)
>
> no - at least for me it's the Big Kernel Semaphore ;-)

Ah, semantics ;-) Fair enough. It still needs to die though ...

M.

2006-09-14 22:31:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Michel Dagenais <[email protected]> wrote:

> > the question is: what is more maintainance, hundreds of static
> > tracepoints (with long parameter lists) all around the (core) kernel, or
> > hundreds of detached dynamic rules that need an update every now and
> > then? [but of which most would still be usable even if some of them
> > "broke"] To me the answer is clear: having hundreds of tracepoints
> > _within_ the source code is higher cost. But please prove me wrong :-)
>
> Actually I rarely find that any of the 70 000 printk is such a huge
> nuisance to code readability. They may even help understand what is
> going on in a code area you are less familiar with.

i disagree. Consider the following example from LTT:

int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct kiocb iocb;
struct sock_iocb siocb;
int ret;

trace_socket_sendmsg(sock, sock->sk->sk_family,
sock->sk->sk_type,
sock->sk->sk_protocol,
size);

init_sync_kiocb(&iocb, NULL);
iocb.private = &siocb;
ret = __sock_sendmsg(&iocb, sock, msg, size);
if (-EIOCBQUEUED == ret)
ret = wait_on_sync_kiocb(&iocb);
return ret;
}

what do the 5 extra lines introduced by trace_socket_sendmsg() tell us?
Nothing. They mostly just duplicate the information i already have from
the function declaration. They obscure the clear view of the function:

int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct kiocb iocb;
struct sock_iocb siocb;
int ret;

init_sync_kiocb(&iocb, NULL);
iocb.private = &siocb;
ret = __sock_sendmsg(&iocb, sock, msg, size);
if (-EIOCBQUEUED == ret)
ret = wait_on_sync_kiocb(&iocb);
return ret;
}

the resulting visual and structural redundancy hurts.

Ingo

2006-09-14 22:44:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> > i very much agree that they should become as fast as possible. So to
> > rephrase the question: can we make dynamic tracepoints as fast (or
> > nearly as fast) as static tracepoints? If yes, should we care about
> > static tracers at all?
>
> Depends how many nops you're willing to add, I guess. Anything, even
> the static tracepoints really needs at least a branch to be useful,
> IMHO. At least for what I've been doing with it, you need to stop the
> data flow after a while (when the event you're interested in happens,
> I'm using it like a flight data recorder, so we can go back and do
> postmortem on what went wrong). I should imagine branch prediction
> makes it very cheap on most modern CPUs, but don't have hard data to
> hand.

only 5 bytes of NOP are needed by default, so that a kprobe can insert a
call/callq instruction. The easiest way in practice is to insert a
_single_, unconditional function call that is patched out to NOPs upon
its first occurance (doing this is not a performance issue at all). That
way the only cost is the NOP and the function parameter preparation
side-effects. (which might or might not be significant - with register
calling conventions and most parameters being readily available it
should be small.)

note that such a limited, minimally invasive 'data extraction point'
infrastructure is not actually what the LTT patches are doing. It's not
even close, and i think you'll be surprised. Let me quote from the
latest LTT patch (patch-2.6.17-lttng-0.5.108, which is the same version
submitted to lkml - although no specific tracepoints were submitted):

+/* Event wakeup logging function */
+static inline void trace_process_wakeup(
+ unsigned int lttng_param_pid,
+ int lttng_param_state)
+#if (!defined(CONFIG_LTT) || !defined(CONFIG_LTT_FACILITY_PROCESS))
+{
+}
+#else
+{
+ unsigned int index;
+ struct ltt_channel_struct *channel;
+ struct ltt_trace_struct *trace;
+ void *transport_data;
+ char *buffer = NULL;
+ size_t real_to_base = 0; /* The buffer is allocated on arch_size alignment */
+ size_t *to_base = &real_to_base;
+ size_t real_to = 0;
+ size_t *to = &real_to;
+ size_t real_len = 0;
+ size_t *len = &real_len;
+ size_t reserve_size;
+ size_t slot_size;
+ size_t align;
+ const char *real_from;
+ const char **from = &real_from;
+ u64 tsc;
+ size_t before_hdr_pad, after_hdr_pad, header_size;
+
+ if(ltt_traces.num_active_traces == 0) return;
+
+ /* For each field, calculate the field size. */
+ /* size = *to_base + *to + *len */
+ /* Assume that the padding for alignment starts at a
+ * sizeof(void *) address. */
+
+ *from = (const char*)&lttng_param_pid;
+ align = sizeof(unsigned int);
+
+ if(*len == 0) {
+ *to += ltt_align(*to, align); /* align output */
+ } else {
+ *len += ltt_align(*to+*len, align); /* alignment, ok to do a memcpy of it */
+ }
+
+ *len += sizeof(unsigned int);
+
+ *from = (const char*)&lttng_param_state;
+ align = sizeof(int);
+
+ if(*len == 0) {
+ *to += ltt_align(*to, align); /* align output */
+ } else {
+ *len += ltt_align(*to+*len, align); /* alignment, ok to do a memcpy of it */
+ }
+
+ *len += sizeof(int);
+
+ reserve_size = *to_base + *to + *len;
+ preempt_disable();
+ ltt_nesting[smp_processor_id()]++;
+ index = ltt_get_index_from_facility(ltt_facility_process_2905B6EB,
+ event_process_wakeup);
+
+ list_for_each_entry_rcu(trace, &ltt_traces.head, list) {
+ if(!trace->active) continue;
+
+ channel = ltt_get_channel_from_index(trace, index);
+
+ slot_size = 0;
+ buffer = ltt_reserve_slot(trace, channel, &transport_data,
+ reserve_size, &slot_size, &tsc,
+ &before_hdr_pad, &after_hdr_pad, &header_size);
+ if(!buffer) continue; /* buffer full */
+
+ *to_base = *to = *len = 0;
+
+ ltt_write_event_header(trace, channel, buffer,
+ ltt_facility_process_2905B6EB, event_process_wakeup,
+ reserve_size, before_hdr_pad, tsc);
+ *to_base += before_hdr_pad + after_hdr_pad + header_size;
+
+ *from = (const char*)&lttng_param_pid;
+ align = sizeof(unsigned int);
+
+ if(*len == 0) {
+ *to += ltt_align(*to, align); /* align output */
+ } else {
+ *len += ltt_align(*to+*len, align); /* alignment, ok to do a memcpy of it */
+ }
+
+ *len += sizeof(unsigned int);
+
+ /* Flush pending memcpy */
+ if(*len != 0) {
+ memcpy(buffer+*to_base+*to, *from, *len);
+ *to += *len;
+ *len = 0;
+ }
+
+ *from = (const char*)&lttng_param_state;
+ align = sizeof(int);
+
+ if(*len == 0) {
+ *to += ltt_align(*to, align); /* align output */
+ } else {
+ *len += ltt_align(*to+*len, align); /* alignment, ok to do a memcpy of it */
+ }
+
+ *len += sizeof(int);
+
+ /* Flush pending memcpy */
+ if(*len != 0) {
+ memcpy(buffer+*to_base+*to, *from, *len);
+ *to += *len;
+ *len = 0;
+ }
+
+ ltt_commit_slot(channel, &transport_data, buffer, slot_size);
+
+ }
+
+ ltt_nesting[smp_processor_id()]--;
+ preempt_enable_no_resched();
+}
+#endif //(!defined(CONFIG_LTT) || !defined(CONFIG_LTT_FACILITY_PROCESS))
+

believe it or not, this is inlined into: kernel/sched.c ...

'enuff said. LTT is so far from being even considerable that it's not
even funny.

Ingo

2006-09-14 22:46:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> i disagree. Consider the following example from LTT:
...
> trace_socket_sendmsg(sock, sock->sk->sk_family,
> sock->sk->sk_type,
> sock->sk->sk_protocol,
> size);
...

> what do the 5 extra lines introduced by trace_socket_sendmsg() tell us?
> Nothing. They mostly just duplicate the information i already have from
> the function declaration. They obscure the clear view of the function:
...
> the resulting visual and structural redundancy hurts.

Couldn't that be easily fixed by just doing

trace_socket_sendmsg(sock, size);

and have it work out which esoteric parts of the sock we want to trace,
and which we don't? Is much less visually invasive, and gives the same
effect.

M.

2006-09-14 22:59:22

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Martin Bligh <[email protected]> wrote:
>
>>>i very much agree that they should become as fast as possible. So to
>>>rephrase the question: can we make dynamic tracepoints as fast (or
>>>nearly as fast) as static tracepoints? If yes, should we care about
>>>static tracers at all?
>>
>>Depends how many nops you're willing to add, I guess. Anything, even
>>the static tracepoints really needs at least a branch to be useful,
>>IMHO. At least for what I've been doing with it, you need to stop the
>>data flow after a while (when the event you're interested in happens,
>>I'm using it like a flight data recorder, so we can go back and do
>>postmortem on what went wrong). I should imagine branch prediction
>>makes it very cheap on most modern CPUs, but don't have hard data to
>>hand.
>
> only 5 bytes of NOP are needed by default, so that a kprobe can insert a
> call/callq instruction. The easiest way in practice is to insert a
> _single_, unconditional function call that is patched out to NOPs upon
> its first occurance (doing this is not a performance issue at all). That
> way the only cost is the NOP and the function parameter preparation
> side-effects. (which might or might not be significant - with register
> calling conventions and most parameters being readily available it
> should be small.)
>
> note that such a limited, minimally invasive 'data extraction point'
> infrastructure is not actually what the LTT patches are doing. It's not
> even close, and i think you'll be surprised. Let me quote from the
> latest LTT patch (patch-2.6.17-lttng-0.5.108, which is the same version
> submitted to lkml - although no specific tracepoints were submitted):

OK, I grant you that's pretty scary ;-) However, it's not the only way
to do it. Most things we're using write a statically sized 64-bit event
into a relayfs buffer, with a timestamp, a minor and major event type,
and a byte of data payload.

> believe it or not, this is inlined into: kernel/sched.c ...
>
> 'enuff said. LTT is so far from being even considerable that it's not
> even funny.

Particularly if we're doing more complex things like that, I'd agree
that the overhead of doing the out of line jump is non-existant by
comparison. Even with the relayfs logging alone, perhaps the jump is
not that heavy ... hmmm.

If we put the NOPs in (at least as an option on some architectures)
from a macro, you don't really need the full kprobes implemented to
to tracing, even ... just overwrite the nops with a jump, so presumably
would be easier to port. However, not sure how local variable data
is specified in that case ... perhaps the kprobes guys know better.
Most of the complexity seemed to be with relocating existing code
because you didn't have nops.

To me, the main thing is to have hooks for the at least some of the
basic needs maintained in-kernel - from the dtrace paper Val pointed
me to, that seems to be exactly what they do too, and it integrates
with the newly added dynamic ones where necessary. Plus I hate the
whole awk thing, and general complexity of systemtap, but we can
probably avoid that easily enough - either the embedded C option
you mentioned, or just a different definiton for the same hook macros
under a config option.

So perhaps it'll all work. Still need a little bit of data maintained
in tree though.

M.

2006-09-14 23:04:53

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> >i disagree. Consider the following example from LTT:
> ...
> > trace_socket_sendmsg(sock, sock->sk->sk_family,
> > sock->sk->sk_type,
> > sock->sk->sk_protocol,
> > size);
> ...
>
> >what do the 5 extra lines introduced by trace_socket_sendmsg() tell us?
> >Nothing. They mostly just duplicate the information i already have from
> >the function declaration. They obscure the clear view of the function:
> ...
> >the resulting visual and structural redundancy hurts.
>
> Couldn't that be easily fixed by just doing
>
> trace_socket_sendmsg(sock, size);
>
> and have it work out which esoteric parts of the sock we want to
> trace, and which we don't? Is much less visually invasive, and gives
> the same effect.

yeah, visual impact is everything. The best that Frank and me came up
with is:

_(socket_sendmsg, sock, size);

we could quickly learn to visually skip over lines like that, they have
a pretty unique geometric form . While if it's called:

trace_socket_sendmsg(sock, size);

it always looks like a function call in the corner of the eye and
attracts attention.

the '_()' macro is defined as:

#define _(x,y,z) STAP_MARK(x,y,z)

(STAP_MARK is an existing SystemTap helper to insert static tracepoints
into the kernel.)

but the other property of dynamic tracing is still very important too:
we have the technological freedom to remove static tracepoints, if we
decide so. With static tracers, once they are in the tree, we are stuck
with these APIs.

Ingo

2006-09-14 23:28:08

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin Bligh <[email protected]> wrote:

> > note that such a limited, minimally invasive 'data extraction point'
> > infrastructure is not actually what the LTT patches are doing. It's
> > not even close, and i think you'll be surprised. Let me quote from
> > the latest LTT patch (patch-2.6.17-lttng-0.5.108, which is the same
> > version submitted to lkml - although no specific tracepoints were
> > submitted):
>
> OK, I grant you that's pretty scary ;-) However, it's not the only way
> to do it. Most things we're using write a statically sized 64-bit
> event into a relayfs buffer, with a timestamp, a minor and major event
> type, and a byte of data payload.

oh, no need to tell me. I wrote ktrace 10 years ago, iotrace 8 years ago
and latency-trace 2 years ago. (The latter even does extensive mcount
based tracing, which is as demanding on the ringbuffer as it gets - on
my testbox i routinely get 10-20 million trace events per second, where
each trace entry includes: type, cpu, flags, preempt_count, pid,
timestamp and 4 words of arbitrary payload, all fit into 32 bytes. It
has static tracepoints too, in addition to the 20,000-40,000 mcount
tracepoints a typical kernel has.)

So i think i know the advantages and disadvantages of static tracers,
their maintainance and performance impact.

but i think (and i think now you'll be surprised) the way to go is to do
all this in SystemTap ;-) If we add any static points to the kernel then
it should have a pure 'local data preparation for extraction' purpose -
nothing more. Static tracing can be built around that too, but at that
point it will be unnecessary because SystemTap will be able to do that
too, with the same (or better, considering the LTT mess) performance.

i.e. we should have macros to prepare local information, with macro
arities of 2, 3, 4 and 5:

_(name, data1);
__(name, data1, data2);
___(name, data1, data2, data3);
____(name, data1, data2, data3, data4);

that and nothing more. But no guarantees that these trace points will
always be there and usable for static tracers: for example about 50% of
all tracepoints can be eliminated via a function attribute. (which
function attribute tells GCC to generate a 5-byte NOP as the first
instruction of the function prologue.) That will be invariant to things
like function renames, etc.

> So perhaps it'll all work. Still need a little bit of data maintained
> in tree though.

ok. And i think SystemTap itself should be in tree too, with a couple of
examples and helper scripts all around tracing and probing - and of
course an LTT-compatible trace output so that all the nice LTT userspace
code and visualization can live on.

Ingo

2006-09-14 23:40:00

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> > OTOH I would also like to know what's going in my m68k kernel without
> > having to implement some rather complex infrastructure, which I don't
> > need otherwise. There hasn't been a single argument so far, why we
> > can't have both.
>
> the argument is very simple: LTT creates strong coupling, it is almost a
> set of 350+ system-calls, moved into the heart of the kernel. Once moved
> in, it's very hard to remove it. "Why did you remove that trace
> information, you broke my LTT script!"

You are changing the topic. Nobody said the current LTT tracepoints have
to be merged as is. You generalize from a work in progress to static trace
points in general.

> While with SystemTap the coupling is alot smaller.

What guarantees we don't have similiar problems with dynamic tracepoints?
As soon as any tracing is merged, users will have some kind of expectation
and thus you can expect "Why did you change this source? It broke my
SystemTap script!" here as well.

bye, Roman

2006-09-14 23:52:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > While with SystemTap the coupling is alot smaller.
>
> What guarantees we don't have similiar problems with dynamic
> tracepoints? As soon as any tracing is merged, users will have some
> kind of expectation [...]

because users rely on the functionality, not on the implementation
details. As i outlined it before: with dynamic tracers, static
tracepoints _are not a necessity_. With static tracers, _static
tracepoints are the only game in town_.

i outlined one such specific "removal of static tracepoint" example
already: static trace points at the head/prologue of functions (half of
the existing tracepoints are such). The sock_sendmsg() example i quoted
before is such a case. Those trace points can be replaced with a simple
GCC function attribute, which would cause a 5-byte (or whatever
necessary) NOP to be inserted at the function prologue. The attribute
would be alot less invasive than an explicit tracepoint (and thus easier
to maintain):

int __trace function(char arg1, char arg2)
{
}

where kprobes can be used to attach a lightweight tracepoint that does a
call, not a break (INT3) instruction. With static tracers we couldnt do
this so we'd have to stick with the static tracepoints forever! It's
always hard to remove features, so we have to make sure we add the
feature that we know is the best long-term solution.

Ingo

2006-09-15 00:20:15

by Nicholas Miell

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 01:19 +0200, Ingo Molnar wrote:
> but i think (and i think now you'll be surprised) the way to go is to do
> all this in SystemTap ;-) If we add any static points to the kernel then
> it should have a pure 'local data preparation for extraction' purpose -
> nothing more. Static tracing can be built around that too, but at that
> point it will be unnecessary because SystemTap will be able to do that
> too, with the same (or better, considering the LTT mess) performance.
>
> i.e. we should have macros to prepare local information, with macro
> arities of 2, 3, 4 and 5:
>
> _(name, data1);
> __(name, data1, data2);
> ___(name, data1, data2, data3);
> ____(name, data1, data2, data3, data4);
>
> that and nothing more. But no guarantees that these trace points will
> always be there and usable for static tracers: for example about 50% of
> all tracepoints can be eliminated via a function attribute. (which
> function attribute tells GCC to generate a 5-byte NOP as the first
> instruction of the function prologue.) That will be invariant to things
> like function renames, etc.

Another interesting idea would be the addition to gcc of a:

__builtin_trace_point(char *name, ...)

It would output a function call sized NOP at it's call site, and store
in another section the trace point name, location, and (this is the
important part) a series of DWARF expressions to reconstruct the trace
point's argument list from the stack frame and saved registers.

This would completely eliminate the argument passing overhead of a
patched-out function call in the cases where the trace point takes
arguments.

It'd also make your __trace function attribute unnecessary, because gcc
could presumably figure out that the trace point is at the beginning of
the function.

It "only" requires compiler support on every architecture that the
kernel cares about and compiler upgrades for everyone who wants to use
static trace points, which is no mean feat.

(Roman Zippel was trimmed from the CC list because his server is
rejecting mail from me and/or Comcast. If the first attempts actually
make it through and this is yet another duplicate, sorry.)

--
Nicholas Miell <[email protected]>

2006-09-15 00:27:34

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> int __trace function(char arg1, char arg2)
> {
> }
>
> where kprobes can be used to attach a lightweight tracepoint that does a
> call, not a break (INT3) instruction. With static tracers we couldnt do
> this so we'd have to stick with the static tracepoints forever! It's
> always hard to remove features, so we have to make sure we add the
> feature that we know is the best long-term solution.

Where is the prove for that? Why can't the same rules apply to dynamic and
static trace points?
You're also mixing up function tracing with event tracing. Most of the LTT
trace points log rather high level events, which are rather unlikely to
disappear. It's more likely that the place where they are generated is
moved and then it's only advantageous if the marker is moved as well at
the same time. OTOH if the actual event really is not generated anymore,
there is also no need for the marker anymore.

bye, Roman

2006-09-15 01:05:17

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> i.e. we should have macros to prepare local information, with macro
> arities of 2, 3, 4 and 5:
>
> _(name, data1);
> __(name, data1, data2);
> ___(name, data1, data2, data3);
> ____(name, data1, data2, data3, data4);

Personally I think that's way more visually offensive that something
that looks like a function call, but still ;-) We do it as a caps macro

KTRACE(foo, bar)

internally, which I suppose makes it not look like a function call.
But at the end of the day, it's all just a matter of visual taste,
what's actually in there is way more important.

> that and nothing more. But no guarantees that these trace points will
> always be there and usable for static tracers: for example about 50% of
> all tracepoints can be eliminated via a function attribute. (which
> function attribute tells GCC to generate a 5-byte NOP as the first
> instruction of the function prologue.) That will be invariant to things
> like function renames, etc.

Yup, sometimes you just want to know when a function is called, and
there's no real need to add that. The hook for system calls should be
pretty generic too. But things like instrumenting the reclaim code need
more work - I ended up incrementing some counters for each type of page
recovery failure in shrink_list() and then just logging one compound
event on the stats structure at the end. That's pretty specific, but
does give you a lot of useful data when the box is dying from mem
pressure.

>> So perhaps it'll all work. Still need a little bit of data maintained
>> in tree though.
>
> ok. And i think SystemTap itself should be in tree too, with a couple of
> examples and helper scripts all around tracing and probing - and of
> course an LTT-compatible trace output so that all the nice LTT userspace
> code and visualization can live on.

I have to figure out how to graft the internal Google stuff onto the
same mechanism ... I definitely want to be able to combine the static
points with dynamic ones. And then add schedstats and blktrace into
the same thing so it interleaves properly ... seeing the blktrace type
data interact with memory reclaim debugging was very useful to me, for
instance. All these little fragmented tools are way more difficult to
deal with.

M.

2006-09-15 01:47:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Roman Zippel <[email protected]> wrote:
>
> > > > > also, the other disadvantages i listed very much count too. Static
> > > > > tracepoints are fundamentally limited because:
> > > > >
[...]
> Right now they are
> pretty heavy cons as far as LTT goes, so obviously they have a primary
> impact on the topic at hand (whic is whether to merge LTT or not).
>

Ingo, why are you arguing about static instrumentation when I don't submit any
static instrumentation in my patch ? You can argue about static VS dynamic
instrumentation all you want, but please don't apply this debate to a dicision
about including or not a core tracing infrastructure that has nothing to do
with the way instrumentation or probes are inserted.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 03:10:45

by James Dickens

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Static probe points in the mainline kernel should not be there for
kernel programmers. Any kernel programmer that is interested in an
event that a static probe would trace, could with a little work use
kprobes, Systemtap, printk, statements or numerous other methods and
accomplish the same thing most likely with less impact on the kernel.

If you allow static probe points, do them for the people that use your
code, If static probing is to work in the mainline kernel, its
necessary for everyone to see the value of them.

I came up with some simple rules that may help the adoption of static
probe points in the kernel. They answer a lot of issues I read in
other reads.

Some simple rules for Static Probing:

- If the probe is not enabled, it turns into a NOP. No probes are
enabled by default
- Each programmer should provide this as a service to the user.
- There should be at most a 1000 static probe points in the entire
kernel including modules, drivers, etc.
- Probes should not pass out any more information than what a user
would need. If the user needs more he needs to find another way to get
it, perhaps dynamic probing.
- If any part of the kernel has more than a dozen probe points there
are too many.
- If a probe would be of little use to a user/sysadmin it should be
removed from the mainline kernel.
- Yes, if a probe point is in the code you are working on, the role of
maintaining it falls on you.
- If you notice your code is doing something that matches a statically
probed event (.i.e. your network driver dropped a packet), it's your
responsibility to add the necessary probe in your code.
- If "you" need a probe that would not be needed except for debugging
your code, use one of the other methods mentioned above, or remove it
before your code is submitted to the mainline kernel.

Some example static probe points

Task going is being moved on to a cpu.
Task moving off a cpu

Start of an IO
End of an IO

Network packet received
Packet dropped.

Various lock activities
Lock taken
Spin lock taken

James Dickens
uadmin.blogspot.com

2006-09-15 05:48:15

by Vara Prasad

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:

>* Ingo Molnar ([email protected]) wrote:
>
>
>>* Roman Zippel <[email protected]> wrote:
>>
>>
>>
>>>>>>also, the other disadvantages i listed very much count too. Static
>>>>>>tracepoints are fundamentally limited because:
>>>>>>
>>>>>>
>>>>>>
>[...]
>
>
>>Right now they are
>>pretty heavy cons as far as LTT goes, so obviously they have a primary
>>impact on the topic at hand (whic is whether to merge LTT or not).
>>
>>
>>
>
>Ingo, why are you arguing about static instrumentation when I don't submit any
>static instrumentation in my patch ? You can argue about static VS dynamic
>instrumentation all you want, but please don't apply this debate to a dicision
>about including or not a core tracing infrastructure that has nothing to do
>with the way instrumentation or probes are inserted.
>
>Mathieu
>
>
>
>
I think Ingo is right in saying what we really need first is a generic
mechanism in how to specify static markers in the kernel which can be
used to put dynamic probes on demand or use as a real static function
calls if one chooses. Once we agree on the marker mechanism dynamic
tracing and static tracing can both co-exist happily.

Coming to your rest of the patches i really don't think we need whole
lot more than the facilities we already got in the kernel. Frank has
successfully demonstrated in OLS how one can use static markers by using
only existing facilities in the kernel.

>OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
>Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2006-09-15 07:00:41

by Vara Prasad

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Martin Bligh wrote:

> Ingo Molnar wrote:
>
>> * Martin Bligh <[email protected]> wrote:
>>
>>>> i very much agree that they should become as fast as possible. So
>>>> to rephrase the question: can we make dynamic tracepoints as fast
>>>> (or nearly as fast) as static tracepoints? If yes, should we care
>>>> about static tracers at all?
>>>
>>>
>>> Depends how many nops you're willing to add, I guess. Anything, even
>>> the static tracepoints really needs at least a branch to be useful,
>>> IMHO. At least for what I've been doing with it, you need to stop
>>> the data flow after a while (when the event you're interested in
>>> happens, I'm using it like a flight data recorder, so we can go back
>>> and do postmortem on what went wrong). I should imagine branch
>>> prediction makes it very cheap on most modern CPUs, but don't have
>>> hard data to hand.
>>
>>
>> only 5 bytes of NOP are needed by default, so that a kprobe can
>> insert a call/callq instruction. The easiest way in practice is to
>> insert a _single_, unconditional function call that is patched out to
>> NOPs upon its first occurance (doing this is not a performance issue
>> at all). That way the only cost is the NOP and the function parameter
>> preparation side-effects. (which might or might not be significant -
>> with register calling conventions and most parameters being readily
>> available it should be small.)
>>
>> note that such a limited, minimally invasive 'data extraction point'
>> infrastructure is not actually what the LTT patches are doing. It's
>> not even close, and i think you'll be surprised. Let me quote from
>> the latest LTT patch (patch-2.6.17-lttng-0.5.108, which is the same
>> version submitted to lkml - although no specific tracepoints were
>> submitted):
>
>
> OK, I grant you that's pretty scary ;-) However, it's not the only way
> to do it. Most things we're using write a statically sized 64-bit event
> into a relayfs buffer, with a timestamp, a minor and major event type,
> and a byte of data payload.
>
>> believe it or not, this is inlined into: kernel/sched.c ...
>>
>> 'enuff said. LTT is so far from being even considerable that it's not
>> even funny.
>
>
> Particularly if we're doing more complex things like that, I'd agree
> that the overhead of doing the out of line jump is non-existant by
> comparison. Even with the relayfs logging alone, perhaps the jump is
> not that heavy ... hmmm.
>
> If we put the NOPs in (at least as an option on some architectures)
> from a macro, you don't really need the full kprobes implemented to
> to tracing, even ... just overwrite the nops with a jump, so presumably
> would be easier to port. However, not sure how local variable data
> is specified in that case ... perhaps the kprobes guys know better.
> Most of the complexity seemed to be with relocating existing code
> because you didn't have nops.

With kprobes one can place probes anywhere you want but the ones placed
in the middle of the function are not maintainable because they are tied
to a location in the code. Having a NOP leaves a maintainable address
that we can hook into when needed.

AFAIK writing a portable code for using local variables is not easy
without using DWARF information, hence we don't handle that in kprobes.
Jprobes is a special case where you can have access to function
arguments at the function entry point. SystemTap can be used to specify
probes anywhere in the function and local variables can also be used in
the probe handlers. The problem still is maintainability as probes are
specified using line numbers.

>
> To me, the main thing is to have hooks for the at least some of the
> basic needs maintained in-kernel - from the dtrace paper Val pointed
> me to, that seems to be exactly what they do too, and it integrates
> with the newly added dynamic ones where necessary.

Once we have these static markers one can use both dynamic probes and
static probes intermixed getting best of both worlds as Frank
demonstrated in OLS.

Here are couple of proposals that were discussed in the systemtap
mailing list in how to specify static markers, we could use these ideas
with the rest in deciding on a maker proposal.
http://sources.redhat.com/ml/systemtap/2006-q3/msg00273.html
http://sourceware.org/ml/systemtap/2005-q4/msg00415.html

> Plus I hate the
> whole awk thing, and general complexity of systemtap, but we can
> probably avoid that easily enough - either the embedded C option
> you mentioned, or just a different definiton for the same hook macros
> under a config option.
>
> So perhaps it'll all work. Still need a little bit of data maintained
> in tree though.

For placing probes at the begin and end of function we don't really need
markers as function boundary works as a marker.
I think we only need markers in few places where an important decision
is made in the middle of a function.

>
> M.
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-09-15 09:17:33

by Richard J Moore

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:

> > I don't think anyone is saying that static tracepoints do not have
> > their limitations, or that dynamic tracepointing is useless. But
> > that's not the point ... why can't we have one infrastructure that
> > supports both? Preferably in a fairly simple, consistent way.
>
> primarily because i fail to see any property of static tracers that are
> not met by dynamic tracers. So to me dynamic tracers like SystemTap are
> a superset of static tracers.

There is one example whethere dynamic tracing is difficult or very messy to
implement and that's for tracepoints needed during system and device
initialization. In this sense dynamic is not a practical superset of
static. However I believe the tooling, for dynamic trace should work for
static as well.

- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

2006-09-15 09:21:01

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

>>>>> "Karim" == Karim Yaghmour <[email protected]> writes:

Karim> Ingo Molnar wrote:
>> that's not true, and this is the important thing that i believe you
>> are missing. A dynamic tracepoint is _detached_ from the normal
>> source code and thus is zero maintainance overhead. You dont have
>> to maintain it during normal development - only if you need it. You
>> dont see the dynamic tracepoints in the source code.

Karim> And that's actually a problem for those who maintain such
Karim> dynamic trace points.

And who should pay here? The people who want the tracepoints or the
people who are not interested in them?

>> a static tracepoint, once it's in the mainline kernel, is a nonzero
>> maintainance overhead _until eternity_. It is a constant visual
>> hindrance and a constant build-correctness and boot-correctness
>> problem if you happen to change the code that is being traced by a
>> static tracepoint. Again, I am talking out of actual experience
>> with static tracepoints: i frequently break my kernel via static
>> tracepoints and i have constant maintainance cost from them. So
>> what i do is that i try to minimize the number of static
>> tracepoints to _zero_. I.e. i only add them when i need them for a
>> given bug.

Karim> Bzzt, wrong. This is your own personal experience with
Karim> tracing. Marked up code does not need to be active under all
Karim> build conditions. In fact trace points can be inactive by
Karim> default at all times, except when you choose to build them in.

You have obviously never tried to maintain a codebase for a long
time. Even if the code is not activated, you make a change and
something breaks and people come running and screaming, or the thing
is in the way for the structural code change you want to make.

Not to mention that some of the classical places people wish to add
those static tracepoints are in performance sensitive codepaths,
syscalls for example.

>> static tracepoints are inferior to dynamic tracepoints in almost
>> every way.

Karim> Sorry, orthogonal is the word.

You can do pretty much everything you want to do with dynamic
tracepoints, it's just a matter of whether you want to dump the burden
of maintenance on someone else. Been there done that, had to show
people in the past how to do with dynamic points what they insisted
had to be done with static points.

>> hundreds (or possibly thousands) of tracepoints? Have you ever
>> tried to maintain that? I have and it's a nightmare.

Karim> I have, and I've showed you that you're wrong. The only reason
Karim> you can make this argument is that you view these things from
Karim> the point of view of what use they are for you as a kernel
Karim> developer and I will repeat what I've said for years now:
Karim> static instrumentation of the kernel isn't meant to be useful
Karim> for kernel developers.

So you maintain the tracepoints in the kernel and you are offering to
take over maintenance of all code that now contain these tracepoints?
You add your static tracepoints, next week someone else wants some
very similar but slightly different points, the following week it's
someone else. Thanks, but no thanks.

Karim> Nevertheless there are
Karim> very legitimate uses for standardized instrumentation points.

Some evidence would be useful here, so far you haven't provided any.

Jes

2006-09-15 09:29:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

>>>>> "Ingo" == Ingo Molnar <[email protected]> writes:

Ingo> * Martin Bligh <[email protected]> wrote:

>> I don't think anyone is saying that static tracepoints do not have
>> their limitations, or that dynamic tracepointing is useless. But
>> that's not the point ... why can't we have one infrastructure that
>> supports both? Preferably in a fairly simple, consistent way.

Ingo> primarily because i fail to see any property of static tracers
Ingo> that are not met by dynamic tracers. So to me dynamic tracers
Ingo> like SystemTap are a superset of static tracers.

Ingo> So my position is that what we should concentrate on is to make
Ingo> the life of dynamic tracers easier (be that a handful of
Ingo> generic, parametric hooks that gather debuginfo information and
Ingo> add NOPs for easy patching), while realizing that static tracers
Ingo> have no advantage over dynamic tracers.

The parallel that springs to mind here is C++ kernel components 'I
promise to only use the good parts', then next week someone else adds
another pile in a worse place. Once the points are in we will never
get rid of them, look at how long it took to get rid of devfs :( In
addition it is guaranteed that people will not be able to agree on
which points to put where, despite the claim that there will be only
30 points - sorry, I am not buying that, we have plenty of evidence to
show the opposite.

I looked at the old LTT code a while ago and it was pretty appalling,
maybe LTTng is better, but I can't say the old code gave me a warm
fuzzy feeling.

Jes

2006-09-15 11:19:00

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Iau, 2006-09-14 am 12:40 -0700, ysgrifennodd Tim Bird:
> It's only zero maintenance overhead for you. Someone has to
> maintain it. The party line for years has been that in-tree
> maintenance is easier than out-of-tree maintenance.

That misses the entire point. If you have dynamic tracepoints you don't
have any static tracepoints to maintain because you don't need them.
They may be a clock or three slower but you are then going to branch
into the trace tool code paths, take tlb misses, take cache misses, and
eventually get back, so the cost of it being dynamic is so close to zero
in the biger picture it doesn't matter.

Alan

2006-09-15 11:47:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Alan Cox wrote:

> Ar Iau, 2006-09-14 am 12:40 -0700, ysgrifennodd Tim Bird:
> > It's only zero maintenance overhead for you. Someone has to
> > maintain it. The party line for years has been that in-tree
> > maintenance is easier than out-of-tree maintenance.
>
> That misses the entire point. If you have dynamic tracepoints you don't
> have any static tracepoints to maintain because you don't need them.

This assumes dynamic tracepoints are generally available, which is wrong.
This assumes that dynamic tracepoints can't benefit from static source
annotations, which is also wrong.
He doesn't miss the point at all, dynamic tracepoints don't imply zero
maintenance overhead.

bye, Roman

2006-09-15 12:16:23

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 13:46 +0200, ysgrifennodd Roman Zippel:
> > That misses the entire point. If you have dynamic tracepoints you don't
> > have any static tracepoints to maintain because you don't need them.
>
> This assumes dynamic tracepoints are generally available, which is wrong.

Wrong in what sense, you don't have them implemented or your
architecture is mindbogglingly braindead you can't implement them ?

> This assumes that dynamic tracepoints can't benefit from static source
> annotations, which is also wrong.

gcc -g produces extensive annotations which are then usably by many
tools other than gdb.

Alan

2006-09-15 12:17:54

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> Karim> And that's actually a problem for those who maintain such
> Karim> dynamic trace points.
>
> And who should pay here? The people who want the tracepoints or the
> people who are not interested in them?

If you'd care to read through the thread you'd notice I've demonstrated
time and again that those static trace points we're mostly interested
in a never-changing. Lest something fundamentally changes with the
kernel, there will always be a scheduling change; etc. This
"instrumentation is evil" mantra is only substantiated if you view
it from the point of view of someone who's only used it to debug code.
Yet, and I repeat this again, instrumentation for in-source debugging
is but a corner case of instrumentation in general.

> You have obviously never tried to maintain a codebase for a long
> time.

Please, this is not constructive. I've never really grasped the need
for posturing on LKML. Jes, I'm not going to fight a war of resumes
with you. If you think I'm incompetent then there's very little I can
do to change your mind.

> Not to mention that some of the classical places people wish to add
> those static tracepoints are in performance sensitive codepaths,
> syscalls for example.

And this argument ignores everything I said on how there does not need
be the limitation currently known to previous static tracing mechanisms.

> You can do pretty much everything you want to do with dynamic
> tracepoints, it's just a matter of whether you want to dump the burden
> of maintenance on someone else. Been there done that, had to show
> people in the past how to do with dynamic points what they insisted
> had to be done with static points.

Yes, Mr. Scrub, I mean kprobes is your answer. The only reason you can
get away with this argument is if you view it exclusively from the
point of view of kernel development. And that's why you're wrong.

> So you maintain the tracepoints in the kernel and you are offering to
> take over maintenance of all code that now contain these tracepoints?

Please explain, honestly, why the following instrumentation point is
going to be a maintenance drag on the person modifying the scheduler:
@@ -1709,6 +1712,7 @@ switch_tasks:
++*switch_count;

prepare_arch_switch(rq, next);
+ TRACE_SCHEDCHANGE(prev, next);
prev = context_switch(rq, prev, next);
barrier();

And please, don't bother complaining about the semantics, they can
be changed. I'm just arguing about location/meaning/content.

> You add your static tracepoints, next week someone else wants some
> very similar but slightly different points, the following week it's
> someone else. Thanks, but no thanks.

Obviously there's no point in me spelling any code of conduct to
anyone, Martin has already pointed out that it's up to the subsystem
maintainers to decide what's appropriate and what's not, as is
customary anyway. But the issue I'm putting forth here is that there
is value for allowing outsiders to understand the dynamic behavior of
your code and the only person who can do that best is the person
writing the code. It is then that person's responsibility to
distinguish between instrumentation they may find important to debug
their code and instrumentation that would be relevant to those using
their code. And if you've maintained code long enough, and I trust
you do, you would see that there is a clear difference between both.

Thanks,

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-15 12:34:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Jes Sorensen wrote:
>>> And who should pay here? The people who want the tracepoints or the
>> people who are not interested in them?
>
> If you'd care to read through the thread you'd notice I've demonstrated
> time and again that those static trace points we're mostly interested
> in a never-changing. Lest something fundamentally changes with the
> kernel, there will always be a scheduling change; etc.

Except as I pointed out, that everyone wants their info slightly
differently so even trace points in the scheduler will be contentious
and we will end up with a stack of them if we are to satisfy everyone.
So now, you didn't demonstrate anything.

> This
> "instrumentation is evil" mantra is only substantiated if you view
> it from the point of view of someone who's only used it to debug code.
> Yet, and I repeat this again, instrumentation for in-source debugging
> is but a corner case of instrumentation in general.

Given that I have used this stuff to more than just debug code, then
this obviously doesn't apply.

>> You have obviously never tried to maintain a codebase for a long
>> time.
>
> Please, this is not constructive. I've never really grasped the need
> for posturing on LKML. Jes, I'm not going to fight a war of resumes
> with you. If you think I'm incompetent then there's very little I can
> do to change your mind.

You refuse to take the big picture into account and then claim that
there is no cost of doing things your way. Point being that once you
start maintaining a large project such as the kernel, or just parts of
it, you realize how much those 'zero cost' additions really cost.

>> Not to mention that some of the classical places people wish to add
>> those static tracepoints are in performance sensitive codepaths,
>> syscalls for example.
>
> And this argument ignores everything I said on how there does not need
> be the limitation currently known to previous static tracing mechanisms.

And how does there not? If you want to add tracepoints to the syscall
path, then you will make an impact. It's non trivial to validate, yes
I have seen some scary attempts of adding LTT tracecalls to the ia64
syscall path, and just because it might not be compiled in in most cases
that doesn't mean it doesn't raise the complexity.

>> You can do pretty much everything you want to do with dynamic
>> tracepoints, it's just a matter of whether you want to dump the burden
>> of maintenance on someone else. Been there done that, had to show
>> people in the past how to do with dynamic points what they insisted
>> had to be done with static points.
>
> Yes, Mr. Scrub, I mean kprobes is your answer. The only reason you can
> get away with this argument is if you view it exclusively from the
> point of view of kernel development. And that's why you're wrong.

As I said, kprobes are much more than kernel development! But you
obviously haven't bothered looking at those properly! Been there done
that!

>> So you maintain the tracepoints in the kernel and you are offering to
>> take over maintenance of all code that now contain these tracepoints?
>
> Please explain, honestly, why the following instrumentation point is
> going to be a maintenance drag on the person modifying the scheduler:
> @@ -1709,6 +1712,7 @@ switch_tasks:
> ++*switch_count;
>
> prepare_arch_switch(rq, next);
> + TRACE_SCHEDCHANGE(prev, next);
> prev = context_switch(rq, prev, next);
> barrier();
>
> And please, don't bother complaining about the semantics, they can
> be changed. I'm just arguing about location/meaning/content.

It will be a drag because next week someone else wants a tracepoint
5 lines further down the code! Again, I have seen people try and do
that on top of the old LTT patchsets, so maybe *you* didn't want the
tracepoint somewhere else, but some people did! Next?

>> You add your static tracepoints, next week someone else wants some
>> very similar but slightly different points, the following week it's
>> someone else. Thanks, but no thanks.
>
> Obviously there's no point in me spelling any code of conduct to
> anyone, Martin has already pointed out that it's up to the subsystem
> maintainers to decide what's appropriate and what's not, as is
> customary anyway. But the issue I'm putting forth here is that there
> is value for allowing outsiders to understand the dynamic behavior of
> your code and the only person who can do that best is the person
> writing the code. It is then that person's responsibility to
> distinguish between instrumentation they may find important to debug
> their code and instrumentation that would be relevant to those using
> their code. And if you've maintained code long enough, and I trust
> you do, you would see that there is a clear difference between both.

You are once again ignoring the point that not everyone needs the exact
same view of things that you are looking for. Dynamic probes allows for
that, doing that with static probes is going to turn into maintenance
hell. Guess what, some of us still try to look after code 8-10 years
after we wrote it initially.

Jes

2006-09-15 12:40:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Alan Cox wrote:

> Ar Gwe, 2006-09-15 am 13:46 +0200, ysgrifennodd Roman Zippel:
> > > That misses the entire point. If you have dynamic tracepoints you don't
> > > have any static tracepoints to maintain because you don't need them.
> >
> > This assumes dynamic tracepoints are generally available, which is wrong.
>
> Wrong in what sense, you don't have them implemented or your
> architecture is mindbogglingly braindead you can't implement them ?
>
> > This assumes that dynamic tracepoints can't benefit from static source
> > annotations, which is also wrong.
>
> gcc -g produces extensive annotations which are then usably by many
> tools other than gdb.

Both points have very strong consequences regarding complexity. Why do you
want to deny me the choice to use something simple, especially since both
solutions are not mutually exclusive and can even complement each other?
What's the point in forcing everyone to use a single solution?

bye, Roman

2006-09-15 12:47:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Martin J. Bligh <[email protected]> wrote:

> >i.e. we should have macros to prepare local information, with macro
> >arities of 2, 3, 4 and 5:
> >
> > _(name, data1);
> > __(name, data1, data2);
> > ___(name, data1, data2, data3);
> > ____(name, data1, data2, data3, data4);
>
> Personally I think that's way more visually offensive that something
> that looks like a function call, but still ;-) We do it as a caps
> macro
>
> KTRACE(foo, bar)
>
> internally, which I suppose makes it not look like a function call.
> But at the end of the day, it's all just a matter of visual taste,
> what's actually in there is way more important.

i disagree with the naming, for the reasons stated before: if we add any
static info to the kernel, it's a "easier data extraction" thing (for
the purposes of speeding up dynamic tracing), not a tracepoint. That way
there's no dispute whether what i remove is a tracepoint (on which
static tracers might rely in a hard way), or just a speedup for
SystemTap. So a better name would be what SystemTap has implemented
today:

STAP_MARK_NN(kernel_context_switch, prev, next);

or what makes this even more explicit:

DEBUG_DATA(kernel_context_switch, prev, next);

(but i'm flexible about the naming - as long as it doesnt say 'trace'
and as long as there are no guarantees at all that those points remain,
when a better method of accessing the same data for dynamic tracers is
implemented.)

Ingo

2006-09-15 13:19:00

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 14:39 +0200, ysgrifennodd Roman Zippel:
> Both points have very strong consequences regarding complexity. Why do you
> want to deny me the choice to use something simple, especially since both
> solutions are not mutually exclusive and can even complement each other?

I don't want to deny you the choice, I just don't want to see
unneccessary garbage in the base kernel. What you put in your own toilet
is a private matter. What you leave out in a public place is different.

> What's the point in forcing everyone to use a single solution?

Maintainability ? common good over individual weirdnesses ? Ability for
people to concentrate on getting one good set of interfaces not twelve
bad ones ? Consistency for user space ?

Alan

2006-09-15 13:21:06

by Paul Mundt

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, Sep 15, 2006 at 08:38:33AM -0400, Karim Yaghmour wrote:
> If you'd care to read through the thread you'd notice I've demonstrated
> time and again that those static trace points we're mostly interested
> in a never-changing. Lest something fundamentally changes with the
> kernel, there will always be a scheduling change; etc. This
> "instrumentation is evil" mantra is only substantiated if you view
> it from the point of view of someone who's only used it to debug code.
> Yet, and I repeat this again, instrumentation for in-source debugging
> is but a corner case of instrumentation in general.
>
I didn't get the "instrumentation is evil" mantra from this thread,
rather "static tracepoints are good, so long as someone else is
maintaining them". The issue comes down to who ends up maintaining the
trace points, and given with how intrusive LTT was in the past, I can't
see anyone wanting to suddenly start littering them around the kernel
now (at least in the areas that they're responsible for, particularly if
it's not something that's going to be useful to most people). Admittedly
LTTng is not as bad at this as LTT was in this regard, though.

If static tracepoints are something that's useful for you, then you
can continue maintaining them out of tree.

> Yes, Mr. Scrub, I mean kprobes is your answer. The only reason you can
> get away with this argument is if you view it exclusively from the
> point of view of kernel development. And that's why you're wrong.
>
kprobes may not be the answer to all lifes problems, but it is
non-intrusive once the initial implementation pains are out of the way..

> Please explain, honestly, why the following instrumentation point is
> going to be a maintenance drag on the person modifying the scheduler:
> @@ -1709,6 +1712,7 @@ switch_tasks:
> ++*switch_count;
>
> prepare_arch_switch(rq, next);
> + TRACE_SCHEDCHANGE(prev, next);
> prev = context_switch(rq, prev, next);
> barrier();
>
> And please, don't bother complaining about the semantics, they can
> be changed. I'm just arguing about location/meaning/content.
>
For someone complaining about meaningless posturing on the list, posting
this as a representation for the isolated changes involved is rather
interesting. If it were down to a small handful of critical static
tracepoints in-tree and the rest left up to the people that really want
them in out-of-tree patches, I doubt LTT would have ever had half of the
resistance towards it.

It's the intrusiveness that becomes the maintenance burden, and if you
whittle it down to a point where the intrusiveness is not that big of a
deal, then I'm not sure I see what static points would buy you over
dynamic instrumentation.

It's easy to write off the maintenance overhead when you aren't the one
maintaining the code..

2006-09-15 13:35:29

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Alan Cox wrote:

> Ar Gwe, 2006-09-15 am 14:39 +0200, ysgrifennodd Roman Zippel:
> > Both points have very strong consequences regarding complexity. Why do you
> > want to deny me the choice to use something simple, especially since both
> > solutions are not mutually exclusive and can even complement each other?
>
> I don't want to deny you the choice, I just don't want to see
> unneccessary garbage in the base kernel. What you put in your own toilet
> is a private matter. What you leave out in a public place is different.

Now we've already sunken to the toilet level... :-(

> > What's the point in forcing everyone to use a single solution?
>
> Maintainability ? common good over individual weirdnesses ? Ability for
> people to concentrate on getting one good set of interfaces not twelve
> bad ones ? Consistency for user space ?

Alan, you're making things up without any proof.

Listening to this diatribe against static tracepoints, one could get idea
they would be something alien, which would polute the source. Well,
everything can be abused, but good tracepoints are like good
documentation, nobody wants to write and maintain it, but in the end
others benefit from it if it exists.

bye, Roman

2006-09-15 13:42:24

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Paul Mundt wrote:

> On Fri, Sep 15, 2006 at 08:38:33AM -0400, Karim Yaghmour wrote:
> > If you'd care to read through the thread you'd notice I've demonstrated
> > time and again that those static trace points we're mostly interested
> > in a never-changing. Lest something fundamentally changes with the
> > kernel, there will always be a scheduling change; etc. This
> > "instrumentation is evil" mantra is only substantiated if you view
> > it from the point of view of someone who's only used it to debug code.
> > Yet, and I repeat this again, instrumentation for in-source debugging
> > is but a corner case of instrumentation in general.
> >
> I didn't get the "instrumentation is evil" mantra from this thread,
> rather "static tracepoints are good, so long as someone else is
> maintaining them". The issue comes down to who ends up maintaining the
> trace points,

The claim that these tracepoints would be maintainance burden is pretty
much unproven so far. The static tracepoint haters just assume the kernel
will be littered with thousands of unrelated tracepoints, where a good
tracepoint would only document what already happens in that function, so
that the tracepoint would be far from something obscure, which only few
people could understand and maintain.

bye, Roman

2006-09-15 13:45:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Roman Zippel wrote:
> The claim that these tracepoints would be maintainance burden is pretty
> much unproven so far. The static tracepoint haters just assume the kernel
> will be littered with thousands of unrelated tracepoints, where a good
> tracepoint would only document what already happens in that function, so
> that the tracepoint would be far from something obscure, which only few
> people could understand and maintain.

How do you propose to handle the case where two tracepoint clients wants
slightly different data from the same function? I saw this with LTT
users where someone wanted things in different places in schedule().

It *is* a nightmare to maintain.

You still haven't explained your argument about kprobes not being
generally available - where?

Cheers,
Jes

2006-09-15 13:57:23

by Paul Mundt

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, Sep 15, 2006 at 03:41:03PM +0200, Roman Zippel wrote:
> > On Fri, Sep 15, 2006 at 08:38:33AM -0400, Karim Yaghmour wrote:
> > I didn't get the "instrumentation is evil" mantra from this thread,
> > rather "static tracepoints are good, so long as someone else is
> > maintaining them". The issue comes down to who ends up maintaining the
> > trace points,
>
> The claim that these tracepoints would be maintainance burden is pretty
> much unproven so far. The static tracepoint haters just assume the kernel
> will be littered with thousands of unrelated tracepoints, where a good
> tracepoint would only document what already happens in that function, so
> that the tracepoint would be far from something obscure, which only few
> people could understand and maintain.
>
Again, this works fine so long as the number of static tracepoints is
small and manageable, but it seems like there's a division between what
the subsystem developer deems as meaningful and what someone doing the
tracing might want to look at. Static tracepoints are completely
subjective, LTT proved that this was a problem regarding general
code-level intrusiveness when the number of tracepoints in relatively
close locality started piling up based on what people considered
arbitrarily useful, and LTTng doesn't appear to do anything to address
this.

This doesn't really match my definition of a neglible maintenance
burden..

2006-09-15 13:59:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> Except as I pointed out, that everyone wants their info slightly
> differently so even trace points in the scheduler will be contentious
> and we will end up with a stack of them if we are to satisfy everyone.
> So now, you didn't demonstrate anything.

There is in my view, and this is what this whole debate is really
about, a clear difference in between the type of instrumentation
being added. Clearly in the view of others there just isn't. But
bare with me. I submit to you that there are 3 classes of trace
points:

- OS-class: These are trace points which will be found in a given
kernel regardless of how it is implemented if it belongs to a
certain family of OSes. Linux being made to mimic Unix, it will
always have key events. And if you look closely at the initial
set of points added by ltt, these would be found in any Unix.
It's not for nothing that my paper on ltt was accepted at Usenix
2000 - and in fact during the question period somebody asked how
easy it would be to port it to BSD, and the answer: trivial.

- Subsystem-class: These are trace points which are specific to
a given implementation. Say block tracing, scsi tracing, etc. as
they are implemented in Linux. The purpose of these is to allow
a user of these given subsystems to get more in-depth understanding
of what's happening inside the box.

- Debug-class: These are trace points required to find difficult
problems such as race-conditions/etc. which are needed to debug
the OS.

I'm not arguing for the inclusion of debug tracepoints. I can see
that within a given subsystem there can be disagreement over the
placement of specific tracepoints, and this is where I think your
argument lies and it is not without merit - IOW such tracepoints
should be more carefully scrutinized. However, there are OS-class
tracepoints for which I hardly see any possible debate either in
terms of usefulness or in terms of maintainability.

> Given that I have used this stuff to more than just debug code, then
> this obviously doesn't apply.
...
> You refuse to take the big picture into account and then claim that
> there is no cost of doing things your way. Point being that once you
> start maintaining a large project such as the kernel, or just parts of
> it, you realize how much those 'zero cost' additions really cost.

Someone else alluded to the parallel between in-code comments and
documentation maintained separately. There is a cost to in-code
instrumentation in the same way that there is to in-code documentation.
And they, in fact, are very much alike.

> And how does there not? If you want to add tracepoints to the syscall
> path, then you will make an impact. It's non trivial to validate, yes
> I have seen some scary attempts of adding LTT tracecalls to the ia64
> syscall path, and just because it might not be compiled in in most cases
> that doesn't mean it doesn't raise the complexity.

Again, this is an implementation issue. If we have a way to mark-up
code, then we can at least "hide" much of the scary stuff.

> As I said, kprobes are much more than kernel development! But you
> obviously haven't bothered looking at those properly! Been there done
> that!

I have, and taking an int3 on every tracepoint wasn't my liking, nor
was having to chase kernel versions for binary editing. If I was going
do maintenance I was much happier to work with source than binary.

> It will be a drag because next week someone else wants a tracepoint
> 5 lines further down the code! Again, I have seen people try and do
> that on top of the old LTT patchsets, so maybe *you* didn't want the
> tracepoint somewhere else, but some people did! Next?

Not if you understand the distinction I am making above.

Now, I can understand that you may think: Karim, nobody is going to
fsck'ing care about the distinction you're making once this is in
the kernel. But for me this is a separate, but yet entirely relevant,
part of the debate. The argument here has already been pointed out
elsewhere: There are already subsystem maintainers and they are more
than capable of taking the appropriate decisions. The distinction I
make above is not esoteric.

> You are once again ignoring the point that not everyone needs the exact
> same view of things that you are looking for. Dynamic probes allows for
> that, doing that with static probes is going to turn into maintenance
> hell. Guess what, some of us still try to look after code 8-10 years
> after we wrote it initially.

I'm not ignoring that people have different needs. I'm being depicted
as endorsing static traces all over the place, and I'm not advocating
such a course of action. The only reason any argument against static
instrumentation can be made is if you consider it from the debug
point of view and what drag such instrumentation would have. There is
a big difference of purpose and of persistent-relevance in between
debug instrumentation of os-class instrumentation. It's entirely
disingenuous to suggest otherwise.

Karim

2006-09-15 14:03:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Jes Sorensen wrote:

> Roman Zippel wrote:
> > The claim that these tracepoints would be maintainance burden is pretty
> > much unproven so far. The static tracepoint haters just assume the kernel
> > will be littered with thousands of unrelated tracepoints, where a good
> > tracepoint would only document what already happens in that function, so
> > that the tracepoint would be far from something obscure, which only few
> > people could understand and maintain.
>
> How do you propose to handle the case where two tracepoint clients wants
> slightly different data from the same function? I saw this with LTT
> users where someone wanted things in different places in schedule().
>
> It *is* a nightmare to maintain.

That nightmare would not be with tracepoints itself, but with the users of
it, so you're missing the point.
Tracepoints can be abused of course, but it's quite a leap to conclude
from this that they are bad in general.

> You still haven't explained your argument about kprobes not being
> generally available - where?

Huh? What kind of explanation do you want?

$ grep KPROBES arch/*/Kconf*
arch/i386/Kconfig:config KPROBES
arch/ia64/Kconfig:config KPROBES
arch/powerpc/Kconfig:config KPROBES
arch/sparc64/Kconfig:config KPROBES
arch/x86_64/Kconfig:config KPROBES

bye, Roman

2006-09-15 14:07:32

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Paul Mundt wrote:
> subjective, LTT proved that this was a problem regarding general
> code-level intrusiveness when the number of tracepoints in relatively
> close locality started piling up based on what people considered
> arbitrarily useful, and LTTng doesn't appear to do anything to address
> this.

"LTT proved that ..." what are you talking about? Have you noticed
the posting earlier regarding the fact that the ltt tracepoints did
not change over a 5 year span? **five** years ... Where do you get
this claim that ltt trace points "started piling up"? Have a look
at figure 2 of this article and let me know exactly which of those
tracepoints are actually a problem to you:
http://www.usenix.org/events/usenix2000/general/full_papers/yaghmour/yaghmour_html/index.html

Karim

2006-09-15 14:14:14

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Paul Mundt wrote:
>> subjective, LTT proved that this was a problem regarding general
>> code-level intrusiveness when the number of tracepoints in relatively
>> close locality started piling up based on what people considered
>> arbitrarily useful, and LTTng doesn't appear to do anything to address
>> this.
>
> "LTT proved that ..." what are you talking about? Have you noticed
> the posting earlier regarding the fact that the ltt tracepoints did
> not change over a 5 year span? **five** years ... Where do you get
> this claim that ltt trace points "started piling up"? Have a look
> at figure 2 of this article and let me know exactly which of those
> tracepoints are actually a problem to you:

Because other people have tried to use LTT for additional projects,
but said projects haven't been integrated into LTT. In other words,
just because *you* haven't added those, doesn't mean someone else
won't try and do it later, if LTT was integrated.

Nice try!

Jes

2006-09-15 14:15:10

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 16:03 +0200, ysgrifennodd Roman Zippel:
> Huh? What kind of explanation do you want?
>
> $ grep KPROBES arch/*/Kconf*
> arch/i386/Kconfig:config KPROBES
> arch/ia64/Kconfig:config KPROBES
> arch/powerpc/Kconfig:config KPROBES
> arch/sparc64/Kconfig:config KPROBES
> arch/x86_64/Kconfig:config KPROBES

Send patches. The fact nobody has them implemented on your platform
isn't a reason to implement something else, quite the reverse in fact.

Alan

2006-09-15 14:18:10

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 15:34 +0200, ysgrifennodd Roman Zippel:
> > Maintainability ? common good over individual weirdnesses ? Ability for
> > people to concentrate on getting one good set of interfaces not twelve
> > bad ones ? Consistency for user space ?
>
> Alan, you're making things up without any proof.

Welcome to my killfile. There isn't much point having a discussion with
anyone who considers any view or fact not in agreement as "no proof" and
any view or fact that favours them as "proven".

In the meantime perhaps the saner members of the static trace brigade
can explain why gcc debug data isn't good enough for them when its good
enough for kgdb to do single stepping at source level and variable
printing ?

Alan

2006-09-15 14:21:50

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> Because other people have tried to use LTT for additional projects,
> but said projects haven't been integrated into LTT. In other words,
> just because *you* haven't added those, doesn't mean someone else
> won't try and do it later, if LTT was integrated.

Thank you. I will take it as a complement and likely laminate this
email for your suggestion that I've acted responsibly in my
maintenance of ltt. Boy, can you imagine what this debate would
have looked like if I had included precisely those additional
projects ...

C'mon Jes, if I was able to responsibly maintain ltt over 5
years *out* of the tree and I'm being labeled as incompetent all
over this thread, then imagine what the very competent people
maintaining the kernel could actually do.

Karim

2006-09-15 14:25:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> In the meantime perhaps the saner members of the static trace brigade
> can explain why gcc debug data isn't good enough for them when its good
> enough for kgdb to do single stepping at source level and variable
> printing ?

Care to explain how I can use to implement the equivalent of this:

@@ -1709,6 +1712,7 @@ switch_tasks:
++*switch_count;

prepare_arch_switch(rq, next);
+ TRACE_SCHEDCHANGE(prev, next);
prev = context_switch(rq, prev, next);
barrier();

Also, care to explain how kprobes can be used to access same data
without having to actually customize a probe point for every binary?

Thanks,

Karim

2006-09-15 14:28:43

by Paul Mundt

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, Sep 15, 2006 at 10:31:51AM -0400, Karim Yaghmour wrote:
> Jes Sorensen wrote:
> > Because other people have tried to use LTT for additional projects,
> > but said projects haven't been integrated into LTT. In other words,
> > just because *you* haven't added those, doesn't mean someone else
> > won't try and do it later, if LTT was integrated.
>
> Thank you. I will take it as a complement and likely laminate this
> email for your suggestion that I've acted responsibly in my
> maintenance of ltt. Boy, can you imagine what this debate would
> have looked like if I had included precisely those additional
> projects ...
>
Which brings back the point of static tracepoints being entirely
subjective. By this line of reasoning, you define for other people what
the useful tracepoints are, and couldn't care less which points they're
actually interested in. How exactly is this serving the need of people
looking for instrumentation, rather than a pre-canned view of what they
can trace? If they already have to go with their own tracepoints for the
things they're interested in, then having a few static points
pre-existing doesn't really buy anyone much else either, especially if
by your own admission you're not integrating the points that people
_are_ interested in.

I'm not indicating that you didn't do exactly what you should have in
this situation, only that static tracepoints in general are only going
to be a small part of the picture, and not a complete solution to most
people on their own. Dynamic instrumentation fills the same sort of gap
without worrying about arbitrary maintenance, so what exactly does
shoving static instrumentation in to the kernel buy us?

2006-09-15 14:31:31

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Jes Sorensen wrote:
> There is in my view, and this is what this whole debate is really
> about, a clear difference in between the type of instrumentation
> being added. Clearly in the view of others there just isn't. But
> bare with me. I submit to you that there are 3 classes of trace
> points:
>
> - OS-class: These are trace points which will be found in a given
> kernel regardless of how it is implemented if it belongs to a
> certain family of OSes. Linux being made to mimic Unix, it will
> always have key events. And if you look closely at the initial
> set of points added by ltt, these would be found in any Unix.
> It's not for nothing that my paper on ltt was accepted at Usenix
> 2000 - and in fact during the question period somebody asked how
> easy it would be to port it to BSD, and the answer: trivial.

There very few tracepoints in this category, the only things you can
claim are more or less generic are syscalls, and tracing syscall
handling is tricky.

> - Subsystem-class: These are trace points which are specific to
> a given implementation. Say block tracing, scsi tracing, etc. as
> they are implemented in Linux. The purpose of these is to allow
> a user of these given subsystems to get more in-depth understanding
> of what's happening inside the box.

This is grossly over simplifying things and why the whole things doesn't
hold water. There is no such thing as 'the place' to put a specific
tracepoint.

Especially when we start talking about things like tracepoints in the
scheduler.

Note that I haven't been referring to debug tracepoints at any point in
this debate.

>> It will be a drag because next week someone else wants a tracepoint
>> 5 lines further down the code! Again, I have seen people try and do
>> that on top of the old LTT patchsets, so maybe *you* didn't want the
>> tracepoint somewhere else, but some people did! Next?
>
> Not if you understand the distinction I am making above.

Your distinction above doesn't hold water, but I did understand it
very well ....

You seem to think that it's fine to add instrumentation in the syscall
path as an example as long as it's compiled out. Well on some
architectures, the syscall path is very sensitive to alignment and there
may be restrictions on how large the stub of code is allowed to be, like
a few hundred bytes. Just because things work one way on x86, doesn't
mean they work like that everywhere.

Jes

2006-09-15 14:34:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Alan Cox wrote:

> Ar Gwe, 2006-09-15 am 16:03 +0200, ysgrifennodd Roman Zippel:
> > Huh? What kind of explanation do you want?
> >
> > $ grep KPROBES arch/*/Kconf*
> > arch/i386/Kconfig:config KPROBES
> > arch/ia64/Kconfig:config KPROBES
> > arch/powerpc/Kconfig:config KPROBES
> > arch/sparc64/Kconfig:config KPROBES
> > arch/x86_64/Kconfig:config KPROBES
>
> Send patches. The fact nobody has them implemented on your platform
> isn't a reason to implement something else, quite the reverse in fact.

Alan, you offer no fact at all and all I can think about this is rather
emotional and potentially offensive, so I'll refrain from further
comments. The anti-tracepoint league has made up its mind anyway, so
what's the point... :-(

bye, Roman

2006-09-15 14:35:53

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 10:35 -0400, ysgrifennodd Karim Yaghmour:
> Care to explain how I can use to implement the equivalent of this:
>
> @@ -1709,6 +1712,7 @@ switch_tasks:
> ++*switch_count;
>
> prepare_arch_switch(rq, next);
> + TRACE_SCHEDCHANGE(prev, next);
> prev = context_switch(rq, prev, next);
> barrier();

The gdb debug data lets you find each line and also the variable
assignments (except when highly optimised in some cases). Try
breakpointing there with kgdb and using "where"... A kgdb script is the
wrong way to do instrumentation but it does demonstrate the information
is already out there, automatically generated and self maintaining.

You do need the gdb -g debug data, but equally if it was static you'd
need to recompile with the tracepoint because it would be off by
default, and there is a very small risk in both cases you'll disturb or
change the code behaviour/flow.

> Also, care to explain how kprobes can be used to access same data
> without having to actually customize a probe point for every binary?

Thats why we have things like systemtap.

All we appear to lack is systemtap ability to parse debug data so it can
be told "trace on line 9 of sched.c and record rq and next"

Alan

2006-09-15 14:40:38

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Jes Sorensen wrote:
>> Because other people have tried to use LTT for additional projects,
>> but said projects haven't been integrated into LTT. In other words,
>> just because *you* haven't added those, doesn't mean someone else
>> won't try and do it later, if LTT was integrated.
>
> Thank you. I will take it as a complement and likely laminate this
> email for your suggestion that I've acted responsibly in my
> maintenance of ltt. Boy, can you imagine what this debate would
> have looked like if I had included precisely those additional
> projects ...

Karim,

Thank you for this, it just proves that taking this discussion any
further is a waste of everybody's time.

> C'mon Jes, if I was able to responsibly maintain ltt over 5
> years *out* of the tree and I'm being labeled as incompetent all
> over this thread, then imagine what the very competent people
> maintaining the kernel could actually do.

Nobody ever said you were irresponsible, but you are claiming that you
are able to define a finite set of static tracepoints that are relevant
to everybody. Or in other words, they are defined as being the ones
relevant to you.

Please read Paul Mundt's response to your email - it's bang on, couldn't
put it any better myself.

Jes

2006-09-15 14:43:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Paul Mundt wrote:
> Which brings back the point of static tracepoints being entirely
> subjective. By this line of reasoning, you define for other people what
> the useful tracepoints are, and couldn't care less which points they're
> actually interested in. How exactly is this serving the need of people
> looking for instrumentation, rather than a pre-canned view of what they
> can trace? If they already have to go with their own tracepoints for the
> things they're interested in, then having a few static points
> pre-existing doesn't really buy anyone much else either, especially if
> by your own admission you're not integrating the points that people
> _are_ interested in.
>
> I'm not indicating that you didn't do exactly what you should have in
> this situation, only that static tracepoints in general are only going
> to be a small part of the picture, and not a complete solution to most
> people on their own. Dynamic instrumentation fills the same sort of gap
> without worrying about arbitrary maintenance, so what exactly does
> shoving static instrumentation in to the kernel buy us?

And this flies in the face of all of those who, for years, have been
satisfied customers for ltt and who were more than looking forwad
for not having to depend on me to get a working traceable kernel.

The static tracepoints we maintained were *the* solution for a great
deal many people. As a maintainer I had two choices with those who
were not content:
a- Maintain their tracepoints for them -- not happening.
b- Suggest they contribute to helping getting a generic tracing
infrastructure into the kernel and then make their case on the
lkml as to the pertinence of their instrumentation.

And what I did is "b". I wasn't going to defend anybody else's
choice of tracepoints. Those who were using ltt for its designated
purpose -- allowing normal users and developers to get an accurate
view of the behavior of their system -- were very happy with it.

You want to know who was unhappy with using it: kernel developers.
It just wasn't geared for them. Which goes back to my earlier
arguments ...

Karim

2006-09-15 14:47:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> Which brings back the point of static tracepoints being entirely
> subjective. By this line of reasoning, you define for other people what
> the useful tracepoints are, and couldn't care less which points they're
> actually interested in. How exactly is this serving the need of people
> looking for instrumentation, rather than a pre-canned view of what they
> can trace? If they already have to go with their own tracepoints for the
> things they're interested in, then having a few static points
> pre-existing doesn't really buy anyone much else either, especially if
> by your own admission you're not integrating the points that people
> _are_ interested in.

They're not *entirely* subjective, though I agree some are. I find the
fact that Andrew Morton, myself, and apparently several other people
have all instrumented the memory reclaim code to tell you *why* it's
failing to reclaim pages at various points in time slightly amusing,
but also rather depressing. It's all rather a waste of effort.

Moreover, subsystem experts know what needs to be traced in order to
give useful information, and the users may not. It's a damned sight
easier for them to say "oh, please turn on tracing for VM events
and send me the output" than custom-construct a set of probes for
that user, and send them off. There's a barrier to entry that just
won't happen there.

Hell, look at all the debug printks in the kernel for example, and
the various small add-hoc tracing facilities. If all we do is unite
those, it'll still be a step forwards.

> I'm not indicating that you didn't do exactly what you should have in
> this situation, only that static tracepoints in general are only going
> to be a small part of the picture, and not a complete solution to most
> people on their own. Dynamic instrumentation fills the same sort of gap
> without worrying about arbitrary maintenance, so what exactly does
> shoving static instrumentation in to the kernel buy us?

Dynamic probes do NOT reduce maintenance, they increase it. They just
push it into somebody else's lap, where it's done more inefficiently.
That's not a solution. The question is what's add-hoc debug for a
particular problem vs. what's generically useful. I refuse to believe
that the subsystem maintainers are too stupid to be able to make that
judgement call.

M.

2006-09-15 14:47:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> The gdb debug data lets you find each line and also the variable
> assignments (except when highly optimised in some cases). Try
> breakpointing there with kgdb and using "where"... A kgdb script is the
> wrong way to do instrumentation but it does demonstrate the information
> is already out there, automatically generated and self maintaining.
>
> You do need the gdb -g debug data, but equally if it was static you'd
> need to recompile with the tracepoint because it would be off by
> default, and there is a very small risk in both cases you'll disturb or
> change the code behaviour/flow.
...
> Thats why we have things like systemtap.
>
> All we appear to lack is systemtap ability to parse debug data so it can
> be told "trace on line 9 of sched.c and record rq and next"

Thanks for the explanation. But I submit to you that both explanations
actually highlight the argument I was making earlier with regards to
dynamic tracing (and gdb info in this case) actually require a non-
expert to chase kernel versions and create appropriate appropriate
scripts/config-info for the post-insertion of instrumentation, with
the risks to kernel developers this may have (ex.: bug report to
lkml from user claiming to have discovered problem in subsystem when,
in fact, trace point by external maintainer was ill-chosen.)

Cheers,

Karim

2006-09-15 14:54:16

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> Thank you for this, it just proves that taking this discussion any
> further is a waste of everybody's time.

Sorry you feel this way.

> Nobody ever said you were irresponsible, but you are claiming that you
> are able to define a finite set of static tracepoints that are relevant
> to everybody. Or in other words, they are defined as being the ones
> relevant to you.

No, I'm precisely not claiming that the tracepoints I was looking for
were "relevant to everybody". They are, however, very relevant to any
standard sysadmin or developer who wants to get a better picture of
what his kernel is doing. Again, please refer to figure 2 of this
article and explain to me why it's not relevant for standard users
and developers to understand when these events happen inside the
kernel:
http://www.usenix.org/events/usenix2000/general/full_papers/yaghmour/yaghmour_html/index.html

Karim

2006-09-15 14:59:30

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 07:46 -0700, ysgrifennodd Martin J. Bligh:
> Moreover, subsystem experts know what needs to be traced in order to
> give useful information, and the users may not. It's a damned sight
> easier for them to say "oh, please turn on tracing for VM events
> and send me the output" than custom-construct a set of probes for
> that user, and send them off. There's a barrier to entry that just
> won't happen there.

That has nothing to do with the static or dynamic probe question.
Scriptable dynamic probes do everything your static probes do and more.

> Hell, look at all the debug printks in the kernel for example, and
> the various small add-hoc tracing facilities. If all we do is unite
> those, it'll still be a step forwards.

Look how many there are, look how they spread, tracepoints will do the
same.

> Dynamic probes do NOT reduce maintenance, they increase it.

Thats a logical fallacy to begin with. A dynamic probe can probe
anything a static probe can. So a static probe can be implemented with a
dynamic probe.

In other words if you like static probe lists and your subsystem happens
to be one where it is useful then you can script it with the same effect
and send people the script.

With kprobes you've got a passably good chance (ie if Distros can be
persuaded to package the debug data) that you can say "run this
systemtap script". With static tracepoints its "recompile your vendor
kernel in your vendor manner with your vendor initrd and add it to the
boot loader"

Alan

2006-09-15 15:00:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 10:51 -0400, Karim Yaghmour wrote:
> And what I did is "b". I wasn't going to defend anybody else's
> choice of tracepoints. Those who were using ltt for its designated
> purpose -- allowing normal users and developers to get an accurate
> view of the behavior of their system -- were very happy with it.
>
> You want to know who was unhappy with using it: kernel developers.
> It just wasn't geared for them. Which goes back to my earlier
> arguments ...

What do you want to prove with this rant ? Simply the fact that your
view of tracing is not matching the view of others. Nothing else.

You just made it clear, that your solution was and still is targeted on
one single user group.

Nobody is opposing instrumentation per se, we just need to figure out a
good solution suitable for endusers, kernel developers, debug
fetishists ... without splattering ten different tracers all across the
kernel source.

The way to a solid kernel instrumentation is definitely not by pushing a
single purpose solution in, which we have to _maintain_ for a long time
without being convinced that it is the _best_ technical solution we can
have right now.

tglx

2006-09-15 15:01:35

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 10:51 -0400, ysgrifennodd Karim Yaghmour:
> The static tracepoints we maintained were *the* solution for a great

I think you mean "a" solution. You've not proved there are no others.

> deal many people. As a maintainer I had two choices with those who
> were not content:
> a- Maintain their tracepoints for them -- not happening.
> b- Suggest they contribute to helping getting a generic tracing
> infrastructure into the kernel and then make their case on the
> lkml as to the pertinence of their instrumentation.

b has been done, its called kprobes. We just need better tools for the
dynamic probes.

> choice of tracepoints. Those who were using ltt for its designated
> purpose -- allowing normal users and developers to get an accurate
> view of the behavior of their system -- were very happy with it.

and you can maintain "Karim's probe list" which is the dynamic probe set
which matches your old static probes, only of course its now much more
flexible.

2006-09-15 15:02:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> There very few tracepoints in this category,

Wow, that's progress.

> the only things you can
> claim are more or less generic are syscalls, and tracing syscall
> handling is tricky.

If there are implementation issue, I trust an adequate solution can be
found by using the tested-and-proven method of posting stuff on the
lkml for review.

> This is grossly over simplifying things and why the whole things doesn't
> hold water. There is no such thing as 'the place' to put a specific
> tracepoint.
>
> Especially when we start talking about things like tracepoints in the
> scheduler.

I do not underestimate the difficulty of selecting such tracepoints.
This is why I chose not to maintain other people's specific tracepoints.
I realize this is a tough problem, but I also trust subsystem maintainers
are smart enough to make the appropriate decision. Obviously for such
things like the scheduler, any fine-grained instrumentation will draw
a barrage of criticism from anyone since a lot of stuff depends on it.
Either the lkml process works or it doesn't, but it isn't for me to
decide.

> Note that I haven't been referring to debug tracepoints at any point in
> this debate.

You're right, but others have happily intermingled the whole lot, and
I just wanted to document my personal categorization on lkml for all
to see.

> You seem to think that it's fine to add instrumentation in the syscall
> path as an example as long as it's compiled out. Well on some
> architectures, the syscall path is very sensitive to alignment and there
> may be restrictions on how large the stub of code is allowed to be, like
> a few hundred bytes. Just because things work one way on x86, doesn't
> mean they work like that everywhere.

If ltt failed to implement such things appropriately, then we apologize.
That fact doesn't preclude proper implementation in the future, however.

Karim

2006-09-15 15:04:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Michel Dagenais <[email protected]> wrote:
>
> > This is the crucial point. Using an INT3 at each dynamic tracepoint is
> > both costly and is a larger perturbation on the system under study.
> > [...]
>
> have you measured this?
>

Hi Ingo,

A very quick test (yes, done in user space, but should be accurate enough for
our needs) on a pentium 4 3 GHz shows that generating a int3 breakpoint in a
loop (connected to an empty handler) takes an average of 2.01?s per breakpoint.

LTT has an impact of about 0.220?s per probe (10 times smaller).

Please refer to this kind of high event rate workload :
http://www.listserv.shafik.org/pipermail/ltt-dev/2005-December/001139.html

On the same pentium 4, 3 GHz (in the following results, I do not consider the
fact that the CPU had hyperthreading enabled) :

Probe execution time at probe site : 220ns/event

220ns * 9588836 events = 2.11s

Event rate : 749994 events per second

LTT :
749994 events/s * 0.220?s/event = 16.5 % of cpu time

With a breakpoint :
749994 events/s * 2.01?s/event = 150 % of cpu time

Considering the limitations of these tests :
- int3 timings taken from user space, which implies calling an empty handler in
user space.
- The machine had hyperthreading enabled, but considered UP here.

It shows that tracing the same workload with breakpoints would make the machine
more than twice slower when a direct memory write has a relatively small impact
(16.5% of cpu time spent in probes).

In high event rate/low perturbation scenarios where instrumentation is put at
arbitrary locations in the code, it shows necessary to use the static
instrumentation alternative because the breakpoint approach is just too slow.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 15:13:48

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> b has been done, its called kprobes. We just need better tools for the
> dynamic probes.

As long as there needs to be the updating of an outside piece of something
then "b" hasn't been done. Especially with regards to what this means
to figuring out which of kernel or instrumentation-script is broken when
you get bug reports on lkml.

> and you can maintain "Karim's probe list" which is the dynamic probe set
> which matches your old static probes, only of course its now much more
> flexible.

Sorry, the issue isn't about my probe list. The issue is that there
needs to be a way of pointing important events without having to
modify things at 3 or 4 different places. The only way this can be
done is if it's in the tree -- regardless of the mechanism. This
isn't about static tracers vs. dynamic tracers, it's about statically
marking code. What goes underneath is secondary. And if the static
markup -- with even the SystemTap people are interested in -- is
but a hook for further selecting the appropriate instrumentation
mechanism, then that's fine too.

Karim

2006-09-15 15:18:22

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Thomas Gleixner wrote:
> You just made it clear, that your solution was and still is targeted on
> one single user group.

And that was part of my point. Every time I got in a debate on lkml
regarding ltt, there were crowds screaming in horror at the possibility
of trace points everywhere.

> Nobody is opposing instrumentation per se, we just need to figure out a
> good solution suitable for endusers, kernel developers, debug
> fetishists ... without splattering ten different tracers all across the
> kernel source.

I agree entirely.

> The way to a solid kernel instrumentation is definitely not by pushing a
> single purpose solution in, which we have to _maintain_ for a long time
> without being convinced that it is the _best_ technical solution we can
> have right now.

I think we're in full agreement. A solid kernel instrumentation mechanism
is exactly what is needed. The whole point of posting the ltt stuff on
the lkml is exactly to get the best technical solution. The ltt developers
are more than happy to take suggestions as to how to achieve this.

Karim

2006-09-15 15:38:25

by Michel Dagenais

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> only 5 bytes of NOP are needed by default, so that a kprobe can insert a
> call/callq instruction. The easiest way in practice is to insert a
> _single_, unconditional function call that is patched out to NOPs upon
> its first occurance (doing this is not a performance issue at all). That
> way the only cost is the NOP and the function parameter preparation
> side-effects. (which might or might not be significant - with register
> calling conventions and most parameters being readily available it
> should be small.)

Interestingly, while this whole thread is full of diverging views, there
is nevertheless considerable common ground.

- Getting a trace output is very useful, whether it is generated from
dynamic or static tracepoints. You need some infrastructure (e.g.
relayfs + a few things) to get the data out efficiently.

- Some sort of static markers make sense in key locations. Whether they
are there "primarily" for dynamic or static tracepoints is mostly
irrelevant. Interesting suggestions were made for a syntax clearly
identifying their "probe point" status.

>From there we can get onto a constructive debate about the technical
details of each of these components.

> note that such a limited, minimally invasive 'data extraction point'
> infrastructure is not actually what the LTT patches are doing. It's not
> even close, and i think you'll be surprised. Let me quote from the
> latest LTT patch (patch-2.6.17-lttng-0.5.108, which is the same version
> submitted to lkml - although no specific tracepoints were submitted):

This is a case where it started with inline code but as you take into
account SMP and eventuelly multiple traces (e.g. the sysadmin is tracing
the system and a user is generating a trace for his processes) it
becomes larger and inlining may not be such a good idea any more, to say
the least. However, this is relatively easy to change.

It is also worth mentioning that code patching NOPs to minimize the cost
of inactive tracepoints was envisioned quite some time ago. Again you
might call these "static low overhead placeholders for optimized dynamic
tracepoints" or "optimized low overhead static tracepoints"... You need
however to be careful when code patching instructions on SMP as it may
not be trivial to atomically replace 5 NOPs by a call.

2006-09-15 15:48:00

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> Ar Gwe, 2006-09-15 am 07:46 -0700, ysgrifennodd Martin J. Bligh:
>> Moreover, subsystem experts know what needs to be traced in order to
>> give useful information, and the users may not. It's a damned sight
>> easier for them to say "oh, please turn on tracing for VM events
>> and send me the output" than custom-construct a set of probes for
>> that user, and send them off. There's a barrier to entry that just
>> won't happen there.
>
> That has nothing to do with the static or dynamic probe question.
> Scriptable dynamic probes do everything your static probes do and more.

No. The point is that they're not *there* and have to be modified
for every kernel version. And do you mean with or without the markers
in the code to tell the dynamic probes where to hook in, and what data
to fetch? that makes a huge difference.

Suppose, as a very real example, I want to instrument shrink_list.
There are 20 or so places where it can switch what we're doing with
a page for different reasons. Potentially we're scanning through many
thousands of pages. If I can keep counters as I go through the function,
and then do one trace entry at the end, that's fairly efficient. If I
have to create 20 separate hooks that all jump out of line, it's going
to be a lot slower. If I log a tracepoint at every damned page every
time it switches, it's going to be a nightmare.

Most things can be done with dynamic probes. Some things will require
markers in the code to tell us sustainably over time where to attatch
them. A few things (like the above) probably require some explicit
code.

>> Hell, look at all the debug printks in the kernel for example, and
>> the various small add-hoc tracing facilities. If all we do is unite
>> those, it'll still be a step forwards.
>
> Look how many there are, look how they spread, tracepoints will do the
> same.

As long as they all use the same infrastructure, that's an improvement.

>> Dynamic probes do NOT reduce maintenance, they increase it.
>
> Thats a logical fallacy to begin with. A dynamic probe can probe
> anything a static probe can. So a static probe can be implemented with a
> dynamic probe.

In the absence of the markers, I don't think that's true - there's the
maintenance of exactly where they go, plus access to local data. If you
mean with markers, then yes, that's fine. The markers + dynamic probes
seems to be a reasonable compromise between the two. Exactly what we
call that combo, static or dynamic, I don't really care ;-)

> In other words if you like static probe lists and your subsystem happens
> to be one where it is useful then you can script it with the same effect
> and send people the script.
>
> With kprobes you've got a passably good chance (ie if Distros can be
> persuaded to package the debug data) that you can say "run this
> systemtap script". With static tracepoints its "recompile your vendor
> kernel in your vendor manner with your vendor initrd and add it to the
> boot loader"

You're thinking of one situation where you can't recompile. I'm thinking
of a situation where it's trivial to recompile. Both exist, neither is
invalid. Of course, where possible, we'd like to be able to add stuff
on the fly, but it's not a panacea.

Without the markers, maintaining a usable set of dynamic probe points
that's always available for every kernel version seems infeasible.
With them, I think it'll cover 99% of the cases, and would be pretty
useful. If people agree on putting tags in there, perhaps we can
discuss things like the logging mechanism, format, and readout.
If not, I suppose we have to drag this debate out even longer.

M.

2006-09-15 16:33:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Frank Ch. Eigler wrote:
> "Martin J. Bligh" <[email protected]> writes:
>
> > without all the awk-style language crap that seems to come with
> > systemtap.
>
> I'm sorry to hear you dislike the scripting language. But that's
> okay, you Real Men can embed literal C code inside systemtap scripts
> to do the Real Work, and leave to systemtap only sundry duties such as
> probe placement and removal.
>

There are also a couple of projects within SystemTap that provide trace
like functionality without the need to use the SystemTap language. In
the case of LKET, we've tried to make this as simple as possible by
predefining probe points using the SystemTap language and embedded C
code, but from a users perspective all he really need to do is just
invoke a simple script like:

#! stap
process_snapshot() {}
addevent.tskdispatch.cpuidle {}
addevent.process {}
addevent.syscall.entry { printf ("%4b", $flags) }
addevent.syscall.exit {}
addevent.tskdispatch.cpuidle {}

The data can later be analyses in user-space with what ever method you like. The developer instrumenting the probe point needs to know the Systemtap language, but the user of the trace just need to know which events are available to him.

We also plan to do static tracing once SystemTap supports static markers. This may not be the perfect solution, but I'm interested in knowing how we can get there.

-JRS

2006-09-15 16:59:19

by Tim Bird

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> Ar Gwe, 2006-09-15 am 10:35 -0400, ysgrifennodd Karim Yaghmour:
>> @@ -1709,6 +1712,7 @@ switch_tasks:
>> ++*switch_count;
>>
>> prepare_arch_switch(rq, next);
>> + TRACE_SCHEDCHANGE(prev, next);
>> prev = context_switch(rq, prev, next);
>> barrier();
>
> All we appear to lack is systemtap ability to parse debug data so it can
> be told "trace on line 9 of sched.c and record rq and next"

If the latter is a suggestion for how an out-of-tree rule for a
tracepoint definition should look, it's a terrible one.
Alan's example is much more fragile, from a maintenance perspective,
than Karim's. Plus, it's much more difficult to implement, whether
you plan to inject no-ops at compile time, just record locations and
stack offsets, or actually place some tracing code (heaven forbid)
that the compiler could optimize for that context.

I still think that this is off-topic for the patch posted. I think we
should debate the implementation of tracepoints/markers when someone posts a
patch for some. I think it's rather scurrilous to complain about
code NOT submitted. Ingo has even mis-characterized the not-submitted
instrumentation patch, by saying it has 350 tracepoints when it has no
such thing. I counted 58 for one architecture (with only 8 being
arch-specific).
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

2006-09-15 17:09:20

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox <[email protected]> writes:

> [...]
> >
> > prepare_arch_switch(rq, next);
> > + TRACE_SCHEDCHANGE(prev, next);
> > prev = context_switch(rq, prev, next);
> > barrier();
>
> The gdb debug data lets you find each line and also the variable
> assignments (except when highly optimised in some cases). [...]

Unfortunately, variables and even control flow are quite regularly
made non-probe-capable by modern gcc. Statement boundaries and
variables are not preserved. There is an arms race within gcc to both
improve code optimization and its own "reverse-engineering" debugging
data generation, and the former is always ahead.

The end result is that there are many spots that we'd like to probe in
systemtap, but can't place exactly or extract all the data we'd like.
Really.

There are also spots that for other reasons cannot tolerate a fully
dynamic kprobes-style probe:

- where 1000-cycle int3-dispatching overheads too high
- in low-level code such as fault handling or locking, that, if probed
dynamically, could entail infinite regress
- debugging information may not be available

This is the reason why I'm in favour of some lightweight event-marking
facility: a way of catching those points where dynamic probing is not
sufficiently fast or dependable.

> [...]
> All we appear to lack is systemtap ability to parse debug data so it can
> be told "trace on line 9 of sched.c and record rq and next"

Actually:

#! stap
probe kernel.function("*@kernel/sched.c:9") { printf("%p %p", $rq, $next) }

- FChE

2006-09-15 17:14:14

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> * Ingo Molnar ([email protected]) wrote:
> >
> > * Roman Zippel <[email protected]> wrote:
> >
> > the key point is that we want _zero_ "static tracepoints". Firstly,
> > static tracepoints are fundamentally limited:
> >
> > - they can only be added at the source code level
> >
> > - modifying them requires a reboot which is not practical in a
> > production environment
>
> Not for kernel modules : unload/load is enough.
>

This assumes that the module can be unloaded in the first place.
Inserting a new probe on the disk controler for your boot drive or in
the filesystem module would still require a reboot.

> If the trace points are modified with the code by the ones who make the
> original code changes, it lessens the maintainance overhead. Furthermore, if
> there is a major change in a code path that requires rethinking the trace
> points, the person introducing the change has the best knowledge of what to do
> with the trace point. I think that trace point maintainance should be left to
> subsystem maintainers, not a centralised task done by distributions once in a
> while.
>

I agree with you here, I think is silly to claim dynamic instrumentation
as a fix for the "constant maintainace overhead" of static trace point.
Working on LKET, one of the biggest burdens that we've had is mantainig
the probe points when something in the kernel changes enough to cause a
breakage of the dynamic instrumentation. The solution to this is having
the SystemTap tapsets maintained by the subsystems maintainers so that
changes in the code can be applied to the dynamic instrumentation as
well. This of course means that the subsystem maintainer would need to
maintain two pieces of code instead of one. There are a lot of
advantages to dynamic vs static instrumentation, but I don't think
maintainace overhead is one of them.

-JRS

2006-09-15 17:21:33

by Chuck Ebbert

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

In-Reply-To: <[email protected]>

On Fri, 15 Sep 2006 15:37:51 +0100, Alan Cox wrote:

> > $ grep KPROBES arch/*/Kconf*
> > arch/i386/Kconfig:config KPROBES
> > arch/ia64/Kconfig:config KPROBES
> > arch/powerpc/Kconfig:config KPROBES
> > arch/sparc64/Kconfig:config KPROBES
> > arch/x86_64/Kconfig:config KPROBES
>
> Send patches. The fact nobody has them implemented on your platform
> isn't a reason to implement something else, quite the reverse in fact.

Yes, but the point is: until that's done you can't claim kprobes is a
valid tracing tool for everyone.

And things like net/ipv4/tcp_probe.c shouldn't be generally implemented
until every arch is supported.

--
Chuck

2006-09-15 17:46:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 15 Sep 2006 13:38:58 +0100
Alan Cox <[email protected]> wrote:

> gcc -g produces extensive annotations which are then usably by many
> tools other than gdb.

This is something I'm curious about. AFAICT there are two(*) reasons for
wanting static tracepoints:

a) to be able to get at local variables and

b) as a "marker" somewhere within the body of a function - the
expectation here is that identifiying that particular spot in the
function would be hard without some marker which moves around as the
functions itself is modified over time.

If a) is true, then isn't this simply a feature request against the
systemtap infrastructure? There's no reason per-se why a kprobe point
cannot access locals, using the dwarf debug info. It'll be somewhat
unreliable, because stack slots and registers go out of scope and get
reused for other things. But as any gdb user will know, it's still
useful.

As for b), if it was _really_ an advantage to be able to identify
particular places within the body of a function then one could concoct a
macro which inserts some info into a separate elf section and which adds no
code at all to actual .text.

Although IMO this is a bit lame - it is quite possible to go into
SexySystemTapGUI, click on a particular kernel file-n-line and have
systemtap userspace keep track of that place in the kernel source across
many kernel versions: all it needs to do is to remember the file+line and a
snippet of the surrounding text, for readjustment purposes.

(*) I don't buy the performance arguments: kprobes are quick, and I'd
expect that the CPU consumption of the destination of the probe is
comparable to or higher than the cost of taking the initial trap.

2006-09-15 17:51:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 15 Sep 2006 10:57:29 -0400
Karim Yaghmour <[email protected]> wrote:

> But I submit to you that both explanations
> actually highlight the argument I was making earlier with regards to
> dynamic tracing (and gdb info in this case) actually require a non-
> expert to chase kernel versions and create appropriate appropriate
> scripts/config-info for the post-insertion of instrumentation
> ...

Again, I don't see this as a huge problem. patch(1) is able to keep track
of specific places within source code even in the presence of quite violent
changes to that source code. There's no reason why systemtap support code
cannot do the same.

2006-09-15 17:59:20

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On 15 Sep 2006 13:08:29 -0400
[email protected] (Frank Ch. Eigler) wrote:

> Alan Cox <[email protected]> writes:
>
> > [...]
> > >
> > > prepare_arch_switch(rq, next);
> > > + TRACE_SCHEDCHANGE(prev, next);
> > > prev = context_switch(rq, prev, next);
> > > barrier();
> >
> > The gdb debug data lets you find each line and also the variable
> > assignments (except when highly optimised in some cases). [...]
>
> Unfortunately, variables and even control flow are quite regularly
> made non-probe-capable by modern gcc. Statement boundaries and
> variables are not preserved. There is an arms race within gcc to both
> improve code optimization and its own "reverse-engineering" debugging
> data generation, and the former is always ahead.
>
> The end result is that there are many spots that we'd like to probe in
> systemtap, but can't place exactly or extract all the data we'd like.
> Really.

Useful info, thanks.

> There are also spots that for other reasons cannot tolerate a fully
> dynamic kprobes-style probe:
>
> - where 1000-cycle int3-dispatching overheads too high

Is that still true of the recent kprobes "boosting" changes?

> - in low-level code such as fault handling or locking, that, if probed
> dynamically, could entail infinite regress
> - debugging information may not be available
>
> This is the reason why I'm in favour of some lightweight event-marking
> facility: a way of catching those points where dynamic probing is not
> sufficiently fast or dependable.

OK.

> > [...]
> > All we appear to lack is systemtap ability to parse debug data so it can
> > be told "trace on line 9 of sched.c and record rq and next"
>
> Actually:
>
> #! stap
> probe kernel.function("*@kernel/sched.c:9") { printf("%p %p", $rq, $next) }
>

Really. That's impressive progress.

2006-09-15 18:06:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Andrew Morton wrote:
> This is something I'm curious about. AFAICT there are two(*) reasons for
> wanting static tracepoints:
>
> a) to be able to get at local variables and
>
> b) as a "marker" somewhere within the body of a function - the
> expectation here is that identifiying that particular spot in the
> function would be hard without some marker which moves around as the
> functions itself is modified over time.
>
>
> If a) is true, then isn't this simply a feature request against the
> systemtap infrastructure? There's no reason per-se why a kprobe point
> cannot access locals, using the dwarf debug info. It'll be somewhat
> unreliable, because stack slots and registers go out of scope and get
> reused for other things. But as any gdb user will know, it's still
> useful.

I believe this has been addressed by Frank in his other email, so I'll
skip.

> As for b), if it was _really_ an advantage to be able to identify
> particular places within the body of a function then one could concoct a
> macro which inserts some info into a separate elf section and which adds no
> code at all to actual .text.

Yes, and this specific suggestion has been made a number of times.
Though, then, this is an implementation debate and there are number
of things which could be made available as build-time options. The
emerging consensus in this thread, however, that there is a clear
need for a way for statically marking up important events, and this
point has been emphasized both by those who have maintained
infrastructure based on "static" tracepoints and those maintaining
such infrastructure based on "dynamic" tracepoints.

> Although IMO this is a bit lame - it is quite possible to go into
> SexySystemTapGUI, click on a particular kernel file-n-line and have
> systemtap userspace keep track of that place in the kernel source across
> many kernel versions: all it needs to do is to remember the file+line and a
> snippet of the surrounding text, for readjustment purposes.

Sure, if you're a kernel developer, but as I've explained numberous
times in this thread, there are far more many users of tracing than
kernel developers.

> (*) I don't buy the performance arguments: kprobes are quick, and I'd
> expect that the CPU consumption of the destination of the probe is
> comparable to or higher than the cost of taking the initial trap.

Please see Mathieu's earlier posting of numbers comparing kprobes to
static points. Nevertheless, I do not believe that the use of kprobes
should be pitted against static instrumentation, the two are
orthogonal.

Karim

2006-09-15 18:08:33

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 13:14 -0400, ysgrifennodd Chuck Ebbert:
> In-Reply-To: <[email protected]>
> > > $ grep KPROBES arch/*/Kconf*
> > > arch/i386/Kconfig:config KPROBES
> > > arch/ia64/Kconfig:config KPROBES
> > > arch/powerpc/Kconfig:config KPROBES
> > > arch/sparc64/Kconfig:config KPROBES
> > > arch/x86_64/Kconfig:config KPROBES
> >
> > Send patches. The fact nobody has them implemented on your platform
> > isn't a reason to implement something else, quite the reverse in fact.
>
> Yes, but the point is: until that's done you can't claim kprobes is a
> valid tracing tool for everyone.

I can however claim that kprobes is what they should be implementing not
adding new large patches for another infrastructure whose author has
already said for dynamic stuff it is based on the same things.

2006-09-15 18:09:28

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 13:08 -0400, ysgrifennodd Frank Ch. Eigler:
> Alan Cox <[email protected]> writes:
> - where 1000-cycle int3-dispatching overheads too high

Why are your despatching overheads 1000 cycles ? (and if its due to int3
why are you using int 3 8))

2006-09-15 18:10:58

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> Consistency for user space ?
>

With several other trace tools being implemented for the kernel, there
is a great problem with consistencies among these tool. It is my
opinion that trace are of very little use to _most_ people with out the
availability of post-processing tools to analyses these trace. While I
wont say that we need one all powerful solution, it would be good if all
solutions would at least be able to talk to the same post-processing
facilities in user-space. Before LTTng is even considered into the
kernel, there need to be discussion to determine if the trace mechanism
being propose is suitable for all people interested in doing trace
analysis. The fact the there also exist tool like LKET and LKST seem to
suggest that there other things to be considered when it comes to
implementing a trace mechanism that everyone would be happy with.

It would also be useful for all the trace tool to implement the same
probe points so that post-processing tools can be interchanged between
the various trace implementations.

-JRS

2006-09-15 18:10:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Andrew Morton wrote:
> Again, I don't see this as a huge problem. patch(1) is able to keep track
> of specific places within source code even in the presence of quite violent
> changes to that source code. There's no reason why systemtap support code
> cannot do the same.

If you don't want to listen to my part of the argument then consider
the point of view of those who have maintained systems entirely based
on binary editing, namely systemtap and LKET. It's indicative that
all those who have been involved in tracing, be it by static
instrumentation of code or the use of binary editing, all favor some
form of static markup mechanism of the code.

Karim

2006-09-15 18:18:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 15 Sep 2006 17:00:47 +0200
Thomas Gleixner <[email protected]> wrote:

> On Fri, 2006-09-15 at 10:51 -0400, Karim Yaghmour wrote:
> > And what I did is "b". I wasn't going to defend anybody else's
> > choice of tracepoints. Those who were using ltt for its designated
> > purpose -- allowing normal users and developers to get an accurate
> > view of the behavior of their system -- were very happy with it.
> >
> > You want to know who was unhappy with using it: kernel developers.
> > It just wasn't geared for them. Which goes back to my earlier
> > arguments ...
>
> What do you want to prove with this rant ? Simply the fact that your
> view of tracing is not matching the view of others. Nothing else.

What Karim is sharing with us here (yet again) is the real in-field
experience of real users (ie: not kernel developers).

I mean, on one hand we have people explaining what they think a tracing
facility should and shouldn't do, and on the other hand we have a guy who
has been maintaining and shipping exactly that thing to (paying!) customers
for many years.

Me thinks our time would be best spent trying to benefit from his
experience..

Me, I'm not particularly averse to some 50-100 static tracepoints if
experience tells us that we need such things. And both Karim's and Frank's
experience does indicate that such things are needed, which carries weight.

What I _am_ concerned about with this patchset is all the infrastructural
goop which backs up those tracepoints. I'd have thought that a better
approach would be to make those explicit tracepoints be "helpers" for the
existing kprobe code.

Of course, it they are properly designed, the one set of tracepoints could
be used by different tracing backends - that allows us to separate the
concepts of "tracepoints" and "tracing backends".

2006-09-15 18:18:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

>>Also, care to explain how kprobes can be used to access same data
>>without having to actually customize a probe point for every binary?
>
>
> Thats why we have things like systemtap.
>
> All we appear to lack is systemtap ability to parse debug data so it can
> be told "trace on line 9 of sched.c and record rq and next"

But that's the whole point - if it's not integrated into a marker as
source code, it requires manual intervention for every bloody release
to do. "line 9 of sched.c" is a farcically stupid way of doing tags
on a dynamically moving project like the linux kernel.

Yes, that may work OK for something that is very static, like a distro
snapshot, but as a general mechanism, it's unsustainable and broken.

M.

2006-09-15 18:20:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Alan Cox <[email protected]> wrote:

> Ar Gwe, 2006-09-15 am 13:08 -0400, ysgrifennodd Frank Ch. Eigler:
> > Alan Cox <[email protected]> writes:
> > - where 1000-cycle int3-dispatching overheads too high
>
> Why are your despatching overheads 1000 cycles ? (and if its due to
> int3 why are you using int 3 8))

this is being worked on actively: there's the "djprobes" patchset, which
includes a simplified disassembler to analyze common target code and can
thus insert much faster, call-a-trampoline-function based tracepoints
that are just as fast as (or faster than) compile-time, static
tracepoints.

there's no fundamental reason why INT3 should be the primary model of
inserting kprobes. Sometimes we are unlucky and the code which we target
is too complex - then we take a few hundred cycles of a penalty. If that
piece of code is a really common destination then we can add a static
marker in the source which both prepares parameters and inserts a
sufficiently sized NOP (or a function call) to prepare things for fast
dynamic tracing - but it should only be an optional performance helper
that we have the freedom to zap.

(kprobes can be thought of as a special "JIT", and there's no
fundamental reason why it couldnt do almost arbitrary transformations on
kernel code.)

and there's alot more that kprobes/systemtap can do: it can be a method
of extending the kernel along a 'plugin' model - without having to
impact the kernel source! That way people can experiment with kernel
extensions on live kernels, without the barrier of recompile/reboot.

Ingo

2006-09-15 18:25:04

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi -

On Fri, Sep 15, 2006 at 07:31:48PM +0100, Alan Cox wrote:

> Ar Gwe, 2006-09-15 am 13:08 -0400, ysgrifennodd Frank Ch. Eigler:
Yeah, or something. :-)

> > Alan Cox <[email protected]> writes:
> > - where 1000-cycle int3-dispatching overheads too high
>
> Why are your despatching overheads 1000 cycles ? (and if its due to int3
> why are you using int 3 8))

Smart teams from IBM and Hitachi have been hammering away at this code
for a year or two now, and yet (roughly) here we are. There have been
experiments involving plopping branches instead of int3's at probe
locations, but this is self-modifying code involving multiple
instructions, and appears to be tricky on SMP/preempt boxes.

- FChE

Attachments:

(No filename) (714.00 B)
(No filename) (189.00 B)
Download all attachments

2006-09-15 18:27:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Andrew Morton <[email protected]> wrote:

> What Karim is sharing with us here (yet again) is the real in-field
> experience of real users (ie: not kernel developers).

well, Jes has that experience and Thomas too.

> I mean, on one hand we have people explaining what they think a
> tracing facility should and shouldn't do, and on the other hand we
> have a guy who has been maintaining and shipping exactly that thing to
> (paying!) customers for many years.

so does Thomas and Jes. So what's the point?

i judge LTT by its current code quality, not by its proponents shouting
volume - and that quality is still quite poor at the moment. (and then
there are the conceptual problems too, outlined numerous times) I have
quoted specific example(s) for that in this thread. Furthermore, LTT
does this:

246 files changed, 26207 insertions(+), 71 deletions(-)

and this gives me the shivers, for all the reasons i outlined.

Ingo

2006-09-15 18:31:42

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Frank Ch. Eigler <[email protected]> wrote:

> > Why are your despatching overheads 1000 cycles ? (and if its due to
> > int3 why are you using int 3 8))
>
> Smart teams from IBM and Hitachi have been hammering away at this code
> for a year or two now, and yet (roughly) here we are. There have been
> experiments involving plopping branches instead of int3's at probe
> locations, but this is self-modifying code involving multiple
> instructions, and appears to be tricky on SMP/preempt boxes.

i am talking to them about that, and i'm 100% sure the solution is much
easier than the many (much harder) problems that SystemTap has already
solved. I think you are way too modest to realize how powerful (and
important) SystemTap is :-)

Ingo

2006-09-15 19:11:27

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> > Ar Gwe, 2006-09-15 am 13:08 -0400, ysgrifennodd Frank Ch. Eigler:
> > > Alan Cox <[email protected]> writes:
> > > - where 1000-cycle int3-dispatching overheads too high
> >
> > Why are your despatching overheads 1000 cycles ? (and if its due to
> > int3 why are you using int 3 8))
>
> this is being worked on actively: there's the "djprobes" patchset, which
> includes a simplified disassembler to analyze common target code and can
> thus insert much faster, call-a-trampoline-function based tracepoints
> that are just as fast as (or faster than) compile-time, static
> tracepoints.

Who is going to implement this for every arch?
Is this now the official party line that only archs, which implement all
of this, can make use of efficient tracing?

bye, Roman

2006-09-15 19:16:59

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> well, Jes has that experience and Thomas too.
...
> so does Thomas and Jes. So what's the point?

Either I'm too stupid for you to bother replying to any of my emails
(which is very possible) or, shall we say politely, you're not
exactly humble. I've responded to half a dozen of your emails, yet
you have not deemed it worthwhile to talk to me directly.

First you came out screaming that static tracepoints are heresy, and
then when there was non-ltt-specific interest being voiced for code
markup, you viciously set out to fud ltt as best you can using your
experience at implementing kernel tracers as ammunition. So answer
this simple question, how many tracers did you actually write which
were geared for non-kernel-developer users? Based on your own
account from yesterday, the answer I conclude is: NONE. I'd say
you've got pretty strong opinions about something you've never
attempted to do. Of course you claim that all tracers are the same,
how could they be different? But that's where experience talks and
hubris walks.

> i judge LTT by its current code quality, not by its proponents shouting
> volume - and that quality is still quite poor at the moment.

You're either skillfully trying to steer arguments in your direction
or you're simply unaware of the basic rules of debating. You started
by saying that static instrumentation of any kind is evil, yet this
is demonstrably false, if nothing else by the outpour of experience
from those who have had to maintain non-inlined instrumentation. Then
you proceed to try to amalgamate this attack with a vicious attack on
ltt. I'll say it one more time: the ltt code gets posted to lkml
*for review*. If you're that concerned about the code, then go ahead
look at it and tell the maintainers what you'd like to see fixed.

Instead, you run out and come back and conclude "The best that Frank
and me came up ..." and then you present your own nomenclature for
static instrumentation. I mean, if nothing, else, have a little
decency for those who have put effort in trying to make this stuff work.

I mean, at least explain to me why you insist on using such a tone
against a project that is now within its 7th year of existence (a
pretty long lifetime if you ask me for something that has been
labeled useless all over this thread.) Do you actually realize the
lkml's past reluctance to admitting a standard tracing mechanism
into the kernel has actually contributed in doing great harm to those
who had put substantial personal and financial investment in getting
something to work. I'll spare you the political debates, but look at
past involvement of major corporate users in ltt and ask yourself why
they've decided to put their efforts elsewhere. We were basically told:
we cannot justify investing any further funds in a project which does
not seem to gain any sort of acceptance by the kernel developers.
I've never complained about this before because I don't like whining.
Do, however, realize that the fact that there are 4 separate teams
working on this in parallel (ltt, lkst, systemtap, lket, off the top
of my head) is directly due to the lack of success ltt has had in
being admitted into the kernel. Do, at least, realize that this is
huge miscarriage of the lkml process.

And finally, do realize that in 2000 I personally contacted the head
of the DProbes project IBM in order to foster common development,
following which ltt was effectively modified in order to allow
dynamic instrumentation of the kernel ...

cheesh ...

Karim

2006-09-15 19:19:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> On Fri, 15 Sep 2006, Ingo Molnar wrote:
>
> > this is being worked on actively: there's the "djprobes" patchset,
> > which includes a simplified disassembler to analyze common target
> > code and can thus insert much faster, call-a-trampoline-function
> > based tracepoints that are just as fast as (or faster than)
> > compile-time, static tracepoints.
>
> Who is going to implement this for every arch?

someone who is interested enough in that arch growing that capability?

> Is this now the official party line that only archs, which implement
> all of this, can make use of efficient tracing?

that's certainly my preference - kprobes have lots of other advantages
besides tracing. Whether that becomes the "official party line" depends
on the technological analysis of the situation which will ultimately
shape the outcome of this discussion.

Ingo

2006-09-15 19:20:24

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> > Although IMO this is a bit lame - it is quite possible to go into
> > SexySystemTapGUI, click on a particular kernel file-n-line and have
> > systemtap userspace keep track of that place in the kernel source across
> > many kernel versions: all it needs to do is to remember the file+line and a
> > snippet of the surrounding text, for readjustment purposes.
>
> Sure, if you're a kernel developer, but as I've explained numberous
> times in this thread, there are far more many users of tracing than
> kernel developers.
>

This is so true (and the main reason we implemented a trace utility in
SystemTap).

Several of the people that work with in my team are _not_ kernel
developers. They do not necessarily know the Linux kernel code enough
to insert their own instrumentation. On the other had, they do posses
other very good knowledges about things specific to a particular
software stack or a HW subsystem. Structured predefined probe points
(dynamic or static) allow people with limited kernel hacking skills to
feedback useful information back to developers of the kernel.

I agree with Karim that a trace tool (while useful to developers) is
mostly targeted at a non kernel developer audience. They are mostly
meant to enhance the communication between developers and regular
users. Any solution that is intended to be dynamic replacement for
LTTng needs to take these kinds of users into account.

-JRS

2006-09-15 19:35:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 11:16 -0700, Andrew Morton wrote:
> Me thinks our time would be best spent trying to benefit from his
> experience..

I was involved in tracer development for quite a while and I have used
them in $paying customer projects too.

> Me, I'm not particularly averse to some 50-100 static tracepoints if
> experience tells us that we need such things. And both Karim's and Frank's
> experience does indicate that such things are needed, which carries weight.

>From my experience the tracepoints usually are not at the place where
you need them to track down a particular problem or analyse a particular
usage scenario in detail. This has been true from a kernel and from an
application programmer POV. Also many of the LTT customer I'm aware of
used their own homebrewed set of trace points.

What I always hated on static tracers is the requirement to recompile /
reboot the kernel in order to gather information. Kprobes / systemtap is
really a conveniant way to avoid this.

I completely agree that the maintenance of the "out of code" trace
scripts is a task which needs a lot of effort, but it does not offload
the maintenance effort to those modifying the code and we have not yet
another pseudo instruction/function set which is interfering with the
goal to have clear and understandable code. Hell, the code in those code
paths which are of common interest for instrumentation is already
complex enough. We really can do without adding some more obfuscated
macro constructs.

When we can maintain a basic set of tracescripts in the kernel tree and
once the necessary infrastructure is in place, I'm quite sure that quite
a lot of kernel developers would keep those fundamental trace scripts in
shape out of their own interest. It might take a while to get this going
but once it is established, distros will ship the scripts along with
dynamic tracing enabled in the kernels.

I see a major advantage over static tracing in that:

Static tracing is usually not enabled in production kernels, but the
dynamic tracing infrastructure can be enabled without costs. So you
can actually request traces (at least for the standard set of
tracepoints) from Joe User to track down complex problems.

One thing which is much more important IMHO is the availablity of
_USEFUL_ postprocessing tools to give users a real value of
instrumentation. This is a much more complex task than this whole kernel
instrumentation business. This also includes the ability to coordinate
user space _and_ kernel space instrumentation, which is necessary to
analyse complex kernel / application code interactions.

tglx

2006-09-15 19:43:57

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> > What Karim is sharing with us here (yet again) is the real in-field
> > experience of real users (ie: not kernel developers).
>
> well, Jes has that experience and Thomas too.
>
> > I mean, on one hand we have people explaining what they think a
> > tracing facility should and shouldn't do, and on the other hand we
> > have a guy who has been maintaining and shipping exactly that thing to
> > (paying!) customers for many years.
>
> so does Thomas and Jes. So what's the point?

That only Karim's experience is being in question here?

> i judge LTT by its current code quality, not by its proponents shouting
> volume - and that quality is still quite poor at the moment. (and then
> there are the conceptual problems too, outlined numerous times) I have
> quoted specific example(s) for that in this thread. Furthermore, LTT
> does this:
>
> 246 files changed, 26207 insertions(+), 71 deletions(-)
>
> and this gives me the shivers, for all the reasons i outlined.

Well, I'm first to admit that LTT needs improvement, but that has never
been the point.

We need to get to some kind of agreement what level of tracing Linux
should support in general, preferably something that is easy to
integrate and usable by everyone. Especially the latter means that there
is not one true solution, so we need to figure out what kind of common
infrastructure can be implemented, from which all of them can benefit.

At this point you've been rather uncompromising contrary to every single
argument from either side.

bye, Roman

2006-09-15 19:46:55

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Thomas Gleixner wrote:
> One thing which is much more important IMHO is the availablity of
> _USEFUL_ postprocessing tools to give users a real value of
> instrumentation. This is a much more complex task than this whole kernel
> instrumentation business. This also includes the ability to coordinate
> user space _and_ kernel space instrumentation, which is necessary to
> analyse complex kernel / application code interactions.

And of course the usefulness of such postprocessing tools is gated
by the ability of users to use them on _any_ kernel they get their
hands on. Up to this point, this has not been for *any* of the
existing toolsets, simply because they require the user to either
recompile his kernel or modify his probe points to match his kernel.
Until users can actually do without either of these steps (which is
only possible with static markup) then the development teams of
the various projects will continue having to invest resources
chasing the kernel.

We don't need separate popstprocessing tool teams. The only reasons
there are separate project teams is because managers in key
positions made the decision that they'd rather break from existing
projects which had had little success mainlining and instead use
their corporate bodyweight to pressure/seduce kernel developers
working for them into pushing their new great which-aboslutely-
has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
with you kernel developers that this is crap, this is why we're
developing this new amazing thing). That's the truth plain and
simple.

When I started involving myself in Linux development a decade ago,
I honestly did not think I'd ever see this kind of stuff happen,
but, hey, that's life.

Karim

2006-09-15 19:49:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Thomas Gleixner <[email protected]> wrote:

> I see a major advantage over static tracing in that:
>
> Static tracing is usually not enabled in production kernels, but the
> dynamic tracing infrastructure can be enabled without costs. So you
> can actually request traces (at least for the standard set of
> tracepoints) from Joe User to track down complex problems.

FYI, kprobes/SystemTap is already enabled in RHEL4.

Ingo

2006-09-15 19:54:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jose R. Santos ([email protected]) wrote:
> Alan Cox wrote:
>
> With several other trace tools being implemented for the kernel, there
> is a great problem with consistencies among these tool. It is my
> opinion that trace are of very little use to _most_ people with out the
> availability of post-processing tools to analyses these trace. While I
> wont say that we need one all powerful solution, it would be good if all
> solutions would at least be able to talk to the same post-processing
> facilities in user-space. Before LTTng is even considered into the
> kernel, there need to be discussion to determine if the trace mechanism
> being propose is suitable for all people interested in doing trace
> analysis. The fact the there also exist tool like LKET and LKST seem to
> suggest that there other things to be considered when it comes to
> implementing a trace mechanism that everyone would be happy with.
>
> It would also be useful for all the trace tool to implement the same
> probe points so that post-processing tools can be interchanged between
> the various trace implementations.
>
>

Hi Jose,

I completely agree that there is a crying need for standardisation there. The
reason why I propose the LTTng infrastructure as a tracing core in the Linux
kernel is this : the fundamental problem I have found with kernel tracers so
far is that they perturb the system too much or do not offer enough fine
grained protection against reentrancy. Ingo's post about tracing statement
breaking the kernel all the time seems to me like a sufficient proof that this
is a real problem.

My goal with LTTng is to provide a reentrant data serialisation mechanism that
can be called from anywhere in the kernel (ok, the vmalloc path of the page
fault handler is _the_ exception) that does not use any lock and can therefore
trace code paths like NMI handlers.

I also implemented code that would serialize any type of data structure I could
think of. If it is too much, well, we can use part of it.

LTTng trace format is explained there. Your comments on it are very welcome.

http://ltt.polymtl.ca/ > LTTV and LTTng developer documentation > format.html
(http://ltt.polymtl.ca/svn/ltt/branches/poly/doc/developer/format.html)

Regards,

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 20:00:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Andrew Morton ([email protected]) wrote:
> Of course, it they are properly designed, the one set of tracepoints could
> be used by different tracing backends - that allows us to separate the
> concepts of "tracepoints" and "tracing backends".

If I try to develop your idea a little further, we could this of dividing the
tracing problem into four layers :

- tracepoints (where the code is instrumented)
- identifying code
- accessing data surrounding the code
- tracing backend (how to add the tracepoints)
- tracing infrastructure (what code will serialize the information)
- data extraction (getting the data out to disk, network, ...)

I think that, if we agree on this segmentation of the problem, this thread is
generally debating on the tracing backends and their respective limitations.
I just want to point out that the patch I have submitted adresses mainly the
"tracing infrastructure" and "data extraction" topics.

Regards,

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 20:01:06

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 15 Sep 2006 14:16:18 -0400
Karim Yaghmour <[email protected]> wrote:

> > Although IMO this is a bit lame - it is quite possible to go into
> > SexySystemTapGUI, click on a particular kernel file-n-line and have
> > systemtap userspace keep track of that place in the kernel source across
> > many kernel versions: all it needs to do is to remember the file+line and a
> > snippet of the surrounding text, for readjustment purposes.
>
> Sure, if you're a kernel developer, but as I've explained numberous
> times in this thread, there are far more many users of tracing than
> kernel developers.

Disagree. I was describing a means by which a set of systemtap trace
points could be described. A means which would allow those tracepoints to
be maintained without human intervention as the kernel source changes.
(ie: use a similar algorithm and representation as patch(1)).

Presumably those tracepoints would have been provided by a kernel developer
and delivered to non-developers, just like static tracepoints.

> > (*) I don't buy the performance arguments: kprobes are quick, and I'd
> > expect that the CPU consumption of the destination of the probe is
> > comparable to or higher than the cost of taking the initial trap.
>
> Please see Mathieu's earlier posting of numbers comparing kprobes to
> static points. Nevertheless, I do not believe that the use of kprobes
> should be pitted against static instrumentation, the two are
> orthogonal.

People have been speeding up kprobes in recent kernels, to avoid the int3
overhead. I don't recall seeing how effective that has been.

2006-09-15 20:04:29

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 21:10 +0200, Roman Zippel wrote:
> >
> > this is being worked on actively: there's the "djprobes" patchset, which
> > includes a simplified disassembler to analyze common target code and can
> > thus insert much faster, call-a-trampoline-function based tracepoints
> > that are just as fast as (or faster than) compile-time, static
> > tracepoints.
>
> Who is going to implement this for every arch?
> Is this now the official party line that only archs, which implement all
> of this, can make use of efficient tracing?

In the reverse you are enforcing an ugly - but available for all archs -
solution due to the fact that there is nobody interested enough to
implement it ?

If there is no interest to do that, then this arch can probably live w/o
instrumentation for the next decade too.

tglx

2006-09-15 20:14:29

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Andrew Morton wrote:
> People have been speeding up kprobes in recent kernels, to avoid the int3
> overhead. I don't recall seeing how effective that has been.

I don't want to microdebate this one, but here's the quote from Frank
on the topic of djprobe:
> Smart teams from IBM and Hitachi have been hammering away at this code
> for a year or two now, and yet (roughly) here we are. There have been
> experiments involving plopping branches instead of int3's at probe
> locations, but this is self-modifying code involving multiple
> instructions, and appears to be tricky on SMP/preempt boxes.

The idea behind this mechanism is neat. But every step along the way
there seem to be ever more complex corner cases where it can't be
used.

Should this mechanism ever be made to work, the need for static
markup would still be felt however.

Karim

2006-09-15 20:14:27

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 15 Sep 2006 20:19:07 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Andrew Morton <[email protected]> wrote:
>
> > What Karim is sharing with us here (yet again) is the real in-field
> > experience of real users (ie: not kernel developers).
>
> well, Jes has that experience and Thomas too.

systemtap and ltt are the only full-scale tracing tools which target
sysadmins and applciation developers of which I am aware..

> > I mean, on one hand we have people explaining what they think a
> > tracing facility should and shouldn't do, and on the other hand we
> > have a guy who has been maintaining and shipping exactly that thing to
> > (paying!) customers for many years.
>
> so does Thomas and Jes. So what's the point?

My point is that I respect Karim and Frank's experience. I in fact
disagree with them (or at least, I want to). But they've been there, and I
haven't. So I listen.

> i judge LTT by its current code quality, not by its proponents shouting
> volume - and that quality is still quite poor at the moment. (and then
> there are the conceptual problems too, outlined numerous times) I have
> quoted specific example(s) for that in this thread. Furthermore, LTT
> does this:
>
> 246 files changed, 26207 insertions(+), 71 deletions(-)
>
> and this gives me the shivers, for all the reasons i outlined.
>

In the bit of text which you snipped I was agreeing with this...

Look, if Karim and Frank (who I assume is a systemtap developer) think that
we need static tracepoints then I have no reason to disagree with them.
What I would propose is that:

a) Those tracepoints be integrated one at a time on well-understood
grounds of necessity. Tracepoints _should_ be added dynamically. But
if there are instances where that's not working and cannot be made to
work then OK, in we go.

b) Saying "we need the static tracepoints because the line numbers keep
on changing" is not, repeat not a justification for static tracepoints.
It's a SMOP to develop tracepoint-adding code which can handle line
numbers changing. lwall did it.

c) Any static tracepoints should be seen as corner-case augmentation of
existing dynamic tracing framework(s). IOW: I see no justification at
this time for adding complete new second set of backend
accumulation/reporting/management infrastructure (ie: LTT core).

Shorter version: I agree with Frank.

2006-09-15 20:14:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Fri, 15 Sep 2006, Ingo Molnar wrote:
>
> > > What Karim is sharing with us here (yet again) is the real in-field
> > > experience of real users (ie: not kernel developers).
> >
> > well, Jes has that experience and Thomas too.
> >
> > > I mean, on one hand we have people explaining what they think a
> > > tracing facility should and shouldn't do, and on the other hand we
> > > have a guy who has been maintaining and shipping exactly that thing to
> > > (paying!) customers for many years.
> >
> > so does Thomas and Jes. So what's the point?
>
> That only Karim's experience is being in question here?

i think you misunderstood, please read the paragraphs above. They
suggest that there's "real in-field experience of real users" against
"people explaining what they think a tracing facility should and
shouldn't do". I only pointed out that those people (Thomas, Jes) dont
just randomly express their opinion but have actual in-field experience
too (of paying customers), about the very topic at hand.

> > i judge LTT by its current code quality, not by its proponents shouting
> > volume - and that quality is still quite poor at the moment. (and then
> > there are the conceptual problems too, outlined numerous times) I have
> > quoted specific example(s) for that in this thread. Furthermore, LTT
> > does this:
> >
> > 246 files changed, 26207 insertions(+), 71 deletions(-)
> >
> > and this gives me the shivers, for all the reasons i outlined.
>
> Well, I'm first to admit that LTT needs improvement, but that has
> never been the point.

that might not be your point, but that very much is my point. I do claim
that LTT's problems arise out of its fundamental mistake on the kernel
side: that it is a static tracer that tries to be too many things to too
many people. SystemTap is available here and today on an unmodified
upstream kernel. LTT has been in this shape for the past ~8 years. But
if you wish you can certainly prove me wrong via for example cleaning up
and shrinking LTT down to a size and impact that is not scary anymore,
with the same functionality, and the clear future path for the removal
of its dependencies. I tried to argue that in the abstract, but please
by all means feel free to prove me wrong. (or argue against my specific
points)

> We need to get to some kind of agreement what level of tracing Linux
> should support in general, preferably something that is easy to
> integrate and usable by everyone. Especially the latter means that
> there is not one true solution, [...]

sorry, but i disagree. There _is_ a solution that is superior in every
aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)

> At this point you've been rather uncompromising [...]

yes, i'm rather uncompromising when i sense attempts to push inferior
concepts into the core kernel _when_ a better concept exists here and
today. Especially if the concept being pushed adds more than 350
tracepoints that expose something to user-space that amounts to a
complex external API, which tracepoints we have little chance of ever
getting rid of under a static tracing concept.

i'm also looking at it this way too: you already seem to be quite
reluctant to add kprobes to your architecture today. How reluctant would
you be tomorrow if you had static tracepoints, which would remove a fair
chunk of incentive to implement kprobes?

Ingo

2006-09-15 20:16:05

by Alan

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ar Gwe, 2006-09-15 am 11:16 -0700, ysgrifennodd Andrew Morton:
> What Karim is sharing with us here (yet again) is the real in-field
> experience of real users (ie: not kernel developers).

A lot of us have plenty of experience helping customers and end users
trace bugs. Thats a good part of why we get paid in the first place.

> What I _am_ concerned about with this patchset is all the infrastructural
> goop which backs up those tracepoints. I'd have thought that a better
> approach would be to make those explicit tracepoints be "helpers" for the
> existing kprobe code.

If you put explicit tracepoints in they will be compiled out for end
users. If you have a script which hits the standard tracepoint set it'll
be usable by end users.

> Of course, it they are properly designed, the one set of tracepoints could
> be used by different tracing backends - that allows us to separate the
> concepts of "tracepoints" and "tracing backends".

There are more than two layers. The first question is "how do I trace
event XYZ" which seems to be the big debate. The second is "how do I
find XYZ" which seems to have some commonality. The third is "what do I
do when the event is hit", which kprobes provides to all the existing
consumers such as systemtap and can field into arrays for graph plotting
and the like.

Ignoring the question of static compiled in trace points kprobes appears
to have solved the problem space. Everyone else can use the kprobes
interfaces to do pretty much anything computationally viable.

I am sceptical about static tracepoints in critical spots because if
they make the variable easy to access they will reduce optimisations and
that will cost a lot more than 5 or 6 clocks.

In addition ideally we want a mechanism that is also sufficient that
printk can be mangled into so that you can pull all the printk text
strings _out_ of the kernel and into the debug traces for embedded work.

[ie you want printk("Oh dear %s exploded.\n", foo->bar); to end up with
"Oh dear %s exploded.\n" out of kernel and in kernel

tracepoint_printk(foo->bar);

maybe with minimal type info (although that can be pulled at debug time
from the string spat into the debug data).]

Alan

2006-09-15 20:22:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 15:56 -0400, Karim Yaghmour wrote:
> Thomas Gleixner wrote:
> > One thing which is much more important IMHO is the availablity of
> > _USEFUL_ postprocessing tools to give users a real value of
> > instrumentation. This is a much more complex task than this whole kernel
> > instrumentation business. This also includes the ability to coordinate
> > user space _and_ kernel space instrumentation, which is necessary to
> > analyse complex kernel / application code interactions.
>
> And of course the usefulness of such postprocessing tools is gated
> by the ability of users to use them on _any_ kernel they get their
> hands on. Up to this point, this has not been for *any* of the
> existing toolsets, simply because they require the user to either
> recompile his kernel or modify his probe points to match his kernel.

So this has to be changed. And requiring to recompile the kernel is the
wrong answer. Having some nifty tool, which allows you to define the set
of dynamic trace points or use a predefined one is the way to go.

> Until users can actually do without either of these steps (which is
> only possible with static markup)

Generalization like that are simply wrong. Static markup is not a
panacea. It might help for some things in the first place, but it is not
flexible enough in the long run. It is an engineering challenge to make
the "static" trace rules autogenerated by some means as Andrew pointed
out several times in this thread (see patch(1)), so we can provide a
useful ad hoc set for the users.

> We don't need separate popstprocessing tool teams. The only reasons
> there are separate project teams is because managers in key
> positions made the decision that they'd rather break from existing
> projects which had had little success mainlining and instead use
> their corporate bodyweight to pressure/seduce kernel developers
> working for them into pushing their new great which-aboslutely-
> has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
> with you kernel developers that this is crap, this is why we're
> developing this new amazing thing). That's the truth plain and
> simple.

Stop whining! LTT did not manage to solve the problem in a generic,
mainline acceptable way. If you really believe that Kprobes / Systemtap
is just a $corporate maliciousness to kick you out of business, then I
really start to doubt your sanity.

This has nothing to do with postprocessing and tracepoint creation
tools. The postprocessing stuff is not in the scope of mainlining. Once
a halfways future proof interface is available, tools will come up
within no time. There are a lot of companies out there who have the
interest and the capabilites to do an intergration into Eclipse to name
one example. They will not start to spend a second of work time until
there is a consolidated instrumentation core in the kernel.

> When I started involving myself in Linux development a decade ago,
> I honestly did not think I'd ever see this kind of stuff happen,
> but, hey, that's life.

- ENOPARSE

tglx

2006-09-15 20:25:11

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 16:24 -0400, Karim Yaghmour wrote:
> Should this mechanism ever be made to work, the need for static
> markup would still be felt however.

This might apply to some exotic points, but for 98% of the
instrumentation scenarios static markup is not necessary.

tglx

2006-09-15 20:26:43

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

* Alan Cox ([email protected]) wrote:
> In addition ideally we want a mechanism that is also sufficient that
> printk can be mangled into so that you can pull all the printk text
> strings _out_ of the kernel and into the debug traces for embedded work.
>
> [ie you want printk("Oh dear %s exploded.\n", foo->bar); to end up with
> "Oh dear %s exploded.\n" out of kernel and in kernel
>
> tracepoint_printk(foo->bar);
>

Good idea, trivial to implement on top of LTTng. When seeing printk's reentrancy
limitations, I have though about doing it a couple of times.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 20:27:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> * Andrew Morton ([email protected]) wrote:
> > Of course, it they are properly designed, the one set of tracepoints could
> > be used by different tracing backends - that allows us to separate the
> > concepts of "tracepoints" and "tracing backends".
>
> If I try to develop your idea a little further, we could this of dividing the
> tracing problem into four layers :
>
> - tracepoints (where the code is instrumented)
> - identifying code
> - accessing data surrounding the code
> - tracing backend (how to add the tracepoints)
> - tracing infrastructure (what code will serialize the information)
> - data extraction (getting the data out to disk, network, ...)
>

I think you missing user-space post processing which should be also
considered part of the problem since the capabilities of post-processing
will be limited by the "tracepoints" available. Tracepoints and
post-processing are also the problems which need to be address first
between the other established tracing projects before going forward with
in-kernel solutions.

> I think that, if we agree on this segmentation of the problem, this thread is
> generally debating on the tracing backends and their respective limitations.
> I just want to point out that the patch I have submitted adresses mainly the
> "tracing infrastructure" and "data extraction" topics.
>

This seem like a good idea to dissect the problem since it seem like
other important issues relevant to general tracing are being ignore
simply because of a dislike of the way LTTng has chosen to implement trace.

-JRS

2006-09-15 20:28:08

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Please Ingo, stop repeating false argument without taking in account people's
corrections :

* Ingo Molnar ([email protected]) wrote:
> sorry, but i disagree. There _is_ a solution that is superior in every
> aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
>

I am sorry to have to repeat myself, but this is not true for heavy loads.

> > At this point you've been rather uncompromising [...]
>
> yes, i'm rather uncompromising when i sense attempts to push inferior
> concepts into the core kernel _when_ a better concept exists here and
> today. Especially if the concept being pushed adds more than 350
> tracepoints that expose something to user-space that amounts to a
> complex external API, which tracepoints we have little chance of ever
> getting rid of under a static tracing concept.
>
>From an earlier email from Tim bird :

"I still think that this is off-topic for the patch posted. I think we
should debate the implementation of tracepoints/markers when someone posts a
patch for some. I think it's rather scurrilous to complain about
code NOT submitted. Ingo has even mis-characterized the not-submitted
instrumentation patch, by saying it has 350 tracepoints when it has no
such thing. I counted 58 for one architecture (with only 8 being
arch-specific)."

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 20:36:21

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Thomas Gleixner wrote:

> > Who is going to implement this for every arch?
> > Is this now the official party line that only archs, which implement all
> > of this, can make use of efficient tracing?
>
> In the reverse you are enforcing an ugly - but available for all archs -
> solution due to the fact that there is nobody interested enough to
> implement it ?

Where is the proof that such solution is inherently ugly? (Note that
just picking some example from LTT doesn't make a general proof.)
I am also not the one who wants to enforce a single solution onto
everyone.

bye, Roman

2006-09-15 20:40:55

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Thomas Gleixner wrote:

> So this has to be changed. And requiring to recompile the kernel is the
> wrong answer. Having some nifty tool, which allows you to define the set
> of dynamic trace points or use a predefined one is the way to go.

Nobody is taking dynamic tracing away!
You make it sound that tracing is only possible via dynamic traces.
If I want to use static tracepoints, why shouldn't I?

> Stop whining!

So we're back to personal attacks now. :-(

bye, Roman

2006-09-15 20:41:50

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Alan Cox wrote:
> A lot of us have plenty of experience helping customers and end users
> trace bugs. Thats a good part of why we get paid in the first place.

But of course, and I wouldn't dare compare my experience with yours.

FWIW, though, I submit to you that there is a difference in between
helping a customer trace something and actually attempting to create
a tool which standard users can use to trace their own stuff.

Then, again, my experience may just be lacking.

Here's an example just for the fun of it: I was giving a class at
a customer's site. It so happened they scheduled this class right
after product delivery (advice: this is a mistake.) And, predictably,
in came the technician asking for Joe, out went Joe, in came Joe,
repeat. They spent quite some time after hours trying to figure
this one out. Midweek, they asked if I could help, they were
having some odd behavior in user-space on a custom-developed board.
Try as I may, none of the standard user-space stuff was effective.
Ok, time to try ltt. Now this was a "vendor" kernel, with
preemption (ok, I'm not telling who, but this was definitely
before Ingo's work) -- the sort of which I hadn't dabbled in
before. I spent the evening trying to figure out how the heck the
thing worked to no avail -- the locking mechanisms were just
wrong for what ltt needed at the time. Last day I asked him if
they could get a *normal* kernel on there and someone somewhere
found an odd-port stable enough to run. So got an ltt patch,
customized it for said kernel (would have had to do something
similar if it were probe points instead of static traces), got a
trace, and within 5 minutes we had found a bug in their custom
hardware (and no, their drivers were just fine). This customer
would not have even needed me or needed to waste their time if he
had been able to get a trace for his bastardized kernel. But
the way the anti-static-instrumentation creed goes this
customer would still have needed me ... or someone else ...
<conspiracy> wait a minute, maybe that's not a coincidence ...
</conspiracy> ;)

Karim

2006-09-15 20:54:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> * Jose R. Santos ([email protected]) wrote:
> > Alan Cox wrote:
> >
> > With several other trace tools being implemented for the kernel, there
> > is a great problem with consistencies among these tool. It is my
> > opinion that trace are of very little use to _most_ people with out the
> > availability of post-processing tools to analyses these trace. While I
> > wont say that we need one all powerful solution, it would be good if all
> > solutions would at least be able to talk to the same post-processing
> > facilities in user-space. Before LTTng is even considered into the
> > kernel, there need to be discussion to determine if the trace mechanism
> > being propose is suitable for all people interested in doing trace
> > analysis. The fact the there also exist tool like LKET and LKST seem to
> > suggest that there other things to be considered when it comes to
> > implementing a trace mechanism that everyone would be happy with.
> >
> > It would also be useful for all the trace tool to implement the same
> > probe points so that post-processing tools can be interchanged between
> > the various trace implementations.
> >
> >
>
> Hi Jose,
>
> I completely agree that there is a crying need for standardisation there. The
> reason why I propose the LTTng infrastructure as a tracing core in the Linux
> kernel is this : the fundamental problem I have found with kernel tracers so
> far is that they perturb the system too much or do not offer enough fine
> grained protection against reentrancy. Ingo's post about tracing statement
> breaking the kernel all the time seems to me like a sufficient proof that this
> is a real problem.
>
>
I agree with your goal for ltt.

> My goal with LTTng is to provide a reentrant data serialisation mechanism that
> can be called from anywhere in the kernel (ok, the vmalloc path of the page
> fault handler is _the_ exception) that does not use any lock and can therefore
> trace code paths like NMI handlers.
>

One of the things that I've notice from this thread that neither you or
Karim sees to have answer is why is LTTng needed if a suitable
replacement can be developed using SystemTap with static markers. I am
personally interested in this answer as well. If all the things that
LTT is proposing can be implemented in SystemTap, what then is the
advantage of accenting such an interface into the kernel.

I don't really care which method is used as long as its the right tool
for the job. I see several idea from LTT that could be integrated into
SystemTap in order to make it a one stop solution for both dynamic and
static tracing. Would you care to elaborate why you think having
separate projects is a better solution?
> I also implemented code that would serialize any type of data structure I could
> think of. If it is too much, well, we can use part of it.
>
> LTTng trace format is explained there. Your comments on it are very welcome.
>
> http://ltt.polymtl.ca/ > LTTV and LTTng developer documentation > format.html
> (http://ltt.polymtl.ca/svn/ltt/branches/poly/doc/developer/format.html)
>

Trace event headers are very similar between both LTT and LKET which is
good in other to get some synergy between our projects. One thing that
LKET has on each trace event that LTT doesn't is the tid and CPU id of
each event. We find this extremely useful for post-processing. Also,
why have the event_size on every event taken? Why not describe the
event during the trace header and remove this redundant information from
the event header and save some trace file space.

-JRS

2006-09-15 20:55:52

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Thomas Gleixner wrote:
> Stop whining!

I resent that. If your efforts in working on popular kernel topics
met rapid reward then I'm happy for you. The fact that others tackle
unpopular topics and persist despite constant personal attacks should
nevertheless be recognized for what it is.

> LTT did not manage to solve the problem in a generic,

You're entirely correct. I never claimed it to be perfect, that's why I
had approached others early on to try to bridge things together and
that's why I used to post ltt patches to the lkml.

> mainline acceptable way. If you really believe that Kprobes / Systemtap
> is just a $corporate maliciousness to kick you out of business, then I
> really start to doubt your sanity.

If that's how it was read, then it wasn't written right. ltt was never
really a profit center for me, embedded Linux training was -- you
wouldn't believe how much more profitable training is than pure
consulting. But my own business is just beside the point. My point
was that the high barrier to entry for tracing fragmented efforts
around it. As for corporate decisions which culminated from such
resistance, they probably were the sanest decision to take at the
time. Heck if I was a manager at any of those companies I would have
likely taken the same decision. It was, and still is, though,
counterproductive. Fully justifiable, but counterproductive.

Karim

2006-09-15 20:56:29

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> On Fri, 15 Sep 2006, Thomas Gleixner wrote:
>
> > So this has to be changed. And requiring to recompile the kernel is the
> > wrong answer. Having some nifty tool, which allows you to define the set
> > of dynamic trace points or use a predefined one is the way to go.
>
> Nobody is taking dynamic tracing away!
> You make it sound that tracing is only possible via dynamic traces.
> If I want to use static tracepoints, why shouldn't I?

because:

- static tracepoints, once added, are very hard to remove - up until
eternity. (On the other hand, markers for dynamic tracers are easily
removed, either via making the dynamic tracer smarter, or by
detaching the marker via the patch(1) method. In any case, if a
marker goes away then hell does not break loose in dynamic tracing
land - but it does in static tracing land.

- the markers needed for dynamic tracing are different from the LTT
static tracepoints.

- a marker for dynamic tracing has lower performance impact than a
static tracepoint, on systems that are not being traced. (but which
have the tracing infrastructure enabled otherwise)

- having static tracepoints dillutes the incentive for architectures to
implement proper kprobes support.

> > > there are separate project teams is because managers in key
> > > positions made the decision that they'd rather break from existing
> > > projects which had had little success mainlining and instead use
> > > their corporate bodyweight to pressure/seduce kernel developers
> > > working for them into pushing their new great which-aboslutely-
> > > has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
> > > with you kernel developers that this is crap, this is why we're
> > > developing this new amazing thing). That's the truth plain and
> > > simple.
> >
> > Stop whining!
>
> So we're back to personal attacks now. :-(

hm, so you dont consider the above paragraph a whine. How would you
characterize it then? A measured, balanced, on-topic technical comment?
I'm truly curious.

Ingo

2006-09-15 21:08:35

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> Please Ingo, stop repeating false argument without taking in account people's
> corrections :
>
> * Ingo Molnar ([email protected]) wrote:
> > sorry, but i disagree. There _is_ a solution that is superior in every
> > aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
> >
>
> I am sorry to have to repeat myself, but this is not true for heavy loads.
>

This thread has already discuss the merits of static instrumentation
when it comes to the performance impacts. The key is now to find a
balance between static vs dynamic probes. While it is true that static
probes will provide less overhead compared to dynamic probes, some probe
point will see less of an impact in measurable performance impact of
dynamic probes due to the nature of the probe. We need to find what
that balance is.

To some people performance is the #1 priority and to other it is
flexibility. I would like to come up with a list of those probe point
that absolutely need to be inserted into the code statically. Those
that are not absolutely critical to have statically should be
implemented dynamically.

-JRS

2006-09-15 21:07:22

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> hm, so you dont consider the above paragraph a whine. How would you
> characterize it then? A measured, balanced, on-topic technical comment?
> I'm truly curious.

Take it for what you want. It's yours to disparage. Consider, though,
that I'm factually explaining the real-life result of resistance to
static instrumentation. It's not entirely detached, I'll admit, but
consider that it remained on-topic and entirely respectful of all parties
involved. I've enjoyed very positive relationships with all those
individuals and continue to hold them with high regard. They took the
decisions they thought were best at the time, and I can only respect
them for having acted as responsibly as they found relevant for their
respective organizations. I don't agree with it, but that's life. It
was just important to me to point out to the casual reader the source
of a lot of the fud than can be found on ltt -- i.e. lots of it is
marketing. For sure ltt initially got a lot of things wrong, but the
progress of kernel tracing overall would have been much better had
the naysayers actually chose to understand the problem instead of
stonewalling the efforts being invested.

Karim

2006-09-15 21:12:40

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> i'm also looking at it this way too: you already seem to be quite
> reluctant to add kprobes to your architecture today. How reluctant would
> you be tomorrow if you had static tracepoints, which would remove a fair
> chunk of incentive to implement kprobes?

If I see that whole teams spend years to implement efficient dynamic
tracing, do you really think that your "incentive" makes any difference?

byem Roman

2006-09-15 21:17:04

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, 2006-09-15 at 17:05 -0400, Karim Yaghmour wrote:
> Thomas Gleixner wrote:
> > Stop whining!
>
> I resent that.

See last sentence of this mail.

> If your efforts in working on popular kernel topics
> met rapid reward then I'm happy for you. The fact that others tackle
> unpopular topics and persist despite constant personal attacks should
> nevertheless be recognized for what it is.

Oh well. I'm working on unpopular and intrusive stuff as long as you do.
Just our ways to work and communicate differ slightly.

> > mainline acceptable way. If you really believe that Kprobes / Systemtap
> > is just a $corporate maliciousness to kick you out of business, then I
> > really start to doubt your sanity.
>
> If that's how it was read, then it wasn't written right

Ouch. Can you please tell me what's the technical merit of this
paragraph:

" ... The only reasons
there are separate project teams is because managers in key
positions made the decision that they'd rather break from existing
projects which had had little success mainlining and instead use
their corporate bodyweight to pressure/seduce kernel developers
working for them into pushing their new great which-aboslutely-
has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
with you kernel developers that this is crap, this is why we're
developing this new amazing thing). That's the truth plain and
simple."

Sorry, I have not found a way to interpret it usefully.

tglx

2006-09-15 21:17:31

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Fri, 15 Sep 2006, Ingo Molnar wrote:
>
> > i'm also looking at it this way too: you already seem to be quite
> > reluctant to add kprobes to your architecture today. How reluctant
> > would you be tomorrow if you had static tracepoints, which would
> > remove a fair chunk of incentive to implement kprobes?
>
> If I see that whole teams spend years to implement efficient dynamic
> tracing, do you really think that your "incentive" makes any
> difference?

oh, being the first mover is the hardest part. Finding the right
solution is a hard, it is blind Brownian motion in untested waters. Once
good solutions have been found and once they have been integrated
upstream, an architecture 'only' has to follow straight through the
example. (which is _still_ far from trivial, but it certainly doesnt
take years.)

Ingo

2006-09-15 21:21:40

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Thomas Gleixner wrote:
> Oh well. I'm working on unpopular and intrusive stuff as long as you do.

Well, I won't debate that shall I :)

> Just our ways to work and communicate differ slightly.

Maybe so. Any wisdom would be greatly appreciated.

> Sorry, I have not found a way to interpret it usefully.

See my response to Ingo on this topic.

Karim

2006-09-15 21:24:31

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Karim Yaghmour <[email protected]> wrote:

> [...] Consider, though, that I'm factually explaining the real-life
> result of resistance to static instrumentation. [...]

with all due respect, do you realize the possibility that this
resistance might be a genuine technical opinion on my part that is
driven by the quality of the code being offered and by the conceptual
problems static tracing introduces in the future, as i see them? And
thus, maybe, what you wrote:

" and instead use their corporate bodyweight to pressure/seduce kernel
developers working for them into pushing their new great [...] "

could possibly be total, utter nonsense?

Ingo

2006-09-15 21:25:46

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jose R. Santos ([email protected]) wrote:
> To some people performance is the #1 priority and to other it is
> flexibility. I would like to come up with a list of those probe point
> that absolutely need to be inserted into the code statically. Those
> that are not absolutely critical to have statically should be
> implemented dynamically.
>

I agree with you that only very specific parts of the kernel have this kind of
high throughput. Using kprobes for lower thoughput tracepoints if perfectly
acceptable from my point of view, as it does not perturb the system too much.

I would suggest (as a beginning) those "standard" hi event rate tracepoints :

(taken from the highest rates in
http://sourceware.org/ml/systemtap/2005-q4/msg00451.html)

- syscall entry/exit
- irq entry/exit
- softirq entry/exit
- tasklet entry/exit
- trap entry/exit
- scheduler change
- wakeup
- network traffic (packet in/out)
- "select" and "poll" system calls
- page_alloc/page_free

(be warned : this list is probably incomplete, too exhaustive or can cause
dizziness under stress condition) :)

However, a tracing infrastructure should still provide the ability for
developers to instrument their own high traffic interrupt handler with a very
low overhead.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 21:28:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> > Nobody is taking dynamic tracing away!
> > You make it sound that tracing is only possible via dynamic traces.
> > If I want to use static tracepoints, why shouldn't I?
>
> because:
>
> - static tracepoints, once added, are very hard to remove - up until
> eternity. (On the other hand, markers for dynamic tracers are easily
> removed, either via making the dynamic tracer smarter, or by
> detaching the marker via the patch(1) method. In any case, if a
> marker goes away then hell does not break loose in dynamic tracing
> land - but it does in static tracing land.

This is simply not true, at the source level you can remove a static
tracepoint as easily as a dynamic tracepoint, the effect of the missing
trace information is the same either way.

> - the markers needed for dynamic tracing are different from the LTT
> static tracepoints.

What makes the requirements so different? I would actually think it
depends on the user independent of the tracing is done.

> - a marker for dynamic tracing has lower performance impact than a
> static tracepoint, on systems that are not being traced. (but which
> have the tracing infrastructure enabled otherwise)

Anyone using static tracing intents to use, which makes this point moot.

> - having static tracepoints dillutes the incentive for architectures to
> implement proper kprobes support.

Considering the level of work needed to support efficient dynamic tracing
it only withholds archs from tracing support for no good reason.

> > > > there are separate project teams is because managers in key
> > > > positions made the decision that they'd rather break from existing
> > > > projects which had had little success mainlining and instead use
> > > > their corporate bodyweight to pressure/seduce kernel developers
> > > > working for them into pushing their new great which-aboslutely-
> > > > has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
> > > > with you kernel developers that this is crap, this is why we're
> > > > developing this new amazing thing). That's the truth plain and
> > > > simple.
> > >
> > > Stop whining!
> >
> > So we're back to personal attacks now. :-(
>
> hm, so you dont consider the above paragraph a whine. How would you
> characterize it then? A measured, balanced, on-topic technical comment?
> I'm truly curious.

It's sarcastic, but considering the disrespect towards Karim, I don't
blame him. At some point the "whining" argument was funny, but lately it's
only used to descredit people.

bye, Roman

2006-09-15 21:32:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jose R. Santos wrote:
> I don't really care which method is used as long as its the right tool
> for the job. I see several idea from LTT that could be integrated into
> SystemTap in order to make it a one stop solution for both dynamic and
> static tracing. Would you care to elaborate why you think having
> separate projects is a better solution?

We don't -- at least *I* wouldn't care, but I'm not the current
maintainer. ltt's usefulness has always been in the digested information
it can present to the user. The kernel patching part was a necessary
evil. What I object to is the depiction of dynamic tracing as solving
the need for static markup. I doesn't, and, therefore, does not
currently constitute an adequate substitute for ltt's patches. If
someone else can actually provide ltt with the events and surround
detail (timestamping and all) it needs while still providing the same
performance we currently get out of the current ltt patches, then I'd
say more power to them -- the current developers may how more relevant
things to say.

Karim

2006-09-15 21:41:17

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> * Ingo Molnar ([email protected]) wrote:
> > sorry, but i disagree. There _is_ a solution that is superior in every
> > aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
> >
>
> I am sorry to have to repeat myself, but this is not true for heavy
> loads.

djprobes?

> > > At this point you've been rather uncompromising [...]
> >
> > yes, i'm rather uncompromising when i sense attempts to push inferior
> > concepts into the core kernel _when_ a better concept exists here and
> > today. Especially if the concept being pushed adds more than 350
> > tracepoints that expose something to user-space that amounts to a
> > complex external API, which tracepoints we have little chance of ever
> > getting rid of under a static tracing concept.
> >
> From an earlier email from Tim bird :
>
> "I still think that this is off-topic for the patch posted. I think
> we should debate the implementation of tracepoints/markers when
> someone posts a patch for some. I think it's rather scurrilous to
> complain about code NOT submitted. Ingo has even mis-characterized
> the not-submitted instrumentation patch, by saying it has 350
> tracepoints when it has no such thing. I counted 58 for one
> architecture (with only 8 being arch-specific)."

i missed that (way too many mails in this thread).

Here is how i counted them:

$ grep "\<trace_.*(" * | wc -l
359

some of those are not true tracepoints, but there's at least this many
of them:

$ grep "\<trace_.*(" *instrumentation* | wc -l
235

so the real number is somewhere between.

patch-2.6.17-lttng-0.5.108-instrumentation-arm.diff
patch-2.6.17-lttng-0.5.108-instrumentation.diff
patch-2.6.17-lttng-0.5.108-instrumentation-i386.diff
patch-2.6.17-lttng-0.5.108-instrumentation-mips.diff
patch-2.6.17-lttng-0.5.108-instrumentation-powerpc.diff
patch-2.6.17-lttng-0.5.108-instrumentation-ppc.diff
patch-2.6.17-lttng-0.5.108-instrumentation-s390.diff
patch-2.6.17-lttng-0.5.108-instrumentation-sh.diff
patch-2.6.17-lttng-0.5.108-instrumentation-x86_64.diff

when judging kernel maintainance overhead, the sum of all patches
matters. And i considered all the other patches too (the ones that add
actual tracepoints) that will come after the currently offered ones, not
just the ones you submitted to lkml.

Ingo

2006-09-15 21:42:01

by Tim Bird

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Thomas Gleixner wrote:
> On Fri, 2006-09-15 at 21:10 +0200, Roman Zippel wrote:
>
>>>this is being worked on actively: there's the "djprobes" patchset, which
>>>includes a simplified disassembler to analyze common target code and can
>>>thus insert much faster, call-a-trampoline-function based tracepoints
>>>that are just as fast as (or faster than) compile-time, static
>>>tracepoints.
>>
>>Who is going to implement this for every arch?
>>Is this now the official party line that only archs, which implement all
>>of this, can make use of efficient tracing?
>
> In the reverse you are enforcing an ugly - but available for all archs -
> solution due to the fact that there is nobody interested enough to
> implement it ?

????

If there's a solution people are willing to implement, and one
they aren't - doesn't that say something? Static tracepoint
patches for numerous architectures have existed and been maintained
out-of-tree for years.

> If there is no interest to do that, then this arch can probably live w/o
> instrumentation for the next decade too.

The arches already have instrumentation - just not dynamic
instrumentation. The reason static tracepoints have been
implemented and kprobes haven't is that static tracepoints
are sufficient for what those people are doing, and dynamic
tracepoints are a pain to implement.

Let me repeat that, just in case people missed it:
"Static tracepoints work for what I need." If other people
want to implement something fancier that works for them,
then feel free.

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Electronics
=============================

2006-09-15 21:45:55

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> with all due respect, do you realize the possibility that this
> resistance might be a genuine technical opinion on my part that is
> driven by the quality of the code being offered and by the conceptual
> problems static tracing introduces in the future, as i see them?

Wait. What I said could not possibly apply to comments you, or anybody
else for that matter, made within this thread. What I said refers to
events and threads which have long since passed. The "resistance" I
allude to is that faced by ltt early on and for as long as several
parties were actively involved in trying to standardize on it. I'm
merely trying to explain the current status of this: several teams
in "apparent" competition one another.

> " and instead use their corporate bodyweight to pressure/seduce kernel
> developers working for them into pushing their new great [...] "
>
> could possibly be total, utter nonsense?

Please read this in the above context -- passed events. In as far as
my understanding of events as I was part of them, this was the
best I made of the decision-making thought process at a managerial
level. And I do not wish to substantiate that nor was this meant as
a personal attack against any person or organization. Everyone acted
to the best of their knowledge of the facts at the time and I cannot
fault them for that. I disagreed and was disappointed, obviously,
but that's mine to bear.

Put simply: all parties involved would actually wish things were
different.

Karim

2006-09-15 21:49:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Andrew Morton wrote:
> On Fri, 15 Sep 2006 20:19:07 +0200
> Ingo Molnar <[email protected]> wrote:
>
> >
> > * Andrew Morton <[email protected]> wrote:
> >
> > > What Karim is sharing with us here (yet again) is the real in-field
> > > experience of real users (ie: not kernel developers).
> >
> > well, Jes has that experience and Thomas too.
>
> systemtap and ltt are the only full-scale tracing tools which target
> sysadmins and applciation developers of which I am aware..
>

IMO, I think SystemTap is to generic of a tool to be considered a
tracing tool. LKET and LKST are more comparable with the functionality
that LTT provides. LKET is implemented using SystemTap while LKST has
both a SystemTap and static kernel patch implementation.

> In the bit of text which you snipped I was agreeing with this...
>
> Look, if Karim and Frank (who I assume is a systemtap developer) think that
> we need static tracepoints then I have no reason to disagree with them.
> What I would propose is that:
>
> a) Those tracepoints be integrated one at a time on well-understood
> grounds of necessity. Tracepoints _should_ be added dynamically. But
> if there are instances where that's not working and cannot be made to
> work then OK, in we go.
>
Agree. What would be the criteria that justifies having static probe vs
a dynamic one?

-JRS

2006-09-15 21:51:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi Jose,

* Jose R. Santos ([email protected]) wrote:
> >My goal with LTTng is to provide a reentrant data serialisation mechanism
> >that
> >can be called from anywhere in the kernel (ok, the vmalloc path of the page
> >fault handler is _the_ exception) that does not use any lock and can
> >therefore
> >trace code paths like NMI handlers.
> >
>
> One of the things that I've notice from this thread that neither you or
> Karim sees to have answer is why is LTTng needed if a suitable
> replacement can be developed using SystemTap with static markers. I am
> personally interested in this answer as well. If all the things that
> LTT is proposing can be implemented in SystemTap, what then is the
> advantage of accenting such an interface into the kernel.
>

Well, last time I have checked, SystemTAP did not have a reentrant serialisation
mechanism to write the information to the buffers. Also, the goals of the
projects differ : SystemTAP finds acceptable to suffer from the kprobe
performance hit while it is unacceptable for LTTng.

> I don't really care which method is used as long as its the right tool
> for the job. I see several idea from LTT that could be integrated into
> SystemTap in order to make it a one stop solution for both dynamic and
> static tracing. Would you care to elaborate why you think having
> separate projects is a better solution?

I think that each projet focus on their own different goals but that there is
much to gain in reusing the strenghts of each.

SystemTAP is good at dynamic instrumentation.
LTTng is good at data serialisation under a fully reentrant kernel.
LTTng provides logging primitives for any data type, including SystemTAP text
output.

Is someone willing to try to create a small facility that will dump SystemTAP's
output in LTTng ? It is nearly trivial : if I wasn't completing my debugfs port,
I would probably be doing it right now.

> Trace event headers are very similar between both LTT and LKET which is
> good in other to get some synergy between our projects. One thing that
> LKET has on each trace event that LTT doesn't is the tid and CPU id of
> each event. We find this extremely useful for post-processing. Also,
> why have the event_size on every event taken? Why not describe the
> event during the trace header and remove this redundant information from
> the event header and save some trace file space.
>

A standard event header has to have only crucial information, nothing more, or
it becomes bloated and quickly grow trace size. We decided not to put tid and
CPU id in the event header because tid is already available with the schedchange
events at post-processing time and CPU id is already available too, as we have
per CPU buffers.

The event size is completely unnecessary, but in reality very, very useful to
authenticate the correspondance between the size of the data recorded by the
kernel and the size of data the viewer thinks it is reading. Think of it as a
consistency check between kernel and viewer algorithms.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 21:59:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > because:
> >
> > - static tracepoints, once added, are very hard to remove - up until
> > eternity. (On the other hand, markers for dynamic tracers are easily
> > removed, either via making the dynamic tracer smarter, or by
> > detaching the marker via the patch(1) method. In any case, if a
> > marker goes away then hell does not break loose in dynamic tracing
> > land - but it does in static tracing land.
>
> This is simply not true, at the source level you can remove a static
> tracepoint as easily as a dynamic tracepoint, the effect of the
> missing trace information is the same either way.

this is not true. I gave you one example already a few mails ago (which
you did not reply to, neither did you reply the previous time when i
first mentioned this - perhaps you missed it in the high volume of
emails):

" i outlined one such specific "removal of static tracepoint" example
already: static trace points at the head/prologue of functions (half
of the existing tracepoints are such). The sock_sendmsg() example i
quoted before is such a case. Those trace points can be replaced with
a simple GCC function attribute, which would cause a 5-byte (or
whatever necessary) NOP to be inserted at the function prologue. The
attribute would be alot less invasive than an explicit tracepoint (and
thus easier to maintain) "

> > - the markers needed for dynamic tracing are different from the LTT
> > static tracepoints.
>
> What makes the requirements so different? I would actually think it
> depends on the user independent of the tracing is done.

yes, and i mentioned before that they can be merged (i even outlined a
few APIs for it), but still that is not being offered by LTT today.

> > - a marker for dynamic tracing has lower performance impact than a
> > static tracepoint, on systems that are not being traced. (but which
> > have the tracing infrastructure enabled otherwise)
>
> Anyone using static tracing intents to use, which makes this point
> moot.

that's not at all true, on multiple grounds:

Firstly, many people use distro kernels. A Linux distribution typically
wants to offer as few kernel rpms as possible (one per arch to be
precise), but it also wants to offer as many features as possible. So if
there was a static tracer in there, a distro would enable it - but 99.9%
of the users would never use it - still they would see the overhead.
Hence the user would have it enabled, but does not intend to use it -
which contradicts your statement.

Secondly, even people who intend to _eventually_ make use of tracing,
dont use it most of the time. So why should they have more overhead when
they are not tracing? Again: the point is not moot because even though
the user intends to use tracing, but does not always want to trace.

> > - having static tracepoints dillutes the incentive for architectures to
> > implement proper kprobes support.
>
> Considering the level of work needed to support efficient dynamic
> tracing it only withholds archs from tracing support for no good
> reason.

5 major architectures (both RISC and CISC) already support kprobes, so
fortunately this point is largely moot - but you are right to a certain
degree, it's not totally solved. But the examples are there. It's still
not trivial to implement a feature like this, but kernel programming
never is. I far more prefer the harder but more intelligent solution
than the easier but less intelligent solution - even if that means a
temporary unavailability of a feature for some rarer arch.

> > > > > there are separate project teams is because managers in key
> > > > > positions made the decision that they'd rather break from existing
> > > > > projects which had had little success mainlining and instead use
> > > > > their corporate bodyweight to pressure/seduce kernel developers
> > > > > working for them into pushing their new great which-aboslutely-
> > > > > has-nothing-to-do-with-this-ltt-crap-(no,no, we actually agree
> > > > > with you kernel developers that this is crap, this is why we're
> > > > > developing this new amazing thing). That's the truth plain and
> > > > > simple.
> > > >
> > > > Stop whining!
> > >
> > > So we're back to personal attacks now. :-(
> >
> > hm, so you dont consider the above paragraph a whine. How would you
> > characterize it then? A measured, balanced, on-topic technical
> > comment? I'm truly curious.
>
> It's sarcastic, [...]

oh, really? Karim's characterization was:

" I'm factually explaining the real-life result of resistance to static
instrumentation. "

so whose interpretation of Karim's comments should i accept, yours or
Karim's? I'm really torn on that issue. (_that_ was sarcastic)

Ingo

2006-09-15 22:04:06

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > * Ingo Molnar ([email protected]) wrote:
> > > sorry, but i disagree. There _is_ a solution that is superior in every
> > > aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
> > >
> >
> > I am sorry to have to repeat myself, but this is not true for heavy
> > loads.
>
> djprobes?
>

I am fully aware of djprobes limitations towards fully preemptible kernel (and
around branches instructions ? I don't remember if they solved this one). Oh,
yes, and if a trap happen to come at the wrong spot, then the thread gets
scheduled out... well, it cannot be applied everywhere, eh ?

> > > > At this point you've been rather uncompromising [...]
> > >
> > > yes, i'm rather uncompromising when i sense attempts to push inferior
> > > concepts into the core kernel _when_ a better concept exists here and
> > > today. Especially if the concept being pushed adds more than 350
> > > tracepoints that expose something to user-space that amounts to a
> > > complex external API, which tracepoints we have little chance of ever
> > > getting rid of under a static tracing concept.
> > >
> > From an earlier email from Tim bird :
> >
> > "I still think that this is off-topic for the patch posted. I think
> > we should debate the implementation of tracepoints/markers when
> > someone posts a patch for some. I think it's rather scurrilous to
> > complain about code NOT submitted. Ingo has even mis-characterized
> > the not-submitted instrumentation patch, by saying it has 350
> > tracepoints when it has no such thing. I counted 58 for one
> > architecture (with only 8 being arch-specific)."
>
> i missed that (way too many mails in this thread).
>
> Here is how i counted them:
>
> $ grep "\<trace_.*(" * | wc -l
> 359
>

This count includes the inline trace functions definitions.

> some of those are not true tracepoints, but there's at least this many
> of them:
>
> $ grep "\<trace_.*(" *instrumentation* | wc -l
> 235
>

1 - This counts per architecture trace points. It quickly adds up considering
that we support ARM, MIPS, i386, powerpc, ppc and x86_64.
2 - It also counts some experimental trace points that I do not want to submit.
3 - Most of these are instrumentation of the traps handlers, which is
conceptually only one event.

> when judging kernel maintainance overhead, the sum of all patches
> matters. And i considered all the other patches too (the ones that add
> actual tracepoints) that will come after the currently offered ones, not
> just the ones you submitted to lkml.
>

I plan to rework the instrumentation patches before submitting them to LKML,
don't worry. I just hasn't been my focus until now. Too bad that you take those
as arguments.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-15 22:05:21

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> oh, really? Karim's characterization was:
>
> " I'm factually explaining the real-life result of resistance to static
> instrumentation. "
>
> so whose interpretation of Karim's comments should i accept, yours or
> Karim's? I'm really torn on that issue. (_that_ was sarcastic)

Hmm ... this might explain why we're having a hard time here ... me
thinks: Ingo don't see that dynamic tracing is orthogonal to static
markup and Ingo don't see that my explanation is orthogonal to
Roman's (i.e. I did factually explain stuff and did resort to
sarcasm as part of said explanation) ... maybe Ingo does not like
orthogonal stuff ...

That _too_ was sarcastic.

Karim

2006-09-15 22:10:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> * Jose R. Santos ([email protected]) wrote:
> > To some people performance is the #1 priority and to other it is
> > flexibility. I would like to come up with a list of those probe point
> > that absolutely need to be inserted into the code statically. Those
> > that are not absolutely critical to have statically should be
> > implemented dynamically.
> >
>
> I agree with you that only very specific parts of the kernel have this kind of
> high throughput. Using kprobes for lower thoughput tracepoints if perfectly
> acceptable from my point of view, as it does not perturb the system too much.
>
> I would suggest (as a beginning) those "standard" hi event rate tracepoints :
>
> (taken from the highest rates in
> http://sourceware.org/ml/systemtap/2005-q4/msg00451.html)
>
> - syscall entry/exit
> - irq entry/exit
> - softirq entry/exit
> - tasklet entry/exit
> - trap entry/exit
> - scheduler change
> - wakeup
> - network traffic (packet in/out)
> - "select" and "poll" system calls
> - page_alloc/page_free
>
> (be warned : this list is probably incomplete, too exhaustive or can cause
> dizziness under stress condition) :)
>
> However, a tracing infrastructure should still provide the ability for
> developers to instrument their own high traffic interrupt handler with a very
> low overhead.
>
This is base on a single scenario, which is wrong. A criteria needs to
be establish that describes the justification for a static trace hook.
Base on the previous comments on the thread, this list is already seems
to big.

If a user of the trace tool absolutely need to have the best
performance, then the propose tool should be smart enough to use static
hooks if available but revert back to dynamic probes if there is no
available static counter part. This performance static tracepoint patch
can be maintained outside of the kernel tree without bloating the
kernel. This way he can have mostly dynamic trace point but at least
provide some sort of mechanism for those that absolutely must have
static hooks in order to get useful data out of the trace tool.

-JRS

2006-09-15 22:12:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jose R. Santos <[email protected]> wrote:

> [...] While it is true that static probes will provide less overhead
> compared to dynamic probes, [...]

that is not true at all. Yes, an INT3 based kprobe might be expensive if
+0.5 usecs per tracepoint (on a 1GHz CPU) is an issue to you - but that
is "only" an implementation detail, not a conceptual property.
Especially considering that help (djprobes) is on the way. And in the
future, as more and more code gets generated (and regenerated) on the
fly, dynamic probes will be _faster_ than static probes - plainly
because they adapt better to the environment they plug into.

so there's basically nothing to balance. My point is that dynamic probes
have won or will win on every front, and we shouldnt tie us down with
static tracers. 5 years ago with no kprobes, had someone submitted a
clean static tracer patchset, we could probably not have resisted it (i
though probably would have resisted it on the grounds of maintainance
overhead) and would have added it because tracing makes sense in
general. But today there's just no reason to add static tracers anymore.

NOTE: i still accept the temporary (or non-temporary) introduction of
static markers, to help dynamic tracing. But my expectation is that
these markers will be less intrusive than static tracepoints, and a lot
more flexible.

Ingo

2006-09-15 22:22:28

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> that is not true at all. Yes, an INT3 based kprobe might be expensive if
> +0.5 usecs per tracepoint (on a 1GHz CPU) is an issue to you - but that
> is "only" an implementation detail, not a conceptual property.
> Especially considering that help (djprobes) is on the way. And in the

djprobes has been "on the way" for some time now. Why don't you at
least have the intellectual honesty to use the same rules you've
repeatedly used against ltt elsewhere in this thread -- i.e. what
it does today is what it is, and what it does today isn't worth
bragging about. But that would be too much to ask of you Ingo,
wouldn't it?

But, sarcasm aside, even if this mechanism existed it still wouldn't
resolve the need for static markup. It would just make djprobe a
likelier candidate for tools that cannot currently rely on kprobes.

> NOTE: i still accept the temporary (or non-temporary) introduction of
> static markers, to help dynamic tracing. But my expectation is that
> these markers will be less intrusive than static tracepoints, and a lot
> more flexible.

Chalk one up for nice endorsement and another for arbitrary distinction.

Karim

2006-09-15 22:27:36

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> > > > sorry, but i disagree. There _is_ a solution that is superior in
> > > > every aspect: kprobes + SystemTap. (or any other equivalent
> > > > dynamic tracer)
> > > >
> > >
> > > I am sorry to have to repeat myself, but this is not true for
> > > heavy loads.
> >
> > djprobes?
> >
>
> I am fully aware of djprobes limitations towards fully preemptible
> kernel [...]

i dont see any fundamental limitation with a preemptible kernel.
(preemptability was never a showstopper for any kernel feature in the
past, and i dont expect it to be a showstopper for anything in the
future either.)

> [...] (and around branches instructions ? I don't remember if they
> solved this one). Oh, yes, and if a trap happen to come at the wrong
> spot, then the thread gets scheduled out... well, it cannot be applied
> everywhere, eh ?

i expect the number of places where dynamic tracers have problems to
gradually shrink. It has shrunk significantly already. Hence i'm
supportive of static markers (as i stated it numerous times), as long as
it's there to ease dynamic probing - _and as long as these static
markers shrink in number as the capabilities of dynamic tracers
improve_. With static tracers i just dont see that possibility: a static
tracer needs all its static tracepoints forever or otherwise it just
wont work.

> > $ grep "\<trace_.*(" * | wc -l
> > 359
> >
>
> This count includes the inline trace functions definitions.

yes, as i stated:

> > some of those are not true tracepoints, but there's at least this many
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > of them:
> >
> > $ grep "\<trace_.*(" *instrumentation* | wc -l
> > 235
> >
>
> 1 - This counts per architecture trace points. It quickly adds up
> considering that we support ARM, MIPS, i386, powerpc, ppc and x86_64.

yes. That's my point: overhead of static tracepoints "quickly adds up".
The cost goes up linearly, as you grow into more subsystems and into
more architectures.

btw., an observation: that's 6 LTT architectures in 7 years, while
kprobes are now on 5 architectures in 2 years.

> 2 - It also counts some experimental trace points that I do not want
> to submit.
> 3 - Most of these are instrumentation of the traps handlers, which is
> conceptually only one event.

i counted the number of tracepoints, not the number of unique types of
events, because:

> > when judging kernel maintainance overhead, the sum of all patches
> > matters. And i considered all the other patches too (the ones that
> > add actual tracepoints) that will come after the currently offered
> > ones, not just the ones you submitted to lkml.
>
> I plan to rework the instrumentation patches before submitting them to
> LKML, don't worry. I just hasn't been my focus until now. Too bad that
> you take those as arguments.

the static tracer patches make little sense without instrumentation, so
sure i considered them. I also clearly declared that you didnt submit
them yet:

>>> Let me quote from the latest LTT patch (patch-2.6.17-lttng-0.5.108,
>>> which is the same version submitted to lkml - although no specific
^^^^^^^^^^^^^^^^^^^^
>>> tracepoints were submitted):
^^^^^^^^^^^^^^^^^^^^^^^^^^

Ingo

2006-09-15 22:35:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> btw., an observation: that's 6 LTT architectures in 7 years, while
> kprobes are now on 5 architectures in 2 years.

Actually much of ltt underwent a complete rewrite since Mathieu took
over maintainership. Let's, according to this email, Mathieu became
the maintainer in November 2005:
http://www.listserv.shafik.org/pipermail/ltt-dev/2005-November/001092.html

[ Karim takes out calculator and punches: 10/12 = 0.83 ]

So that's 7 architectures in 0.83 years, compared to 5 in 2 years.

Joke's on you pall.

Karim

2006-09-15 22:52:32

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Karim Yaghmour <[email protected]> wrote:

> Ingo Molnar wrote:
> > that is not true at all. Yes, an INT3 based kprobe might be expensive if
> > +0.5 usecs per tracepoint (on a 1GHz CPU) is an issue to you - but that
> > is "only" an implementation detail, not a conceptual property.
> > Especially considering that help (djprobes) is on the way. And in the
>
> djprobes has been "on the way" for some time now. Why don't you at
> least have the intellectual honesty to use the same rules you've
> repeatedly used against ltt elsewhere in this thread -- i.e. what it
> does today is what it is, and what it does today isn't worth bragging
> about. [...]

i actually think djprobes are pretty darn inventive. I also think that
the tracebuffer management portion of LTT is better than the hacks in
SystemTap, and that LTT's visualization tools are better (for example
they do exist :-) - so clearly there's synergy possible. But i have no
faith at all, for the many reasons outlined before, in the concept of
static tracing, because i see no possible future path out of its many
limitations and because i see no possible future way to get rid of their
dependencies. So i'd rather wait some time for dynamic tracers to
outgrow static tracers in even the last final area, than let static
tracing into the kernel - which would add dependencies that we'd have to
live with almost until eternity.

> But, sarcasm aside, even if this mechanism existed it still wouldn't
> resolve the need for static markup. It would just make djprobe a
> likelier candidate for tools that cannot currently rely on kprobes.

it would clearly reduce the number of places where static markup would
still be necessary. With static tracers i see no such mechanism that
gradually moves the markups out of the kernel.

> > NOTE: i still accept the temporary (or non-temporary) introduction
> > of static markers, to help dynamic tracing. But my expectation is
> > that these markers will be less intrusive than static tracepoints,
> > and a lot more flexible.
>
> Chalk one up for nice endorsement and another for arbitrary
> distinction.

So you dispute that markups for dynamic tracing will be more flexible
and you dispute that they will be less intrusive than markups for static
tracing?

Ingo

2006-09-15 22:53:53

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Fri, 15 Sep 2006, Ingo Molnar wrote:

> > This is simply not true, at the source level you can remove a static
> > tracepoint as easily as a dynamic tracepoint, the effect of the
> > missing trace information is the same either way.
>
> this is not true. I gave you one example already a few mails ago (which
> you did not reply to, neither did you reply the previous time when i
> first mentioned this - perhaps you missed it in the high volume of
> emails):
>
> " i outlined one such specific "removal of static tracepoint" example
> already: static trace points at the head/prologue of functions (half
> of the existing tracepoints are such). The sock_sendmsg() example i
> quoted before is such a case. Those trace points can be replaced with
> a simple GCC function attribute, which would cause a 5-byte (or
> whatever necessary) NOP to be inserted at the function prologue. The
> attribute would be alot less invasive than an explicit tracepoint (and
> thus easier to maintain) "

As I said before you're mixing up function tracing with event tracing, not
all events are tied to functions, functions can be moved and renamed, the
actual event more often stays the same.
Function attributes also doesn't provide information local to the
function.

> > > - the markers needed for dynamic tracing are different from the LTT
> > > static tracepoints.
> >
> > What makes the requirements so different? I would actually think it
> > depends on the user independent of the tracing is done.
>
> yes, and i mentioned before that they can be merged (i even outlined a
> few APIs for it), but still that is not being offered by LTT today.

It's possible I missed something, but pretty much anything you outlined
wouldn't make the live of static tracepoints any easier.

> > > - a marker for dynamic tracing has lower performance impact than a
> > > static tracepoint, on systems that are not being traced. (but which
> > > have the tracing infrastructure enabled otherwise)
> >
> > Anyone using static tracing intents to use, which makes this point
> > moot.
>
> that's not at all true, on multiple grounds:
>
> Firstly, many people use distro kernels. A Linux distribution typically
> wants to offer as few kernel rpms as possible (one per arch to be
> precise), but it also wants to offer as many features as possible. So if
> there was a static tracer in there, a distro would enable it - but 99.9%
> of the users would never use it - still they would see the overhead.
> Hence the user would have it enabled, but does not intend to use it -
> which contradicts your statement.

So if dynamic tracing is available use it, as distributions already do.
OTOH the barrier to use static tracing is drastically different whether
the user has to deal with external patches or whether it's a simple kernel
option.
Again, static tracing doesn't exclude the possibility of dynamic tracing,
that's something you constantly omit and thus make it sound like both
options were mutually exlusive.

> Secondly, even people who intend to _eventually_ make use of tracing,
> dont use it most of the time. So why should they have more overhead when
> they are not tracing? Again: the point is not moot because even though
> the user intends to use tracing, but does not always want to trace.

I've used kernels which included static tracing and the perfomance
overhead is negligible for occasional use.

> > > - having static tracepoints dillutes the incentive for architectures to
> > > implement proper kprobes support.
> >
> > Considering the level of work needed to support efficient dynamic
> > tracing it only withholds archs from tracing support for no good
> > reason.
>
> 5 major architectures (both RISC and CISC) already support kprobes, so
> fortunately this point is largely moot - but you are right to a certain
> degree, it's not totally solved. But the examples are there. It's still
> not trivial to implement a feature like this, but kernel programming
> never is. I far more prefer the harder but more intelligent solution
> than the easier but less intelligent solution - even if that means a
> temporary unavailability of a feature for some rarer arch.

Why don't you leave the choice to the users? Why do you constantly make it
an exclusive choice? There is a lot of common ground, but you seem to be
hellbent to make the life of static tracers and thus their users as hard
possible. Only for pursuit of some perfect solution while the more
practical solution is easily available without any ill effects?

bye, Roman

2006-09-15 23:00:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar <[email protected]> writes:

> [...] NOTE: i still accept the temporary (or non-temporary)
> introduction of static markers, to help dynamic tracing. But my
> expectation is that these markers will be less intrusive than static
> tracepoints, and a lot more flexible.

It seems like an agreement on this is coming together. You and Karim
may be in violent agreement, even if others haven't quite come around:

Let us design a static marker mechanism that can be coupled at run
time either to a dynamic system such as systemtap, or by a specialized
tracing system such as lttnng (!). Then "markers" === "static
instrumentation", for purposes of the kernel developer. If the
markers are lightweight enough, then a distribution kernel can afford
keeping them compiled in.

- FChE

2006-09-15 23:17:31

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> * Jose R. Santos <[email protected]> wrote:
>
> > [...] While it is true that static probes will provide less overhead
> > compared to dynamic probes, [...]
>
> that is not true at all. Yes, an INT3 based kprobe might be expensive if
> +0.5 usecs per tracepoint (on a 1GHz CPU) is an issue to you - but that
> is "only" an implementation detail, not a conceptual property.
> Especially considering that help (djprobes) is on the way. And in the
> future, as more and more code gets generated (and regenerated) on the
> fly, dynamic probes will be _faster_ than static probes - plainly
> because they adapt better to the environment they plug into.
>
Agree. And they are details that can be fixed.

One such detail we still see issue with is kretprobes though (which we
use on LKET for systemcall exit). These have problem scaling due to
spinlock issues even on small smp systems. Its an implementation issue
that can be fixed but I've been told that the fix is not trivial and
should not expect it anytime soon.
> so there's basically nothing to balance. My point is that dynamic probes
> have won or will win on every front, and we shouldnt tie us down with
> static tracers. 5 years ago with no kprobes, had someone submitted a
> clean static tracer patchset, we could probably not have resisted it (i
> though probably would have resisted it on the grounds of maintainance
> overhead) and would have added it because tracing makes sense in
> general. But today there's just no reason to add static tracers anymore.
>
> NOTE: i still accept the temporary (or non-temporary) introduction of
> static markers, to help dynamic tracing. But my expectation is that
> these markers will be less intrusive than static tracepoints, and a lot
> more flexible.
>
Agree here as well. Sorry, I was also counting static markers as
static tracepoint as well. Even with static markers, there need to be
balance of what thing need to be implemented with markers vs those that
can just be done dynamically.

-JRS

2006-09-15 23:23:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > This is simply not true, at the source level you can remove a static
> > > tracepoint as easily as a dynamic tracepoint, the effect of the
> > > missing trace information is the same either way.
> >
> > this is not true. I gave you one example already a few mails ago (which
> > you did not reply to, neither did you reply the previous time when i
> > first mentioned this - perhaps you missed it in the high volume of
> > emails):
> >
> > " i outlined one such specific "removal of static tracepoint" example
> > already: static trace points at the head/prologue of functions (half
> > of the existing tracepoints are such). The sock_sendmsg() example i
> > quoted before is such a case. Those trace points can be replaced with
> > a simple GCC function attribute, which would cause a 5-byte (or
> > whatever necessary) NOP to be inserted at the function prologue. The
> > attribute would be alot less invasive than an explicit tracepoint (and
> > thus easier to maintain) "
>
> As I said before you're mixing up function tracing with event tracing,
> not all events are tied to functions, functions can be moved and
> renamed, the actual event more often stays the same.

you are showing a clear misunderstanding of how tracing is typically
done. Both for LTT and for blktrace (and for the tracers i've done
myself), roughly half (50%) of the tracepoints are right at the top of
the function and trace the function arguments. Let me quote an example
straight from LTT:

int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct kiocb iocb;
struct sock_iocb siocb;
int ret;

trace_socket_sendmsg(sock, sock->sk->sk_family,
sock->sk->sk_type,
sock->sk->sk_protocol,
size);

this tracepoint, under a dynamic tracing concept, can be replaced with:

int __trace sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
{
struct kiocb iocb;
struct sock_iocb siocb;
int ret;

note the "__trace" attribute to the function. (see my previous mails
where i talked about __trace for more details) SystemTap can hook to
that point and can access the very same parameters that the markup does,
in a lot less invasive way.

So a 5-line markup can be replaced with a single function attribute.

roughly half of the existing tracepoints in blktrace/LTT can be replaced
that way. A 50% reduction in the number of markups is significant - but
such a reduction in markups not possible under the static tracing
concept. And that method was just off the top of my head - Andrew
provided other ideas to reduce the number of markups.

> Function attributes also doesn't provide information local to the
> function.

of course, but where does the above tracepoint i quoted use information
local to the function? A fair number of markups use global functions
because, surprise, alot of interesting activity happens along global
functions. So a healthy reduction in markups can be achieved.

> > > > - the markers needed for dynamic tracing are different from the
> > > > LTT static tracepoints.
> > >
> > > What makes the requirements so different? I would actually think
> > > it depends on the user independent of the tracing is done.
> >
> > yes, and i mentioned before that they can be merged (i even outlined
> > a few APIs for it), but still that is not being offered by LTT
> > today.
>
> It's possible I missed something, but pretty much anything you
> outlined wouldn't make the live of static tracepoints any easier.

sorry, but if you re-read the above line of argument, your sentence
appears non-sequitor. I said "the markers needed for dynamic tracing are
different from the LTT static tracepoints". You asked why they are so
different, and i replied that i already outlined what the right API
would be in my opinion to do markups, but that API is different from
what LTT is offering now. To which you are now replying: "pretty much
anything you outlined wouldn't make the life of static tracepoints any
easier." Huh?

> > > > - a marker for dynamic tracing has lower performance impact than a
> > > > static tracepoint, on systems that are not being traced. (but which
> > > > have the tracing infrastructure enabled otherwise)
> > >
> > > Anyone using static tracing intents to use, which makes this point
> > > moot.
> >
> > that's not at all true, on multiple grounds:
> >
> > Firstly, many people use distro kernels. A Linux distribution typically
> > wants to offer as few kernel rpms as possible (one per arch to be
> > precise), but it also wants to offer as many features as possible. So if
> > there was a static tracer in there, a distro would enable it - but 99.9%
> > of the users would never use it - still they would see the overhead.
> > Hence the user would have it enabled, but does not intend to use it -
> > which contradicts your statement.
>
> So if dynamic tracing is available use it, as distributions already
> do. OTOH the barrier to use static tracing is drastically different
> whether the user has to deal with external patches or whether it's a
> simple kernel option. Again, static tracing doesn't exclude the
> possibility of dynamic tracing, that's something you constantly omit
> and thus make it sound like both options were mutually exlusive.

how does this reply to my point that: "a marker for dynamic tracing has
lower performance impact than a static tracepoint, on systems that are
not being traced", which point you claimed moot?

> > Secondly, even people who intend to _eventually_ make use of
> > tracing, dont use it most of the time. So why should they have more
> > overhead when they are not tracing? Again: the point is not moot
> > because even though the user intends to use tracing, but does not
> > always want to trace.
>
> I've used kernels which included static tracing and the perfomance
> overhead is negligible for occasional use.

how does this suddenly make my point, that "a marker for dynamic tracing
has lower performance impact than a static tracepoint, on systems that
are not being traced", "moot"?

> > > > - having static tracepoints dillutes the incentive for
> > > > architectures to
> > > > implement proper kprobes support.
> > >
> > > Considering the level of work needed to support efficient dynamic
> > > tracing it only withholds archs from tracing support for no good
> > > reason.
> >
> > 5 major architectures (both RISC and CISC) already support kprobes,
> > so fortunately this point is largely moot - but you are right to a
> > certain degree, it's not totally solved. But the examples are there.
> > It's still not trivial to implement a feature like this, but kernel
> > programming never is. I far more prefer the harder but more
> > intelligent solution than the easier but less intelligent solution -
> > even if that means a temporary unavailability of a feature for some
> > rarer arch.
>
> Why don't you leave the choice to the users? Why do you constantly
> make it an exclusive choice? [...]

as i outlined it tons of times before: once we add markups for static
tracers, we cannot remove them. That is a constant kernel maintainance
drag that i feel uncomfortable about. While with dynamic tracers i see a
clear path out of any such drag. We can, in a very finegrained way, tune
the overhead of markups vs. out-of-source scripts. Static tracers dont
give us this flexibility - and hence limit our future choices.

the user of course does not care about kernel internal design and
maintainance issues. Think about the many reasons why STREAMS was
rejected - users wanted that too. And note that users dont want "static
tracers" or any design detail of LTT in particular: what they want is
the _functionality_ of LTT.

nor do i reject all of LTT: as i said before i like the tools, and i
think its collection of trace events should be turned into systemtap
markups and scripts. Furthermore, it's ringbuffer implementation looks
better. So as far as the user is concerned, LTT could (and should) live
on with full capabilities, but with this crutial difference in how it
interfaces to the kernel source code.

Ingo

2006-09-15 23:23:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> i actually think djprobes are pretty darn inventive.

So do I. While there is a language barrier, the Hitachi folks,
especially Hiramatsu-san, are very talented.

> I also think that
> the tracebuffer management portion of LTT is better than the hacks in
> SystemTap, and that LTT's visualization tools are better (for example
> they do exist :-) - so clearly there's synergy possible.

Great, because I believe all those involved would like to see this
happen. I personally am convinced that none of those involved want
to continue wasting their time in parallel.

> But i have no
> faith at all, for the many reasons outlined before, in the concept of
> static tracing, because i see no possible future path out of its many
> limitations and because i see no possible future way to get rid of their
> dependencies.

Yes, I do so believe that this is what you most sincerely think. And
I'm ok with that. We don't have to approach the problem from the
same direction. In my view we should at least settle for working on
the most basic thing we *do* agree on: having a markup mechanism for
necessary instrumentation.

> So i'd rather wait some time for dynamic tracers to
> outgrow static tracers in even the last final area, than let static
> tracing into the kernel - which would add dependencies that we'd have to
> live with almost until eternity.

I genuinely understand your concern. And I repeat that ltt's initial
design cared little of the provenance of the events. It just needed
key events to present an intelligent picture to the user. The
patches have since grown to include stuff which was essential as
development went ahead. But there's no reason things cannot be
refactored into an acceptable format to all by review on the lkml.

> it would clearly reduce the number of places where static markup would
> still be necessary. With static tracers i see no such mechanism that
> gradually moves the markups out of the kernel.

Again, I strongly believe that this issue isn't about static vs.
dynamic. The goal, and that's what's important, is to allow users
to have access to a set of tools they can use on *any* kernel
they get their hands on, without having to edit anything anywhere
or fix any script. For having spent considerable effort into this,
I don't see any other way that using static markup. Here's a
simple case: you ask someone who's got a bug report on a kernel
crashing because of his user-space realtime task, and you ask him
to dump you a trace, and that trace actually ends up misleading
because his out-of-tree instrumentation was inserted in the wrong
location.

Again, the goal is to obtain tools that users can use on *any*
kernel they get their hands on.

> So you dispute that markups for dynamic tracing will be more flexible
> and you dispute that they will be less intrusive than markups for static
> tracing?

No, I'm saying that the flexibility of the markup is not tied to the
instrumentation "grab" mechanism (direct call or binary editing.)
That's the "arbitrary" I'm talking about.

Karim

2006-09-15 23:30:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Frank Ch. Eigler wrote:
> Let us design a static marker mechanism that can be coupled at run
> time either to a dynamic system such as systemtap, or by a specialized
> tracing system such as lttnng (!). Then "markers" === "static
> instrumentation", for purposes of the kernel developer. If the
> markers are lightweight enough, then a distribution kernel can afford
> keeping them compiled in.

I'm all for it.

Karim

2006-09-15 23:49:37

by Nicholas Miell

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Sat, 2006-09-16 at 01:14 +0200, Ingo Molnar wrote:
> * Roman Zippel <[email protected]> wrote:
>
> > > > This is simply not true, at the source level you can remove a static
> > > > tracepoint as easily as a dynamic tracepoint, the effect of the
> > > > missing trace information is the same either way.
> > >
> > > this is not true. I gave you one example already a few mails ago (which
> > > you did not reply to, neither did you reply the previous time when i
> > > first mentioned this - perhaps you missed it in the high volume of
> > > emails):
> > >
> > > " i outlined one such specific "removal of static tracepoint" example
> > > already: static trace points at the head/prologue of functions (half
> > > of the existing tracepoints are such). The sock_sendmsg() example i
> > > quoted before is such a case. Those trace points can be replaced with
> > > a simple GCC function attribute, which would cause a 5-byte (or
> > > whatever necessary) NOP to be inserted at the function prologue. The
> > > attribute would be alot less invasive than an explicit tracepoint (and
> > > thus easier to maintain) "
> >
> > As I said before you're mixing up function tracing with event tracing,
> > not all events are tied to functions, functions can be moved and
> > renamed, the actual event more often stays the same.
>
> you are showing a clear misunderstanding of how tracing is typically
> done. Both for LTT and for blktrace (and for the tracers i've done
> myself), roughly half (50%) of the tracepoints are right at the top of
> the function and trace the function arguments. Let me quote an example
> straight from LTT:
>
> int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> {
> struct kiocb iocb;
> struct sock_iocb siocb;
> int ret;
>
> trace_socket_sendmsg(sock, sock->sk->sk_family,
> sock->sk->sk_type,
> sock->sk->sk_protocol,
> size);
>
> this tracepoint, under a dynamic tracing concept, can be replaced with:
>
> int __trace sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> {
> struct kiocb iocb;
> struct sock_iocb siocb;
> int ret;
>
> note the "__trace" attribute to the function. (see my previous mails
> where i talked about __trace for more details) SystemTap can hook to
> that point and can access the very same parameters that the markup does,
> in a lot less invasive way.
>
> So a 5-line markup can be replaced with a single function attribute.
>
> roughly half of the existing tracepoints in blktrace/LTT can be replaced
> that way. A 50% reduction in the number of markups is significant - but
> such a reduction in markups not possible under the static tracing
> concept. And that method was just off the top of my head - Andrew
> provided other ideas to reduce the number of markups.
>

You're going to want to be able to trace every function in the kernel,
which means they'd all need a __trace -- and in that case, a
-fpad-functions-for-tracing gcc option would make more sense then
per-function attributes.

The option could also insert NOPs before RETs, not just before the
prologue so that function returns are equally easy to trace. (It might
also inhibit tail calls, assuming being able to trace all function
returns is more important than that optimization.)

And SystemTap can already hook into sock_sendmsg() (or any other
function) and examine it's arguments -- all of this GCC extension talk
is just performance enhancement.

--
Nicholas Miell <[email protected]>

2006-09-16 00:00:58

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Karim Yaghmour <[email protected]> wrote:

> > So you dispute that markups for dynamic tracing will be more
> > flexible and you dispute that they will be less intrusive than
> > markups for static tracing?
>
> No, I'm saying that the flexibility of the markup is not tied to the
> instrumentation "grab" mechanism (direct call or binary editing.)
> That's the "arbitrary" I'm talking about.

ok, then i'd like to dispute your point. Contrary to your statement
there is a very fundamental difference between "static tracing" (static
call, which relies on compile-time insertion of trace points) and
"dynamic tracing" (which can insert trace points almost anywhere) -
_even if both use in-source markers_.

The fundamental difference is this: dynamic tracing has full access to
the full environment of the code that it taps into _at the time of
tracepoint activation_, while static tracing has to get all its context
during compilation.

To make my point easier to understand, consider the following example:
we want to tap into the middle of a global_function():

int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2();

... [lots of code] ...
}

We want to trace the function right after 'x' has been assigned, and we
want to trace an event_A, with parameters: arg1, arg2, arg3 and x. This
is a pretty common scenario. Ok so far?

here is how the markup looks like under static tracing:

int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2();
D(event_A, arg1, arg2, arg3, x);

... [lots of code] ...
}

that's what you'd expect, right? This is pretty common too, up to this
point.

now how could the markup look like for a dynamic tracepoint:

int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2();
D(event_A, x);

... [lots of code] ...
}

Note: there's no (arg1, arg2, arg3) passed to the markup! Why? Because
SystemTap has full access to the function's arguments and in this
particular case it's simply not necessary to reference them explicitly.
So the markup has less of an overhead because it does not 'touch' arg1,
arg2, arg3 if the tracepoint is not active [which is the common case we
optimize for].

Furthermore, the markup is also visually less intrusive.

But better than that, the markup could look like this as well:

int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2();

... [lots of code] ...
}

right, no markup at all, but in a script somewhere we'd have:

insert.trace(global_function: "x = func2();", after);

or maybe even in a script, annotated in patch format, so that the
context of the tapped code is captured too.

so, as a result: the dynamic markup() does the same, but has less impact
on the compiled code (less parameters touched), and is more flexible in
terms of attachment to the source code.

Can we do any of this with the static tracepoint? We cannot,
fundamentally! So if we allowed static tracers to access that tracepoint
anytime, we could never make things more intelligent there in the
future!

Ingo

2006-09-16 00:01:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Karim Yaghmour <[email protected]> wrote:

> > the tracebuffer management portion of LTT is better than the hacks
> > in SystemTap, and that LTT's visualization tools are better (for
> > example they do exist :-) - so clearly there's synergy possible.
>
> Great, because I believe all those involved would like to see this
> happen. I personally am convinced that none of those involved want to
> continue wasting their time in parallel.

a reasonable compromise for me would be what i suggested a few mails
ago:

nor do i reject all of LTT: as i said before i like the tools, and i
think its collection of trace events should be turned into systemtap
markups and scripts. Furthermore, it's ringbuffer implementation looks
better. So as far as the user is concerned, LTT could (and should) live
on with full capabilities, but with this crutial difference in how it
interfaces to the kernel source code.

i.e. could you try to just give SystemTap a chance and attempt to
integrate a portion of LTT with it ... that shares more of the
infrastructure and we'd obviously only need "one" markup variant, and
would have full markup (removal-) flexibility. I'll try to help djprobes
as much as possible. Hm?

Ingo

2006-09-16 00:05:15

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Nicholas Miell <[email protected]> wrote:

> You're going to want to be able to trace every function in the kernel,
> which means they'd all need a __trace -- and in that case, a
> -fpad-functions-for-tracing gcc option would make more sense then
> per-function attributes.

the __trace attribute would be a _specific_ replacement for a _specific_
static markup at the entry of a function. So no, we would not want to
add __trace to _every_ function in the kernel: only those which get
commonly traced. And note that SystemTap can trace the rest too, just
with slighly higher overhead.

In that sense __trace is not an enabling infrastructure, it's a
performance tuning infrastructure.

> The option could also insert NOPs before RETs, not just before the
> prologue so that function returns are equally easy to trace. (It might
> also inhibit tail calls, assuming being able to trace all function
> returns is more important than that optimization.)

yeah. __trace_entry and __trace_exit [or both] attributes. Makes sense.

> And SystemTap can already hook into sock_sendmsg() (or any other
> function) and examine it's arguments -- all of this GCC extension talk
> is just performance enhancement.

yes, yes, yes, exactly!!! Finally someone reads my mails and understands
my points. There's hope! ;)

Ingo

2006-09-16 00:31:59

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sat, 16 Sep 2006, Ingo Molnar wrote:

> > As I said before you're mixing up function tracing with event tracing,
> > not all events are tied to functions, functions can be moved and
> > renamed, the actual event more often stays the same.
>
> you are showing a clear misunderstanding of how tracing is typically
> done.

Not really, you're missing the point I'm trying to make, we want to trace
_events_ not functions. Function specific tracing would still require
kernel specific mapping to map function names to events.

> Both for LTT and for blktrace (and for the tracers i've done
> myself), roughly half (50%) of the tracepoints are right at the top of
> the function and trace the function arguments. Let me quote an example
> straight from LTT:
>
> int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> {
> struct kiocb iocb;
> struct sock_iocb siocb;
> int ret;
>
> trace_socket_sendmsg(sock, sock->sk->sk_family,
> sock->sk->sk_type,
> sock->sk->sk_protocol,
> size);
>
> this tracepoint, under a dynamic tracing concept, can be replaced with:
>
> int __trace sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
> {
> struct kiocb iocb;
> struct sock_iocb siocb;
> int ret;
>
> note the "__trace" attribute to the function. (see my previous mails
> where i talked about __trace for more details) SystemTap can hook to
> that point and can access the very same parameters that the markup does,
> in a lot less invasive way.
>
> So a 5-line markup can be replaced with a single function attribute.

A nice example where you make life more difficult for static tracers for
no reason, whereas a "trace_socket_sendmsg(sock, size);" is just as
usable. It would also add virtually no maintainance overhead as you like
to claim - how often does this function change?

> > Function attributes also doesn't provide information local to the
> > function.
>
> of course, but where does the above tracepoint i quoted use information
> local to the function? A fair number of markups use global functions
> because, surprise, alot of interesting activity happens along global
> functions. So a healthy reduction in markups can be achieved.

But not completely, which is the whole point.

> > It's possible I missed something, but pretty much anything you
> > outlined wouldn't make the live of static tracepoints any easier.
>
> sorry, but if you re-read the above line of argument, your sentence
> appears non-sequitor. I said "the markers needed for dynamic tracing are
> different from the LTT static tracepoints". You asked why they are so
> different, and i replied that i already outlined what the right API
> would be in my opinion to do markups, but that API is different from
> what LTT is offering now. To which you are now replying: "pretty much
> anything you outlined wouldn't make the life of static tracepoints any
> easier." Huh?

Yeah, huh?
I have no idea, what you're trying to tell me. As you demonstrated above
your "right API" is barely usable for static tracers.

> > So if dynamic tracing is available use it, as distributions already
> > do. OTOH the barrier to use static tracing is drastically different
> > whether the user has to deal with external patches or whether it's a
> > simple kernel option. Again, static tracing doesn't exclude the
> > possibility of dynamic tracing, that's something you constantly omit
> > and thus make it sound like both options were mutually exlusive.
>
> how does this reply to my point that: "a marker for dynamic tracing has
> lower performance impact than a static tracepoint, on systems that are
> not being traced", which point you claimed moot?

Because it's pretty much an implementation issue. The point is about
adding markers at all, it's about the choice being able to use static
tracers in the first place. Both have undeniable their advantages/
disadvantages, where you prefer to emphasize only the strong points of
dynamic tracing and constantly declare its problems as nonissues.

> > > Secondly, even people who intend to _eventually_ make use of
> > > tracing, dont use it most of the time. So why should they have more
> > > overhead when they are not tracing? Again: the point is not moot
> > > because even though the user intends to use tracing, but does not
> > > always want to trace.
> >
> > I've used kernels which included static tracing and the perfomance
> > overhead is negligible for occasional use.
>
> how does this suddenly make my point, that "a marker for dynamic tracing
> has lower performance impact than a static tracepoint, on systems that
> are not being traced", "moot"?

Why exactly is the point relevant in first place? How exactly is the added
(minor!) overhead such a fundamental problem?

> > Why don't you leave the choice to the users? Why do you constantly
> > make it an exclusive choice? [...]
>
> as i outlined it tons of times before: once we add markups for static
> tracers, we cannot remove them. That is a constant kernel maintainance
> drag that i feel uncomfortable about.

As many, many people have already said, any tracepoints have an
maintainance overhead, which is barely different between dynamic and
static tracing and only increases the further away the tracepoints are
from the source.

bye, Roman

2006-09-16 00:43:20

by Nicholas Miell

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Sat, 2006-09-16 at 01:57 +0200, Ingo Molnar wrote:
> * Nicholas Miell <[email protected]> wrote:
>
> > You're going to want to be able to trace every function in the kernel,
> > which means they'd all need a __trace -- and in that case, a
> > -fpad-functions-for-tracing gcc option would make more sense then
> > per-function attributes.
>
> the __trace attribute would be a _specific_ replacement for a _specific_
> static markup at the entry of a function. So no, we would not want to
> add __trace to _every_ function in the kernel: only those which get
> commonly traced. And note that SystemTap can trace the rest too, just
> with slighly higher overhead.
>
> In that sense __trace is not an enabling infrastructure, it's a
> performance tuning infrastructure.
>
> > The option could also insert NOPs before RETs, not just before the
> > prologue so that function returns are equally easy to trace. (It might
> > also inhibit tail calls, assuming being able to trace all function
> > returns is more important than that optimization.)
>
> yeah. __trace_entry and __trace_exit [or both] attributes. Makes sense.
>
> > And SystemTap can already hook into sock_sendmsg() (or any other
> > function) and examine it's arguments -- all of this GCC extension talk
> > is just performance enhancement.
>
> yes, yes, yes, exactly!!! Finally someone reads my mails and understands
> my points. There's hope! ;)

I'm not sure that I do, actually.

You seem to be opposed to all static probe markers in general, but I
think that they'd be useful for big abstract things like "new thread
created" (which would encompass fork/vfork/clone and probably consist of
a single marker in do_fork) or for similar things that happen all over
the kernel (for example, I imagine that all filesystems would want to
use the same set of probe names just to make I/O tracing easier for
userspace).

--
Nicholas Miell <[email protected]>

2006-09-16 02:03:29

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> ok, then i'd like to dispute your point. Contrary to your statement
> there is a very fundamental difference between "static tracing" (static
> call, which relies on compile-time insertion of trace points) and
> "dynamic tracing" (which can insert trace points almost anywhere) -
> _even if both use in-source markers_.

Good, a nice little down-to-earth debate for a change ;)

> The fundamental difference is this: dynamic tracing has full access to
> the full environment of the code that it taps into _at the time of
> tracepoint activation_, while static tracing has to get all its context
> during compilation.

I disagree.

> To make my point easier to understand, consider the following example:
> we want to tap into the middle of a global_function():
>
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
>
> ... [lots of code] ...
> }
>
> We want to trace the function right after 'x' has been assigned, and we
> want to trace an event_A, with parameters: arg1, arg2, arg3 and x. This
> is a pretty common scenario. Ok so far?

Ok so far.

> here is how the markup looks like under static tracing:
>
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
> D(event_A, arg1, arg2, arg3, x);
>
> ... [lots of code] ...
> }
>
> that's what you'd expect, right? This is pretty common too, up to this
> point.

No, that's not what I'd necessarily expect, though it could be and
definitely does match current standard practice. There's no reason,
though, D(foo) isn't calling a statically-linked function which has
a pluggable interface (a module-overloadable symbol if you'd like)
which can then do much more than initially fetch arg1-2-3 using,
as you alluded to earlier, built-in disassemblers and the likes.

One nice thing about the above, though, is that you can easily have
type information at build time and can actually create customized
logging info right there. But this is just brain farting, more
substance below.

> now how could the markup look like for a dynamic tracepoint:
>
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
> D(event_A, x);
>
> ... [lots of code] ...
> }
>
> Note: there's no (arg1, arg2, arg3) passed to the markup! Why? Because
> SystemTap has full access to the function's arguments and in this
> particular case it's simply not necessary to reference them explicitly.
> So the markup has less of an overhead because it does not 'touch' arg1,
> arg2, arg3 if the tracepoint is not active [which is the common case we
> optimize for].

Again, this does not have to be the case. D(arg1, ..., N) could actually
be defined to nothing in *ALL* cases in a header. Nothing precludes
having a special parser that only runs if tracing is enabled and then
generates a special header and corresponding C file which then have
what it takes to make these D() markups meaningful. So in this case,
the compiler never gives a damn about arg1-Z (i.e. no touch or
dependency or anything of the likes), yet a compile-time option allows
you to suddenly make D(foo) turn into a system-tap usable probe point
or a direct call to a statically-linked function (which is what I refer
to as "static tracing".)

> Furthermore, the markup is also visually less intrusive.

That's debatable. If you're going to mark something up, you might as
well state right away what's typically interesting about the event.
Sure, you could make a point that arg32 is something you may be
interesting in some cases, but if arg1-3 are the ones most relevant
99% of the time for this function, then you might as well say that
in your trace marker.

> But better than that, the markup could look like this as well:
>
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
>
> ... [lots of code] ...
> }
>
> right, no markup at all, but in a script somewhere we'd have:
>
> insert.trace(global_function: "x = func2();", after);

That's two files. If we're talking funky, and the following is
by no means and endorsement I'm making -- just showing you what
could be possible, then here's a better one:

Look ma, no hands:

int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2(); /*T* @here:arg1,arg2,arg3 */

... [lots of code] ...
}

Now you can't say that's visually wrong: we've already got tons
of outdated comments in the code. And you can't say there's
entirely no precedent: kerneldoc. Yet, this can be used by a
build-time tool which automagically generates either information
for later use by probe inserters or, alternatively, substitutes
the default built file (say foo.c) with an equivalent (foo-trace.c)
which has inlined static tracing.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 02:30:38

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> nor do i reject all of LTT: as i said before i like the tools, and i
> think its collection of trace events should be turned into systemtap
> markups and scripts. Furthermore, it's ringbuffer implementation looks
> better. So as far as the user is concerned, LTT could (and should) live
> on with full capabilities, but with this crutial difference in how it
> interfaces to the kernel source code.

The interface to the kernel source code can be worked on. I hope my
other email has demonstrated that.

> i.e. could you try to just give SystemTap a chance and attempt to
> integrate a portion of LTT with it ... that shares more of the
> infrastructure and we'd obviously only need "one" markup variant, and
> would have full markup (removal-) flexibility. I'll try to help djprobes
> as much as possible. Hm?

Preface: I have absolutely nothing against SystemTap. I did have a
bone with the way it was developed (behind closed-doors practically),
but I told the SystemTap people about this and end of story, we
moved on and I've had many enjoyable discussions with the SystemTap
team since. I just have a feeling that part of the team is proceeding
as if ltt was dead and buried. They'd like to interface with us --
at least I think -- but nobody dares to touch ltt with a 10foot
poll because it's a political hot-potato i.e. for all they care, ltt
could be a liability for SystemTap because of all the fuss about it
amongst kernel developers. But that's my take, I could be entirely
wrong.

Now, on a technical level, SystemTap cannot currently be a substitute
for what the ltt patch provides, especially in terms of performance.
Maybe one day it will be a substitute, with djprobe and other stuff,
but it isn't *now*. Nevertheless, I'm all for encouraging a movement
in a common direction. And in that regard I think that there is
consensus both amongst the SystemTap team and within the ltt team
-- at least I think, for having a common markers interface. This is
something we can definitely build on. Hopefully dispelling some of
the ltt fud and gathering some positive mantra for the ltt effort
on lkml can help ease people's fears about the possibility of
rubbing the kernel developers the wrong way.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 08:29:57

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > > > This is simply not true, at the source level you can remove a
> > > > > static tracepoint as easily as a dynamic tracepoint, the
> > > > > effect of the missing trace information is the same either way.
> > > >
> > > > this is not true. I gave you one example already a few mails ago
> > > > [...]
> > >
> > > Function attributes also doesn't provide information local to the
> > > function.
> >
> > of course, but where does the above tracepoint i quoted use
> > information local to the function? A fair number of markups use
> > global functions because, surprise, alot of interesting activity
> > happens along global functions. So a healthy reduction in markups
> > can be achieved.
>
> But not completely, which is the whole point.

the point was what you said above, which i claimed and still claim to be
false: "at the source level you can remove a static tracepoint as easily
as a dynamic tracepoint, the effect of the missing trace information is
the same either way."

Your point is still incorrect. I gave you an example of how half of the
tracepoints could be removed under a dynamic scheme - while they couldnt
be removed under a static scheme. Hence that directly contradicts your
contention that "you can remove a static tracepoint as easily as a
dynamic tracepoint". Nothing more, nothing less. I just pointed out the
point in your thinking that i believe to be incorrect.

Reality is that you can remove a dynamic tracepoint much easier, due to
the fundamental flexibility of dynamic tracers. While with static
tracers, every tracepoint has to be _somewhere_ in the source code,
otherwise people like you will complain just like you did in this mail:
"you make life more difficult for static tracers for no reason".

You can concede my point or you can dispute that argument - but what you
did above was neither: you snipped all the quotations and you claimed a
totally new point. (which new point i never argued with: _of course_ i
never claimed that __trace function attributes can remove _all_ markups.
They can "only" remove half of them.)

Ingo

2006-09-16 08:29:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > this tracepoint, under a dynamic tracing concept, can be replaced with:
> >
> > int __trace sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
>
> A nice example where you make life more difficult for static tracers
> for no reason, [...]

No, it's simply a clever feature: "halve the impact of static markups".

What you say will be _precisely_ the kind of situations that make me
very wary of static tracers. Someone does something smart that enables
us to remove half of the tracepoints from the kernel source code, while
you will go on and complain: "why do you make the life harder for static
tracers". You, perhaps inwillingly, are giving the perfect demonstration
of why static tracepoints are a maintainance problem: once added _they
can not be removed without breaking static tracers_.

And i see you didnt reply to (and you didnt even quote) the paragraph
that i believe answers your point:

> > the user of course does not care about kernel internal design and
> > maintainance issues. Think about the many reasons why STREAMS was
> > rejected - users wanted that too. And note that users dont want
> > "static tracers" or any design detail of LTT in particular: what
> > they want is the _functionality_ of LTT.

The kernel tree is not there to make it easier for inferior approaches.
How hard is it for the static tracer folks to take a look at dynamic
tracers and realize that it's the fundamentally better approach, for the
reasons above and for other reasons, and pick the concept up and
integrate it with their code? Just like the STREAMS folks had a chance
to look at the existing TCP/IP implementation in the Linux kernel and
had the chance to realize that it's the better approach. Yet they
insisted on just adding a few hooks here and there, to "make the life
easier for STREAMS".

Ingo

2006-09-16 08:29:24

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> [...] It would also add virtually no maintainance overhead as you like
> to claim - how often does this function change?

as i said, roughly half of the tracepoints are like this - and some of
them in functions in frequented places. That's far from "virtually no
maintainance overhead". In the -rt tree i have never more than a dozen
static tracepoints, yet even this small amount caused at least 5 extra
-rt tree iterations due to various breakages (build problems or even
crashes). Cruft comes in small steps, and my worry is that such
_unremovable_ markups will be cruft that never shrinks. With dynamic
tracers i see the _chance_ for cruft to shift to places where it does
not hurt, if that cruft turns out to become a hindrance.

Ingo

2006-09-16 08:30:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > It's possible I missed something, but pretty much anything you
> > > outlined wouldn't make the live of static tracepoints any easier.
> >
> > sorry, but if you re-read the above line of argument, your sentence
> > appears non-sequitor. I said "the markers needed for dynamic tracing are
> > different from the LTT static tracepoints". You asked why they are so
> > different, and i replied that i already outlined what the right API
> > would be in my opinion to do markups, but that API is different from
> > what LTT is offering now. To which you are now replying: "pretty much
> > anything you outlined wouldn't make the life of static tracepoints any
> > easier." Huh?
>
> Yeah, huh?
>
> I have no idea, what you're trying to tell me. As you demonstrated
> above your "right API" is barely usable for static tracers.

you raise a new point again (without conceding or disputing the point we
were discussing, which point you snipped from your reply) but i'm happy
to reply to this new point too: my suggested API is not "barely usable"
for static tracers but "totally unusable". Did i tell you yet that i
disagree with the addition of markups for static tracers?

Ingo

2006-09-16 08:31:16

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > > > > - a marker for dynamic tracing has lower performance impact
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > > > than a static tracepoint, on systems that are not being
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > > > traced. (but which have the tracing infrastructure enabled
^^^^^^
> > > > > > otherwise)
> > > > >
> > > > > Anyone using static tracing intents to use, which makes this point
> > > > > moot.
> > > >
> > > > that's not at all true, on multiple grounds:
> > > >
> > > > Firstly, many people use distro kernels. A Linux distribution
> > > > typically wants to offer as few kernel rpms as possible (one per
> > > > arch to be precise), but it also wants to offer as many features
> > > > as possible. So if there was a static tracer in there, a distro
> > > > would enable it - but 99.9% of the users would never use it - still
> > > > they would see the overhead. Hence the user would have it enabled,
> > > > but does not intend to use it - which contradicts your statement.
> > >
> > > So if dynamic tracing is available use it, as distributions
> > > already do. OTOH the barrier to use static tracing is drastically
> > > different whether the user has to deal with external patches or
> > > whether it's a simple kernel option. Again, static tracing doesn't
> > > exclude the possibility of dynamic tracing, that's something you
> > > constantly omit and thus make it sound like both options were
> > > mutually exlusive.
> >
> > how does this reply to my point that: "a marker for dynamic tracing has
> > lower performance impact than a static tracepoint, on systems that are
> > not being traced", which point you claimed moot?
>
> Because it's pretty much an implementation issue. [...]

No, that's my point, it's not an "implementational issue" of static
tracers, the overhead of markups for static tracers is:

_inherent to their concept of being compile-time and static_

ok?

> [...] The point is about adding markers at all, it's about the choice
> being able to use static tracers in the first place. [...]

your characterization of "the point" is at odds with the specific point
that we are discussing - see the underlined sentence above, right at the
top of the quotes:

> > > > > > - a marker for dynamic tracing has lower performance impact
> > > > > > than a static tracepoint, on systems that are not being
> > > > > > traced. (but which have the tracing infrastructure enabled

Please either concede the point or dispute it, before shifting to new
grounds. Thanks,

Ingo

2006-09-16 08:32:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > Why don't you leave the choice to the users? Why do you constantly
> > > make it an exclusive choice? [...]
> >
> > as i outlined it tons of times before: once we add markups for static
> > tracers, we cannot remove them. That is a constant kernel maintainance
> > drag that i feel uncomfortable about.
>
> As many, many people have already said, any tracepoints have an
> maintainance overhead, which is barely different between dynamic and
> static tracing and only increases the further away the tracepoints are
> from the source.

i have demonstrated that with dynamic tracers it's possible to have:
"half the number of tracepoints" or "no tracepoints at all", right in
the traced kernel source. That way we are able to shift away the
maintainance overhead from the subsystem which is being traced to the
person who _wants_ to do the tracing (instead of on the person who
maintains the code that is being traced), in a finegrained way.

But even the secondary metric, the "sum of all maintainance, including
the maintanance of tracepoints" can become lower with dynamic tracers:
if a subsystem changes with a much higher frequency than the tracing
scripts follow.

Let me try to explain it to you with other words: if all tracing is done
via scripts and no in-source tracepoints at all, then we could for
example update the tracing scripts only once per release. A subsystem
might undergo a heavy cycle of updates, changing functions that are
traced many times: i call this a "high frequency update to the source
code".

If tracing is done via tracepoints for static tracers, then such "high
frequency updates to the source code" have to "carry with them" all the
markups. It might be zero overhead if a subsystem has no tracepoints,
but it might be alot more complex too.

For example, I can tell you that the -rt tree has a number of very
useful scheduling tracepoints but which are also a constant maintainance
hindrance. For example i even have a separate _function_ that is a
helper to one of the tracepoints. And this was the _bare minimum_ of
static tracepoints i needed for the purposes of visualizing and
analyzing scheduling patterns in the -rt tree (either on my boxes or on
users' boxes). Occasionally users needed alot more tracepoints. So i am
talking from first-hand experience. This maintainance overhead occured
(and still occurs) to /me/, so please dont try to tell me that the
maintainance overhead is minimal. Even "half the tracepoints" would be
great. And i only have a dozen tracepoints, not hundreds!

Ingo

2006-09-16 08:32:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > > > Secondly, even people who intend to _eventually_ make use of
> > > > tracing, dont use it most of the time. So why should they have
> > > > more overhead when they are not tracing? Again: the point is not
> > > > moot because even though the user intends to use tracing, but
> > > > does not always want to trace.
> > >
> > > I've used kernels which included static tracing and the perfomance
> > > overhead is negligible for occasional use.
> >
> > how does this suddenly make my point, that "a marker for dynamic
> > tracing has lower performance impact than a static tracepoint, on
> > systems that are not being traced", "moot"?
>
> Why exactly is the point relevant in first place? How exactly is the
> added (minor!) overhead such a fundamental problem?

how could a fundamental performance difference between two markup
schemes be not relevant to kernel design decisions? Which performance
difference i claim derives straight from the conceptual difference
between the two approaches and is thus "unfixable" (and not an
"implementational issue").

Ingo

2006-09-16 09:58:58

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> Please Ingo, stop repeating false argument without taking in account people's
> corrections :
>
> * Ingo Molnar ([email protected]) wrote:
>> sorry, but i disagree. There _is_ a solution that is superior in every
>> aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
>>
> I am sorry to have to repeat myself, but this is not true for heavy loads.

Alan pointed out earlier in the thread that the actual kprobe is noise
in this context, and I have seen similar issues on real workloads. Yes
kprobes are probably a little higher overhead in real life, but you have
to way that up against the rest of the system load.

If you want to prove people wrong, I suggest you do some real life
implementation and measure some real workloads with a predefined set of
tracepoints implemented using kprobes and LTT and show us that the
benchmark of the user application suffers in a way that can actually be
measured. Argueing that a syscall takes an extra 50 instructions
because it's traced using kprobes rather than LTT doesn't mean it
actually has any real impact.

"The 'kprobes' are too high overhead that makes them unusable" is one of
these classic myths that the static tracepoint advocates so far have
only been backing up with rhetoric. Give us some hard evidence or stop
repeating this argument please. Just because something is repeated
constantly doesn't transform it into truth.

Jes

2006-09-16 10:18:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Andrew Morton wrote:
> On Fri, 15 Sep 2006 20:19:07 +0200
> Ingo Molnar <[email protected]> wrote:
>
>> * Andrew Morton <[email protected]> wrote:
>>
>>> What Karim is sharing with us here (yet again) is the real in-field
>>> experience of real users (ie: not kernel developers).
>> well, Jes has that experience and Thomas too.
>
> systemtap and ltt are the only full-scale tracing tools which target
> sysadmins and applciation developers of which I am aware..

Just to clarify, the stuff I have looked at in the field was based on
LTT, but not part of the official LTT. It simply goes to show that end
users cannot agree on a small set of fixed tracepoints because someone
always wants a slightly different view of things, like in the cases I
looked at. Not to mention that the changes LTT users make, at times, to
shoehorn their stuff in, especially in sensitive codepaths such as the
syscall path, have side effects which clearly weren't considered.

In one case I ended up doing an alternative implementation using kprobes
to prove that similar results could be achieved in that manner.
Strangely enough I was right :)

I don't have any objections to markers as Ingo suggested. I just don't
buy the repeated argument that LTT has been around for years and barely
changed. It's simply a case of the LTT team not being aware (or deciding
to ignore, I cannot say which) what users have actually done with the
LTT codebase, but it seems obvious they are not aware what everyone is
doing with it. But we have seen before how if an infrastructure like LTT
goes into the kernel, many more users will pop up and want to have their
stuff added.

The other part is the constantly repeated performance claim, which to
this point hasn't been backed up by any hard evidence. If we are to take
that argument serious, then I strongly encourage the LTT community to
present some real numbers, but until then it can be classified as
nothing but FUD.

I shall be the first to point out that kprobes are less than ideal,
especially the current ia64 implementation suffers from some tricky
limitations, but thats an implementation issue.

Cheers,
Jes

2006-09-16 10:40:42

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Jes Sorensen wrote:
>> There very few tracepoints in this category,
>
> Wow, that's progress.

Karim,

A personal question?, do you feel that being patronising and insulting
is in any way going to put your LTT project in a better light? It
certainly makes it a lot harder for many of us to take your arguments
serious.

>> the only things you can claim are more or less generic are syscalls,
>> and tracing syscall handling is tricky.
>
> If there are implementation issue, I trust an adequate solution can be
> found by using the tested-and-proven method of posting stuff on the
> lkml for review.

And how is this going to solve the case where trace code in the syscall
path has a negative impact on cacheline utilization and alignment, even
when the trace data is not being used?

>> This is grossly over simplifying things and why the whole things doesn't
>> hold water. There is no such thing as 'the place' to put a specific
>> tracepoint.
[snip]
> I do not underestimate the difficulty of selecting such tracepoints.
> This is why I chose not to maintain other people's specific tracepoints.
> I realize this is a tough problem, but I also trust subsystem maintainers
> are smart enough to make the appropriate decision.

So you are back to saying that trace data other people wish to collect
are uninteresting and therefore should just be ignored? If not, what you
are saying there otherwise just backs up the argument that if LTT or
something similar goes into mainline, we will see the amount of
tracepoints grow significantly.

>> You seem to think that it's fine to add instrumentation in the syscall
>> path as an example as long as it's compiled out. Well on some
>> architectures, the syscall path is very sensitive to alignment and there
>> may be restrictions on how large the stub of code is allowed to be, like
>> a few hundred bytes. Just because things work one way on x86, doesn't
>> mean they work like that everywhere.
>
> If ltt failed to implement such things appropriately, then we apologize.
> That fact doesn't preclude proper implementation in the future, however.

Please read what I wrote above! Touching the syscall path with static
tracepoints is costly and has side effects! The argument that things can
be compiled out is just pointless, end users do not recompile kernels at
random and many of the 'end user' cases where people wish to vizualize
trace data, are running on precompiled vendor kernels. Recompiling the
kernel and rebooting is not an option here!

In fact, the users who wish to trace data in self-compiled kernels are a
tiny subset of the potential userbase for this stuff which is primarily
useful to developers .... which in terms makes your argument about debug
tracepoints irrelevant since you are turning all the tracepoints into
debug tracepoints :)

Jes

2006-09-16 10:45:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Chuck Ebbert wrote:
> In-Reply-To: <[email protected]>
>
> On Fri, 15 Sep 2006 15:37:51 +0100, Alan Cox wrote:
>
>>> $ grep KPROBES arch/*/Kconf*
>>> arch/i386/Kconfig:config KPROBES
>>> arch/ia64/Kconfig:config KPROBES
>>> arch/powerpc/Kconfig:config KPROBES
>>> arch/sparc64/Kconfig:config KPROBES
>>> arch/x86_64/Kconfig:config KPROBES
>> Send patches. The fact nobody has them implemented on your platform
>> isn't a reason to implement something else, quite the reverse in fact.
>
> Yes, but the point is: until that's done you can't claim kprobes is a
> valid tracing tool for everyone.

The fact that the remaining architectures haven't bothered implementing
kprobe supposed should not be used as an argument for pushing something
inferior out of laziness.

It's the same with syscalls, the kernel infrastructure is there, but if
you don't bother updating the syscall tables and wrap it in with glibc,
then the call isn't available on your architecture.

The core kprobe infrastructure is available to all architectures, it's
up to the developers of the remaining architectures to implement the
remaining bits.

Jes

2006-09-16 15:07:55

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> A personal question?, do you feel that being patronising and insulting
> is in any way going to put your LTT project in a better light? It
> certainly makes it a lot harder for many of us to take your arguments
> serious.

ltt isn't *mine* anymore, somebody else is maintaining it at this point,
and it remains to be seen whether any of my input in this thread is:
a) appreciated by them, b) agreed by them.

With regards to the tone of the thread, then please at least read other
people's approach to me, including yourself. I think the casual observer
will see that there was a great deal of animosity aimed at me personally.
I'll admit to being sarcastic and biting back. But that's hardly alien
to lkml.

> And how is this going to solve the case where trace code in the syscall
> path has a negative impact on cacheline utilization and alignment, even
> when the trace data is not being used?

Hmm... and then compare that to the negative impact of kprobes at runtime.
Of course if we could override the syscall table your point disappears.
That's not how ltt does it now, but it could easily be done otherwise.
All implementations I've looked at so far of syscall in Linux involve
a table. If the base of this table was a dynamically modifiable entry,
then the problem is solved. Wouldn't it?

> So you are back to saying that trace data other people wish to collect
> are uninteresting and therefore should just be ignored? If not, what you
> are saying there otherwise just backs up the argument that if LTT or
> something similar goes into mainline, we will see the amount of
> tracepoints grow significantly.

I've explained earlier the difference in between these things.

> Please read what I wrote above! Touching the syscall path with static
> tracepoints is costly and has side effects! The argument that things can
> be compiled out is just pointless, end users do not recompile kernels at
> random and many of the 'end user' cases where people wish to vizualize
> trace data, are running on precompiled vendor kernels. Recompiling the
> kernel and rebooting is not an option here!

It is for some. And please stop repeating the syscall path stuff. It can
be solved elegantly. The fact that it hasn't up to this point is only an
excuse to keep working harder on it. There is, in fact, no reason that
the solution may not just be a combination of static markup and dynamic
modification.

> In fact, the users who wish to trace data in self-compiled kernels are a
> tiny subset of the potential userbase for this stuff which is primarily
> useful to developers .... which in terms makes your argument about debug
> tracepoints irrelevant since you are turning all the tracepoints into
> debug tracepoints :)

How many embedded Linux projects did you personally work on?

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 15:44:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Jes Sorensen wrote:
> Just to clarify, the stuff I have looked at in the field was based on
> LTT, but not part of the official LTT. It simply goes to show that end
> users cannot agree on a small set of fixed tracepoints because someone
> always wants a slightly different view of things, like in the cases I
> looked at. Not to mention that the changes LTT users make, at times, to
> shoehorn their stuff in, especially in sensitive codepaths such as the
> syscall path, have side effects which clearly weren't considered.

Good. So give me concrete examples of those cases that you saw and tell
me exactly what those people you were working with were attempting to
achieve.

> I don't have any objections to markers as Ingo suggested. I just don't
> buy the repeated argument that LTT has been around for years and barely
> changed. It's simply a case of the LTT team not being aware (or deciding
> to ignore, I cannot say which) what users have actually done with the
> LTT codebase, but it seems obvious they are not aware what everyone is
> doing with it. But we have seen before how if an infrastructure like LTT
> goes into the kernel, many more users will pop up and want to have their
> stuff added.

Either ltt had a userbase or it didn't. To say that all its users went
out and added their own tracepoints is to not know enough about the project
and so too is it to say that none of its users could actually just use
it out of the box without modifying it. Now, as an outsider, trying to
measure how many users were using it without modifying it is like
trying to figure out how many Linux users there are out there. There's
a silent majority and there's those that need customization. Guess
who you've been talking to?

Strange, come to think of it I don't remember *ever* getting an
email from you while being the maintainer or seing *any* emails by you
on the ltt lists -- that's indicative of mindset, namely that you
personally assumed you knew all about tracing and didn't need us to make
suggestions to help you AND that you personally never found it relevant
to contribute back. That's like me going off forking the kernel, adding
features to it and then calling the kernel developers incompetent when
they come around saying that what I'm doing is wrong. Who's patronizing
who here?

And I submit to you an idea which I submitted to Ingo yesterday and have
not yet received feedback on. Here's static markup as it could be
implemented:

The plain function:
int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2();

... [lots of code] ...
}

The function with static markup:
int global_function(int arg1, int arg2, int arg3)
{
... [lots of code] ...

x = func2(); /*T* @here:arg1,arg2,arg3 */

... [lots of code] ...
}

The semantics are primitive at this stage, and they could definitely
benefit from lkml input, but essentially we have a build-time parser
that goes around the code and automagically does one of two things:
a) create information for binary editors to use
b) generate an alternative C file (foo-trace.c) with inlined static
function calls.

And there might be other possibilities I haven't thought of.

This beats every argument I've seen to date on static instrumentation.
Namely:
- It isn't visually offensive: it's a comment.
- It's not a maintenance drag: outdated comments are not alien.
- It doesn't use weird function names or caps: it's a comment.
- There is precedent: kerneldoc.
And it does preserve most of the key things those who've asked for
static markup are looking for. Namely:
- Static instrumentation
- Mainline maintainability
- Contextualized variables

When I was still part of the ltt development process we had accumulated
a huge amount of ideas of how we could optimize and fix stuff here and
there. We were never actually ever able to reduce these to practice
because folks like you never bothered interfacing with us and the
attitude on the lkml was exactly as I described. We spent our time
chasing kernels.

> The other part is the constantly repeated performance claim, which to
> this point hasn't been backed up by any hard evidence. If we are to take
> that argument serious, then I strongly encourage the LTT community to
> present some real numbers, but until then it can be classified as
> nothing but FUD.

Hmm... beats me why even the systemtap folks would themselves admit
to performance limitations.

> I shall be the first to point out that kprobes are less than ideal,
> especially the current ia64 implementation suffers from some tricky
> limitations, but thats an implementation issue.

Ah, so it's ok for kprobes to have implementation issues, but not ltt.
Somehow there's this magic thought recurring throughout this thread
that the limitations of dynamic instrumentation are trivial to fix,
but those of static instrumentation are unrecoverable. *That* is a
fallacy if I ever saw one. I'm willing to admit that a combination of
dynamic editing and static instrumentation is a good balance, but Jes
please drop this discourse, it's not constructive.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 17:24:23

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jes Sorensen ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> >Please Ingo, stop repeating false argument without taking in account
> >people's
> >corrections :
> >
> >* Ingo Molnar ([email protected]) wrote:
> >>sorry, but i disagree. There _is_ a solution that is superior in every
> >>aspect: kprobes + SystemTap. (or any other equivalent dynamic tracer)
> >>
> >I am sorry to have to repeat myself, but this is not true for heavy loads.
>
> Alan pointed out earlier in the thread that the actual kprobe is noise
> in this context, and I have seen similar issues on real workloads. Yes
> kprobes are probably a little higher overhead in real life, but you have
> to way that up against the rest of the system load.
>
> If you want to prove people wrong, I suggest you do some real life
> implementation and measure some real workloads with a predefined set of
> tracepoints implemented using kprobes and LTT and show us that the
> benchmark of the user application suffers in a way that can actually be
> measured. Argueing that a syscall takes an extra 50 instructions
> because it's traced using kprobes rather than LTT doesn't mean it
> actually has any real impact.
>
> "The 'kprobes' are too high overhead that makes them unusable" is one of
> these classic myths that the static tracepoint advocates so far have
> only been backing up with rhetoric. Give us some hard evidence or stop
> repeating this argument please. Just because something is repeated
> constantly doesn't transform it into truth.
>

Hi,

Here we go. I made a test that we can consider a lower bound for kprobes impact.
Two tests per run.

Simulation of high speed network traffic :

time ping -f localhost

First run : without any tracing activated, LTTng probes compiled in :

39457 packets received in 2.021 seconds : 19523.50 packets/s
142672 packets received in 7.237 seconds : 19714.24 packets/s

Second run : LTTng tracing activated (traces system calls, interrupts and
packet in/out...) :

93051 packets received in 7.395 seconds : 12582.96 packets/s
121585 packets received in 9.703 seconds : 12530.66 packets/s

Third run : same LTTng instrumentation, with a kprobe handler triggered by each
event traced.

56643 packets received in 11.152 seconds : 5079.17 packets/s
50150 packets received in 9.593 seconds : 5227.77 packets/s

The bottom line is :

LTTng impact on the studied phenomenon : 35% slower

LTTng+kprobes impact on the studied phenomenon : 73% slower

Therefore, I conclude that on this type of high event rate workload, kprobes
doubles the tracer impact on the system.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-16 17:35:01

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

2006-09-16 17:35:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jes Sorensen ([email protected]) wrote:

> If you want to prove people wrong, I suggest you do some real life
> implementation and measure some real workloads with a predefined set of
> tracepoints implemented using kprobes and LTT and show us that the
> benchmark of the user application suffers in a way that can actually be
> measured. Argueing that a syscall takes an extra 50 instructions
> because it's traced using kprobes rather than LTT doesn't mean it
> actually has any real impact.
>

And about those extra cycles.. according to :
Documentation/kprobes.txt
"6. Probe Overhead

On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
microseconds to process. Specifically, a benchmark that hits the same
probepoint repeatedly, firing a simple handler each time, reports 1-2
million hits per second, depending on the architecture. A jprobe or
return-probe hit typically takes 50-75% longer than a kprobe hit.
When you have a return probe set on a function, adding a kprobe at
the entry to that function adds essentially no overhead.

i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40

x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07

ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99

So, 1 microsecond seems more like 1500-2000 cycles to me, not 50.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-16 17:44:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> Third run : same LTTng instrumentation, with a kprobe handler
> triggered by each event traced.

where exactly did you put the kprobe handler?

Ingo

2006-09-16 17:50:27

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> where exactly did you put the kprobe handler?

So location matters, huh? If you're keen to ask this question,
then it might be worth asking why should non-experts be
trusted with keeping instrumentation pertinent out of tree.

[ I know you've said that you acknowledge the need for static
markup. I'm just highlighting a fact substantiating the
position I stated to you in my response late last evening. ]

Thanks,

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 17:52:32

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Karim Yaghmour <[email protected]> wrote:

> Ingo Molnar wrote:
> > where exactly did you put the kprobe handler?
>
> So location matters, huh? [...]

yes, location very much matters if someone wants to reproduce the
numbers.

Ingo

2006-09-16 17:54:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> yes, location very much matters if someone wants to reproduce the
> numbers.

Was that really the angle? I'll give you the benefit of the doubt.
But I'm sure you understand the importance of probe placement
with regards to impact of performance ...

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 17:56:11

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > Third run : same LTTng instrumentation, with a kprobe handler
> > triggered by each event traced.
>
> where exactly did you put the kprobe handler?

ltt_relay_reserve_slot.

See http://ltt.polymtl.ca/svn/tests/kernel/test-kprobes.c to insert the kprobe.
Tests done on LTTng 0.5.111, on a x86 3GHz with hyperthreading.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-16 19:19:39

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> See http://ltt.polymtl.ca/svn/tests/kernel/test-kprobes.c to insert
> the kprobe. Tests done on LTTng 0.5.111, on a x86 3GHz with
> hyperthreading.

i have done a bit of kprobes and djprobes testing on a 2160 MHz Athlon64
CPU, UP. I have tested 2 types of almost-NOP tracepoints (on 2.6.17),
where the probe function only increases a counter:

static int counter;

static void probe_func(struct djprobe *djp, struct pt_regs *regs)
{
counter++;
}

and have measured the overhead of an unmodified, kprobes-probed and
djprobes-probed sys_getpid() system-call:

sys_getpid() unmodified latency: 317 cycles [ 0.146 usecs ]
sys_getpid() kprobes latency: 815 cycles [ 0.377 usecs ]
sys_getpid() djprobes latency: 380 cycles [ 0.176 usecs ]

i.e. the kprobes overhead is +498 cycles (+0.231 usecs), the djprobes
overhead is +63 cycles (+0.029 usecs).

what do these numbers tell us? Firstly, on this CPU the kprobes overhead
is not 1000-2000 cycles but 500 cycles. Secondly, if that's not fast
enough, the "next-gen kprobes" code, djprobes have a really small
overhead of 63 cycles.

Ingo

2006-09-16 19:31:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar wrote:
> i have done a bit of kprobes and djprobes testing on a 2160 MHz Athlon64
> CPU, UP. I have tested 2 types of almost-NOP tracepoints (on 2.6.17),
> where the probe function only increases a counter:
>
> static int counter;
>
> static void probe_func(struct djprobe *djp, struct pt_regs *regs)
> {
> counter++;
> }
>
> and have measured the overhead of an unmodified, kprobes-probed and
> djprobes-probed sys_getpid() system-call:
>
> sys_getpid() unmodified latency: 317 cycles [ 0.146 usecs ]
> sys_getpid() kprobes latency: 815 cycles [ 0.377 usecs ]
> sys_getpid() djprobes latency: 380 cycles [ 0.176 usecs ]
>
> i.e. the kprobes overhead is +498 cycles (+0.231 usecs), the djprobes
> overhead is +63 cycles (+0.029 usecs).

But that's an entirely hypothetical benchmark. Mathieu was asked for
real-workload benchmarks and he gave you those. In turn, you set up
a simplistic test and then go on to conclude that the measurements
are far less than advertised. You ask that ltt replace its static
instrumentation by what kprobes provides and Mathieu demonstrated
that that's not realistic. If you want to change his mind, at least
reproduce the exact information ltt can provide and then we'll
talk.

> what do these numbers tell us? Firstly, on this CPU the kprobes overhead
> is not 1000-2000 cycles but 500 cycles. Secondly, if that's not fast
> enough, the "next-gen kprobes" code, djprobes have a really small
> overhead of 63 cycles.

But djprobe isn't even here yet. If you insist on keeping ltt's
_current_ limitations as your single most powerful justification to
reject it, how you hold kprobes to a different standard with a
straight face? You're only perpetuating the fallacy found
throughout this thread that somehow the shortcomings of dynamic
editing are "easy" to fix while those of static instrumentation are
inherently unrecoverable. That's just plain not true, as I've
demonstrated now countless times in this thread.

And please Ingo, I'm still waiting for your feedback on the static
markup mechanism I proposed earlier. I believe it avoids every
single problem you alluded to with regards to the problems generated
by inline markup.

Thanks,

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 19:45:52

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar <[email protected]> wrote:

> and have measured the overhead of an unmodified, kprobes-probed and
> djprobes-probed sys_getpid() system-call:
>
> sys_getpid() unmodified latency: 317 cycles [ 0.146 usecs ]
> sys_getpid() kprobes latency: 815 cycles [ 0.377 usecs ]
> sys_getpid() djprobes latency: 380 cycles [ 0.176 usecs ]

i have taken a look at the kprobes fastpath, and there are a few things
we can do to speed it up. The patch below shaves off 75 cycles from the
kprobes overhead:

sys_getpid() kprobes-speedup: 740 cycles [ 0.342 usecs ]

that reduces the kprobes overhead to 423 cycles.

Ingo

--------------->
Subject: [patch] kprobes: speed INT3 trap handling up on i386
From: Ingo Molnar <[email protected]>

speed up kprobes trap handling by special-casing kernel-space
INT3 traps (which do not occur otherwise) and doing a kprobes
handler check - instead of redirecting over the i386-die-notifier
chain.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/kprobes.c | 2 +-
arch/i386/kernel/traps.c | 19 ++++++++++++-------
include/asm-i386/kprobes.h | 2 ++
3 files changed, 15 insertions(+), 8 deletions(-)

Index: linux/arch/i386/kernel/kprobes.c
===================================================================
--- linux.orig/arch/i386/kernel/kprobes.c
+++ linux/arch/i386/kernel/kprobes.c
@@ -200,7 +200,7 @@ void __kprobes arch_prepare_kretprobe(st
* Interrupts are disabled on entry as trap3 is an interrupt gate and they
* remain disabled thorough out this function.
*/
-static int __kprobes kprobe_handler(struct pt_regs *regs)
+int __kprobes kprobe_handler(struct pt_regs *regs)
{
struct kprobe *p;
int ret = 0;
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -802,13 +802,18 @@ EXPORT_SYMBOL_GPL(unset_nmi_callback);
#ifdef CONFIG_KPROBES
fastcall void __kprobes do_int3(struct pt_regs *regs, long error_code)
{
- if (notify_die(DIE_INT3, "int3", regs, error_code, 3, SIGTRAP)
- == NOTIFY_STOP)
- return;
- /* This is an interrupt gate, because kprobes wants interrupts
- disabled. Normal trap handlers don't. */
- restore_interrupts(regs);
- do_trap(3, SIGTRAP, "int3", 1, regs, error_code, NULL);
+ /*
+ * kernel-mode INT3s are likely kprobes:
+ */
+ if (!user_mode(regs)) {
+ if (kprobe_handler(regs))
+ return;
+ /* This is an interrupt gate, because kprobes wants interrupts
+ disabled. Normal trap handlers don't. */
+ restore_interrupts(regs);
+ do_trap(3, SIGTRAP, "int3", 1, regs, error_code, NULL);
+ }
+ notify_die(DIE_INT3, "int3", regs, error_code, 3, SIGTRAP);
}
#endif

Index: linux/include/asm-i386/kprobes.h
===================================================================
--- linux.orig/include/asm-i386/kprobes.h
+++ linux/include/asm-i386/kprobes.h
@@ -88,4 +88,6 @@ static inline void restore_interrupts(st

extern int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
+extern int kprobe_handler(struct pt_regs *regs);
+
#endif /* _ASM_KPROBES_H */

2006-09-16 20:38:10

[permalink] [raw]

Subject: [patch] kprobes: optimize branch placement

* Ingo Molnar <[email protected]> wrote:

> * Ingo Molnar <[email protected]> wrote:
>
> > and have measured the overhead of an unmodified, kprobes-probed and
> > djprobes-probed sys_getpid() system-call:
> >
> > sys_getpid() unmodified latency: 317 cycles [ 0.146 usecs ]
> > sys_getpid() kprobes latency: 815 cycles [ 0.377 usecs ]
> > sys_getpid() djprobes latency: 380 cycles [ 0.176 usecs ]
>
> i have taken a look at the kprobes fastpath, and there are a few things
> we can do to speed it up. The patch below shaves off 75 cycles from the
> kprobes overhead:
>
> sys_getpid() kprobes-speedup: 740 cycles [ 0.342 usecs ]
>
> that reduces the kprobes overhead to 423 cycles.

the patch below brings the overhead down to 420 cycles:

sys_getpid() kprobes-speedup: 737 cycles [ 0.341 usecs ]

Ingo

---------->
Subject: [patch] kprobes: optimize branch placement
From: Ingo Molnar <[email protected]>

optimize gcc's code generation by hinting branch probabilities.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/kprobes.c | 2 +-
arch/i386/kernel/traps.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

Index: linux/arch/i386/kernel/kprobes.c
===================================================================
--- linux.orig/arch/i386/kernel/kprobes.c
+++ linux/arch/i386/kernel/kprobes.c
@@ -220,7 +220,7 @@ int __kprobes kprobe_handler(struct pt_r
kcb = get_kprobe_ctlblk();

/* Check we're not actually recursing */
- if (kprobe_running()) {
+ if (unlikely(kprobe_running())) {
p = get_kprobe(addr);
if (p) {
if (kcb->kprobe_status == KPROBE_HIT_SS &&
Index: linux/arch/i386/kernel/traps.c
===================================================================
--- linux.orig/arch/i386/kernel/traps.c
+++ linux/arch/i386/kernel/traps.c
@@ -806,7 +806,7 @@ fastcall void __kprobes do_int3(struct p
* kernel-mode INT3s are likely kprobes:
*/
if (!user_mode(regs)) {
- if (kprobe_handler(regs))
+ if (likely(kprobe_handler(regs)))
return;
/* This is an interrupt gate, because kprobes wants interrupts
disabled. Normal trap handlers don't. */

2006-09-16 20:52:00

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

* Ingo Molnar <[email protected]> wrote:

> the patch below brings the overhead down to 420 cycles:
>
> sys_getpid() kprobes-speedup: 737 cycles [ 0.341 usecs ]

the patch below reduces the kprobes overhead to 305 cycles. The current
performance table is:

sys_getpid() unmodified latency: 317 cycles [ 0.146 usecs ] 0
sys_getpid() djprobes latency: 380 cycles [ 0.176 usecs ] +63
sys_getpid() kprobes latency: 815 cycles [ 0.377 usecs ] +498
sys_getpid() kprobes-speedup: 740 cycles [ 0.342 usecs ] +423
sys_getpid() kprobes-speedup2: 737 cycles [ 0.341 usecs ] +420
sys_getpid() kprobes-speedup3: 622 cycles [ 0.287 usecs ] +305

the 3 speedups i did today eliminated 63% of the kprobes overhead in
this test.

this too shows that there's lots of performance potential even in
INT3-based kprobes.

Ingo

--------------->
Subject: [patch] kprobes: move from struct_hlist to struct_list
From: Ingo Molnar <[email protected]>

kprobes is using hlists for no good reason: the hash-table is 64 entries
so there's no significant RAM footprint difference. hlists are more
complicated to handle though and cause runtime overhead and cacheline
inefficiencies.

Signed-off-by: Ingo Molnar <[email protected]>

---
arch/i386/kernel/kprobes.c | 7 +--
include/linux/djprobe.h | 3 -
include/linux/kprobes.h | 15 +++----
init/main.c | 3 +
kernel/djprobe.c | 23 ++++++----
kernel/kprobes.c | 96 +++++++++++++++++++++------------------------
6 files changed, 75 insertions(+), 72 deletions(-)

Index: linux/arch/i386/kernel/kprobes.c
===================================================================
--- linux.orig/arch/i386/kernel/kprobes.c
+++ linux/arch/i386/kernel/kprobes.c
@@ -354,9 +354,8 @@ no_kprobe:
*/
fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
{
- struct kretprobe_instance *ri = NULL;
- struct hlist_head *head;
- struct hlist_node *node, *tmp;
+ struct kretprobe_instance *ri = NULL, *tmp;
+ struct list_head *head;
unsigned long flags, orig_ret_address = 0;
unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;

@@ -376,7 +375,7 @@ fastcall void *__kprobes trampoline_hand
* real return address, and all the rest will point to
* kretprobe_trampoline
*/
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ list_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task != current)
/* another task is sharing our hash bucket */
continue;
Index: linux/include/linux/djprobe.h
===================================================================
--- linux.orig/include/linux/djprobe.h
+++ linux/include/linux/djprobe.h
@@ -36,7 +36,7 @@ struct djprobe_instance {
struct list_head plist; /* list of djprobes for multiprobe support */
struct arch_djprobe_stub stub;
struct kprobe kp;
- struct hlist_node hlist; /* list of djprobe_instances */
+ struct list_head hlist; /* list of djprobe_instances */
};
#define DJPI_EMPTY(djpi) (list_empty(&djpi->plist))

@@ -65,6 +65,7 @@ extern int djprobe_pre_handler(struct kp
extern void arch_install_djprobe_instance(struct djprobe_instance *djpi);
extern void arch_uninstall_djprobe_instance(struct djprobe_instance *djpi);
struct djprobe_instance * __get_djprobe_instance(void *addr, int size);
+extern int init_djprobe(void);

int register_djprobe(struct djprobe *p, void *addr, int size);
void unregister_djprobe(struct djprobe *p);
Index: linux/include/linux/kprobes.h
===================================================================
--- linux.orig/include/linux/kprobes.h
+++ linux/include/linux/kprobes.h
@@ -39,7 +39,7 @@
#include <linux/mutex.h>

struct kprobe_insn_page_list {
- struct hlist_head list;
+ struct list_head list;
int insn_size; /* size of an instruction slot */
};

@@ -69,7 +69,7 @@ typedef int (*kretprobe_handler_t) (stru
struct pt_regs *);

struct kprobe {
- struct hlist_node hlist;
+ struct list_head hlist;

/* list of kprobes for multi-handler support */
struct list_head list;
@@ -145,13 +145,13 @@ struct kretprobe {
kretprobe_handler_t handler;
int maxactive;
int nmissed;
- struct hlist_head free_instances;
- struct hlist_head used_instances;
+ struct list_head free_instances;
+ struct list_head used_instances;
};

struct kretprobe_instance {
- struct hlist_node uflist; /* either on free list or used list */
- struct hlist_node hlist;
+ struct list_head uflist; /* either on free list or used list */
+ struct list_head hlist;
struct kretprobe *rp;
kprobe_opcode_t *ret_addr;
struct task_struct *task;
@@ -163,6 +163,7 @@ extern int arch_prepare_kprobe(struct kp
extern void arch_arm_kprobe(struct kprobe *p);
extern void arch_disarm_kprobe(struct kprobe *p);
extern int arch_init_kprobes(void);
+extern int init_kprobes(void);
extern void show_registers(struct pt_regs *regs);
extern kprobe_opcode_t *get_insn_slot(void);
extern void free_insn_slot(kprobe_opcode_t *slot);
@@ -175,7 +176,7 @@ extern int in_kprobes_functions(unsigned

/* Get the kprobe at this addr (if any) - called with preemption disabled */
struct kprobe *get_kprobe(void *addr);
-struct hlist_head * kretprobe_inst_table_head(struct task_struct *tsk);
+struct list_head * kretprobe_inst_table_head(struct task_struct *tsk);

/* kprobe_running() will just return the current_kprobe on this CPU */
static inline struct kprobe *kprobe_running(void)
Index: linux/init/main.c
===================================================================
--- linux.orig/init/main.c
+++ linux/init/main.c
@@ -530,6 +530,9 @@ asmlinkage void __init start_kernel(void
if (efi_enabled)
efi_enter_virtual_mode();
#endif
+#ifdef CONFIG_KPROBES
+ init_kprobes();
+#endif
fork_init(num_physpages);
proc_caches_init();
buffer_init();
Index: linux/kernel/djprobe.c
===================================================================
--- linux.orig/kernel/djprobe.c
+++ linux/kernel/djprobe.c
@@ -47,7 +47,7 @@
#define DJPROBE_TABLE_MASK (DJPROBE_TABLE_SIZE - 1)

/* djprobe instance hash table */
-static struct hlist_head djprobe_inst_table[DJPROBE_TABLE_SIZE];
+static struct list_head djprobe_inst_table[DJPROBE_TABLE_SIZE];

#define hash_djprobe(key) \
(((unsigned long)(key) >> DJPROBE_BLOCK_BITS) & DJPROBE_TABLE_MASK)
@@ -59,12 +59,12 @@ static atomic_t djprobe_count = ATOMIC_I

/* Instruction pages for djprobe's stub code */
static struct kprobe_insn_page_list djprobe_insn_pages = {
- HLIST_HEAD_INIT, 0
+ LIST_HEAD_INIT(djprobe_insn_pages.list), 0
};

static inline void __free_djprobe_instance(struct djprobe_instance *djpi)
{
- hlist_del(&djpi->hlist);
+ list_del(&djpi->hlist);
if (djpi->kp.addr) {
unregister_kprobe(&(djpi->kp));
}
@@ -100,8 +100,8 @@ static inline
djpi->kp.pre_handler = djprobe_pre_handler;
arch_prepare_djprobe_instance(djpi, size);

- INIT_HLIST_NODE(&djpi->hlist);
- hlist_add_head(&djpi->hlist, &djprobe_inst_table[hash_djprobe(addr)]);
+ INIT_LIST_HEAD(&djpi->hlist);
+ list_add(&djpi->hlist, &djprobe_inst_table[hash_djprobe(addr)]);
out:
return djpi;
}
@@ -110,13 +110,12 @@ struct djprobe_instance *__kprobes __get
int size)
{
struct djprobe_instance *djpi;
- struct hlist_node *node;
unsigned long idx, eidx;

idx = hash_djprobe(addr - ARCH_STUB_INSN_MAX);
eidx = ((hash_djprobe(addr + size) + 1) & DJPROBE_TABLE_MASK);
do {
- hlist_for_each_entry(djpi, node, &djprobe_inst_table[idx],
+ list_for_each_entry(djpi, &djprobe_inst_table[idx],
hlist) {
if (((long)addr <
(long)djpi->kp.addr + DJPI_ARCH_SIZE(djpi))
@@ -234,13 +233,17 @@ void __kprobes unregister_djprobe(struct
up(&djprobe_mutex);
}

-static int __init init_djprobe(void)
+int __init init_djprobe(void)
{
+ int i;
+
+ for (i = 0; i < DJPROBE_TABLE_SIZE; i++)
+ INIT_LIST_HEAD(&djprobe_inst_table[i]);
+
djprobe_insn_pages.insn_size = ARCH_STUB_SIZE;
+
return 0;
}

-__initcall(init_djprobe);
-
EXPORT_SYMBOL_GPL(register_djprobe);
EXPORT_SYMBOL_GPL(unregister_djprobe);
Index: linux/kernel/kprobes.c
===================================================================
--- linux.orig/kernel/kprobes.c
+++ linux/kernel/kprobes.c
@@ -46,8 +46,8 @@
#define KPROBE_HASH_BITS 6
#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)

-static struct hlist_head kprobe_table[KPROBE_TABLE_SIZE];
-static struct hlist_head kretprobe_inst_table[KPROBE_TABLE_SIZE];
+static struct list_head kprobe_table[KPROBE_TABLE_SIZE];
+static struct list_head kretprobe_inst_table[KPROBE_TABLE_SIZE];

DEFINE_MUTEX(kprobe_mutex); /* Protects kprobe_table */
DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */
@@ -63,14 +63,14 @@ static DEFINE_PER_CPU(struct kprobe *, k
#define INSNS_PER_PAGE(size) (PAGE_SIZE/(size * sizeof(kprobe_opcode_t)))

struct kprobe_insn_page {
- struct hlist_node hlist;
+ struct list_head hlist;
kprobe_opcode_t *insns; /* Page of instruction slots */
int nused;
char slot_used[1];
};

static struct kprobe_insn_page_list kprobe_insn_pages = {
- HLIST_HEAD_INIT, MAX_INSN_SIZE
+ LIST_HEAD_INIT(kprobe_insn_pages.list), MAX_INSN_SIZE
};

/**
@@ -81,10 +81,10 @@ kprobe_opcode_t
__kprobes * __get_insn_slot(struct kprobe_insn_page_list *pages)
{
struct kprobe_insn_page *kip;
- struct hlist_node *pos;
+ struct list_head *pos;
int ninsns = INSNS_PER_PAGE(pages->insn_size);

- hlist_for_each(pos, &pages->list) {
+ list_for_each(pos, &pages->list) {
kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
if (kip->nused < ninsns) {
int i;
@@ -118,8 +118,8 @@ kprobe_opcode_t
kfree(kip);
return NULL;
}
- INIT_HLIST_NODE(&kip->hlist);
- hlist_add_head(&kip->hlist, &pages->list);
+ INIT_LIST_HEAD(&kip->hlist);
+ list_add(&kip->hlist, &pages->list);
memset(kip->slot_used, 0, ninsns);
kip->slot_used[0] = 1;
kip->nused = 1;
@@ -130,10 +130,10 @@ void __kprobes __free_insn_slot(struct k
kprobe_opcode_t * slot)
{
struct kprobe_insn_page *kip;
- struct hlist_node *pos;
+ struct list_head *pos;
int ninsns = INSNS_PER_PAGE(pages->insn_size);

- hlist_for_each(pos, &pages->list) {
+ list_for_each(pos, &pages->list) {
kip = hlist_entry(pos, struct kprobe_insn_page, hlist);
if (kip->insns <= slot &&
slot < kip->insns + (ninsns * pages->insn_size)) {
@@ -147,10 +147,10 @@ void __kprobes __free_insn_slot(struct k
* so as not to have to set it up again the
* next time somebody inserts a probe.
*/
- hlist_del(&kip->hlist);
- if (hlist_empty(&pages->list)) {
- INIT_HLIST_NODE(&kip->hlist);
- hlist_add_head(&kip->hlist,
+ list_del(&kip->hlist);
+ if (list_empty(&pages->list)) {
+ INIT_LIST_HEAD(&kip->hlist);
+ list_add(&kip->hlist,
&pages->list);
} else {
module_free(NULL, kip->insns);
@@ -192,12 +192,11 @@ static inline void reset_kprobe_instance
*/
struct kprobe __kprobes *get_kprobe(void *addr)
{
- struct hlist_head *head;
- struct hlist_node *node;
+ struct list_head *head;
struct kprobe *p;

head = &kprobe_table[hash_ptr(addr, KPROBE_HASH_BITS)];
- hlist_for_each_entry_rcu(p, node, head, hlist) {
+ list_for_each_entry_rcu(p, head, hlist) {
if (p->addr == addr)
return p;
}
@@ -283,9 +282,9 @@ void __kprobes kprobes_inc_nmissed_count
/* Called with kretprobe_lock held */
struct kretprobe_instance __kprobes *get_free_rp_inst(struct kretprobe *rp)
{
- struct hlist_node *node;
struct kretprobe_instance *ri;
- hlist_for_each_entry(ri, node, &rp->free_instances, uflist)
+
+ list_for_each_entry(ri, &rp->free_instances, uflist)
return ri;
return NULL;
}
@@ -294,9 +293,9 @@ struct kretprobe_instance __kprobes *get
static struct kretprobe_instance __kprobes *get_used_rp_inst(struct kretprobe
*rp)
{
- struct hlist_node *node;
struct kretprobe_instance *ri;
- hlist_for_each_entry(ri, node, &rp->used_instances, uflist)
+
+ list_for_each_entry(ri, &rp->used_instances, uflist)
return ri;
return NULL;
}
@@ -308,35 +307,35 @@ void __kprobes add_rp_inst(struct kretpr
* Remove rp inst off the free list -
* Add it back when probed function returns
*/
- hlist_del(&ri->uflist);
+ list_del(&ri->uflist);

/* Add rp inst onto table */
- INIT_HLIST_NODE(&ri->hlist);
- hlist_add_head(&ri->hlist,
+ INIT_LIST_HEAD(&ri->hlist);
+ list_add(&ri->hlist,
&kretprobe_inst_table[hash_ptr(ri->task, KPROBE_HASH_BITS)]);

/* Also add this rp inst to the used list. */
- INIT_HLIST_NODE(&ri->uflist);
- hlist_add_head(&ri->uflist, &ri->rp->used_instances);
+ INIT_LIST_HEAD(&ri->uflist);
+ list_add(&ri->uflist, &ri->rp->used_instances);
}

/* Called with kretprobe_lock held */
void __kprobes recycle_rp_inst(struct kretprobe_instance *ri)
{
/* remove rp inst off the rprobe_inst_table */
- hlist_del(&ri->hlist);
+ list_del(&ri->hlist);
if (ri->rp) {
/* remove rp inst off the used list */
- hlist_del(&ri->uflist);
+ list_del(&ri->uflist);
/* put rp inst back onto the free list */
- INIT_HLIST_NODE(&ri->uflist);
- hlist_add_head(&ri->uflist, &ri->rp->free_instances);
+ INIT_LIST_HEAD(&ri->uflist);
+ list_add(&ri->uflist, &ri->rp->free_instances);
} else
/* Unregistering */
kfree(ri);
}

-struct hlist_head __kprobes *kretprobe_inst_table_head(struct task_struct *tsk)
+struct list_head __kprobes *kretprobe_inst_table_head(struct task_struct *tsk)
{
return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
}
@@ -349,14 +348,13 @@ struct hlist_head __kprobes *kretprobe_i
*/
void __kprobes kprobe_flush_task(struct task_struct *tk)
{
- struct kretprobe_instance *ri;
- struct hlist_head *head;
- struct hlist_node *node, *tmp;
+ struct kretprobe_instance *ri, *tmp;
+ struct list_head *head;
unsigned long flags = 0;

spin_lock_irqsave(&kretprobe_lock, flags);
head = kretprobe_inst_table_head(tk);
- hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+ list_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task == tk)
recycle_rp_inst(ri);
}
@@ -367,7 +365,7 @@ static inline void free_rp_inst(struct k
{
struct kretprobe_instance *ri;
while ((ri = get_free_rp_inst(rp)) != NULL) {
- hlist_del(&ri->uflist);
+ list_del(&ri->uflist);
kfree(ri);
}
}
@@ -416,7 +414,7 @@ static inline void add_aggr_kprobe(struc
INIT_LIST_HEAD(&ap->list);
list_add_rcu(&p->list, &ap->list);

- hlist_replace_rcu(&p->hlist, &ap->hlist);
+ list_replace_rcu(&p->hlist, &ap->hlist);
}

/*
@@ -499,8 +497,8 @@ static int __kprobes __register_kprobe(s
if ((ret = arch_prepare_kprobe(p)) != 0)
goto out;

- INIT_HLIST_NODE(&p->hlist);
- hlist_add_head_rcu(&p->hlist,
+ INIT_LIST_HEAD(&p->hlist);
+ list_add_rcu(&p->hlist,
&kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);

arch_arm_kprobe(p);
@@ -551,7 +549,7 @@ valid_p:
(p->list.prev == &old_p->list))) {
/* Only probe on the hash list */
arch_disarm_kprobe(p);
- hlist_del_rcu(&old_p->hlist);
+ list_del_rcu(&old_p->hlist);
cleanup_p = 1;
} else {
list_del_rcu(&p->list);
@@ -632,16 +630,16 @@ int __kprobes register_kretprobe(struct
rp->maxactive = NR_CPUS;
#endif
}
- INIT_HLIST_HEAD(&rp->used_instances);
- INIT_HLIST_HEAD(&rp->free_instances);
+ INIT_LIST_HEAD(&rp->used_instances);
+ INIT_LIST_HEAD(&rp->free_instances);
for (i = 0; i < rp->maxactive; i++) {
inst = kmalloc(sizeof(struct kretprobe_instance), GFP_KERNEL);
if (inst == NULL) {
free_rp_inst(rp);
return -ENOMEM;
}
- INIT_HLIST_NODE(&inst->uflist);
- hlist_add_head(&inst->uflist, &rp->free_instances);
+ INIT_LIST_HEAD(&inst->uflist);
+ list_add(&inst->uflist, &rp->free_instances);
}

rp->nmissed = 0;
@@ -671,21 +669,21 @@ void __kprobes unregister_kretprobe(stru
spin_lock_irqsave(&kretprobe_lock, flags);
while ((ri = get_used_rp_inst(rp)) != NULL) {
ri->rp = NULL;
- hlist_del(&ri->uflist);
+ list_del(&ri->uflist);
}
spin_unlock_irqrestore(&kretprobe_lock, flags);
free_rp_inst(rp);
}

-static int __init init_kprobes(void)
+int __init init_kprobes(void)
{
int i, err = 0;

/* FIXME allocate the probe table, currently defined statically */
/* initialize all list heads */
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
- INIT_HLIST_HEAD(&kprobe_table[i]);
- INIT_HLIST_HEAD(&kretprobe_inst_table[i]);
+ INIT_LIST_HEAD(&kprobe_table[i]);
+ INIT_LIST_HEAD(&kretprobe_inst_table[i]);
}

err = arch_init_kprobes();
@@ -695,8 +693,6 @@ static int __init init_kprobes(void)
return err;
}

-__initcall(init_kprobes);
-
EXPORT_SYMBOL_GPL(register_kprobe);
EXPORT_SYMBOL_GPL(unregister_kprobe);
EXPORT_SYMBOL_GPL(register_jprobe);

2006-09-16 20:00:26

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

I don't know why you split this into multiple subthreads and instead of
delving further into secondary issues, please let me get back to the
primary issues to put everything a little into perspective.

The foremost issue is still that there is only limited kprobes support.
The way you ignore this and try to make this a non-issue makes it appear
to me rather arrogant, I appreciate it that you want to push technology
forward, but it's rather ignorant how you leave people behind in the dust
who can't keep up, by making it very hard for them to easily get access to
tracing in the kernel.
Since I have a quite good idea of the amount of work needed to implement
second rate kprobes hack, first rate kprobes support and first rate
ltt(ng) support, it's a quite simple decision what I'm going to do. Since
your "incentive" to add kprobes support is not very high, it's more likely
to backfire in making you the jerk denying me easy access to tracing
technologies.

Since my options are right now limited to a static tracer in first place,
most of the issues you mentioned over the various mails become really
moot, e.g. why should I care about the overhead of inactive traces? We can
happily discuss the merits of dynamic tracers forever, but it does _not_
change my current situation, that I have no access to one on some machines
I care about.

The main issue in supporting static tracers are the tracepoints and so far
I haven't seen any convincing proof that the maintainance overhead of
dynamic and static tracepoints has to be significantly different. What you
did is constructing a worst case scenario, which only proves that it's
possible, what it doesn't prove is that there are no measures to prevent
this from happining. This means nobody proved so far that it's not
possible to create and enforce a set of rules to keep the amount and
effect of tracepoints under control.
Let's take your example of a tracepoint in an area of high development
activity, since such development should happen in -mm, it would be no
problem to drop the trace and add it back once development calmed down,
exactly like you would do for a dynamic trace. OTOH it's very well
possible some people might find the trace useful during development.
So the problem here is now that you simply work from the unproven premiss,
that static tracepoints automatically lead to uncontrolled chaos. This
makes a reasonable discussion about managing tracepoints impossible, since
you don't want to support static tracepoints at all.

Ingo, as long as you don't give up this zero tolerance strategy, it
doesn't make much sense to discuss details and I can only hope there are
other people who are more reasonable...

bye, Roman

2006-09-16 21:23:52

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

Ingo Molnar wrote:
> the 3 speedups i did today eliminated 63% of the kprobes overhead in
> this test.
>
> this too shows that there's lots of performance potential even in
> INT3-based kprobes.

So now you're resorting to your uber-talents as a kernel guru to bury
the other side? This attitude, if you permit, I find cowardly and
hypocritical. It renders a huge disservice to kernel developers at
large by making it clear to outside observers that if they do not
curry the favors of key kernel developers or submit material which is
consensual to a given line of thought then they are not welcome.

Keep on at it. The writing is on the wall for those kernel developers
who genuinely wish that outside contributors make an effort in
pushing stuff back into mainline. Keep on at it Ingo. Hack this one
to death. I know of very few people who have your clout or
understanding of the kernel intricacies to match you in such an
arms race.

Go Ingo, Go.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-16 22:56:10

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

On Sat, 16 Sep 2006 22:29:39 +0200
Ingo Molnar <[email protected]> wrote:

> --- linux.orig/arch/i386/kernel/kprobes.c
> +++ linux/arch/i386/kernel/kprobes.c
> @@ -220,7 +220,7 @@ int __kprobes kprobe_handler(struct pt_r
> kcb = get_kprobe_ctlblk();
>
> /* Check we're not actually recursing */
> - if (kprobe_running()) {
> + if (unlikely(kprobe_running())) {
> p = get_kprobe(addr);

This function does two calls to get_kprobe() (in the recurring-trap case)
where only one is needed.

2006-09-16 22:57:50

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

On Sat, 16 Sep 2006 17:44:25 -0400
Karim Yaghmour <[email protected]> wrote:

> So now you're resorting to your uber-talents as a kernel guru to bury
> the other side?

It's hardly rocket science - it appears that nobody has ever bothered.

2006-09-16 22:57:46

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

On Sat, 16 Sep 2006 22:43:42 +0200
Ingo Molnar <[email protected]> wrote:

> --- linux.orig/arch/i386/kernel/kprobes.c
> +++ linux/arch/i386/kernel/kprobes.c
> @@ -354,9 +354,8 @@ no_kprobe:
> */
> fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
> {
> - struct kretprobe_instance *ri = NULL;
> - struct hlist_head *head;
> - struct hlist_node *node, *tmp;
> + struct kretprobe_instance *ri = NULL, *tmp;
> + struct list_head *head;
> unsigned long flags, orig_ret_address = 0;
> unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;

Wanna fix the whitespace wreckage while you're there??

i386's kprobe_handler() appears to forget to reenable preemption in the
if (p->pre_handler && p->pre_handler(p, regs)) case?

2006-09-16 22:59:04

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> I don't know why you split this into multiple subthreads [...]

huh? Maybe because the mail got ... too big?

Ingo

2006-09-16 23:08:36

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Since my options are right now limited to a static tracer in first
> place, [...]

Lets see the equation of the current situation. On one side you want
static tracing but you dont want to implement kprobes on m68k - although
you probably could. On the other side there is the main kernel, which,
if it ever accepted static tracepoints, could probably never get rid of
them.

so, you request the main kernel to accept hundreds of static tracepoints
that would probably never go away, just because you are reluctant at the
moment to implement kprobes? And that only to bridge a temporary period
of time when m68k has no kprobes support yet? Combined with the fact
that m68k was just fine without tracing for 13 years? Did i get that
right?

Ingo

2006-09-16 23:22:44

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> [...] instead of delving further into secondary issues, please let me
> get back to the primary issues [...]

here's a list of some of those "secondary issues" that we were
discussing, and which you opted not to "further dvelve into":

firstly, a factually wrong statement of yours:

> [...] any tracepoints have an maintainance overhead, which is barely
> different between dynamic and static tracing [...]

secondly, a factually wrong statement of yours:

> [...] at the source level you can remove a static tracepoint as easily
> as a dynamic tracepoint, [...]

thirdly, a factually wrong statement of yours:

> [...] It would also add virtually no maintainance overhead [...]

[see the previous mails for the full context on these items.]

Ingo

2006-09-16 23:32:51

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

* Andrew Morton <[email protected]> wrote:

> On Sat, 16 Sep 2006 17:44:25 -0400
> Karim Yaghmour <[email protected]> wrote:
>
> > So now you're resorting to your uber-talents as a kernel guru to
> > bury the other side?
>
> It's hardly rocket science - it appears that nobody has ever bothered.

yeah. Performance of kprobes was never really a big issue, kprobes were
always more than fast enough in my opinion. Would be nice if Mathieu
could try to re-run his kprobes test with these patches applied. I still
havent given up on the hope of convincing the LTT folks that they
shouldnt let their sizable codebase drop on the floor but should attempt
to integrate it with kprobes/systemtap. There's nothing wrong with what
LTT gives to users, it's just the tracing engine itself (the static hook
based component) that i have a conceptual problem with - not with the
rest. Most of the know-how of tracers is in the identification of the
information that should be extracted, its linkup and delivery to
user-space tools.

Ingo

2006-09-16 23:38:42

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

* Andrew Morton <[email protected]> wrote:

> On Sat, 16 Sep 2006 22:43:42 +0200
> Ingo Molnar <[email protected]> wrote:
>
> > --- linux.orig/arch/i386/kernel/kprobes.c
> > +++ linux/arch/i386/kernel/kprobes.c
> > @@ -354,9 +354,8 @@ no_kprobe:
> > */
> > fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
> > {
> > - struct kretprobe_instance *ri = NULL;
> > - struct hlist_head *head;
> > - struct hlist_node *node, *tmp;
> > + struct kretprobe_instance *ri = NULL, *tmp;
> > + struct list_head *head;
> > unsigned long flags, orig_ret_address = 0;
> > unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
>
> Wanna fix the whitespace wreckage while you're there??

will do. If you consider this for -mm then there's some djprobes noise
in the patch [djprobes isnt upstream yet] - it's not completely
sanitized yet. (but it should actually work if applied to upstream -
kprobes and djprobes are disjunct.) Also, i havent tested with
CONFIG_KPROBES turned off, etc. I'll do a clean queue.

> i386's kprobe_handler() appears to forget to reenable preemption in
> the if (p->pre_handler && p->pre_handler(p, regs)) case?

that portion seems a bit tricky - i think what happens is that the
pre_handler() sets stuff up for single-stepping, and then we do this
recursive single-stepping (during which preemption remains disabled),
and _then_ do we re-enable preemption.

Ingo

2006-09-16 23:49:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> > > Third run : same LTTng instrumentation, with a kprobe handler
> > > triggered by each event traced.
> >
> > where exactly did you put the kprobe handler?
>
> ltt_relay_reserve_slot.
>
> See http://ltt.polymtl.ca/svn/tests/kernel/test-kprobes.c to insert
> the kprobe. Tests done on LTTng 0.5.111, on a x86 3GHz with
> hyperthreading.

ok. In what way did you enable LTTng instrumentation? I have 0.5.108
installed, and i'd like to make sure i do everything as you did, to make
the tests comparable. Which kernel config options (default ones?), and
what precise lttcl commands did you use, were they the usual:

lttctl -n trace -d -l /mnt/debugfs/ltt -t /tmp/trace

? What filesystem does /tmp/trace reside on?

Ingo

2006-09-17 01:15:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> Lets see the equation of the current situation. On one side you want
> static tracing but you dont want to implement kprobes on m68k - although
> you probably could.

You would have a point if would it be just about m68k.

> On the other side there is the main kernel, which,
> if it ever accepted static tracepoints, could probably never get rid of
> them.

If they are useful and not hurting anyone, why should we?

bye, Roman

2006-09-17 01:38:37

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

Andrew Morton wrote:
> It's hardly rocket science - it appears that nobody has ever bothered.

Hmm, that's one explanation. The other explanation, which in my view is
the likelier -- but I've been shown wrong before, is that most of those
who went through that code before just didn't have Ingo's insight and
abilities. Which goes to show what can be achieved when "interesting"
ideas are given a hand by those having appropriate insight and
abilities -- and, of course, what fate awaits those other ideas which
are less so fortunate in the eyes of the talented. Praise the lord for
the chosen ones.

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-17 02:23:31

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

Ingo Molnar wrote:
> yeah. Performance of kprobes was never really a big issue, kprobes were
> always more than fast enough in my opinion. Would be nice if Mathieu
> could try to re-run his kprobes test with these patches applied. I still
> havent given up on the hope of convincing the LTT folks that they
> shouldnt let their sizable codebase drop on the floor but should attempt
> to integrate it with kprobes/systemtap. There's nothing wrong with what
> LTT gives to users, it's just the tracing engine itself (the static hook
> based component) that i have a conceptual problem with - not with the
> rest. Most of the know-how of tracers is in the identification of the
> information that should be extracted, its linkup and delivery to
> user-space tools.

You keep claiming that the use of kprobes/stap would be an equivalent
substitute for what ltt needs and as a past maintainer of ltt I'm
telling it isn't, *even* if the speed of such things were *better*
than that allowed by direct inlined static calls. Speed of the method
from event to tracer is but part of the issue and I've never personally
considered it even the most difficult one to solve.

The most difficult issue, and this one is *not* technical, is the
requirement for outside maintenance of event location delimiters --
whether it be by patch or by script. *THAT* is an intractable
problem so long as there is no mechanism for static markup of the
code. If *THAT* is resolved, then what hides behind the marked up
code -- to which you've devoted considerable bandwidth in this
thread -- becomes utterly *irrelevant*. Roman could then have his
direct inline calls, you could have your uber-optimized steriod-
probes, and neither would conflict.

Your contention up to this point has been that direct inline calls
inherently require more markup than dynamically-instertable ones.
And, to me, this is a fundamental flaw in your argument. My
argument, and you have yet to respond to my earlier email where
I assert this, is that what is sufficient for tracing a given set
of events by means of binary editing *that-does-not-require-out-
of-tree-maintenance* can be made to be sufficient for the tracing
of events using direct inlined static calls. The *only* difference
being that binary editing allows further extension of the pool of
events of interest by means of outside specification of additional
interest points. But given that both those groups working on
dynamic-inserters and those directly patching the kernel are
interested very much in the same events, and both claim that they
need some sort of inlined instrumentation, then there is no point
in pitting the hooking mechanisms amongst themselves.

Clearly those working on dynamic inserters would gladly immediately
use any static instrumentation allowing fastest event-to-tracer
time, and clearly those basing their work on a patch for events
would benefit from the ability to dynamically extend their event
pool. Again: the word is *orthogonal*.

You, and many others, claim that out-of-tree maintenance of
dynamically-insertable probe points is much simpler than in-tree
maintenance and you mainly base this on *your* experience of
having to port *your* trace points around. Now, there are two
parts to this:

First, the alluded simplicity of out-of-tree probe points is a
fallacy. Those working on such things came out publicly stating
the opposite -- and a historical note here: many of those that
participated in the inception of some of these projects were
themselves convinced at the onset that once they got their thing
figured out they'd never need any sort of mainline markup; clearly
experience has shown otherwise. Both Frank, which is one of the
major contributors to stap, and Jose, which maintains LKET - an
ltt-equivalent that uses stap to get its events, have actually
said the opposite. Here's from an earlier email by Jose:
> I agree with you here, I think is silly to claim dynamic instrumentation
> as a fix for the "constant maintainace overhead" of static trace point.
> Working on LKET, one of the biggest burdens that we've had is mantainig
> the probe points when something in the kernel changes enough to cause a
> breakage of the dynamic instrumentation. The solution to this is having
> the SystemTap tapsets maintained by the subsystems maintainers so that
> changes in the code can be applied to the dynamic instrumentation as
> well. This of course means that the subsystem maintainer would need to
> maintain two pieces of code instead of one. There are a lot of
> advantages to dynamic vs static instrumentation, but I don't think
> maintainace overhead is one of them.
What more do I need to say?

Second, the fact that *your* experience points to the low
maintainability of static instrumentation does not mean that
this actually readily applies to other possible markup
methods. You've named very specific arguments why *your*
experience leads you to believe in the problematic nature of
static markup, and I've provided you with a proposal that
addresses every single one of the issues you mentioned. Yet,
again, you don't bother giving me feedback on it. So here it
is one more time:

> Here's static markup as it could be implemented:
>
> The plain function:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
>
> ... [lots of code] ...
> }
>
> The function with static markup:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2(); /*T* @here:arg1,arg2,arg3 */
>
> ... [lots of code] ...
> }
>
> The semantics are primitive at this stage, and they could definitely
> benefit from lkml input, but essentially we have a build-time parser
> that goes around the code and automagically does one of two things:
> a) create information for binary editors to use
> b) generate an alternative C file (foo-trace.c) with inlined static
> function calls.
>
> And there might be other possibilities I haven't thought of.
>
> This beats every argument I've seen to date on static instrumentation.
> Namely:
> - It isn't visually offensive: it's a comment.
> - It's not a maintenance drag: outdated comments are not alien.
> - It doesn't use weird function names or caps: it's a comment.
> - There is precedent: kerneldoc.
> And it does preserve most of the key things those who've asked for
> static markup are looking for. Namely:
> - Static instrumentation
> - Mainline maintainability
> - Contextualized variables

If you would care to give your approval to the above, then I
think this thread is over.

Thanks,

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-17 05:01:26

by Ganesan Rajagopal

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

>>>>> Karim Yaghmour <[email protected]> writes:

> And I submit to you an idea which I submitted to Ingo yesterday and have
> not yet received feedback on. Here's static markup as it could be
> implemented:
>
> The plain function:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2();
>
> ... [lots of code] ...
> }
>
> The function with static markup:
> int global_function(int arg1, int arg2, int arg3)
> {
> ... [lots of code] ...
>
> x = func2(); /*T* @here:arg1,arg2,arg3 */
>
> ... [lots of code] ...
> }
>
> The semantics are primitive at this stage, and they could definitely
> benefit from lkml input, but essentially we have a build-time parser
> that goes around the code and automagically does one of two things:
> a) create information for binary editors to use
> b) generate an alternative C file (foo-trace.c) with inlined static
> function calls.

This makes sense to me, when combined with kprobes. I refer to the dtrace
Usenix http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf. They
argue (Section 4.2 Statically-defined Tracing):

"While FBT (Function Boundary Tracing) allows for comprehensive probe
coverage, one must be familar with the kernel implementation to use it
effectively. To have probes with semantic meaning, one must allow probes to
be statically declared in the implementation. The mechanism for implemting
this is typically a macro that expands to a conditional call into a tracing
framework if tracing is enabled. While the probe effect of this mechanism is
small, it is observable: even when disabled, the expanded macro introduces a
load, a compare and a taken branch.

In keeping with our philosophy of zero probe effect when disabled, we have
implemnted a statically defined tracing (SDT) provider by defining a C macro
that expands to a call to a non-existent function with a well-defined prefix
("__dtrace_probe_"). When the kernel linker sees a relocation against a
function with this prefix, it replaces the call instruction with a
no-operation and records the full name of the bogus function along with the
location of the call site. Wehn the SDT provider loads, it queries the
auxiliary structure and creates a probe with a name specified by the
function name. When a SDT probe is enabled, teh no-operation at the call
site is patched to be a call into an SDT-controlled trampoline that
transfers control into DTrace."

--
Ganesan Rajagopal

2006-09-17 05:38:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > > > Third run : same LTTng instrumentation, with a kprobe handler
> > > > triggered by each event traced.
> > >
> > > where exactly did you put the kprobe handler?
> >
> > ltt_relay_reserve_slot.
> >
> > See http://ltt.polymtl.ca/svn/tests/kernel/test-kprobes.c to insert
> > the kprobe. Tests done on LTTng 0.5.111, on a x86 3GHz with
> > hyperthreading.
>
> ok. In what way did you enable LTTng instrumentation? I have 0.5.108
> installed, and i'd like to make sure i do everything as you did, to make
> the tests comparable. Which kernel config options (default ones?), and
> what precise lttcl commands did you use, were they the usual:
>
> lttctl -n trace -d -l /mnt/debugfs/ltt -t /tmp/trace
>
> ? What filesystem does /tmp/trace reside on?
>

I used LTTng 0.5.111 (yes, now with debugfs!) ;).

I ran the tests on a Pentium 4 3 GHz, with hyperthreading enabled. The system
has 1GB of ram. Hard disk : WDC WD1600JD-00H. File system : ext3.

The kernel (2.6.17) is configured with SMP enabled.

Relevant kernel config :

CONFIG_LTT=y
CONFIG_LTT_TRACER=m
CONFIG_LTT_RELAY=m
CONFIG_LTT_ALIGNMENT=y
CONFIG_LTT_HEARTBEAT=y
CONFIG_LTT_HEARTBEAT_EVENT=y
# CONFIG_LTT_SYNTHETIC_TSC is not set
CONFIG_LTT_USERSPACE_GENERIC=y
CONFIG_LTT_NETLINK_CONTROL=m
CONFIG_LTT_STATEDUMP=m
CONFIG_LTT_FACILITY_CORE=y
CONFIG_LTT_FACILITY_FS=y
CONFIG_LTT_FACILITY_FS_DATA=y
CONFIG_LTT_FACILITY_IPC=y
CONFIG_LTT_FACILITY_KERNEL=y
CONFIG_LTT_FACILITY_KERNEL_ARCH=y
# CONFIG_LTT_FACILITY_LOCKING is not set
CONFIG_LTT_FACILITY_MEMORY=y
CONFIG_LTT_FACILITY_NETWORK=y
CONFIG_LTT_FACILITY_NETWORK_IP_INTERFACE=y
CONFIG_LTT_FACILITY_PROCESS=y
CONFIG_LTT_FACILITY_SOCKET=y
CONFIG_LTT_FACILITY_STATEDUMP=y
CONFIG_LTT_FACILITY_TIMER=y
CONFIG_LTT_FACILITY_STACK=y
CONFIG_LTT_PROCESS_STACK=y
CONFIG_LTT_PROCESS_MAX_FUNCTION_STACK=100
CONFIG_LTT_PROCESS_MAX_STACK_LEN=250
CONFIG_LTT_KERNEL_STACK=y
CONFIG_LTT_STACK_SYSCALL=y
CONFIG_LTT_STACK_INTERRUPT=y
CONFIG_LTT_STACK_NMI=y

Huge note : I left CONFIG_LTT_FACILITY_STACK enabled, but THIS IS EXPERIMENTAL.

lttctl commands :

Start tracing :
lttctl -n trace -d -l /mnt/debugfs/ltt -t /tmp/trace1
(note : 0.5.111 uses debugfs, 0.5.108 uses relayfs)

Stop tracing :
lttctl -n trace -R

See http://ltt.polymtl.ca > QUICKSTART for other details (modules to load...)

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-17 08:15:00

by Frederik Deweerdt

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Sat, Sep 16, 2006 at 09:37:45PM +0200, Ingo Molnar wrote:
> --------------->
> Subject: [patch] kprobes: speed INT3 trap handling up on i386
> From: Ingo Molnar <[email protected]>
>
> speed up kprobes trap handling by special-casing kernel-space
> INT3 traps (which do not occur otherwise) and doing a kprobes
> handler check - instead of redirecting over the i386-die-notifier
> chain.
>
Hi Ingo,

Not that it would make any difference to the actual kprobe performance,
but I think that not using the die-notifier chain makes the DIE_INT3
handling in kprobe_exceptions_notify() useless.

Regards,
Frederik

Signed-off-by: Frederik Deweerdt <[email protected]>

diff --git a/arch/i386/kernel/kprobes.c b/arch/i386/kernel/kprobes.c
index afe6505..90787ff 100644
--- a/arch/i386/kernel/kprobes.c
+++ b/arch/i386/kernel/kprobes.c
@@ -652,10 +652,6 @@ int __kprobes kprobe_exceptions_notify(s
return ret;

switch (val) {
- case DIE_INT3:
- if (kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
- break;
case DIE_DEBUG:
if (post_kprobe_handler(args->regs))
ret = NOTIFY_STOP;

2006-09-17 08:50:42

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > On the other side there is the main kernel, which, if it ever
> > accepted static tracepoints, could probably never get rid of them.
>
> If they are useful and not hurting anyone, why should we?

FYI, whether it is true that "they not hurting anyone" is one of those
"secondary issues" that I analyzed in great detail in the emails
yesterday, and which you opted not to "further dvelve into":

Message-ID: <[email protected]>:

' That is a constant kernel maintainance drag that i feel
uncomfortable about. '

Message-ID: <[email protected]>:

'That's far from "virtually no maintainance overhead".'

Message-ID: <[email protected]>:

'static tracepoints are a maintainance problem: once added _they can
not be removed without breaking static tracers_.'

I still very much opine that your claim that static tracepoints are not
hurting anyone is false: they can cause significant maintainance
overhead in the long run that we cannot remove, and these costs
integrate over a long period of time.

We have statements from two people who have /used and hacked/ LTT in
products and have seen LTT's use, indicating that the maintainance
overhead is nonzero and that the combined number of tracepoints in use
by actual customers is much larger than posited in this thread. We also
have LTT proponents disputing that and suggesting that the long-term
maintainance overhead is very low. So even taking my opinion out of the
picture, the picture is far from clear. If we put my opinion back into
the picture: i base it on my first-hand experience with tracers. [**]

so at least to me the rule in such a situation is clear: if we have the
choice between two approaches that are useful in similar ways [*] but
one has a larger flexibility to decrease the total maintainance cost,
then we _must_ pick that one.

This really isnt rocket science, we do such decisions every day. We did
that decision for STREAMS too. (which STREAMS argument you ignored for a
number of times.) STREAMS was a similar situation: people wanted "just a
few unintrusive hooks which you could compile out" for external STREAMS
functionality to hook into.

and unlike STREAMS, in the LTT case it's not the totality of the project
that is being disputed: i only dispute the static tracing aspect of it,
which is a comparatively small (but intrusive) portion of a project that
consists of a 26,000 lines kernel patchset and a large body of userspace
tools.

Ingo

[*] furthermore, dynamic tracing is not only "similarly useful", it is
_more useful_ because it allows alot more than static tracing does.
That's why i analyzed the "secondary issue" of the usefulness of
dynamic tracers: the decision gets easier if one of the technologies
is fundamentally more capable.

[**] Also, just yesterday i tried to merge the 2.6.17 version of the LTT
patchset to 2.6.18, and it created non-trivial rejects left and
right. That is a further objective indicator to me - if something
has low maintainance cost, how come its patchset is so intrusive
that it cannot survive 3 months of kernel development flux? If it's
intrusive, shouldnt we have the fundamental option to shift that
maintainance overhead out of the core kernel, back to the people
that want those features?

2006-09-17 10:04:08

by Mike Galbraith

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

On Sat, 2006-09-16 at 21:59 -0400, Karim Yaghmour wrote:
> Andrew Morton wrote:
> > It's hardly rocket science - it appears that nobody has ever bothered.
>
> Hmm, that's one explanation. The other explanation, which in my view is
> the likelier -- but I've been shown wrong before, is that most of those
> who went through that code before just didn't have Ingo's insight and
> abilities. Which goes to show what can be achieved when "interesting"
> ideas are given a hand by those having appropriate insight and
> abilities -- and, of course, what fate awaits those other ideas which
> are less so fortunate in the eyes of the talented. Praise the lord for
> the chosen ones.

(that bit after the last -- is a steaming pile of nastiness)

I don't understand your reaction. If roles were reversed, would you not
examine the implementation?

-Mike

2006-09-17 14:08:33

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Frederik Deweerdt <[email protected]> wrote:

> On Sat, Sep 16, 2006 at 09:37:45PM +0200, Ingo Molnar wrote:
> > --------------->
> > Subject: [patch] kprobes: speed INT3 trap handling up on i386
> > From: Ingo Molnar <[email protected]>
> >
> > speed up kprobes trap handling by special-casing kernel-space
> > INT3 traps (which do not occur otherwise) and doing a kprobes
> > handler check - instead of redirecting over the i386-die-notifier
> > chain.
> >
> Hi Ingo,
>
> Not that it would make any difference to the actual kprobe
> performance, but I think that not using the die-notifier chain makes
> the DIE_INT3 handling in kprobe_exceptions_notify() useless.

yeah, indeed - i'll add your patch to the kprobes patchset.

Ingo

2006-09-17 14:20:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Ingo Molnar <[email protected]> writes:

> [...]
> firstly, a factually wrong statement of yours:
>
> > [...] any tracepoints have an maintainance overhead, which is barely
> > different between dynamic and static tracing [...]

If one totals the fixup effort required across the programmers who
need to do the work, I would concur with the OP; or if there is a
difference, it is in favour of the static markers. It is unfortunate
that all the talk about maintenance has been almost entirely aloof and
disconnected from empirical examples. It would be much better if we
were able to sketch out plausible designs for static instrumentation
and similar dynamic probes, and carry out gedanken experiments aobut
how they would need to adopt to realistic examples of code drift. It
is not the case that all "maintenance" is alike.

> secondly, a factually wrong statement of yours:
>
> > [...] at the source level you can remove a static tracepoint as easily
> > as a dynamic tracepoint, [...]

It is not hard to imagine commenting out a single line; nor inserting
the equivalent of "#define NDEBUG" at the head of the .c file to
disable them all for the whole compilation unit. The retort that
"this would break the entire tracing system" does not hold water
without far more argument. Missing events do not necessarily a
totally broken system make. (Renamed or changed events may even be
mapped back via a translation layer.) Tracing events need not become
as firmly fixed (unremovable or unchangeable) a user interface as the
syscalls.

> thirdly, a factually wrong statement of yours:
>
> > [...] It would also add virtually no maintainance overhead [...]

Yes, the knife cuts both ways: both cost ongoing effort. The question
is how much; who would do the work; who is better able to do the work;
who (users/developers) receives value from the work. The overall
cost/benefit calculation is far more complicated than pithy lines
about "no maintenance" or its opposite.

As for the possibilities of kprobes performance improvements: bring
them on, they're great. It is however almost certain that, because
reasons like debugging-information imperfection or absence, compiler
optimizations, different deployment scenarios, some un-probable blind
spots would remain kprobes-only probing system.

As for Karim's proposed comment-based markers, I don't have a strong
opinion, not being one whose kernel-side code would be marked up one
way or the other. My intuition suggests that, if the runtime costs of
a dormant static marker are low enough, they should be just compiled
in by default. And if they are compiled in, then by golly, compile
them in honestly and don't hide them. Something like build-time
multilibbing seems like too much effort to trade one eyesore for a
different eyesore. But that's just my opinion, I could be wrong.

- FChE

2006-09-17 15:09:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Frank Ch. Eigler <[email protected]> wrote:

> [...] It would be much better if we were able to sketch out plausible
> designs for static instrumentation and similar dynamic probes, and
> carry out gedanken experiments aobut how they would need to adopt to
> realistic examples of code drift. It is not the case that all
> "maintenance" is alike.

see my previous mail - hopefully that explains my position even clearer.

A number of people have expressed doubts about the all-static model (i'm
amongst them) - and that's all based on actual experience. So there's no
need for Gedanken-experiments, because we've got real-life experiments
:-) A number of people also have expressed that they think an all-static
markup model is the right one - and that's based on experience as well.

Just looking at the opinions objectively and excluding my opinion i'd
say that the most likely model will thus be a _hybrid_ one: some
markups will be static, some will be dynamic.

Whether a tracepoint will be static or dynamic will depend on the 'flux
of changes' in the tracing code and of the code they trace. If tracing
code has a high flux, or the traced code has a high flux, then the
lowest maintainance overhead is to have a dynamic tracepoint. If _both_
the tracing code and the traced code has low flux of changes, then the
lowest maintainance overhead will be a static markup.

Put differently: dynamic markups will turn into static markups if the
code that they handle "cools down". Static markups will turn into
dynamic markups if the code where they reside in gets "too hot" or if
the markups themselves are "too hot".

But one thing is sure: with a static tracer model accepted into the
kernel we are forced to have a comprehensive, always-maintained, full
set of static markups in the tree, for a long time. Dynamic tracers will
still be around, but we wont be able to fully benefit from the more
flexible tracepoint maintainance models they allow, because we'll always
have to carry around the static markups, for the sake of static tracers.
There will likely be periodic friction about how many static markups
there should be in the source: subsystem maintainers will want them out,
static-trace-users will want them in. If a crutial static markup is
removed or damaged then the kernel will regress materially.

Ingo

2006-09-17 15:17:04

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> > If they are useful and not hurting anyone, why should we?
>
> FYI, whether it is true that "they not hurting anyone" is one of those
> "secondary issues" that I analyzed in great detail in the emails
> yesterday, and which you opted not to "further dvelve into":

Ingo, you happily still ignore my primary issues, how serious do you
expect me to take this?

> so at least to me the rule in such a situation is clear: if we have the
> choice between two approaches that are useful in similar ways [*] but
> one has a larger flexibility to decrease the total maintainance cost,
> then we _must_ pick that one.

That would assume the choices are mutually exclusive, which you haven't
proven at all.

To put everything in yet another perspective: We have the kernel full of
security hooks, which are likely more invasive than any trace marker ever
will be. These security hooks are well hated by a few developers, but we
merged them anyway, because they are useful.
So the big question is now, why should it be impossible to create and
merge a well defined set of markers, which can be used by any tracer?

bye, Roman

2006-09-17 15:34:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Ingo, you happily still ignore my primary issues, how serious do you
> expect me to take this?

I did not ignore your new "primary issues", to the contrary. Please read
my replies. To recap, your "primary issues" are:

> The foremost issue is still that there is only limited kprobes
> support.

> The main issue in supporting static tracers are the tracepoints and so
> far I haven't seen any convincing proof that the maintainance overhead
> of dynamic and static tracepoints has to be significantly different.

to both points i (and others) already replied in great detail - please
follow up on them. (I can quote message-IDs if you cannot find them.)

[ Or if it's not these two then let me know if i missed some important
point - it's easy to miss a valid point in a sea of of replies.
For example yesterday i have replied to 7 different issues _you_
raised, partly issues where you have questioned my credibility and
competence, so i felt compelled to reply - but still you replied to
none of those mails, only declaring them "secondary" in a passing
reference. If they were secondary then why did you raise them in the
first place? Or do you summarily concede all those points by not
replying to them? And is there any guarantee that you will reply to
any mails i write to you now? Will you declare them "secondary" too
once the argument appears to turn unfavorable to your position? ]

Ingo

2006-09-17 15:40:01

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Frank Ch. Eigler <[email protected]> wrote:

> As for Karim's proposed comment-based markers, I don't have a strong
> opinion, not being one whose kernel-side code would be marked up one
> way or the other. [...]

What makes the difference isnt just the format of markup (although i
fully agree that the least visually intrusive markup format should be
used for static markers, and the range of possibilities includes
comment-based markers too), but what makes the differen is:

the /guarantee/ of a full (comprehensive) set to /static tracers/

The moment we allow a static tracer into the upstream kernel, we make
that guarantee, implicitly and explicitly. (I've expanded on this line
of argument in the previous few mails, extensively.)

Ingo

2006-09-17 16:03:15

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> > The foremost issue is still that there is only limited kprobes
> > support.
>
> > The main issue in supporting static tracers are the tracepoints and so
> > far I haven't seen any convincing proof that the maintainance overhead
> > of dynamic and static tracepoints has to be significantly different.
>
> to both points i (and others) already replied in great detail - please
> follow up on them. (I can quote message-IDs if you cannot find them.)

What you basically tell me is (rephrased to make it more clear): Implement
kprobes support or fuck off! You make it very clear, that you're unwilling
to support static tracers even to point to make _any_ static trace support
impossible. It's impossible to discuss this with you, because you're
absolutely unwilling to make any concessions. What am I supposed to do
than it's very clear to me, that you don't want to make any compromise
anyway? You leave me _nothing_ to work with, that's the main reason I
leave such things unanswered. AFAICT there is nothing I can do about that
than just repeating what I told you already anyway and you'll continue to
ignore it and I'm sick and tired of it.

bye, Roman

2006-09-17 16:57:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > to both points i (and others) already replied in great detail -
> > please follow up on them. (I can quote message-IDs if you cannot
> > find them.)
>
> What you basically tell me is (rephrased to make it more clear):
> Implement kprobes support or fuck off! [...]

What i am saying (again and again) is: "the other option you suggest is
not acceptable to me because a better solution exists" [for the many
reasons outlined before]. Think about the STREAMS example: there too
_that_ particular approach was rejected, because a better solution
existed. (although it was a _much_ larger body of code that was
rejected)

I'm not "forcing" kprobes on you: you can invent whatever other approach
that solves the problems i and others raised, or you can have your own
separate patchset - this is standard kernel acceptance procedure.
Granted, kprobes is an existing solution with extensive existing
infrastructure, so it's IMO the easiest solution technically, but you
are certainly not 'forced' to do it. You want the feature on your
architecture _without_ kprobes, solve the problems.

> [...] You make it very clear, that you're unwilling to support static
> tracers even to point to make _any_ static trace support impossible.
> It's impossible to discuss this with you, because you're absolutely
> unwilling to make any concessions. [...]

Because we either accept the concept of static tracing or not -
unfortunately there's no meaningful middle ground. I'd love it if there
was some meaningful middle-ground, because then we'd not have this
lengthy discussion at all. But sometimes such situations do happen. Same
was true for STREAMS: the only choice was to either it was accepted or
it was rejected. One cannot get a "little bit pregnant".

The "add some static markups" suggestion is IMO just tactical pretense:
static tracing will only be fully functional once it grows a
comprehensive set of static tracepoints, so once we accept a "little
bit" of static tracing where all the tools are built around a full set
of tracepoints, we've created an expectance to have all of it.

Hence my suggestion: forget static tracing for the LTT engine and
concentrate on dynamic tracepoints with _static markups_. Do you realize
that dynamic tracers can insert _function calls_ into static markups,
today? [and i'm not talking about djprobes here but current existing
SystemTap behavior.]

Ingo

2006-09-17 16:59:39

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

Roman Zippel wrote:
> Hi,
>
> On Sun, 17 Sep 2006, Ingo Molnar wrote:
>
>
>>>The foremost issue is still that there is only limited kprobes
>>>support.
>>
>>>The main issue in supporting static tracers are the tracepoints and so
>>>far I haven't seen any convincing proof that the maintainance overhead
>>>of dynamic and static tracepoints has to be significantly different.

Above, weren't you asking about static vs dynamic trace-*points*, rather
than the implementation of the tracer itself. I think Ingo said that
some "static tracepoints" (eg. annotation) could be acceptable.

>>to both points i (and others) already replied in great detail - please
>>follow up on them. (I can quote message-IDs if you cannot find them.)
>
>
> What you basically tell me is (rephrased to make it more clear): Implement
> kprobes support or fuck off! You make it very clear, that you're unwilling
> to support static tracers even to point to make _any_ static trace support

Now it seems you are talking about compiled vs runtime inserted traces,
which is different. And so far I have to agree with Ingo: dynamic seems
to be better in almost every way. Implementation may be more complex,
but that's never stood in the way of a better solution before, and I
don't think anybody has shown it to be prohibitive ("I won't implement
it" notwithstanding)

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-17 17:15:15

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

* Ingo Molnar ([email protected]) wrote:
>
> * Frank Ch. Eigler <[email protected]> wrote:
>
> > As for Karim's proposed comment-based markers, I don't have a strong
> > opinion, not being one whose kernel-side code would be marked up one
> > way or the other. [...]
>
> What makes the difference isnt just the format of markup (although i
> fully agree that the least visually intrusive markup format should be
> used for static markers, and the range of possibilities includes
> comment-based markers too), but what makes the differen is:
>
> the /guarantee/ of a full (comprehensive) set to /static tracers/
>
> The moment we allow a static tracer into the upstream kernel, we make
> that guarantee, implicitly and explicitly. (I've expanded on this line
> of argument in the previous few mails, extensively.)
>

Ingo, your definition of a static tracer seems to be slightly off from LTTng's
reality in two ways :

First, the kernel tracer supports dynamically loadable "event types", which
makes it quite more flexible than a static tracer that would have to guarantee
a full set of trace points. There is a clear difference between statically
adding instrumentation and statically adding new event types in that forcing a
static set of events would indeed break the user space tools when an event is
added or removed.

Second, the user space analysis tools are built so that they can handle missing
information. So, if they lack things like scheduler change or irq entry/exit
events, they will still show the available information. No "breakage" would
result from a missing probe. Moreover, the LTTV trace analysis tool being
modular and plugin-based, developers can choose to load or not analysis on the
data based on the instrumentation present in the traced kernel.

So there is no guarantee of any full instrumentation set : both instrumentation
and analysis tools are extensible by the users when needed.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-17 17:27:21

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Mon, 18 Sep 2006, Nick Piggin wrote:

> > > > The foremost issue is still that there is only limited kprobes support.
> > >
> > > > The main issue in supporting static tracers are the tracepoints and so
> > > > far I haven't seen any convincing proof that the maintainance overhead
> > > > of dynamic and static tracepoints has to be significantly different.
>
> Above, weren't you asking about static vs dynamic trace-*points*, rather
> than the implementation of the tracer itself. I think Ingo said that
> some "static tracepoints" (eg. annotation) could be acceptable.

No, he made it rather clear, that as far as possible he only wants dynamic
annotations (e.g. via function attributes).

> > What you basically tell me is (rephrased to make it more clear): Implement
> > kprobes support or fuck off! You make it very clear, that you're unwilling
> > to support static tracers even to point to make _any_ static trace support
>
> Now it seems you are talking about compiled vs runtime inserted traces,
> which is different. And so far I have to agree with Ingo: dynamic seems
> to be better in almost every way. Implementation may be more complex,
> but that's never stood in the way of a better solution before, and I
> don't think anybody has shown it to be prohibitive ("I won't implement
> it" notwithstanding)

I don't deny that dynamic tracer are more flexible, but I simply don't
have the resources to implement one. If those who demand I use a dynamic
tracer, would also provide the appropriate funding, it would change the
situation completely, but without that I have to live with the tools
available to me.

bye, Roman

2006-09-17 17:56:52

by Nick Piggin

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Roman Zippel wrote:
> Hi,
>
> On Mon, 18 Sep 2006, Nick Piggin wrote:
>
>
>>Above, weren't you asking about static vs dynamic trace-*points*, rather
>>than the implementation of the tracer itself. I think Ingo said that
>>some "static tracepoints" (eg. annotation) could be acceptable.
>
>
> No, he made it rather clear, that as far as possible he only wants dynamic
> annotations (e.g. via function attributes).

OK we must have him interpreted differently. I won't speak for Ingo,
but he can respond if he likes.

>>Now it seems you are talking about compiled vs runtime inserted traces,
>>which is different. And so far I have to agree with Ingo: dynamic seems
>>to be better in almost every way. Implementation may be more complex,
>>but that's never stood in the way of a better solution before, and I
>>don't think anybody has shown it to be prohibitive ("I won't implement
>>it" notwithstanding)
>
>
> I don't deny that dynamic tracer are more flexible, but I simply don't
> have the resources to implement one. If those who demand I use a dynamic
> tracer, would also provide the appropriate funding, it would change the
> situation completely, but without that I have to live with the tools
> available to me.

You definitely don't have to use a dynamic tracer, nor even implement
one on m68k (that will presumably happen if/when somebody does want a
dynamic tracer enough).

But equally nobody can demand that a feature go into the upstream
kernel. Especially not if there is a more flexible alternative
already available that just requires implementing for their arch.

This shouldn't be surprising, the kernel doesn't have a doctrine of
unlimited choice or merge features because they exist. For example
people wanted pluggable (runtime and/or compile time CPU scheduler
in the kernel. This was rejected (IIRC by Linus, Andrew, Ingo, and
myself). No doubt it would have been useful for a small number of
people but it was decided that it would split testing and development
resources. The STREAMS example is another one.

As an aside, there are quite a number of different types of tracing
things (mostly static, compile out) in the kernel. Everything from
blktrace to various userspace notifiers to lots of /proc/stuff could
be considered a type of static event tracing. I don't know what my
point is other than all these big, disjoint frameworks trying to be
pushed into the kernel. Are there any plans for working some things
together, or is that somebody else's problem?

Nick

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-09-17 17:58:43

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Alan Cox ([email protected]) wrote:
> In addition ideally we want a mechanism that is also sufficient that
> printk can be mangled into so that you can pull all the printk text
> strings _out_ of the kernel and into the debug traces for embedded work.

Hi,

I just implemented a printk instrumentation that logs the printks into LTTng
traces ASAP in order to keep the time causality correct. It can be found in
LTTng 0.5.112.

Regards,

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-17 19:05:52

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Mon, 18 Sep 2006, Nick Piggin wrote:

> But equally nobody can demand that a feature go into the upstream
> kernel. Especially not if there is a more flexible alternative
> already available that just requires implementing for their arch.

I completely agree with you under the condition that these alternatives
were mutually exclusive or conflicting with each other.

> This shouldn't be surprising, the kernel doesn't have a doctrine of
> unlimited choice or merge features because they exist.

Do we have a doctrine which forces us to design a feature in such way
that has to be as difficult as possible to make it available to our users?
In this case it would be very easy to provide some basic functionality via
static tracing and the full functionality via dynamic tracing. Where is
the law that forbids this?

> For example
> people wanted pluggable (runtime and/or compile time CPU scheduler
> in the kernel. This was rejected (IIRC by Linus, Andrew, Ingo, and
> myself). No doubt it would have been useful for a small number of
> people but it was decided that it would split testing and development
> resources. The STREAMS example is another one.

Comparing it to STREAMS is an insult and Ingo should be aware of this. :-(

> As an aside, there are quite a number of different types of tracing
> things (mostly static, compile out) in the kernel. Everything from
> blktrace to various userspace notifiers to lots of /proc/stuff could
> be considered a type of static event tracing. I don't know what my
> point is other than all these big, disjoint frameworks trying to be
> pushed into the kernel. Are there any plans for working some things
> together, or is that somebody else's problem?

All the controversy around static tracing in general and LTT in specific
has prevented this so far...

bye, Roman

2006-09-17 19:32:21

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > [...] I think Ingo said that some "static tracepoints" (eg.
> > annotation) could be acceptable.
>
> No, he made it rather clear, that as far as possible he only wants
> dynamic annotations (e.g. via function attributes).

what you say is totally and utterly nonsensical misrepresentation of
what i have said. I always said: i support in-source annotations too (I
even suggested APIs how to do them), as long as they are not a total
_guaranteed_ set destined for static tracers, i.e. as long as they are
there for the purpose of dynamic tracers. I dont _care_ about static
annotations as long as they are there for dynamic tracers, because they
can be moved into scripts if they cause problems. But static annotations
for static tracers are much, much harder to remove. Please go on and
read my "tracepoint maintainance models" email:

Message-ID: <[email protected]>

Ingo

2006-09-17 19:40:18

[permalink] [raw]

Subject: printk instrumentation with LTTng

* Alan Cox ([email protected]) wrote:
> In addition ideally we want a mechanism that is also sufficient that
> printk can be mangled into so that you can pull all the printk text
> strings _out_ of the kernel and into the debug traces for embedded work.
>

Further on, in LTTng 0.5.113, I added the possibility to trace the location
where the printk happened. Within a huge amount of information, this kind of
data identification can be very useful.

Example of a printk as shown from the text dump of a trace :

kernel.printk_locate: 181.713815470 (/tmp/trace2/cpu_0),
4357, 0, insmod, UNBRANDED, 4234, 0x0, SYSCALL,
{ file = "/home/compudj/repository/tests/kernel/test-printk.c",
function = "init_module", line = 14, address = 0xf88eb000 }

kernel.printk: 181.713817590 (/tmp/trace2/cpu_0),
4357, 0, insmod, UNBRANDED, 4234, 0x0, SYSCALL,
{ loglevel = 0, text = { printk message } }

Regards,

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-17 19:57:05

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> > > [...] I think Ingo said that some "static tracepoints" (eg.
> > > annotation) could be acceptable.
> >
> > No, he made it rather clear, that as far as possible he only wants
> > dynamic annotations (e.g. via function attributes).
>
> what you say is totally and utterly nonsensical misrepresentation of
> what i have said. I always said: i support in-source annotations too (I
> even suggested APIs how to do them),

Some consistency would certainly help:
'my suggested API is not "barely usable" for static tracers but "totally
unusable".'

<[email protected]>

2006-09-17 21:04:58

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> Hi,
>
> On Sun, 17 Sep 2006, Ingo Molnar wrote:
>
> > > > [...] I think Ingo said that some "static tracepoints" (eg.
> > > > annotation) could be acceptable.
> > >
> > > No, he made it rather clear, that as far as possible he only wants
> > > dynamic annotations (e.g. via function attributes).
> >
> > what you say is totally and utterly nonsensical misrepresentation of
> > what i have said. I always said: i support in-source annotations too (I
> > even suggested APIs how to do them),
>
> Some consistency would certainly help: 'my suggested API is not
> "barely usable" for static tracers but "totally unusable".'

I am really sorry that you were able to misunderstand and misrepresent
such a simple sentence. Let me quote the full paragraph of what i said:

> you raise a new point again (without conceding or disputing the point
> we were discussing, which point you snipped from your reply) but i'm
> happy to reply to this new point too: my suggested API is not "barely
> usable" for static tracers but "totally unusable". Did i tell you yet
> that i disagree with the addition of markups for static tracers?

this makes it clear that i disagree with adding static markups for
static tracers - but i of course still agree with static markups for
_dynamic tracers_. The markups would be totally unusable for static
tracers because there is no guarantee for the existence of static
markups _everywhere_: the static markups would come and go, as per the
"tracepoint maintainance model". Do you understand that or should i
explain it in more detail?

Ingo

2006-09-17 21:32:34

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > For example people wanted pluggable (runtime and/or compile time CPU
> > scheduler in the kernel. This was rejected (IIRC by Linus, Andrew,
> > Ingo, and myself). No doubt it would have been useful for a small
> > number of people but it was decided that it would split testing and
> > development resources. The STREAMS example is another one.
>
> Comparing it to STREAMS is an insult and Ingo should be aware of this.
> :-(

so in your opinion Nick's mentioning of STREAMS is an insult too? I
certainly do not understand Nick's example as an insult. Is STREAMS now
a dirty word to you that no-one is allowed to use as an example in
kernel maintanance discussions?

Let me recap how I mentioned STREAMS for the first time: it was simply
the best example i could think of when you asked the following question:

> > Why don't you leave the choice to the users? Why do you constantly
> > make it an exclusive choice? [...]
>
> [...]
>
> the user of course does not care about kernel internal design and
> maintainance issues. Think about the many reasons why STREAMS was
> rejected - users wanted that too. And note that users dont want
> "static tracers" or any design detail of LTT in particular: what they
> want is the _functionality_ of LTT.

(see <[email protected]> for the full context. Tellingly,
that point of mine you have left unreplied too.)

btw., you still have not retracted or corrected your false suggestion
that "concessions" or a "compromise" were possible and you did not
retract or correct your false accusation that i "dont want to make
them":

> It's impossible to discuss this with you, because you're absolutely
> unwilling to make any concessions. What am I supposed to do than it's
> very clear to me, that you don't want to make any compromise anyway?

while, as i explained it before, such a concession simply does not exist
- so i am not in the position to "make such a concession". There are
only two choices in essence: either we accept a generic static tracer,
or we reject it.

(see <[email protected]>)

Ingo

2006-09-17 21:37:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> > Some consistency would certainly help: 'my suggested API is not
> > "barely usable" for static tracers but "totally unusable".'
>
> I am really sorry that you were able to misunderstand and misrepresent
> such a simple sentence.

Considering the context, which is not exactly full of support for static
tracer, I think my understanding was and still is quite correct.
Let's take <[email protected]>, where you suggest converting
as much possible tracepoints to this API, thus excluding a lot of
information from static tracers.

> this makes it clear that i disagree with adding static markups for
> static tracers - but i of course still agree with static markups for
> _dynamic tracers_. The markups would be totally unusable for static
> tracers because there is no guarantee for the existence of static
> markups _everywhere_: the static markups would come and go, as per the
> "tracepoint maintainance model". Do you understand that or should i
> explain it in more detail?

Well, I rather just wait for the real patch, where you can show your
support for all possible users.

bye, Roman

2006-09-17 21:40:52

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Nick Piggin <[email protected]> wrote:

> As an aside, there are quite a number of different types of tracing
> things (mostly static, compile out) in the kernel. Everything from
> blktrace to various userspace notifiers to lots of /proc/stuff could
> be considered a type of static event tracing. I don't know what my
> point is other than all these big, disjoint frameworks trying to be
> pushed into the kernel. Are there any plans for working some things
> together, or is that somebody else's problem?

AFAIK Jens has indicated interest in seeing experiments that would try
to replace BKLTRACE with dynamic tracepoints, so it's being worked on.

but yes, that would be the general idea: to turn all existing ad-hoc
tracing/debugging points in the kernel into static SystemTap markers or
SystemTap scripts.

Ingo

2006-09-17 21:48:08

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > As an aside, there are quite a number of different types of tracing
> > things (mostly static, compile out) in the kernel. Everything from
> > blktrace to various userspace notifiers to lots of /proc/stuff could
> > be considered a type of static event tracing. I don't know what my
> > point is other than all these big, disjoint frameworks trying to be
> > pushed into the kernel. Are there any plans for working some things
> > together, or is that somebody else's problem?
>
> All the controversy around static tracing in general and LTT in
> specific has prevented this so far...

BLKTRACE is a special-purpose tracing facility limited to one subsystem
and written and maintained by the /same/ person (Jens) who maintains
that subsystem. He maintains the subsystem, the tracer and the userspace
tool that extracts the tracer data.

LTT on the other hand is a static tracer that affects _all_ subsystems.
That is a very different situation from a maintainance overhead POV, and
i believe you must know that.

your suggestion that this controversy has prevented consolidation in
this area is baseless and misleading, please correct or retract it.

Ingo

2006-09-17 21:53:36

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Sun, 17 Sep 2006, Ingo Molnar wrote:

> btw., you still have not retracted or corrected your false suggestion
> that "concessions" or a "compromise" were possible and you did not
> retract or correct your false accusation that i "dont want to make
> them":

Sorry, I have nothing to retract and I'm not interesting in playing your
word games. :-(

> > It's impossible to discuss this with you, because you're absolutely
> > unwilling to make any concessions. What am I supposed to do than it's
> > very clear to me, that you don't want to make any compromise anyway?
>
> while, as i explained it before, such a concession simply does not exist
> - so i am not in the position to "make such a concession". There are
> only two choices in essence: either we accept a generic static tracer,
> or we reject it.

Wrong, this is about the minimum support, which can be used by both static
and dynamic tracers.

bye, Roman

2006-09-17 22:21:52

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> > I am really sorry that you were able to misunderstand and
> > misrepresent such a simple sentence.
>
> Considering the context, which is not exactly full of support for
> static tracer, I think my understanding was and still is quite
> correct.

this thought of you is still false. Nick said:

' I think Ingo said that some "static tracepoints" (eg. annotation)
could be acceptable. '

to which you replied:

' No, he made it rather clear, that as far as possible he only wants
dynamic annotations (e.g. via function attributes). '

That "No" word at the beginning of your sentence, by its plain meaning,
falsely questions Nick's correct interpretation of what I said. I ask
you to retract or correct this false statement.

Nick is of course correct: i said before that some static markups could
be acceptable. In fact, i even outlined a possible API for such static
markups in [email protected]. Would I want to reduce the
number of such static markups: of course, not wanting to reduce the
number of subsystem-functionality unrelated source code lines would be
foolish.

> > this makes it clear that i disagree with adding static markups for
> > static tracers - but i of course still agree with static markups for
> > _dynamic tracers_. The markups would be totally unusable for static
> > tracers because there is no guarantee for the existence of static
> > markups _everywhere_: the static markups would come and go, as per
> > the "tracepoint maintainance model". Do you understand that or
> > should i explain it in more detail?
>
> Well, I rather just wait for the real patch, where you can show your
> support for all possible users.

this answer of yours does not rectify the false statement you did.

Your sentence also introduces a new misrepresentation of my intentions:
my intention with partial static markups (which intention i've written
to you about before, so it was known to you when you wrote this
stentence) is not to support "all possible users", but to support
dynamic tracers. Static tracers cannot use static markups that go away
into dynamic tracing scripts.

Ingo

2006-09-17 22:35:19

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Roman Zippel <[email protected]> wrote:

> On Sun, 17 Sep 2006, Ingo Molnar wrote:
>
> > btw., you still have not retracted or corrected your false suggestion
> > that "concessions" or a "compromise" were possible and you did not
> > retract or correct your false accusation that i "dont want to make
> > them":
>
> Sorry, I have nothing to retract and I'm not interesting in playing
> your word games. :-(

you are wrong if you call my asking you to retract your false suggestion
and false accusation a "word game". It is my basic right to point out
misrepresentations, false statements, false accusations and
misinterpretations when i see them. The sentences i pointed out were not
just opinions, they were materially false statements of yours. But you
are of course free to not retract or correct them (or to not dispute my
characterization of them as such).

Ingo

2006-09-18 08:14:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Jes Sorensen wrote:
> Good. So give me concrete examples of those cases that you saw and tell
> me exactly what those people you were working with were attempting to
> achieve.

I don't have all the details at hand, but it included syscalls and
scheduler points amongst others.

> Either ltt had a userbase or it didn't. To say that all its users went
> out and added their own tracepoints is to not know enough about the project
> and so too is it to say that none of its users could actually just use
> it out of the box without modifying it. Now, as an outsider, trying to
> measure how many users were using it without modifying it is like
> trying to figure out how many Linux users there are out there. There's
> a silent majority and there's those that need customization. Guess
> who you've been talking to?

Or maybe people start looking at it not knowing whether the want to
pursue it to the end for their product.

> Strange, come to think of it I don't remember *ever* getting an
> email from you while being the maintainer or seing *any* emails by you
> on the ltt lists -- that's indicative of mindset, namely that you
> personally assumed you knew all about tracing and didn't need us to make
> suggestions to help you AND that you personally never found it relevant
> to contribute back.

There's a word for that: *plonk*

Maybe the code was used to evaluate it as an option, maybe they realized
it wasn't worth using in the end, maybe they decided they could make it
work. Maybe the LTT mailing list had been *dead* for 18 months by the
time? You know, reading C code isn't that hard, and it didn't state
anywhere in the LTT license that one is required to take out a paying
contract with a certain Mr. Yaghmour just to be allowed to compile the
code.

> The semantics are primitive at this stage, and they could definitely
> benefit from lkml input, but essentially we have a build-time parser
> that goes around the code and automagically does one of two things:
> a) create information for binary editors to use
> b) generate an alternative C file (foo-trace.c) with inlined static
> function calls.

You intend to handle inline assembly how? You plan to handle the issue
of debugging the code when the markup is present how?

> And there might be other possibilities I haven't thought of.
>
> This beats every argument I've seen to date on static instrumentation.
> Namely:
> - It isn't visually offensive: it's a comment.
> - It's not a maintenance drag: outdated comments are not alien.
> - It doesn't use weird function names or caps: it's a comment.
> - There is precedent: kerneldoc.
> And it does preserve most of the key things those who've asked for
> static markup are looking for. Namely:
> - Static instrumentation
> - Mainline maintainability
> - Contextualized variables

And it doesn't address the following issues:

a) The static community providing actual evidence that dynamic tracing
is noticably slower.
b) It will not be enabled per default in vendor kernels so in practice
the information will not be available anywhere, only in debug
kernels.
c) The point that we will end up with markups all over the place to
satisfy everybody's needs.

>> The other part is the constantly repeated performance claim, which to
>> this point hasn't been backed up by any hard evidence. If we are to take
>> that argument serious, then I strongly encourage the LTT community to
>> present some real numbers, but until then it can be classified as
>> nothing but FUD.
>
> Hmm... beats me why even the systemtap folks would themselves admit
> to performance limitations.

Everything has performance limitations, you keep running around touting
that static is the only thing thats not a problem. Now show us the
numbers!

>> I shall be the first to point out that kprobes are less than ideal,
>> especially the current ia64 implementation suffers from some tricky
>> limitations, but thats an implementation issue.
>
> Ah, so it's ok for kprobes to have implementation issues, but not ltt.
> Somehow there's this magic thought recurring throughout this thread
> that the limitations of dynamic instrumentation are trivial to fix,
> but those of static instrumentation are unrecoverable. *That* is a
> fallacy if I ever saw one. I'm willing to admit that a combination of
> dynamic editing and static instrumentation is a good balance, but Jes
> please drop this discourse, it's not constructive.

Oh so bringing fact into a discussion is not allowed. Karim, maybe you
should try using some real arguments. What I am saying about the ia64
implementation is that there are limitations but I am also saying they
can be fixed, it's an implementation issue, not a problem with the
concept.

The problems pointed out with LTT are *conceptual*, but of course you
keep ignoring the facts and refusing to provide real numbers.

Says it all really ....

Jes

2006-09-18 08:16:35

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> And about those extra cycles.. according to :
> Documentation/kprobes.txt
> "6. Probe Overhead
>
> On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
> microseconds to process. Specifically, a benchmark that hits the same
> probepoint repeatedly, firing a simple handler each time, reports 1-2
> million hits per second, depending on the architecture. A jprobe or
> return-probe hit typically takes 50-75% longer than a kprobe hit.
> When you have a return probe set on a function, adding a kprobe at
> the entry to that function adds essentially no overhead.
[snip]
> So, 1 microsecond seems more like 1500-2000 cycles to me, not 50.

So call it 2000 cycles, now go measure it in *real* life benchmarks
and not some artificial I call this one syscall that hits the probe
every time in a tight loop, kinda thing.

Show us some *real* numbers please.

Jes

2006-09-18 08:18:45

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Ingo Molnar wrote:
>> yes, location very much matters if someone wants to reproduce the
>> numbers.
>
> Was that really the angle? I'll give you the benefit of the doubt.
> But I'm sure you understand the importance of probe placement
> with regards to impact of performance ...

So now you produce a benchmark, then won't allow someone to reproduce
it ..... do we see a pattern here?

Jes

2006-09-18 08:21:36

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> Mathieu Desnoyers wrote:
>> The bottom line is :
>>
>> LTTng impact on the studied phenomenon : 35% slower
>>
>> LTTng+kprobes impact on the studied phenomenon : 73% slower
>>
>> Therefore, I conclude that on this type of high event rate workload, kprobes
>> doubles the tracer impact on the system.
>
> Amen to that. Hopefully this puts to rest the myth of Mr. Scrub.

If it wasn't because it's so sad, this would be hysterically funny.

Jes

2006-09-18 08:34:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> The bottom line is :
>
> LTTng impact on the studied phenomenon : 35% slower
>
> LTTng+kprobes impact on the studied phenomenon : 73% slower
>
> Therefore, I conclude that on this type of high event rate workload, kprobes
> doubles the tracer impact on the system.

For this specific benchmark, for which we have not seen the code, nor
do we know what system configuration it was run on. Sorry, but even M$'s
sham benchmarks generally tell you which system they used for their
tests.

In addition, some profiling would be interesting so we can see exactly
where things go wrong and fix it. Ingo seems to be doing a good job at
that even without you providing this basic info....

Anyway, despite what Karim likes to claim, this *is* the Linux way!
Things don't get fixed if they are not reported broken and when they
are, whoever is interested in the item will try and fix it. We are not
going to cease Linux kernel development just to please Karim.

The point of this discussion is that the concept of dynamic tracing is
the way to go. If the code isn't 100% there today, then it should be
fixed, thats *not* an excuse to add a lot of cruft based on the wrong
design when we know which path to take. I know it's hard for someone
to accept when he's thrown so much personal time into a project, but as
Ingo keeps saying, there is a lot of value in LTT, the actual markup
isn't the big issue.

Jes

2006-09-18 08:44:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Roman Zippel wrote:
> Hi,
>
> On Mon, 18 Sep 2006, Nick Piggin wrote:
>
>> But equally nobody can demand that a feature go into the upstream
>> kernel. Especially not if there is a more flexible alternative
>> already available that just requires implementing for their arch.
>
> I completely agree with you under the condition that these alternatives
> were mutually exclusive or conflicting with each other.

Roman,

I don't get this, you are arguing that we should put it in because it
doesn't do any damage. First of all it does, by adding a lot of clutter
all over the place. Second, if we take that argument, then we should
allow anybody to put in anything they want, are you also suggesting we
put devfs back in?

Point is that the Linux kernel gets so many proposals, some are good
some are bad and some while maybe looking like a good idea at the
beginning, show out later to be a bad idea - LTT falls into this
category. *However*, it doesn't mean the knowledge and tools that were
developed with LTT are bad or useless.

To take another related project, look at relayfs. There was so much
noise about it when it was initially pushed, yuck I even remember how it
was suggested that printk should be implemented via relayfs. But look at
it now, there is no fs/relayfs/* these days. The kernel moved on, used
the knowledge optained and provided the feature in a better way -
exactly like it is being proposed to do for trace points, by using
dynamic probes.

Jes

2006-09-18 08:58:12

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Karim Yaghmour wrote:
> It is for some. And please stop repeating the syscall path stuff. It can
> be solved elegantly. The fact that it hasn't up to this point is only an
> excuse to keep working harder on it. There is, in fact, no reason that
> the solution may not just be a combination of static markup and dynamic
> modification.

You just don't want to listen, this is *not* a question of a modifiable
table or not. It's a question of *how* code needs to be added to the
syscall path, we both know why a modifiable table is not going to
happen. How do you plan to handle vdso based syscalls with LTT?

>> In fact, the users who wish to trace data in self-compiled kernels are a
>> tiny subset of the potential userbase for this stuff which is primarily
>> useful to developers .... which in terms makes your argument about debug
>> tracepoints irrelevant since you are turning all the tracepoints into
>> debug tracepoints :)
>
> How many embedded Linux projects did you personally work on?

You know what, I give up. Your primary interest seems to be in attacking
people personally because they didn't start out jumping up and down
clapping their hands in support of your pet project. Even if I wanted to
I couldn't tell you about the number of different projects I have
worked, partly because I can't remember half of them, partly because of
contract limitation, and most importantly because I do not need to
justify my experience to you.

Jes

2006-09-18 13:52:30

by Ananth N Mavinakayanahalli

[permalink] [raw]

Subject: Re: [patch] kprobes: optimize branch placement

On Sun, Sep 17, 2006 at 01:30:38AM +0200, Ingo Molnar wrote:
>
> * Andrew Morton <[email protected]> wrote:
>
> > On Sat, 16 Sep 2006 22:43:42 +0200
> > Ingo Molnar <[email protected]> wrote:
> >
> > > --- linux.orig/arch/i386/kernel/kprobes.c
> > > +++ linux/arch/i386/kernel/kprobes.c
> > > @@ -354,9 +354,8 @@ no_kprobe:
> > > */
> > > fastcall void *__kprobes trampoline_handler(struct pt_regs *regs)
> > > {
> > > - struct kretprobe_instance *ri = NULL;
> > > - struct hlist_head *head;
> > > - struct hlist_node *node, *tmp;
> > > + struct kretprobe_instance *ri = NULL, *tmp;
> > > + struct list_head *head;
> > > unsigned long flags, orig_ret_address = 0;
> > > unsigned long trampoline_address =(unsigned long)&kretprobe_trampoline;
> >
> > Wanna fix the whitespace wreckage while you're there??
>
> will do. If you consider this for -mm then there's some djprobes noise
> in the patch [djprobes isnt upstream yet] - it's not completely
> sanitized yet. (but it should actually work if applied to upstream -
> kprobes and djprobes are disjunct.) Also, i havent tested with
> CONFIG_KPROBES turned off, etc. I'll do a clean queue.

Also, the hlist->list changes need to be taken care of for the other
archs too.

> > i386's kprobe_handler() appears to forget to reenable preemption in
> > the if (p->pre_handler && p->pre_handler(p, regs)) case?
>
> that portion seems a bit tricky - i think what happens is that the
> pre_handler() sets stuff up for single-stepping, and then we do this
> recursive single-stepping (during which preemption remains disabled),
> and _then_ do we re-enable preemption.

Well, that is the jprobes and return probes case. In the case of normal
kprobes, p->pre_handler() should always return 0.

In the case of a jprobe, the setjmp_pre_handler() resets the instruction
pointer to the instrumented routine (same signature as the routine being
jprobed), which later does a jprobe_return(), a placeholder for the
arch-specific trap instruction. We re-enter the kprobe_handler here and
then re-enable preemption via the longjmp_break_handler. As for the
return probe case, since the underlying instruction originally was a nop
(kretprobe_trampoline), we don't need to single-step.

Yes, its a bit convoluted, but we are currently covered for all cases.

Ananth

2006-09-18 14:47:10

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi Jes,

* Jes Sorensen ([email protected]) wrote:
> Everything has performance limitations, you keep running around touting
> that static is the only thing thats not a problem. Now show us the
> numbers!
>

If I may : I showed in a precedent thread that kprobes impact doubled LTTng's
impact on the system. If you are interested in numbers about LTTng, here they
are :

"The LTTng tracer : A Low Impact Performance and Behavior Monitor for GNU/Linux"
(OLS2006)
http://ltt.polymtl.ca/papers/desnoyers-ols2006.pdf

(and for Ingo : I haven't rerun the tests on your modified kprobes, it will
come in time. But I do not really expect that 30-50 cycles compared to 1500
will make a very big difference.)

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 14:57:42

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jes Sorensen <[email protected]> wrote:

> >> tiny subset of the potential userbase for this stuff which is primarily
> >> useful to developers .... which in terms makes your argument about debug
> >> tracepoints irrelevant since you are turning all the tracepoints into
> >> debug tracepoints :)
> >
> > How many embedded Linux projects did you personally work on?
>
> You know what, I give up. Your primary interest seems to be in
> attacking people personally because they didn't start out jumping up
> and down clapping their hands in support of your pet project. [...]

i'm giving up on Karim too. I did apologize to Karim for the mistake i
did in this thread-of-200-mails, but it's revolting to see that Karim
still goes on and attacks top Linux contributors like you, without
looking back, without apologizing for anything and without feeling any
remorse. Karim patronized, attacked and insulted various people dozens
of times in this thread alone. I just dont see any value in trying to
"work with" Karim anymore, because it's apparently not something he is
interested in doing. I feel a bit sorry for him too, because at heart he
must be a deeply lonely person.

( I do see value in working with Mathieu, who has shown lot of insight,
patience, ability in cleaning up the LTT codebase and producing LTTng.
I dont envy him for having to work with Karim though. LTTng still
needs alot of work to be upstream-acceptable but my current impression
is that Mathieu's fundamentally professional approach will be
successful. )

> > How many embedded Linux projects did you personally work on?
> >
> [...] Even if I wanted to I couldn't tell you about the number of
> different projects I have worked, partly because I can't remember half
> of them, partly because of contract limitation, and most importantly
> because I do not need to justify my experience to you.

you dont need to justify your experience to Karim. Your countless
contributions to the Linux kernel speak for themselves. Most tellingly,
his boasting aside, the only embedded-related Linux kernel contribution
i have ever seen from Karim was the 1000-lines relayfs code - and even
that code took years for Tom Zanussi to clean up and to get upstream.
Besides that i have not seen a single line of code from Karim - not a
single patch, not a oneliner fix, nothing. So if someone needs to prove
his experience in embedded Linux matters on this forum then it's Karim.

Ingo

2006-09-18 14:58:51

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jes Sorensen ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> > And about those extra cycles.. according to :
> > Documentation/kprobes.txt
> > "6. Probe Overhead
> >
> > On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
> > microseconds to process. Specifically, a benchmark that hits the same
> > probepoint repeatedly, firing a simple handler each time, reports 1-2
> > million hits per second, depending on the architecture. A jprobe or
> > return-probe hit typically takes 50-75% longer than a kprobe hit.
> > When you have a return probe set on a function, adding a kprobe at
> > the entry to that function adds essentially no overhead.
> [snip]
> > So, 1 microsecond seems more like 1500-2000 cycles to me, not 50.
>
> So call it 2000 cycles, now go measure it in *real* life benchmarks
> and not some artificial I call this one syscall that hits the probe
> every time in a tight loop, kinda thing.
>
> Show us some *real* numbers please.
>

You are late (I don't blame you about it, considering the size of this thread).
It has been posted in the following email :

http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-09/msg04492.html

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 15:06:25

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jes Sorensen ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> > The bottom line is :
> >
> > LTTng impact on the studied phenomenon : 35% slower
> >
> > LTTng+kprobes impact on the studied phenomenon : 73% slower
> >
> > Therefore, I conclude that on this type of high event rate workload, kprobes
> > doubles the tracer impact on the system.
>
> For this specific benchmark, for which we have not seen the code, nor
> do we know what system configuration it was run on. Sorry, but even M$'s
> sham benchmarks generally tell you which system they used for their
> tests.
>
> In addition, some profiling would be interesting so we can see exactly
> where things go wrong and fix it. Ingo seems to be doing a good job at
> that even without you providing this basic info....
>

Hi Jes,

I did not repeat my system configuration from the previous email in the thread
as it seemed redundant. Ingo asked me politely to tell more about my config
and tests, which I have done. Please read on further down this thread to get
that information.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 15:16:35

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Trust me, I don't intend to drag this any longer. I just want to make
sure this issue of "respect" is cleared up.

Ingo Molnar wrote:
> i'm giving up on Karim too. I did apologize to Karim for the mistake i
> did in this thread-of-200-mails, but it's revolting to see that Karim
> still goes on and attacks top Linux contributors like you, without
> looking back, without apologizing for anything and without feeling any
> remorse.

If there exists a cult where top contributors are to be venerated, then
I'm not part of it. If my calling individuals to account on their supposed
expertise on tracing, which they use as justification for continued
marginalization of such related projects, has generated so much backlash,
then it is for me but a sign of how entrenched arrogance can be in some
quarters.

Don't get wrong, I have immense respect for the collective talent of
kernel developers. But no matter how broad collective talent can be, it
cannot be omniscient.

> Karim patronized, attacked and insulted various people dozens
> of times in this thread alone. I just dont see any value in trying to
> "work with" Karim anymore, because it's apparently not something he is
> interested in doing. I feel a bit sorry for him too, because at heart he
> must be a deeply lonely person.

Ditto.

> single patch, not a oneliner fix, nothing. So if someone needs to prove
> his experience in embedded Linux matters on this forum then it's Karim.

http://www.oreilly.com/catalog/belinuxsys/

Karim
--
President / Opersys Inc.
Embedded Linux Training and Expertise
http://www.opersys.com / 1.866.677.4546

2006-09-18 15:25:18

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Mathieu Desnoyers <[email protected]> wrote:

> You are late (I don't blame you about it, considering the size of this
> thread). It has been posted in the following email :
>
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-09/msg04492.html

yeah - and i dont think the kprobes overhead is a fundamental thing - i
posted a few kprobes-speedup patches as a reply to your measurements.

Ingo

2006-09-18 16:54:50

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Ingo Molnar ([email protected]) wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > You are late (I don't blame you about it, considering the size of this
> > thread). It has been posted in the following email :
> >
> > http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-09/msg04492.html
>
> yeah - and i dont think the kprobes overhead is a fundamental thing - i
> posted a few kprobes-speedup patches as a reply to your measurements.
>

Hi Ingo,

Yes, and I replied that I really don't think that a few cycles saved here and
there by a predicted branch will change anything significant compared to the
int3 cost. As my test bench is really not that hard to deploy (I have given the
precise instructions to do so), I assume that the burden of the proof is on your
side there.

Anyhow, I prefer to move to a more constructive matter than testing kprobes
branch optimisations.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-18 17:06:30

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

> And it doesn't address the following issues:
>
> a) The static community providing actual evidence that dynamic tracing
> is noticably slower.

...

> Everything has performance limitations, you keep running around touting
> that static is the only thing thats not a problem. Now show us the
> numbers!

When comparing two different approaches to a problem, it is unreasonable
and disingenuous to try to force the onus on the proponents of one
particular approach to do all the benchmarking for both sides. Everybody
has to help try to find the correct solution.

Furthermore, Mathieu already did provide numbers, if you go back and
look.

> The problems pointed out with LTT are *conceptual*, but of course you
> keep ignoring the facts and refusing to provide real numbers.

This is getting very silly, and unnecessarily abusive. Real problems
exist on both sides of the fence, which have been discussed ad nauseam.
If you don't recall them, then go back and read the thread again. The
question is how to strike a comprimise between two different set of
problems, which Ingo and Karim actually seemed to be making progress
on towards the end of the thread.

M.

2006-09-19 12:00:21

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Thu, Sep 14, 2006 at 01:27:18PM +0200, Ingo Molnar wrote:
>
> * Mathieu Desnoyers <[email protected]> wrote:
>
> > Following an advice Christoph gave me this summer, submitting a
> > smaller, easier to review patch should make everybody happier. Here is
> > a stripped down version of LTTng : I removed everything that would
> > make the code review reluctant (especially kernel instrumentation and
> > kernel state dump module). I plan to release this "core" version every
> > few LTTng releases and post it to LKML.
> >
> > Comments and reviews are very welcome.
>
> i have one very fundamental question: why should we do this
> source-intrusive method of adding tracepoints instead of the dynamic,
> unintrusive (and thus zero-overhead) KProbes+SystemTap method?

Coming a little late to this thread because I've been travelling the last
three weeks I'll answer here before wading through hundreds of mails.

I'll categorize tracing methods into a few categories:

a) static and in-inline

These are tracepoints directly in the kernel source, always compiled
in (or under a CONFIG option). We have various ad-hoc tracers of
this type already in the kernel, e.g. blktrace or xfs's ktrace

b) dynamic and in-line (markers)

These are in-line but normally don't do anything in the code except
of maybe adding a nop. We currently don't support this at all.

c) dynamic and out-of-line

These are mainained as external modules or things that need to be
translated to modules. We have various low-level mechanisms to
implement the hooking up of those currently (*probes) but no other
infratsurcture in the kernel to help with those. There's an external
project, systemtap which supports probes like those but has a bunch
of problems:

- it doesn't allow writing scripts in C but only in some odd scripting
language
- it doesn't actually put support code into the kernel tree but keeps
it separate, not allowing to keep probes with the kernel either.
In addition it also needs quite frequent updates because it has to
poke deep into kernel internals by it's nature.

So what's the right way of tracing for us? I'd say a pretty clear all three,
and most importantly we need to have a common infrastrucuture for all of those.

The most important bit we need right now is a reliable framework to transfer
trace data to userspace - one we have that we support a) and a subset of
b) above. LTT might be that missing bit, but I'd need to look at the actual
patches to see if it's suitable. b) is something people have talked about
a lot and we've seen lots of prototypes, in my eyes it's the second priority.

But even after that the way we support c) is very rudimentary - we need
helpers to look at data, put probes at points outside of function entry/
return we needs things like a dwarf parser, an so on.

I think the systemtap approach of the external package is the very last
thing we need. Unlike you said elsewhere having the tracepoints externally
does not eliminitate maintaince overhead - it shifts it to someone else.
Shifting maintaince overhead to someone else is a valid concept in the
linux kernel development, we do this all the time for things we don't care
about. I think it's fundamentally wrong for traces, though. Traces are
very important for debugging complex problems, and I've grown very tired
of maintaining all my ad-hoc scripts. Having them in the kernel tree
or traces static in it's nature inline would allow and force kernel developers
to always keept it uptodate with it's changes.

2006-09-19 12:05:49

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Thu, Sep 14, 2006 at 04:46:21PM -0400, Karim Yaghmour wrote:
> Ideally, though, markers should be self-contained. IOW, the person
> implementing such a marker should not need to edit any other file
> that the one being worked on to add an instrumentation point --
> at least that's the way I think is easiest. What this means is that
> you would be able to add an instrumentation point in the kernel,
> build it, run the tracing and view the trace with your new event
> without any further intervention on any tool, header, or anything
> else.

Just in case my first mail on this subject wasn't clear enough I
completely agree with that statement. complex traces detaches from
the actual sourcecode are an uteer maintaince nightmare and should
be avoided for anything but spontanous debugging. For that case they
are of course imensely useful. Thus we need two forms to specify
probes, and to not make the tracing an utter mess they need to share
as much infrastructure as possible.

2006-09-19 12:08:55

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Thu, Sep 14, 2006 at 01:55:44PM -0700, Martin Bligh wrote:
> 1. They're harder to maintain out of tree.
> 2. they're written in some jibberish awk crap
> 3. They're slower. If you're doing thousands of tracepoints a second,
> into a circular 8GB log buffer, that *does* matter. You want
> to peturb what you're measuring as little as possible.

agreed to all these and I'd like to add:

4. If you merge proper dynamic tracing infrastructure you get static
traces for free. It's just a bunch of macros directly calling
the trace function also used by the dynamic tracing code, maybe
keyed of an enable variable.

2006-09-19 12:29:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

On Fri, Sep 15, 2006 at 09:10:44PM +0200, Roman Zippel wrote:
> Hi,
>
> On Fri, 15 Sep 2006, Ingo Molnar wrote:
>
> > > Ar Gwe, 2006-09-15 am 13:08 -0400, ysgrifennodd Frank Ch. Eigler:
> > > > Alan Cox <[email protected]> writes:
> > > > - where 1000-cycle int3-dispatching overheads too high
> > >
> > > Why are your despatching overheads 1000 cycles ? (and if its due to
> > > int3 why are you using int 3 8))
> >
> > this is being worked on actively: there's the "djprobes" patchset, which
> > includes a simplified disassembler to analyze common target code and can
> > thus insert much faster, call-a-trampoline-function based tracepoints
> > that are just as fast as (or faster than) compile-time, static
> > tracepoints.
>
> Who is going to implement this for every arch?
> Is this now the official party line that only archs, which implement all
> of this, can make use of efficient tracing?

Come on, stop trying to be an asshole. It's always been the case that to
use new functionality you have to add arch code where nessecary.

2006-09-19 13:18:56

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Hi,

On Tue, 19 Sep 2006, Christoph Hellwig wrote:

> > Who is going to implement this for every arch?
> > Is this now the official party line that only archs, which implement all
> > of this, can make use of efficient tracing?
>
> Come on, stop trying to be an asshole. It's always been the case that to
> use new functionality you have to add arch code where nessecary.

On the contrary I'm really trying my best to be reasonable.
If there were no way around implementing kprobes, I would completely agree
with you.

Let's take an item from todo list: TLS support for m68k. This a language
feature becoming more and more important and increasingly difficult to
work around it. Considering the complexities of this feature it will take
quite a bit of the time available to me and somehow I doubt someone will
beat me to it. I'm not complaining about it, I even enjoy hacking on it,
but I also have to take no shit on how I have to spend my time.

Considering this I hope you understand how important kprobes are to me, I
admit it's a nice a feature, but it's far from being essential.

bye, Roman

2006-09-19 15:05:58

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> > Trace event headers are very similar between both LTT and LKET which is
> > good in other to get some synergy between our projects. One thing that
> > LKET has on each trace event that LTT doesn't is the tid and CPU id of
> > each event. We find this extremely useful for post-processing. Also,
> > why have the event_size on every event taken? Why not describe the
> > event during the trace header and remove this redundant information from
> > the event header and save some trace file space.
> >
>
> A standard event header has to have only crucial information, nothing more, or
> it becomes bloated and quickly grow trace size. We decided not to put tid and
> CPU id in the event header because tid is already available with the schedchange
> events at post-processing time and CPU id is already available too, as we have
> per CPU buffers.
>

We still keep the CPU id because LKET still support ASCII tracing which
mixes the output of all the CPUs together. It is still debatable
whether this is a useful feature or not though. If we remove ASCII
event tracing from LKET, we could remove CPU id from the event header as
well.

The tid we still include because LKET supports turning on individual
tracepoints unlike LTT, which if I remember correctly turns on all the
tracepoint that are compiled into the running kernel. Since the user is
free to chose which tracepoints he wants to use for his workload, we can
not guarantee that scheduler tracepoints are going to be available. We
consider taking the tid as one of those absolute minimum pieces of data
required to do meaningful analysis.

We chose to control performance and trace output size by letting users
have control of number of tracepoint he can activate at any given time.
This is important to us since we plan to add many dynamic tracepoints to
different sub-systems (filesystem, device drivers, core kernel
facilities, etc...). Turning on all of these tracepoint at the same
time would slow down the system to much and change the performance
characteristics of the environment being studied.
> The event size is completely unnecessary, but in reality very, very useful to
> authenticate the correspondance between the size of the data recorded by the
> kernel and the size of data the viewer thinks it is reading. Think of it as a
> consistency check between kernel and viewer algorithms.
>

I understand. But if the size of each event is fixed, why would you
expect the data sizes that the tool reports in the trace header for each
event to change over the course of a trace. If the data on the per-CPU
buffers is serialized, a similar authentication could be done using the
timestamp by checking the timestamps of the events before and after the
current event, thus validating the current timestamp as well as the size
offset of the previous event. Just a thought.

-JRS

2006-09-19 15:30:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jose R. Santos ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> >A standard event header has to have only crucial information, nothing
> >more, or
> >it becomes bloated and quickly grow trace size. We decided not to put tid
> >and
> >CPU id in the event header because tid is already available with the
> >schedchange
> >events at post-processing time and CPU id is already available too, as we
> >have
> >per CPU buffers.
> >
>
> We still keep the CPU id because LKET still support ASCII tracing which
> mixes the output of all the CPUs together. It is still debatable
> whether this is a useful feature or not though. If we remove ASCII
> event tracing from LKET, we could remove CPU id from the event header as
> well.
>

How hard would it be to make LKET send its ASCII output to multiple "channels"
(buffers) and then fetch and combine them in user space ? Have a look at lttd
and lttv in the ltt-control package from the LTTng project : it would be
trivial to adapt. In fact, there is already a text dump module available.

> The tid we still include because LKET supports turning on individual
> tracepoints unlike LTT, which if I remember correctly turns on all the
> tracepoint that are compiled into the running kernel. Since the user is
> free to chose which tracepoints he wants to use for his workload, we can
> not guarantee that scheduler tracepoints are going to be available. We
> consider taking the tid as one of those absolute minimum pieces of data
> required to do meaningful analysis.
>

I understand, but it does not have to be included in the bare-boned event
header. We could think of an optional "event context" header that would have its
individual parts enabled or not depending on the events recorded in the trace.
For instance :

With scheduler instrumentation activated :

Event Header | Variable data

Without scheduler instrumentation activated :

Event Header | PID | Variable data

The information about whether or not the optional event context is present in
the trace or not could be saved in the trace header.

This way, we could not add unnecessary data when it is not needed. And
furthermore, this is extensible for other event context information.

> We chose to control performance and trace output size by letting users
> have control of number of tracepoint he can activate at any given time.
> This is important to us since we plan to add many dynamic tracepoints to
> different sub-systems (filesystem, device drivers, core kernel
> facilities, etc...). Turning on all of these tracepoint at the same
> time would slow down the system to much and change the performance
> characteristics of the environment being studied.

Yes, I know that overhead is a big problem with dynamic instrumentation ;) I
think we can find a way to both have an optimal trace format while giving
a dynamic probe based tracer enough context when needed.

> >The event size is completely unnecessary, but in reality very, very useful
> >to
> >authenticate the correspondance between the size of the data recorded by
> >the
> >kernel and the size of data the viewer thinks it is reading. Think of it
> >as a
> >consistency check between kernel and viewer algorithms.
> >
>
> I understand. But if the size of each event is fixed, why would you
> expect the data sizes that the tool reports in the trace header for each
> event to change over the course of a trace. If the data on the per-CPU
> buffers is serialized, a similar authentication could be done using the
> timestamp by checking the timestamps of the events before and after the
> current event, thus validating the current timestamp as well as the size
> offset of the previous event. Just a thought.
>

Yes, but if there is a bug with the timestamp (time going backward because of
problematic event record serialization), it becomes harder to pinpoint the
source of the problem (if it is due to a bug in the variable data serialization
mechanism, a bug in the user space "unserialization" mechanism or a bug in event
serialization within the kernel). LTTng hasn't suffered of this kind of issue
for quite some time, but when under heavy development, those indicators of data
consistency have all proven their usefulness.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-19 16:39:47

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Mathieu Desnoyers wrote:
> > We still keep the CPU id because LKET still support ASCII tracing which
> > mixes the output of all the CPUs together. It is still debatable
> > whether this is a useful feature or not though. If we remove ASCII
> > event tracing from LKET, we could remove CPU id from the event header as
> > well.
> >
>
> How hard would it be to make LKET send its ASCII output to multiple "channels"
> (buffers) and then fetch and combine them in user space ? Have a look at lttd
> and lttv in the ltt-control package from the LTTng project : it would be
> trivial to adapt. In fact, there is already a text dump module available.
>

Actually, ASCII trace should output to multiple channels if we use bulk
mode. The original idea for keeping ASCII trace (this was the original
output mechanism) was that a user may have wanted to look at trace
output information in real-time as it was being printed onto the screen
(which requires merging all the output channels). Again, I question the
usability of this feature and if a user really wanted to look at ASCII
trace data in real time, a better solution would be for the lket-b2a
conversion tool to have a mode were it could print the output of
constantly changing trace buffers to the screen. The ASCII output mode
in LKET is cryptic and having lket-b2a do this would perform better and
produce prettier output while also reducing the trace file output size a
bit.
> > The tid we still include because LKET supports turning on individual
> > tracepoints unlike LTT, which if I remember correctly turns on all the
> > tracepoint that are compiled into the running kernel. Since the user is
> > free to chose which tracepoints he wants to use for his workload, we can
> > not guarantee that scheduler tracepoints are going to be available. We
> > consider taking the tid as one of those absolute minimum pieces of data
> > required to do meaningful analysis.
> >
>
> I understand, but it does not have to be included in the bare-boned event
> header. We could think of an optional "event context" header that would have its
> individual parts enabled or not depending on the events recorded in the trace.
> For instance :
>
> With scheduler instrumentation activated :
>
> Event Header | Variable data
>
> Without scheduler instrumentation activated :
>
> Event Header | PID | Variable data
>
> The information about whether or not the optional event context is present in
> the trace or not could be saved in the trace header.
>
> This way, we could not add unnecessary data when it is not needed. And
> furthermore, this is extensible for other event context information.
>
Thats also a possible and it should not be difficult to implement.
> > We chose to control performance and trace output size by letting users
> > have control of number of tracepoint he can activate at any given time.
> > This is important to us since we plan to add many dynamic tracepoints to
> > different sub-systems (filesystem, device drivers, core kernel
> > facilities, etc...). Turning on all of these tracepoint at the same
> > time would slow down the system to much and change the performance
> > characteristics of the environment being studied.
>
> Yes, I know that overhead is a big problem with dynamic instrumentation ;) I
> think we can find a way to both have an optimal trace format while giving
> a dynamic probe based tracer enough context when needed.
>

Actually, we started doing this six years ago on our internal *static*
trace tool before we started implementing event tracing using
SystemTap. Regardless of whether the tool uses static or dynamic
probes, if the problem only requires 3 tracepoints to figure out, why
would you want to activate 50+ hooks.
>
> > I understand. But if the size of each event is fixed, why would you
> > expect the data sizes that the tool reports in the trace header for each
> > event to change over the course of a trace. If the data on the per-CPU
> > buffers is serialized, a similar authentication could be done using the
> > timestamp by checking the timestamps of the events before and after the
> > current event, thus validating the current timestamp as well as the size
> > offset of the previous event. Just a thought.
> >
>
> Yes, but if there is a bug with the timestamp (time going backward because of
> problematic event record serialization), it becomes harder to pinpoint the
> source of the problem (if it is due to a bug in the variable data serialization
> mechanism, a bug in the user space "unserialization" mechanism or a bug in event
> serialization within the kernel). LTTng hasn't suffered of this kind of issue
> for quite some time, but when under heavy development, those indicators of data
> consistency have all proven their usefulness.
>
>
Look like the example you propose above could also apply to this as
well. You could implement some sort of debug mode to the trace data
that provides extra information useful for debugging the tool. If the
information is really only useful when debugging the trace tool during
development, wouldn't it make sense to have a way to disable debugging
junk as needed?

-JRS

2006-09-19 18:08:09

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

* Jose R. Santos ([email protected]) wrote:
> Look like the example you propose above could also apply to this as
> well. You could implement some sort of debug mode to the trace data
> that provides extra information useful for debugging the tool. If the
> information is really only useful when debugging the trace tool during
> development, wouldn't it make sense to have a way to disable debugging
> junk as needed?
>

You are absolutely right.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2006-09-20 14:18:13

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

Martin Bligh wrote:
>> Everything has performance limitations, you keep running around touting
>> that static is the only thing thats not a problem. Now show us the
>> numbers!
>
> When comparing two different approaches to a problem, it is unreasonable
> and disingenuous to try to force the onus on the proponents of one
> particular approach to do all the benchmarking for both sides. Everybody
> has to help try to find the correct solution.

Martin,

If you have one side of a discussion stating that the other side's
suggestion is useless for performance reasons, then it is IMHO totally
fair for the second side to ask the first side to back up their
statement with facts. If one wants to get a patch into the kernel,
you also get asked for justication, and if you want to get it into
a vendor kernel, a benchmark proving your patch is not causing any
damage is pretty much standard. Fortunately Mathieu also showed that he
was willing to try and do that.

> This is getting very silly, and unnecessarily abusive. Real problems
> exist on both sides of the fence, which have been discussed ad nauseam.
> If you don't recall them, then go back and read the thread again. The
> question is how to strike a comprimise between two different set of
> problems, which Ingo and Karim actually seemed to be making progress
> on towards the end of the thread.

This got very silly and abuse pretty much from the beginning, at the
very point anyone tried to challenge the justification that was
initially presented with the LTT patches. This isn't how Linux works,
if you want to post a patch, you should be ready to accept public
scrutiny of your design and your actual code. Just because something is
your personal pet project doesn't mean it nobody has the right to
challenge it.

Even after Christoph tried to be the neutral middle-man, we had to see
another three follow-ups of 'I must have the last word' postings :(

As I said in my last posting related to this thread, I had had enough,
I haven't even read all the responses to my posting and I doubt I will.
Instead I went back and starting writing code (unrelated and really
evil code, but in a very different way, and trust me it's making me
very grumpy :)

Fortunately, we at least now have a situation where Mathieu has shown he
is interested in being constructive on the issue and is able to work
with Ingo on the static markers, which I'd like to applaud.

I am optimistic a useful solution will come out of it finally, but I
will rather stay out of it at this point.

Jes

2006-09-25 15:24:40

by Chuck Ebbert

[permalink] [raw]

Subject: Re: [PATCH 0/11] LTTng-core (basic tracing infrastructure) 0.5.108

In-Reply-To: <[email protected]>

On Mon, 18 Sep 2006 17:17:13 +0200, Ingo Molnar wrote:

> yeah - and i dont think the kprobes overhead is a fundamental thing - i
> posted a few kprobes-speedup patches as a reply to your measurements.

Where is the source code for the kprobes benchmarks you used?

--
Chuck

2006-09-25 15:47:25