Support for Intel's last branch recording to ptrace. This gives
debuggers
access to this hardware feature and allows them to show an execution
trace
of the debugged application.
Last branch recording (see section 18.5 in the Intel 64 and IA-32
Architectures Software Developer's Manual) allows taking an execution
trace of the running application without instrumentation. When a branch
is executed, the hardware logs the source and destination address in a
cyclic buffer given to it by the OS.
This can be a great debugging aid. It shows you how exactly you got
where you currently are without requiring you to do lots of single
stepping and rerunning.
This patch manages the various buffers, configures the trace
hardware, disentangles the trace, and provides a user interface via
ptrace. On the high-level design:
- there is one optional trace buffer per thread_struct
- upon a context switch, the trace hardware is reconfigured to either
disable tracing or to use the appropriate buffer for the new task.
- tracing induces ~20% overhead as branch records are sent out on
the bus.
- the hardware collects trace per processor. To disentangle the
traces for different tasks, we use separate buffers and reconfigure
the trace hardware.
- the low-level data layout is configured at cpu initialization time
- different processors use different branch record formats
patch 1/2 contains the kernel changes
patch 2/2 contains changes to the ptrace man pages
So far, we incorporated mostly feedback from Andi Kleen. Is there any
more feedback that needs to be addressed?
regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
On Thu, 29 Nov 2007 08:14:10 -0000
"Metzger, Markus T" <[email protected]> wrote:
> Support for Intel's last branch recording to ptrace. This gives
> debuggers
> access to this hardware feature and allows them to show an execution
> trace
> of the debugged application.
>
> Last branch recording (see section 18.5 in the Intel 64 and IA-32
> Architectures Software Developer's Manual) allows taking an execution
> trace of the running application without instrumentation. When a branch
> is executed, the hardware logs the source and destination address in a
> cyclic buffer given to it by the OS.
>
> This can be a great debugging aid. It shows you how exactly you got
> where you currently are without requiring you to do lots of single
> stepping and rerunning.
>
> This patch manages the various buffers, configures the trace
> hardware, disentangles the trace, and provides a user interface via
> ptrace. On the high-level design:
> - there is one optional trace buffer per thread_struct
> - upon a context switch, the trace hardware is reconfigured to either
> disable tracing or to use the appropriate buffer for the new task.
> - tracing induces ~20% overhead as branch records are sent out on
> the bus.
> - the hardware collects trace per processor. To disentangle the
> traces for different tasks, we use separate buffers and reconfigure
> the trace hardware.
> - the low-level data layout is configured at cpu initialization time
> - different processors use different branch record formats
>
>
> patch 1/2 contains the kernel changes
> patch 2/2 contains changes to the ptrace man pages
>
>
Is there any userspace code avaialble which people can use to play with
this?
How do you envisage it being used in the long term? Do you expect any of
the standard performance tuning tools will be tweaked to understand this
feature and if so which ones?
I'm generally wondering "how will developers be using this in a year or
two's time?"
Please cc Michael Kerrisk <[email protected]> on future versions of
these patches.
The patches were horridly wordwrapped.
Is there any likelihood that any other CPUs do now or will in the future
support any similar feature to this? If so, is an implementation which is
100% contained to arch/x86 appropriate?
>Is there any userspace code avaialble which people can use to play with
>this?
Not yet. We are talking to internal teams regarding gdb support.
>How do you envisage it being used in the long term? Do you
>expect any of
>the standard performance tuning tools will be tweaked to
>understand this
>feature and if so which ones?
I would expect debuggers to use it to show an execution trace of the
debuggee. The ptrace interface targets application debuggers; kernel
debuggers would need a slightly different interface.
This saves you a lot of single-stepping through your code to answer the
question "how exactly did I get here". I used a similar feature on bare
metal XScale to hunt data aborts, which made the task pretty easy.
Performance tools, which would be interested in the PEBS part of it,
would
need to share DS with this debugging feature. When I grep'ed the kernel
to
see who else used DS, I did not find anybody accessing the DS_SAVE_AREA
MSR.
When there are multiple users of DS, we would need to introduce some
means
of managing that resource.
We may extend this patch to add support for reading PEBS using ptrace.
>I'm generally wondering "how will developers be using this in a year or
>two's time?"
Application developers will use it via application debuggers.
Kernel developers will use it via kgdb; a kernel interface needs to be
added to it. This patch provides the hardware access and an application
debugger interface.
>The patches were horridly wordwrapped.
My apologies. Andi already complained and I hoped I got it fixed.
I'm working on using a different email client and maybe a different
email
account.
>Is there any likelihood that any other CPUs do now or will in
>the future
>support any similar feature to this? If so, is an
>implementation which is
>100% contained to arch/x86 appropriate?
I am aware of a trace feature in XScale processors. I think there is
also
something available for ARM, but I don't know details.
If the feature turns out to be really useful, I would, of course, expect
(or at least hope) that other CPU's would provide a similar feature.
Most of the code is arch specific. If other CPU's share the general BTS
layout,
some of the ptrace_bts.c code could be shared.
Since the implementation only supports x86, I think the code should go
into
arch/x86 - at least until other CPU's are supported.
I would hope that the ptrace interface will be shared across
architectures.
regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
On Friday 30 November 2007 10:57:22 Metzger, Markus T wrote:
>
> >Is there any userspace code avaialble which people can use to play with
> >this?
>
> Not yet. We are talking to internal teams regarding gdb support.
But you already have reasonably realistic test code right?
We were burned a few times recently with new interfaces that turned
out to be not really usable from the user perspective.
Ideal situation to avoid that would be that at least one person other than
the patch submitter has successfully written a program using it first.
e.g. I'm still feeling a bit uneasy about that global sysctl
embedded in the interface.
>
> >How do you envisage it being used in the long term? Do you
> >expect any of
> >the standard performance tuning tools will be tweaked to
> >understand this
> >feature and if so which ones?
>
> I would expect debuggers to use it to show an execution trace of the
> debuggee. The ptrace interface targets application debuggers;
application debuggers and possible performance tools. There are certainly
a lot of possibilities from this.
> If the feature turns out to be really useful, I would, of course, expect
> (or at least hope) that other CPU's would provide a similar feature.
I think some others do. But the details are always CPU specific. I doubt
much low level code will be possible to share. But it would be good
if the ptrace interface is generic enough for everybody (I think it was
though)
> Most of the code is arch specific. If other CPU's share the general BTS
> layout,
> some of the ptrace_bts.c code could be shared.
> Since the implementation only supports x86, I think the code should go
> into
> arch/x86 - at least until other CPU's are supported.
Agreed.
-Andi
* Andrew Morton <[email protected]> wrote:
> The patches were horridly wordwrapped.
yep, i already tried to check how well it integrates to x86.git:
http://lkml.org/lkml/2007/11/29/93
the code does not seem to be layered correctly: i'd suggest to read the
discussion between Roland McGrath and Alan Stern on lkml, about kwatch
-> hw_breakpoint, to see how a more general debugging framework
can/should be built. These things shouldnt be limited to user-space
alone, the kernel could probably use hardware tracing even more than
user-space could. (because it's a much harder to debug environment)
Ingo
>yep, i already tried to check how well it integrates to x86.git:
I ported it to scm/linux/kernel/git/x86/linux-2.6-x86.git mm.
I will send out the patch and then look at the below discussion.
>the code does not seem to be layered correctly: i'd suggest to
>read the
>discussion between Roland McGrath and Alan Stern on lkml, about kwatch
>-> hw_breakpoint, to see how a more general debugging framework
>can/should be built. These things shouldnt be limited to user-space
>alone, the kernel could probably use hardware tracing even more than
>user-space could. (because it's a much harder to debug environment)
regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
>> Not yet. We are talking to internal teams regarding gdb support.
>
>But you already have reasonably realistic test code right?
I wrote a small program to talk to ptrace and look at the trace of small
sample programs to test the patch. I do this on P4 32bit and Core2
64bit.
Our debugger team has a prototype implementation for their debugger. But
that will not be available for some time.
I hope that we get gdb support, soon, but that would take a while
if I had to do it.
>e.g. I'm still feeling a bit uneasy about that global sysctl
>embedded in the interface.
I added a ptrace command to query for the maximal buffer size and moved
the macro to ptrace_bts.h.
I am not sure how much flexibility is actually needed, here. I would
expect
the trace to be very useful near its tail, but unmanageable when it gets
too big.
I would rather add flexibility when and where it is needed.
regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
[...]
> Please cc Michael Kerrisk <[email protected]> on future versions of
> these patches.
Yes, please. Buit note that my official address nowadays is
[email protected]
Chers,
Michael
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
On Nov 30, 2007 5:04 PM, Michael Kerrisk <[email protected]> wrote:
> [...]
>
> > Please cc Michael Kerrisk <[email protected]> on future versions of
> > these patches.
>
> Yes, please. Buit note that my official address nowadays is
>
> [email protected]
Ooops! I meant [email protected]!
--
Michael Kerrisk
Maintainer of the Linux man-pages project
http://www.kernel.org/doc/man-pages/
* Metzger, Markus T <[email protected]> wrote:
> >> Not yet. We are talking to internal teams regarding gdb support.
> >
> >But you already have reasonably realistic test code right?
>
> I wrote a small program to talk to ptrace and look at the trace of
> small sample programs to test the patch. I do this on P4 32bit and
> Core2 64bit.
>
> Our debugger team has a prototype implementation for their debugger.
> But that will not be available for some time.
>
> I hope that we get gdb support, soon, but that would take a while if I
> had to do it.
i'm wondering what the main use-case would be then, and what the gdb
folks think about the current API. (Roland?)
Ingo
* Ingo Molnar <[email protected]> wrote:
> > Our debugger team has a prototype implementation for their debugger.
> > But that will not be available for some time.
> >
> > I hope that we get gdb support, soon, but that would take a while if
> > I had to do it.
>
> i'm wondering what the main use-case would be then, and what the gdb
> folks think about the current API. (Roland?)
here's a forwarded mail from Roland about the patch and APIs. (and i
hope that now i can stop playing the middleman, with everyone Cc:-ed :)
------------>
From: Roland McGrath <[email protected]>
Subject: Re: [patch][v2] x86, ptrace: support for branch trace store(BTS)]
Cool. It's been on my list to look into exposing those features
somehow. I hadn't planned on doing it until after the utrace stuff
settles and there is a more coherent interface context in which to do
it.
If they are tackling the MSR hacking and context switch and so forth,
I'd like to see them start out by just adding block-step
(debugctlmsr.btf) with the PTRACE_SINGLEBLOCK interface as ia64 has.
That should lay some of the same groundwork needed here, but is much
simpler.
I am not really in favor of this new ptrace interface. I think they
should look around across arch's and think about sane general-purpose
interfaces for features of this kind that might be built with some
commonality across machines. Also do it in a layered way from
low-level, with something usable for kernel-mode too. The discussion
Alan Stern and I had on LKML that started as kwatch and became
hw_breakpoint is an example of how I would go at this set of features
too.
> Cool. It's been on my list to look into exposing those features
> somehow. I hadn't planned on doing it until after the utrace stuff
> settles and there is a more coherent interface context in which to do
> it.
I'm looking very much forward to utrace. From what I read so far, this is
a much nicer interface.
I would expect that this feature, together with all other ptrace extensions,
would need to be adapted to utrace, once that is in.
> If they are tackling the MSR hacking and context switch and so forth,
> I'd like to see them start out by just adding block-step
> (debugctlmsr.btf) with the PTRACE_SINGLEBLOCK interface as ia64 has.
> That should lay some of the same groundwork needed here, but is much
> simpler.
There seems to be support for block stepping in arch/x86/kernel/step.c,
which is used by kernel/ptrace.c.
This is now another user for the DEBUGCTL MSR; the access needs to be
synchronized. I'll look into it.
> I am not really in favor of this new ptrace interface. I think they
> should look around across arch's and think about sane general-purpose
> interfaces for features of this kind that might be built with some
> commonality across machines.
I looked at the include/asm-*/ptrace.h files and some arch/*/kernel/ptrace.c
files. Most arch's support a few variants of GET<whatever>REGS.
Most implementations simply copy_to_user the kernel structures for the
requested registers.
Sparc64 needs to convert pointer sizes and defines the returned struct
directly in the implementation.
Xtensa provides access to an array of FP regs of varying size. They provide
a ptrace command to query for the size, but otherwise also copy_to_user
the entire array.
I have not found any arch that does anything more fancy than return a single
integer value or an array of registers.
In all cases, the command carries enough information to interpret the result.
In our case, the array we're querying for can be rather big and
typically only some
of the information is interesting. The data we return is inhomogeneous.
The former may be true for register arrays as well, but they are
typically small
enough. The latter would compare to a general GETREGS command that returns
all registers in a self-describing format (that might be an
interesting extension, if
one got tired of yet another GET<new-type-of>REGS command).
Instead of providing the entire array in one command, we introduced commands to
handle that array.
Instead of carrying the information how to interpret the result in the
command itself,
we provide that information directly in the result.
I would argue that this interface may be directly (re)used and
extended by other arch's.
Do you have specific concerns regarding the interface?
> Also do it in a layered way from
> low-level, with something usable for kernel-mode too.
To disable cpl0-filtering should be fairly easy; we would simply clear
the cpl-bit
in the debugctl_mask. This way, you can trace the kernel
part of the application, but you would still debug the application.
You could call the ptrace_bts_ functions directly or we might add a new set of
interface functions that simply forward the request (or the other way round).
To provide a per-cpu trace instead of a per-thread trace would be a
completely new
feature that only has the configuration part in common with our patch.
What did you have in mind when you asked for kernel-mode support?
thanks and regards,
markus.
>There seems to be support for block stepping in arch/x86/kernel/step.c,
>which is used by kernel/ptrace.c.
>
>This is now another user for the DEBUGCTL MSR; the access needs to be
>synchronized. I'll look into it.
I looked into the new block/single stepping support in
arch/x86/kernel/step.c.
It uses a TIF DEBUGCTLMSR and a field unsigned long debugctlmsr in
struct thread_struct.
When they are done stepping, they clear the TIF and the MSR.
Our patch uses a ds_area field in struct thread_struct and two TIF to
mark
the functionality, rather than the resource. We need to access the
DEBUGCTL
and DS_SAVE_AREA MSR's.
I would rewrite our patch to rename the TIF to name the used resource
and
move the code setting the DS_SAVE_AREA MSR to __switch_to_xtra; that
leaves
only the code to take the timestamp in ptrace_bts.c.
The two MSR accesses (or, rather, the two users of the TIF) still
conflict.
When either is done, he clears the TIF bit and thus disables the other
user.
I would introduce a convention:
- each user clears only the debugctl bits he used
- the TIF bit is cleared, if thread_struct.debugctlmsr == 0
- before setting any bit, each user first checks all bits he intends to
set
if any is set already, he bails out with an error
Is there some better way to do this?
thanks and regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
On Monday 03 December 2007 14:53:27 Markus Metzger wrote:
> > Cool. It's been on my list to look into exposing those features
> > somehow. I hadn't planned on doing it until after the utrace stuff
> > settles and there is a more coherent interface context in which to do
> > it.
>
> I'm looking very much forward to utrace. From what I read so far, this is
> a much nicer interface.
Just don't wait for that. utrace doesn't seem to have any concrete
plans to merge any time soon AFAIK[1] and it would be a shame to delay an useful
feature forever.
[1] At least the patches have not reached any mailing lists
> What did you have in mind when you asked for kernel-mode support?
I asked about that earlier too and I would like to see per CPU traces
for ring 0 with some way to dump that on crashes or on user trigger.
-Andi
* Andi Kleen <[email protected]> wrote:
> Just don't wait for that. utrace doesn't seem to have any concrete
> plans to merge any time soon AFAIK[1] and it would be a shame to delay
> an useful feature forever.
>
> [1] At least the patches have not reached any mailing lists
FYI, as far as arch/x86 goes, the merging of Roland's utrace preparatory
patches is well underway in x86.git, and the merge went pretty well so
far, with robust results. It's 49 patches so far:
54 files changed, 2440 insertions(+), 2587 deletions(-)
Ingo
Ingo Molnar <[email protected]> writes:
> * Andi Kleen <[email protected]> wrote:
>
>> Just don't wait for that. utrace doesn't seem to have any concrete
>> plans to merge any time soon AFAIK[1] and it would be a shame to delay
>> an useful feature forever.
>>
>> [1] At least the patches have not reached any mailing lists
>
> FYI, as far as arch/x86 goes, the merging of Roland's utrace preparatory
> patches is well underway in x86.git, and the merge went pretty well so
> far, with robust results. It's 49 patches so far:
I see. They are planning to just skip the public review stage? Clever.
-Andi
On Mon, 3 Dec 2007, Andi Kleen wrote:
> Ingo Molnar <[email protected]> writes:
>
> > * Andi Kleen <[email protected]> wrote:
> >
> >> Just don't wait for that. utrace doesn't seem to have any concrete
> >> plans to merge any time soon AFAIK[1] and it would be a shame to delay
> >> an useful feature forever.
> >>
> >> [1] At least the patches have not reached any mailing lists
> >
> > FYI, as far as arch/x86 goes, the merging of Roland's utrace preparatory
> > patches is well underway in x86.git, and the merge went pretty well so
> > far, with robust results. It's 49 patches so far:
>
> I see. They are planning to just skip the public review stage? Clever.
Andi,
stop that bullshit! Care to read what others write and check out your
LKML folder before barking ?
We got Rolands _PREPARATORY_ patches from LKML.
tglx
* Andi Kleen <[email protected]> wrote:
> Ingo Molnar <[email protected]> writes:
>
> > * Andi Kleen <[email protected]> wrote:
> >
> >> Just don't wait for that. utrace doesn't seem to have any concrete
> >> plans to merge any time soon AFAIK[1] and it would be a shame to delay
> >> an useful feature forever.
> >>
> >> [1] At least the patches have not reached any mailing lists
> >
> > FYI, as far as arch/x86 goes, the merging of Roland's utrace
> > preparatory patches is well underway in x86.git, and the merge went
> > pretty well so far, with robust results. It's 49 patches so far:
>
> I see. They are planning to just skip the public review stage?
> Clever.
Andi, is this some kind of "destroy your years of kernel hacking
credibility within a few days" contest that you are participating in??
What you are doing is getting really silly.
All ptrace cleanup patches from Roland were posted by him to lkml, and
we picked them up from there. Review is ongoing, Roland replied to all
feedback with more patches, and those were integrated as well. Final
upstream merging of this depends on more review and test results (as
usual), but it's looking good so far.
Ingo
> All ptrace cleanup patches from Roland were posted by him to lkml, and
> we picked them up from there. Review is ongoing, Roland replied to all
> feedback with more patches, and those were integrated as well. Final
> upstream merging of this depends on more review and test results (as
> usual), but it's looking good so far.
Yes, clearly I overreacted. I actually saw the ptrace patches, but
somehow didn't connect them with utrace which as I remember
was a much larger patch kit. Sorry for that, Roland.
Anyways I think my original point about not delaying jump ptrace
for utrace holds still though. e.g. Markus' patches are a clear .25
candidates, but for utrace that is probably far too late by now.
-Andi
>-----Original Message-----
>From: Andi Kleen [mailto:[email protected]]
>Sent: Montag, 3. Dezember 2007 17:22
>> What did you have in mind when you asked for kernel-mode support?
>
>I asked about that earlier too and I would like to see per CPU traces
>for ring 0 with some way to dump that on crashes or on user trigger.
I will split it into two layers:
- a lower layer that does the DS/BTS configuration
- a higher layer that provides the ptrace interface
The lower layer works on a parameter DS pointer. Unfortunately,
this will be an unsigned long, since we have different DS/BTS layout
for different architectures.
The higher layer uses the lower layer to do the real work. It stores
a per-thread DS pointer in the thread_struct.
Kernel tracing would allocate a per_cpu array of DS pointers
and use the lower layer for all the real work. I would leave it
to someone who is more familiar with the kgdb patch, if that is OK.
What's left is proper resource managememt. The kernel use of
DS_SAVE_AREA
and DEBUGCTL conflicts with the ptrace use of those MSR's. As far as I
can
tell, this is missing for all MSR's. The single/block_stepping support
certainly uses an optimistic approach.
I guess this is a certain amount of work. It needs the idea of an
'owner'
of such an MSR, some means to acquire and release it, a global
allocation
order, some protection from others who simply try to use it. It needs to
be on register granularity for some; on bit granularity for others. It
needs to be global for some; per-thread for others. And it probably
needs
a lot of error handling code in a lot of different areas.
I would hope that utrace is tackling some or all of these problems.
I would therefore stay optimistic in this patch and rework it once
utrace is in and provides a proper framework.
This seems to be the approach that the new block-stepping support takes.
thanks and regards,
markus.
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.