LinuxLists.cc - Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests

2009-09-01 06:39:14

Subject: Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests

On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
>
> * K.Prasad <[email protected]> wrote:
>
> > I am not sure if pmus can handle, (or want to handle) all the
> > intricacies involved with the hw-breakpoint layer [...]
>
> Which are those intricacies? It's all rather straightforward
> register scheduling and reservation stuff - which perfcounters
> already solves in a very rich way.
>
> Ingo

While it is quite true that debug register scheduling and reservation
(using exclusive/pinned properties) are possible through the perf's
implementation, breakpoint exception handling and a provision to invoke
user-defined callback require an extension to the existing perf
implementation (which allows only counting and sampling upon an event,
as I presently understand).

Breakpoint exception handling involving tasks such as filtering stray
exceptions (arising out of breakpoint length limitations), user-defined
callback invocation and signal generation are, as I see not in common
with perf-counter's functionality. And on architectures like PPC64 whose
exception behaviour is 'trigger-before-execute' making it difficult to
bring a 'continuous-trigger' behaviour, sufficient interlocking is necessary
with single-step exception (required for a
bkpt_exception-->disable_bp-->single_step-->enable_bp-->invoke_callback+signal
process).

And post integration, in-kernel users like ptrace, kgdb* and xmon*
which hitherto have interacted directly with the debug registers
(through set_debugreg()/set_dabr()) should route their requests through the
perf-layer. It is difficult to imagine ptrace's idempotent requests
(through ptrace_<get><set>_debugreg()) having to pass through perf-layer
(and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
tricks required to synchronise signal generation timing with exception
behaviour (especially on PPC64).
* - Not converted to use hw-breakpoint layer yet

With debugging and performance monitoring being two primary uses of
hw-breakpoints (apart from the many niche uses that one can think of),
it would be prudent to retain the breakpoints as a separate layer
allowing exploitation by applications with either needs than to tightly
integrate with perf-counters.

With plenty of users exploiting the breakpoint layer's debugging
capabilities - like SystemTap http://lwn.net/Articles/343581/
(extensible for user-space), ftrace, ptrace and potentially gdbstub
(http://tinyurl.com/gdbstub-prototype), it is but a sad state to keep
the hw-breakpoint layer waiting in-queue for want of performance
monitoring (through perf-counter exploitation/integration).

Thanks,
K.Prasad

2009-09-01 23:51:39

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests

On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> >
> > * K.Prasad <[email protected]> wrote:
> >
> > > I am not sure if pmus can handle, (or want to handle) all the
> > > intricacies involved with the hw-breakpoint layer [...]
> >
> > Which are those intricacies? It's all rather straightforward
> > register scheduling and reservation stuff - which perfcounters
> > already solves in a very rich way.
> >
> > Ingo
>
> While it is quite true that debug register scheduling and reservation
> (using exclusive/pinned properties) are possible through the perf's
> implementation, breakpoint exception handling and a provision to invoke
> user-defined callback require an extension to the existing perf
> implementation (which allows only counting and sampling upon an event,
> as I presently understand).

Well, not that much actually. The upper (core) layer of the hw bp
should reside and still handle specific breakpoint problems.

Also that doesn't imply a complete zapping of the low level, we
indeed still need to handle things like exception callbacks.

Actually the only part that may roughly shrink is the registers
scheduling. We just won't need to handle anymore tricky things like
per thread virtual debug registers and things like that.

> Breakpoint exception handling involving tasks such as filtering stray
> exceptions (arising out of breakpoint length limitations), user-defined
> callback invocation and signal generation are, as I see not in common
> with perf-counter's functionality. And on architectures like PPC64 whose
> exception behaviour is 'trigger-before-execute' making it difficult to
> bring a 'continuous-trigger' behaviour, sufficient interlocking is necessary
> with single-step exception (required for a
> bkpt_exception-->disable_bp-->single_step-->enable_bp-->invoke_callback+signal
> process).

No really, it's not up to perf to handle such peculiar things. It's still
the role of the bp API (low and high level).

> And post integration, in-kernel users like ptrace, kgdb* and xmon*
> which hitherto have interacted directly with the debug registers
> (through set_debugreg()/set_dabr()) should route their requests through the
> perf-layer. It is difficult to imagine ptrace's idempotent requests
> (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> tricks required to synchronise signal generation timing with exception
> behaviour (especially on PPC64).
> * - Not converted to use hw-breakpoint layer yet

Actually, I see the perf layer here as a middle man between

- the very hardware stuff (dr[0-467]) handling, reading, writing, updating
- the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)

And this middle man can handle so much things on its own that the two above
gets utterly shrinked.

Also the ptrace thing is tricky in itself, and that can't be helped easily.
Because of the direct writing to debug registers done by POKE_USR,
whatever the current breakpoint API with or without perf integration, we still
need subterfuges to carry it.

> With debugging and performance monitoring being two primary uses of
> hw-breakpoints (apart from the many niche uses that one can think of),
> it would be prudent to retain the breakpoints as a separate layer
> allowing exploitation by applications with either needs than to tightly
> integrate with perf-counters.

A lonesome counter would be very limited in itself, we would only
the perf support for breakpoint. Again, the API is still required.

The goal is to have:

1) A factorization of the registers scheduling, of breakpoint
target allocation (task/cpu, etc..., it's all handled by perf)
2) Optimization of registers scheduling
3) New features (period to trigger events, target inheritance, context exclusion
etc...)
4) A schrink of the code

> With plenty of users exploiting the breakpoint layer's debugging
> capabilities - like SystemTap http://lwn.net/Articles/343581/
> (extensible for user-space), ftrace, ptrace and potentially gdbstub
> (http://tinyurl.com/gdbstub-prototype), it is but a sad state to keep
> the hw-breakpoint layer waiting in-queue for want of performance
> monitoring (through perf-counter exploitation/integration).

I first felt the idea of a perf based design suspicious. Because it appeared
to be a real overkill.
But actually after more thoughts about it, it could really simplify, factorize,
and enhance this API.

I'm currently trying to do something. A quick draft just to see where
we can go with it, how could look like such a beast...

2009-09-03 18:28:16

by K.Prasad

[permalink] [raw]

Subject: Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests

On Wed, Sep 02, 2009 at 01:51:33AM +0200, Frederic Weisbecker wrote:
> On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> > On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> > >
> > > * K.Prasad <[email protected]> wrote:
> > >
> > > > I am not sure if pmus can handle, (or want to handle) all the
> > > > intricacies involved with the hw-breakpoint layer [...]
> > >
> > > Which are those intricacies? It's all rather straightforward
> > > register scheduling and reservation stuff - which perfcounters
> > > already solves in a very rich way.
> > >
> > > Ingo
> >
[edited]
> > And post integration, in-kernel users like ptrace, kgdb* and xmon*
> > which hitherto have interacted directly with the debug registers
> > (through set_debugreg()/set_dabr()) should route their requests through the
> > perf-layer. It is difficult to imagine ptrace's idempotent requests
> > (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> > (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> > tricks required to synchronise signal generation timing with exception
> > behaviour (especially on PPC64).
> > * - Not converted to use hw-breakpoint layer yet
>
>
> Actually, I see the perf layer here as a middle man between
>
> - the very hardware stuff (dr[0-467]) handling, reading, writing, updating
> - the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)
>
> And this middle man can handle so much things on its own that the two above
> gets utterly shrinked.
>
> Also the ptrace thing is tricky in itself, and that can't be helped easily.
> Because of the direct writing to debug registers done by POKE_USR,
> whatever the current breakpoint API with or without perf integration, we still
> need subterfuges to carry it.
>

The reverse-dependancy this would create over perf (CONFIG_PERF) for the
hw-breakpoint layer is an undesirable side-effect, and gives rise to
atleast two immediate questions:

- Handling of requests for hw-breakpoint from users like ptrace when
CONFIG_PERF is not turned on
- Managing 'register scheduling and reservation' on architectures where
perf layer isn't ported. An inefficient way of handling this would be
to retain the existing register allocation code of hw-breakpoint for
such architectures - thereby artificially imposing arch-specific code
into generic stuff.

A solution here would be to detach parts of perf layer's code that
handle register scheduling and reservation (which I learn are in
kernel/perf_counter.c) into a separate entity (outside the ambit of
CONFIG_PERF) that can serve the needs of both hw-breakpoint and perf
thereby eliminating the two issues enumerated above.

The tight coupling between the functions that perform register
scheduling (in kernel/perf_counter.c) and perf's data structures is quite
apparent and does suggest non-trivial amount of effort to detach them
into a layer of its own.

However this might be quite necessary in order to balance between a
desire to re-use the 'register scheduling and reservation' code of
perf-layer while not running into issues as above.

This, along with the framework (described in the previous mail) to retain
the hw-breakpoint's APIs + code interacting with debug registers
(including exception handling) would be a good compromise.

Thanks,
K.Prasad

2009-09-03 19:23:19

by Ingo Molnar

[permalink] [raw]

Subject: Re: [Patch 0/1] HW-BKPT: Allow per-cpu kernel-space Hardware Breakpoint requests

* K.Prasad <[email protected]> wrote:

> On Wed, Sep 02, 2009 at 01:51:33AM +0200, Frederic Weisbecker wrote:
> > On Tue, Sep 01, 2009 at 12:08:45PM +0530, K.Prasad wrote:
> > > On Sat, Aug 29, 2009 at 03:41:07PM +0200, Ingo Molnar wrote:
> > > >
> > > > * K.Prasad <[email protected]> wrote:
> > > >
> > > > > I am not sure if pmus can handle, (or want to handle) all the
> > > > > intricacies involved with the hw-breakpoint layer [...]
> > > >
> > > > Which are those intricacies? It's all rather straightforward
> > > > register scheduling and reservation stuff - which perfcounters
> > > > already solves in a very rich way.
> > > >
> > > > Ingo
> > >
> [edited]
> > > And post integration, in-kernel users like ptrace, kgdb* and xmon*
> > > which hitherto have interacted directly with the debug registers
> > > (through set_debugreg()/set_dabr()) should route their requests through the
> > > perf-layer. It is difficult to imagine ptrace's idempotent requests
> > > (through ptrace_<get><set>_debugreg()) having to pass through perf-layer
> > > (and becoming dependant on CONFIG_PERF_COUNTERS), not to mention the
> > > tricks required to synchronise signal generation timing with exception
> > > behaviour (especially on PPC64).
> > > * - Not converted to use hw-breakpoint layer yet
> >
> >
> > Actually, I see the perf layer here as a middle man between
> >
> > - the very hardware stuff (dr[0-467]) handling, reading, writing, updating
> > - the core API (register_kernel_breakpoint(), register_user_breakpoint() etc..)
> >
> > And this middle man can handle so much things on its own that the two above
> > gets utterly shrinked.
> >
> > Also the ptrace thing is tricky in itself, and that can't be helped easily.
> > Because of the direct writing to debug registers done by POKE_USR,
> > whatever the current breakpoint API with or without perf integration, we still
> > need subterfuges to carry it.
> >
>
> The reverse-dependancy this would create over perf (CONFIG_PERF) for the
> hw-breakpoint layer is an undesirable side-effect, and gives rise to
> atleast two immediate questions:
>
> - Handling of requests for hw-breakpoint from users like ptrace when
> CONFIG_PERF is not turned on

This is basically just a build/layering logistics question and it is
solved easily - we could have a library mode for it.

> - Managing 'register scheduling and reservation' on architectures where
> perf layer isn't ported. An inefficient way of handling this would be
> to retain the existing register allocation code of hw-breakpoint for
> such architectures - thereby artificially imposing arch-specific code
> into generic stuff.

Minimally porting perf to enable a hw-breakpoints PMU extension is
very easy in practice. For example on s390 it took just 15 lines of
code:

12310e9: [S390] Enable tick based perf_counter on s390.

arch/s390/Kconfig | 1 +
arch/s390/include/asm/perf_counter.h | 8 ++++++++
tools/perf/perf.h | 6 ++++++
3 files changed, 15 insertions(+), 0 deletions(-)

On FRV it took 38 lines (60% of which are boilerplace copyright
notices), on PARISC 15 lines.

By far the most complexity is in factoring out the hw-breakpoint
code itself - and that has to be done regardless of the register
scheduling model.

> A solution here would be to detach parts of perf layer's code that
> handle register scheduling and reservation (which I learn are in
> kernel/perf_counter.c) into a separate entity (outside the ambit
> of CONFIG_PERF) that can serve the needs of both hw-breakpoint and
> perf thereby eliminating the two issues enumerated above.
>
> The tight coupling between the functions that perform register
> scheduling (in kernel/perf_counter.c) and perf's data structures
> is quite apparent and does suggest non-trivial amount of effort to
> detach them into a layer of its own.
>
> However this might be quite necessary in order to balance between
> a desire to re-use the 'register scheduling and reservation' code
> of perf-layer while not running into issues as above.
>
> This, along with the framework (described in the previous mail) to
> retain the hw-breakpoint's APIs + code interacting with debug
> registers (including exception handling) would be a good
> compromise.

I dont think the librarization is all that complex. It's very much
desired, as we'd reuse an existing piece of infrastructure to
implement another one - this is always good.

Ingo