LinuxLists.cc - [GIT PULL v6] hw-breakpoints: Rewrite on top of perf events v6

2009-11-11 13:02:22

by K.Prasad

[permalink] [raw]

Subject: Re: [PATCH 5/7 v6] hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events

On Sun, Nov 08, 2009 at 04:28:59PM +0100, Frederic Weisbecker wrote:

There were a few comments that I posted against version 6 of your
patchset (which happened to cross your version 7 posting...) regarding
the breakpoint interfaces, reservation of register for unpinned events
and such...

By the way, I'm looking at refs/heads/perfevents/hw-breakpoint branch in
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing.git
and hope that's correct/latest?

Some more comments about the ptrace implementation here...

static int ptrace_set_breakpoint_addr(struct task_struct *tsk, int nr,
unsigned long addr)
{
struct perf_event *bp;
struct thread_struct *t = &tsk->thread;

if (!t->ptrace_bps[nr]) {
/*
* Put stub len and type to register (reserve) an inactive but
* correct bp
*/
bp = register_user_hw_breakpoint(addr, HW_BREAKPOINT_LEN_1,
HW_BREAKPOINT_W,
ptrace_triggered, tsk,
false);
..
...
}

Given that a register_user_hw_breakpoint() is done at the time of a
write to DR0-DR3, it would needlessly hold onto the debug register until
the corresponding DR7 bit is allocated while using up one 'pinned' debug
slot. It would be prudent to postpone the breakpoint registration till
DR7 is changed to activate it.

static int ptrace_write_dr7(struct task_struct *tsk, unsigned long data)
{
..
...
/*
* We shoud have at least an inactive breakpoint at this
* slot. It means the user is writing dr7 without having
* written the address register first
*/
if (!bp) {
rc = -EINVAL;
break;
}

I was just about confused...thinking that the above condition would
become true during second_pass, but alas it turns out that you restore
"thread->ptrace_bps[i] = bp" again later.

rc = arch_bp_generic_fields(len, type, &gen_len, &gen_type);
if (rc)
break;

/*
* This is a temporary thing as bp is unregistered/registered
* to simulate modification
*/
bp = modify_user_hw_breakpoint(bp, bp->attr.bp_addr, gen_len,
gen_type, bp->callback,
tsk, true);

modify_user_hw_breakpoint() is called twice (once per pass) and in its
current implementation, it would leave open a window for register
grabbing on two occasions. Another reason to change its implementation
soon...

thread->ptrace_bps[i] = NULL;

Why not remove this line from here...

if (!bp) { /* incorrect bp, or we have a bug in bp API */
rc = -EINVAL;
break;
}
if (IS_ERR(bp)) {
rc = PTR_ERR(bp);
bp = NULL;
break;
}
thread->ptrace_bps[i] = bp;

...and put it here inside a condition "if (second_pass)"?

}
/*
* Make a second pass to free the remaining unused breakpoints
* or to restore the original breakpoints if an error occurred.
*/
if (!second_pass) {
second_pass = 1;
if (rc < 0) {
orig_ret = rc;
data = old_dr7;
}
goto restore;
}
return ((orig_ret < 0) ? orig_ret : rc);
}

Thanks,
K.Prasad

2009-11-12 04:25:14

by K.Prasad

[permalink] [raw]

On Sat, Nov 28, 2009 at 12:37:05AM +0530, K.Prasad wrote:
> I think the register_<> interfaces can become wrappers around functions
> that do the following:
>
> - arch_validate(): Validate request by invoking an arch-dependant
> routine. Proceed if returned valid.
> - arch-specific debugreg availability: Do something like
> if (arch_hw_breakpoint_availabile())
> bp = perf_event_create_kernel_counter();

The current state is settled for Bp Api clients
(perf_event_create_kernel_counter()) and perf clients (perf syscall)
to have the same endpoint, which is the arch_validate() + reg reservation.
This is already what is done. It's just done at the pmu level.

I don't understand your point.

> perf_event_create_kernel_counter()--->arch_install_hw_breakpoint();

But this is what is done, when perf_event_alloc() get the breakpoint
pmu.

> This way, all book-keeping related work (no. of pinned/flexible/per-cpu)
> will be moved to arch-specific files (will be helpful for PPC Book-E
> implementation having two types of debug registers). Every new
> architecture that intends to port to the new hw-breakpoint
> implementation must define their arch_validate(),
> arch_hw_breakpoint_available() and an arch_install_hw_breakpoint(),
> while the hw-breakpoint code will be flexible enough to extend itself to
> each of these archs.

We just need to move reserve_bp_slot() and release_bp_slot() in arch code
then.

But I would prefer to move more efforts in generalizing what can
generalized in the register reservation topic, and have the smallest
possible part in arch code.

> This implementation would be even superior (in terms of extensibility)
> to even the older hw-breakpoint layer implementation (despite it providing
> a working layer for x86 and PPC64).

The older breakpoint API was very tight to x86. It had a single linear
breakpoint refcounting that wasn't handling different natures of breakpoint
registers (different registers between instruction and data).
Neither was this linear refcounting handling the fact a single cpu could
share a single breakpoint register between hardware threads.

So, no I don't think it was providing a working layer for PPC64.

That said the current state of the constraints sucks :) as it is too much
tight to x86 too.

But I want to avoid the trap of moving all the constraint checks
to the arch. If possible, I would like to extract the arch specificity
only.
It's possible that in the case of PPC64, we want a totally different
constraint check, but what about other archs? Do they have very close
needs than x86 or another bunck of tricky constraints? In the former
case, I would prefer to have the current constraints defined as
__weak and let the tricky archs implement their own constraints.
That will only work if the _tricky_ arch are a minority of course.

> > > As pointed out in [email protected] and
> > > [email protected], ptrace requests can a) lose register
> > > slots when modifying the breakpoint addresses and b) new implementation
> > > assumes that every DR7 write to be preceded by a write on DR0-DR3 which
> > > need not be true.
> >
> > The a) case is going to be fixed.
> > But the b) situation must be reported as a user mistake (which is what is
> > done currently): -EINVAL, -EIO or whatever. Enabling a breakpoint without
> > having given an address is a userland bug.
> >
>
> b) need not be a user mistake always (except perhaps the first time). As I
> mentioned here [email protected], DR7 enable/disable
> without a DR0-DR3 write can be done by the user through ptrace for
> optimising the number of write operations (and hence ptrace syscalls).

I really think this is a wrong workflow as the address register
is undefined.

I think this is a user bug. In the current upstream state, I guess
the addr debugreg are initialized to 0. So is this going to
set a breakpoint to 0?

I doubt there are much user app that rely on such buggy behaviour,
but I can probably support that by creating a temporary disabled
breakpoint in this case.

> Consider the following steps which is entirely valid (in mainline ptrace)
> but which would fail if assumed that a DR0-DR3 write precedes a DR7 write:
> i) Set address on DR0
> ii) Enable bits corresponding to DR0 in DR7
> iii) Disable DR0 bits in DR7
> iv) Re-enable DR0 bits in DR7

Agreed. I need to fix this. I thought you were talking about enabling
dr0 in dr7 without having ever set dr0.

Ok I'll fix this case.

Thanks.