2007-05-11 15:25:25

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Wed, 28 Mar 2007, Roland McGrath wrote:

> Sorry I've been slow in responding to your most recent version.
> I fell into a large hole and couldn't get out until I fixed some bugs.

Has the same thing happened again? There hasn't been any feedback on the
most recent version of hw_breakpoint emailed on April 13:

http://marc.info/?l=linux-kernel&m=117661223820357&w=2

I think there are probably still a few small things wrong with it. For
instance, the RF setting isn't right; I misunderstood the Intel manual.
It should get set only when the latest debug interrupt was for an
instruction breakpoint.

Alan Stern


2007-05-13 10:40:07

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

Sorry again about the delay.

> I trust we are moving closer to a final, usable form.

Indeed, I think it is getting there.

> I think there are probably still a few small things wrong with it. For
> instance, the RF setting isn't right; I misunderstood the Intel manual.
> It should get set only when the latest debug interrupt was for an
> instruction breakpoint.

This makes me think about RF a little more. If you ever set it, there are
some places we need to clear it too. That is, when the PC is being changed
before returning to user mode, which is in signals and in ptrace. If the
PC is changing to other than the breakpoint location hit by the handler
that set RF, we need clear RF so that the first instruction at the changed
PC can be a breakpoint hit of its own and not get masked. In fact, it may
also be necessary to clear RF when freshly setting a new instruction
breakpoint (when RF is set because the stop was not a debug exception at
all), so that it isn't skipped if the PC happens to be right there already.

> Come to think of it, we don't really need modify_user_hw_breakpoint at
> all. It could be replaced by an {unregister(old); register(new);}
> sequence. Unless you think there's some pressing reason to keep it, my
> inclination is to do away with it.

I sort of wondered from the beginning why it was there. The rationale I
can see is to avoid flutter. That is, when unregistering frees up a slot
for a lower-priority allocation waiting in the wings, and then the new
registration will just displace it again. The priority list diddling is
wasted work to get back to just how it was before, but more importantly you
don't want to have those callbacks for a momentarily-available slot coming
and going. I don't know if this can really come up with the current code.

> Hmm... Maybe I could store a pointer to the DR6 value in args.err instead
> of the value itself...

Ugh.

> As I understand it, setting one of those bits is necessary on the 386 but
> not necessary for later processors. Should this be controlled by a
> runtime (or compile time) check? For that matter, do those bits have any
> effect at all on a Pentium?

I've never heard of anyone using them, but I don't know the full story.

> My Intel manual says that the CPU automatically sets the RF bit in the
> EFLAGS image stored on the stack by the debug exception. Hence the
> handler doesn't have to worry about it. That's why I removed it from the
> existing code.

The documentation I have says that RF is set in the trap frame on the stack
(i.e. pt_regs.eflags) by every other kind of exception. However, for a
debug exception that is due to an instruction breakpoint, RF=0 in the trap
frame and the manual explicitly says that the handler must set the bit so
that iret will resume and execute it rather than hit the breakpoint again.

[later:]
> It also turns out that some CPUs don't automatically set the RF bit in
> the EFLAGS image on the stack. Intel recommends that the OS always set
> that bit whenever a debug exception occurs, so that's what I did.

Is this really "some CPUs"? Or is it actually always as I described above
(i.e. RF set usually but cleared for an instruction breakpoint hit)?

> If callers want to give up when a kernel breakpoint isn't installed
> immediately, all they have to do is check the return value from
> register_kernel_hw_breakpoint and call unregister_kernel_hw_breakpoint.
> If you really want it, I could add an extra "fail if not installed"
> argument flag.

The important thing is that there aren't any difficult races (i.e. what you
get with callbacks). If register with no callback followed by unregister
on seeing "registered but not installed" return value is simple and cheap,
that is fine.

> For user breakpoints, the whole notion is almost meaningless. Even if the
> breakpoint was allocated a debug register initially, it could get
> displaced by the time the debuggee task next runs.

It's no less meaningful than for a kernel allocation. In neither case is
there a guarantee you'll keep it forever. What callers I had in mind want
is a quick answer when the answer is negative at the time of the call, so
they just punt on the complexity of dealing with a positive answer.

> Again, this was referring to existing code which I basically copied
> without fully understanding. Does the new code in do_debug do the right
> thing with regard to TF?

It looks right to me. That is, it preserves the existing behavior for
kernel-mode traps, and does not touch TF at all for user-mode traps.

> > > + /* Block kernel breakpoint updates from other CPUs */
> > > + local_irq_save(flags);
> >
> > I have a feeling this is more costly than we want, though I don't really
> > know. It seems to me that things in struct cpu_hw_breakpoint are not
> > really per-CPU, except for bp_task. They are "current global state",
> > right?
>
> Not really, since changes to the debug registers on multiple CPUs cannot
> be made simultaneously. There will be short periods when different CPUs
> have different debug register values. What if a debug exception occurs
> during one of those periods?

I think it's fine if a CPU getting an exception before it's processed the
IPI looks at changed global state and says "oh, mine was stale", and punts
the hit. (Or perhaps it transmorgifies its apparent DR# based on the new
global state, if the CPU's old setting corresponds to one of the new
settings. Probably the changing of settings can just preserve the old DR#
selection in such cases and simplify the situation for the handler doing
the catch-up to just if (old->dr[n] != new->dr[n]) ignore;.)

> Or what if a task switch occurs?

You mean a context switch before the IPI gets in?
switch_to_thread_hw_breakpoint can just install the latest global state.

> Here's the latest take on the hw_breakpoint patch. I adopted most of your
> suggestions. There still isn't a .bits member, but or'ing the .len and
> .type members together will give you essentially the same thing; both of
> those values are now completely encoded.

I'd still prefer to have a single machine-dependent field and not have .len.

> The hot path in switch_to_thread_hw_breakpoint() should now be very fast.
> There's a minimal amount of additional activity needed to deal with kernel
> breakpoint updates that might arrive in the middle of a context switch.

It looks promising.

I'm not entirely sanguine about an 8-bit gennum. For the kernel
settings, it's going to be fine--there won't be 256 updates before all
the CPUs process their IPIs. But for the thbi->gennum comparison, a
thread might very well not have run for days, while there have been
many more updates than that, and its gennum%256 matching the current
one or not is just luck.

You may need some memory barriers around the switching/restart stuff.
In fact, I think it would be better not to delve into reinventing the
low-level bits there at all. Instead use read_seqcount_retry there
(linux/seqlock.h). Using that read_seqcount_begin's value as the
number to compare in thbi would also give a 32-bit sequence number.

I don't see why notify_all_threads ever needs to be used. The sequence
number changed, so the next switch in will always update. I guess
that's how you were avoiding the untrustworthy 8-bit sequence number
issue. But I think it's better to do the whole thing with seqcount and
rely on 32-bit sequence numbers being good enough to let thread updates
be entirely lazy.

> I'll go through the file and see which parts really can be shared. It
> might end up being less than you think.
>
> Note that doing this would necessarily create a bunch of new public
> symbols. Routines that I now have declared static wouldn't be able to
> remain that way.
[later:]
> I didn't try to split hw_breakpoint.c apart into sharable and non-sharable
> pieces. At this stage it's not entirely clear which routines would have
> to go on each side. For example, processors with separate sets of debug
> registers for execute and data breakpoints would require a substantial
> change to the existing code. Probably all the lists and arrays would have
> to be duplicated, with one copy for execute breakpoints and one for data
> breakpoints.
>
> If you eliminate all routines that refer to HB_NUM or dr7, that really
> doesn't leave much sharable code. The routines which qualify tend to be
> relatively short; I think the largest one is flush_thread_hw_breakpoint().

It looks to me like there is quite a lot to be shared. Of course the
code can refer to constants like HB_NUM, they just have to be defined
per machine. The dr7 stuff can all be a couple of simple arch_foo
hooks, which will be empty on other machines. All of the list-managing
logic, the prio stuff, etc., would be bad to copy.

The two flavors could probably be accomodated cleanly with an
HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
around some of the calls. I'm not suggesting you try to figure out
that code structure ahead of time. But I don't think it will be a big
barrier to code sharing.

> It turns out that on some processors the CPU does reset DR6 sometimes.
> Intel's documentation is wonderfully vague: "Certain debug exceptions may
> clear bits 0-3." And it appears that gdb relies on this behavior; it
> distinguishes correctly among multiple breakpoints on a vanilla kernel but
> not under the previous version of hw_breakpoint.

So it sounds like maybe the real behavior is that any dr[0-3]-induced
exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
other bits (i.e. just DR_STEP in practice). Is that part true on all CPUs?

> I decided the safest course was to have do_debug() clear tsk->thread.vdr6
> whenever any of the four breakpoint bits is set in the real DR6. More
> sophisticated behavior would be possible at the cost of adding an extra
> flag to tsk->thread.

I'm not sure what you have in mind using a new thread flag. To be
consistent with existing (and machine) behavior, shouldn't that be clear
only all the low (DR_TRAP[0-3]) bits when one of those bits is set?

> Finally, I put in a couple of #ifdef's to make the same source work under
> both i386 and x86_64, although I haven't tried building it. You might
> want to check and make sure that part of validate_settings() is correct.

That looks fine.

I'd like to see this concretely working on x86_64 as well as i386.
That should be a simple matter of the new header file and the makefile
patches to share the code. I can test on x86_64 if you can't.

Do you have some simple test cases prepared? That is, some simple
modules using the generic kernel hw_breakpoint support to readily
report working or not working on basic functionality. I'd like to have
something we can agree on as the baseline smoke test for trying the
patches, and for new machine ports.

I also want to get this machine-independent code sharing going for
real. I'd like to have powerpc working as the non-x86 demonstration
before we declare things in good shape. I don't expect you to write
any powerpc support, but I hope I can get you to do the arch code
separation to make the way for it. If you'll take a crack at it, I'll
fill in and test the powerpc bits and I think we'll get something very
satisfactory ironed out pretty fast.

So consider the powerpc64 situation and imagine how you would do the
implementation for it, and I think you'll find a lot of the code you've
written is naturally shared for it. It's a bit of a degenerate case,
because HB_NUM is 1, but that needn't really matter. There are only
data address breakpoints of length 8 with an aligned address, so the
only control info aside from the address is r/w bits. There is no
separate control register. The control bits are stored in the low bits
of the register whose high bits are the high bits of the aligned
address. (I think other machines store their control bits the same
way.) So in fact, not only is there no need for .len, but .type is
actually just bits that could be stored directly in address.va (if
noone expected to look at that for the address, or they used an
accessor that masks off the low bits). But there are bits to spare
there next to .priority, so keeping them separate doesn't hurt. What's
important is that the chbi->dabr and thbi->dabr fields are stored in
fully-encoded form for quick switching.


Thanks,
Roland

2007-05-14 15:42:31

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Sun, 13 May 2007, Roland McGrath wrote:

> This makes me think about RF a little more. If you ever set it, there are
> some places we need to clear it too. That is, when the PC is being changed
> before returning to user mode, which is in signals and in ptrace. If the
> PC is changing to other than the breakpoint location hit by the handler
> that set RF, we need clear RF so that the first instruction at the changed
> PC can be a breakpoint hit of its own and not get masked. In fact, it may
> also be necessary to clear RF when freshly setting a new instruction
> breakpoint (when RF is set because the stop was not a debug exception at
> all), so that it isn't skipped if the PC happens to be right there already.

It seems to me that signal handlers must run with a copy of the original
EFLAGS stored on the stack. Otherwise, when the handler returned the
former context wouldn't be fully restored. But I don't know enough about
the signal handling code to see how to turn off RF in the stored EFLAGS
image.

Also, what if the signal handler was entered as a result of encountering
an instruction breakpoint? In that case you would want to keep RF on to
prevent an infinite loop.

You're right about wanting to clear RF when changing the PC via ptrace or
when setting a new execution breakpoint (provided the new breakpoint's
address is equal to the current PC value).

Do you know how gdb handles instruction breakpoints, and in particular,
how it resumes execution after a breakpoint?


> > Come to think of it, we don't really need modify_user_hw_breakpoint at
> > all. It could be replaced by an {unregister(old); register(new);}
> > sequence. Unless you think there's some pressing reason to keep it, my
> > inclination is to do away with it.
>
> I sort of wondered from the beginning why it was there. The rationale I
> can see is to avoid flutter. That is, when unregistering frees up a slot
> for a lower-priority allocation waiting in the wings, and then the new
> registration will just displace it again. The priority list diddling is
> wasted work to get back to just how it was before, but more importantly you
> don't want to have those callbacks for a momentarily-available slot coming
> and going. I don't know if this can really come up with the current code.

That may be what I originally had in mind; I no longer remember.

But it doesn't matter. We're up against an API incompatibility here.
gdb doesn't allow you to modify breakpoints; it forces you to delete the
old one and add a new one. It's only an artifact of the x86 architecture
that gdb implements this by reusing debug registers. So even if the
modify_user_hw_breakpoint() routine were kept, gdb wouldn't really want to
make use of it.

Under the circumstances I think we should just leave it out.


> > As I understand it, setting one of those bits is necessary on the 386 but
> > not necessary for later processors. Should this be controlled by a
> > runtime (or compile time) check? For that matter, do those bits have any
> > effect at all on a Pentium?
>
> I've never heard of anyone using them, but I don't know the full story.

On the 386, either GE or LE had to be set for DR breakpoints to work
properly. Later on (I don't remember if it was in the 486 or the Pentium)
this restriction was removed. I don't know whether those bits do anything
at all on modern CPUs.


> The documentation I have says that RF is set in the trap frame on the stack
> (i.e. pt_regs.eflags) by every other kind of exception. However, for a
> debug exception that is due to an instruction breakpoint, RF=0 in the trap
> frame and the manual explicitly says that the handler must set the bit so
> that iret will resume and execute it rather than hit the breakpoint again.
>
> [later:]
> > It also turns out that some CPUs don't automatically set the RF bit in
> > the EFLAGS image on the stack. Intel recommends that the OS always set
> > that bit whenever a debug exception occurs, so that's what I did.
>
> Is this really "some CPUs"? Or is it actually always as I described above
> (i.e. RF set usually but cleared for an instruction breakpoint hit)?

My 80386 Programmer's Reference Manual says:

... an instruction-address breakpoint exception is a fault.

And:

When it detects a fault, the processor automatically sets
RF in the flags image that it pushes onto the stack.

And:

The processor automatically sets RF in the EFLAGS image
on the stack before entry into any fault handler. Upon
entry into the fault handler for instruction address
breakpoints, for example, RF is set in the EFLAGS image
on the stack...

That seems to be pretty clear. So the behavior can vary according to the
processor type.


> > If callers want to give up when a kernel breakpoint isn't installed
> > immediately, all they have to do is check the return value from
> > register_kernel_hw_breakpoint and call unregister_kernel_hw_breakpoint.
> > If you really want it, I could add an extra "fail if not installed"
> > argument flag.
>
> The important thing is that there aren't any difficult races (i.e. what you
> get with callbacks). If register with no callback followed by unregister
> on seeing "registered but not installed" return value is simple and cheap,
> that is fine.

I suppose you might register a breakpoint and find that it isn't installed
immediately, but then it could get installed and actually trigger before
you managed to unregister it. Does that count as a "difficult race"?
Presumably the work done by the trigger callback would get ignored.


> > > > + /* Block kernel breakpoint updates from other CPUs */
> > > > + local_irq_save(flags);
> > >
> > > I have a feeling this is more costly than we want, though I don't really
> > > know. It seems to me that things in struct cpu_hw_breakpoint are not
> > > really per-CPU, except for bp_task. They are "current global state",
> > > right?
> >
> > Not really, since changes to the debug registers on multiple CPUs cannot
> > be made simultaneously. There will be short periods when different CPUs
> > have different debug register values. What if a debug exception occurs
> > during one of those periods?
>
> I think it's fine if a CPU getting an exception before it's processed the
> IPI looks at changed global state and says "oh, mine was stale", and punts
> the hit. (Or perhaps it transmorgifies its apparent DR# based on the new
> global state, if the CPU's old setting corresponds to one of the new
> settings. Probably the changing of settings can just preserve the old DR#
> selection in such cases and simplify the situation for the handler doing
> the catch-up to just if (old->dr[n] != new->dr[n]) ignore;.)

Punting isn't acceptable, not if the bp in question was present both
before and after the IPI. I'd rather transmogrify it as you described,
awkward though that may be.

Maybe it doesn't have to be so bad. If there were _two_ global copies of
the kernel bp settings, one for the old pre-IPI state and one for the new,
then the handler could simply look up the DR# in the appropriate copy.
This would remove the need to store the settings in the per-CPU area.


> > Here's the latest take on the hw_breakpoint patch. I adopted most of your
> > suggestions. There still isn't a .bits member, but or'ing the .len and
> > .type members together will give you essentially the same thing; both of
> > those values are now completely encoded.
>
> I'd still prefer to have a single machine-dependent field and not have .len.

It's a relatively minor issue. On machines with fixed-length breakpoints,
the .len field can be ignored. Conversely, leaving it out would require
using bitmasks to extract the type and length values from a combined .bits
field. I don't see any advantage.


> I'm not entirely sanguine about an 8-bit gennum. For the kernel
> settings, it's going to be fine--there won't be 256 updates before all
> the CPUs process their IPIs. But for the thbi->gennum comparison, a
> thread might very well not have run for days, while there have been
> many more updates than that, and its gennum%256 matching the current
> one or not is just luck.

Ah, you haven't understood the purpose of the gennum. In fact 8 bits
isn't too small -- far from it! It's too _large_; a single bit would
suffice. I made it an 8-bit value just because that was easier.

Here's the idea. thbi->gennum is at all times either equal to the current
gennum value or is set to -1. That's what notify_all_threads() does; it
sets thbi->gennum to -1 in all tasks currently being debugged whenever a
change to the kernel breakpoints occurs. My assumption is that almost all
of the time there will be very few debuggees.

The main use of gennum is with chbi->gennum, which is at all times equal
to the current gennum value or the previous one (if the CPU hasn't yet
received the update IPI). Hence chbi->gennum needs to distinguish between
only two values: current or previous.

Note that CPUs can never lag behind by more than one update. The
hw_breakpoint_mutex doesn't get released until every CPU has acknowledged
receipt of the IPI.


> You may need some memory barriers around the switching/restart stuff.
> In fact, I think it would be better not to delve into reinventing the
> low-level bits there at all. Instead use read_seqcount_retry there
> (linux/seqlock.h). Using that read_seqcount_begin's value as the
> number to compare in thbi would also give a 32-bit sequence number.
>
> I don't see why notify_all_threads ever needs to be used. The sequence
> number changed, so the next switch in will always update. I guess
> that's how you were avoiding the untrustworthy 8-bit sequence number
> issue. But I think it's better to do the whole thing with seqcount and
> rely on 32-bit sequence numbers being good enough to let thread updates
> be entirely lazy.

Yes, that was the idea. However seqcounts may work better in conjunction
with this idea of keeping a global copy of both the old and the new kernel
breakpoints. I'll look into it.


> It looks to me like there is quite a lot to be shared. Of course the
> code can refer to constants like HB_NUM, they just have to be defined
> per machine. The dr7 stuff can all be a couple of simple arch_foo
> hooks, which will be empty on other machines. All of the list-managing
> logic, the prio stuff, etc., would be bad to copy.
>
> The two flavors could probably be accomodated cleanly with an
> HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
> around some of the calls. I'm not suggesting you try to figure out
> that code structure ahead of time. But I don't think it will be a big
> barrier to code sharing.

Hmmm, maybe. Those loops would end up looking messy.


> > It turns out that on some processors the CPU does reset DR6 sometimes.
> > Intel's documentation is wonderfully vague: "Certain debug exceptions may
> > clear bits 0-3." And it appears that gdb relies on this behavior; it
> > distinguishes correctly among multiple breakpoints on a vanilla kernel but
> > not under the previous version of hw_breakpoint.
>
> So it sounds like maybe the real behavior is that any dr[0-3]-induced
> exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
> other bits (i.e. just DR_STEP in practice). Is that part true on all CPUs?

No. The 80386 manual says:

Note that the bits of DR6 are never cleared by the processor.

It's important to bear in mind that not all x86 CPUs are made by Intel,
and of those that are, not all are Pentium 4's. This appears to be an
area of high variability so we should be as conservative as possible.

> > I decided the safest course was to have do_debug() clear tsk->thread.vdr6
> > whenever any of the four breakpoint bits is set in the real DR6. More
> > sophisticated behavior would be possible at the cost of adding an extra
> > flag to tsk->thread.
>
> I'm not sure what you have in mind using a new thread flag. To be
> consistent with existing (and machine) behavior, shouldn't that be clear
> only all the low (DR_TRAP[0-3]) bits when one of those bits is set?

I could do that. I don't know what happens to DR_STEP; a quick test might
be worthwhile.


> I'd like to see this concretely working on x86_64 as well as i386.
> That should be a simple matter of the new header file and the makefile
> patches to share the code. I can test on x86_64 if you can't.
>
> Do you have some simple test cases prepared? That is, some simple
> modules using the generic kernel hw_breakpoint support to readily
> report working or not working on basic functionality. I'd like to have
> something we can agree on as the baseline smoke test for trying the
> patches, and for new machine ports.

I'll put together a simple test module for kernel breakpoints. It's
already possible to test user breakpoints just by running gdb.

> I also want to get this machine-independent code sharing going for
> real. I'd like to have powerpc working as the non-x86 demonstration
> before we declare things in good shape. I don't expect you to write
> any powerpc support, but I hope I can get you to do the arch code
> separation to make the way for it. If you'll take a crack at it, I'll
> fill in and test the powerpc bits and I think we'll get something very
> satisfactory ironed out pretty fast.
>
> So consider the powerpc64 situation and imagine how you would do the
> implementation for it, and I think you'll find a lot of the code you've
> written is naturally shared for it. It's a bit of a degenerate case,
> because HB_NUM is 1, but that needn't really matter. There are only
> data address breakpoints of length 8 with an aligned address, so the
> only control info aside from the address is r/w bits. There is no
> separate control register. The control bits are stored in the low bits
> of the register whose high bits are the high bits of the aligned
> address. (I think other machines store their control bits the same
> way.) So in fact, not only is there no need for .len, but .type is
> actually just bits that could be stored directly in address.va (if
> noone expected to look at that for the address, or they used an
> accessor that masks off the low bits). But there are bits to spare
> there next to .priority, so keeping them separate doesn't hurt. What's
> important is that the chbi->dabr and thbi->dabr fields are stored in
> fully-encoded form for quick switching.

I'll see what I can do.

In this situation you don't need to worry about how .type and .len are
stored. On powerpc64 we can have a special thbi->dabr field analogous to
the thbi->tdr7 field on x86. All precomputed and ready for quick
switching.

Even if HB_NUM were larger than 1, we could still store two copies of the
address value (the second copy with the low-order type bits set).

Alan Stern

2007-05-14 21:25:27

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> It seems to me that signal handlers must run with a copy of the original
> EFLAGS stored on the stack.

Of course. I'm talking about how the registers get changed to set up the
signal handler to start running, not how the interrupted registers are
saved on the user stack. There is no issue with the stored eflags image;
the "privileged" flags like RF are ignored by sigreturn anyway.

> Also, what if the signal handler was entered as a result of encountering
> an instruction breakpoint?

This does not happen in reality. Breakpoints can only be set by the
debugger, not by the program itself. The debugger should always eat the trap.

> You're right about wanting to clear RF when changing the PC via ptrace or
> when setting a new execution breakpoint (provided the new breakpoint's
> address is equal to the current PC value).

Starting a signal handler is "warping the PC" equivalent to changing it via
ptrace for purposes of this discussion. In case the new PC is the site of
another breakpoint, RF must be clear.

> Do you know how gdb handles instruction breakpoints, and in particular,
> how it resumes execution after a breakpoint?

AFAICT it never actually uses hardware instruction breakpoints, only data
watchpoints. I wouldn't be surprised if noone has ever really used
instruction breakpoint settings in x86 hardware debug registers on Linux.
(Frankly, I don't much expect them to start either. This level of detail
about instruction breakpoints is largely academic. I am a stickler for
getting the details right if we're going to allow using them at all.
But I think really everyone only cares about data watchpoints.)

> But it doesn't matter. We're up against an API incompatibility here.

That's a red herring. gdb is the compatibility case, not the real API user.

> Under the circumstances I think we should just leave it out.

That is fine. If the flutter issue comes up, we can address it later.

> On the 386, either GE or LE had to be set for DR breakpoints to work
> properly. Later on (I don't remember if it was in the 486 or the Pentium)
> this restriction was removed. I don't know whether those bits do anything
> at all on modern CPUs.

I'm moderately sure they do nothing on modern CPUs. Intel says they're
ignored as of Pentium, but recommends setting both bits if you care at all.
In practice, I don't think we'll ever hear about the inexactness on a
pre-Pentium processor from not setting the bits. But I'd follow the Intel
manual and set both.

> My 80386 Programmer's Reference Manual says:

The earlier quote I gave was from an AMD64 manual. A 1995 Intel manual I
have says, "All Intel Architecture processors manage the RF flag as follows,"
and proceeds to give the "all faults except instruction breakpoint" behavior
I quoted from the AMD manual earlier. Hence I sincerely doubt that this
varies among Intel and AMD processors. Someone else will have to help us
know about other makers' processors. So far I have no reason to suspect that
any processor behaves differently (aside from generic cynicism ;-).

> I suppose you might register a breakpoint and find that it isn't installed
> immediately, but then it could get installed and actually trigger before
> you managed to unregister it. Does that count as a "difficult race"?

Yes, that is really the kind of thing I had in mind. For user breakpoints it
shouldn't be an issue, since the thread shouldn't have been let run in between.

> Presumably the work done by the trigger callback would get ignored.

That is in the "difficult race" category to ensure. I would not presume.

> Maybe it doesn't have to be so bad. If there were _two_ global copies of
> the kernel bp settings, one for the old pre-IPI state and one for the new,
> then the handler could simply look up the DR# in the appropriate copy.
> This would remove the need to store the settings in the per-CPU area.

I think that is what I suggested an iteration or two ago. Installing new
state means making a fresh data structure and installing a pointer to it,
leaving the old (immutable) one to be freed by RCU.

> It's a relatively minor issue. On machines with fixed-length breakpoints,
> the .len field can be ignored. Conversely, leaving it out would require
> using bitmasks to extract the type and length values from a combined .bits
> field. I don't see any advantage.

I guess my main objection to having .type and .len is the false implied
documentation of their presence and names, leading to people thinking they
can look at those values. In fact, they are machine-specific and
implementation-specific bits of no intrinsic use to anyone else.

> Ah, you haven't understood the purpose of the gennum. In fact 8 bits
> isn't too small -- far from it! It's too _large_; a single bit would
> suffice. I made it an 8-bit value just because that was easier.

If it's actually a flag, then treating it any other way is just confusing.
I can't see how it's easier for anyone.

> Note that CPUs can never lag behind by more than one update. The
> hw_breakpoint_mutex doesn't get released until every CPU has acknowledged
> receipt of the IPI.

Then it really is just a flag for all uses, and there's no reason at all to
call it a number.

> Yes, that was the idea. However seqcounts may work better in conjunction
> with this idea of keeping a global copy of both the old and the new kernel
> breakpoints. I'll look into it.

I think that is going to be the clean and sane approach.
Hand-rolling your low-level synchronization code is always questionable.

> > So it sounds like maybe the real behavior is that any dr[0-3]-induced
> > exception resets the DR_TRAP[0-3] bits to just the new hit, but not the
> > other bits (i.e. just DR_STEP in practice). Is that part true on all CPUs?
>
> No. The 80386 manual says:
>
> Note that the bits of DR6 are never cleared by the processor.
>
> It's important to bear in mind that not all x86 CPUs are made by Intel,
> and of those that are, not all are Pentium 4's. This appears to be an
> area of high variability so we should be as conservative as possible.

That line from the manual is what we were both going on originally, and
then you described the conflicting behavior. I was trying to ascertain
whether chips really do vary, or if the manual was just inaccurate about
the single common way it actually behaves. I take it you have in fact
observed different behaviors on different chips?

There are two possible kinds of "conservative" here. To be conservative
with respect to the existing behavior on a given chip, whatever that may
be, we should never clear %dr6 completely, and instead should always
mirror its bits to vdr7, only mapping the low four bits around to present
the virtualized order. The only bits we'd ever clear in hardware are
those DR_TRAPn bits corresponding to the registers allocated to non-ptrace
uses, and kprobes should clear DR_STEP. And note that when vdr6 is
changed by ptrace, we should reset the hardware %dr6 accordingly, to match
existing kernel behavior should users change debugreg[6] via ptrace.

To be conservative in the sense of reliable user-level behavior despite
chip oddities would be a little different. Firstly, I think we should
mirror all the "extra" bits from hardware to vdr7 blindly, i.e. everything
but DR_STEP and DR_TRAPn. That way if any chip comes along that sets new
bits for new features or whatnot, users can at least see the new hardware
bits via ptrace before hw_breakpoint gets updated to support them more
directly. For the low four bits, I think what users expect is that no
bits are ever implicitly cleared, so they accumulate to say which drN has
hit since the last time ptrace was used to clear vdr6.

> Even if HB_NUM were larger than 1, we could still store two copies of the
> address value (the second copy with the low-order type bits set).

There's no reason to waste another word when you only need two bits and
already have spare space for a machine implementation field (i.e. where .type
is now).


Thanks,
Roland

2007-05-16 19:03:31

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Mon, 14 May 2007, Roland McGrath wrote:

> > It seems to me that signal handlers must run with a copy of the original
> > EFLAGS stored on the stack.
>
> Of course. I'm talking about how the registers get changed to set up the
> signal handler to start running, not how the interrupted registers are
> saved on the user stack. There is no issue with the stored eflags image;
> the "privileged" flags like RF are ignored by sigreturn anyway.

Ah, okay. Yes, clearly the new EFLAGS for the signal handler should
have RF turned off. This should always be true, regardless of
debugging.

> > Also, what if the signal handler was entered as a result of encountering
> > an instruction breakpoint?
>
> This does not happen in reality. Breakpoints can only be set by the
> debugger, not by the program itself. The debugger should always eat the trap.

Hmmm. I put in a little extra code to account for the possibility that
a program might want to set hardware breakpoints in itself. Should
this be removed?

> The earlier quote I gave was from an AMD64 manual. A 1995 Intel manual I
> have says, "All Intel Architecture processors manage the RF flag as follows,"
> and proceeds to give the "all faults except instruction breakpoint" behavior
> I quoted from the AMD manual earlier. Hence I sincerely doubt that this
> varies among Intel and AMD processors. Someone else will have to help us
> know about other makers' processors. So far I have no reason to suspect that
> any processor behaves differently (aside from generic cynicism ;-).

And I no longer have any 386 CPUs to test...

> > It's a relatively minor issue. On machines with fixed-length breakpoints,
> > the .len field can be ignored. Conversely, leaving it out would require
> > using bitmasks to extract the type and length values from a combined .bits
> > field. I don't see any advantage.
>
> I guess my main objection to having .type and .len is the false implied
> documentation of their presence and names, leading to people thinking they
> can look at those values. In fact, they are machine-specific and
> implementation-specific bits of no intrinsic use to anyone else.

The fact that they are machine-specific and implementation-specific
doesn't necessarily make them of no use. See the driver below.

> That line from the manual is what we were both going on originally, and
> then you described the conflicting behavior. I was trying to ascertain
> whether chips really do vary, or if the manual was just inaccurate about
> the single common way it actually behaves. I take it you have in fact
> observed different behaviors on different chips?

No; I have tested only a couple of systems and I don't have a wide
variety of machines available.

> There are two possible kinds of "conservative" here. To be conservative
> with respect to the existing behavior on a given chip, whatever that may
> be, we should never clear %dr6 completely, and instead should always
> mirror its bits to vdr7, only mapping the low four bits around to present
> the virtualized order. The only bits we'd ever clear in hardware are
> those DR_TRAPn bits corresponding to the registers allocated to non-ptrace
> uses, and kprobes should clear DR_STEP. And note that when vdr6 is
> changed by ptrace, we should reset the hardware %dr6 accordingly, to match
> existing kernel behavior should users change debugreg[6] via ptrace.
>
> To be conservative in the sense of reliable user-level behavior despite
> chip oddities would be a little different. Firstly, I think we should
> mirror all the "extra" bits from hardware to vdr7 blindly, i.e. everything
> but DR_STEP and DR_TRAPn. That way if any chip comes along that sets new
> bits for new features or whatnot, users can at least see the new hardware
> bits via ptrace before hw_breakpoint gets updated to support them more
> directly. For the low four bits, I think what users expect is that no
> bits are ever implicitly cleared, so they accumulate to say which drN has
> hit since the last time ptrace was used to clear vdr6.

Allow me to rephrase: When a debug exception occurs, the real DR6 value
should be copied to vdr6, except that kprobes should adjust DR_STEP and
hw_breakpoint should adjust the DR_TRAPn bits appropriately. There's
some question about what value the debug exception handler should write
back to DR6, if anything. When switching to a new task, the DR_TRAPn
bits in vdr6 could be de-virtualized somehow and the result loaded into
DR6, but again, it might be safest to leave DR6 alone.

As for what users expect of the low four bits, you are definitely
wrong. My tests with gdb show that it relies on the CPU to clear those
bits whenever a data breakpoint is hit; it doesn't clear them itself
and it doesn't work properly if the kernel keeps virtualized versions
of them set. That's on a Pentium 4 and on an AMD Duron.

I did some testing to see how the CPU behaves when the debug handler
writes different values back to DR6. The results were:

Values written back to DR6 were retained in the register until
the next debug exception occurred.

When the exception handler read DR6, the 0xffff0ff0 bits were
set every time. The 0x00001000 bit was never set, even if it
had been turned on before the exception occurred.

No matter what values were stored in the low four bits
beforehand, when the exception occurred DR6 had only the
bit for the debug register which was triggered.

If the handler wrote back any of BS, BT, or BD to DR6, then
the system misbehaved. I don't know exactly what happened,
but my shell process ended and the debug handler got called
over and over again (as if stuck in a loop) for several
seconds.

In light of these results, the best approach appears to be either to
leave DR6 alone or to set it to 0.


Below is a patch containing a driver meant for testing kernel hardware
breakpoints. Instructions are in the comments at the top. You can
build the driver by typing "make M=bptest" at the top level.

The patch also adjust the Alt-SysRq-P handler to print out the debug
register values along with all the other stuff.

Alan Stern



Index: usb-2.6/bptest/Makefile
===================================================================
--- /dev/null
+++ usb-2.6/bptest/Makefile
@@ -0,0 +1 @@
+obj-m += bptest.o
Index: usb-2.6/bptest/bptest.c
===================================================================
--- /dev/null
+++ usb-2.6/bptest/bptest.c
@@ -0,0 +1,459 @@
+/*
+ * Test driver for hardware breakpoints.
+ *
+ * Copyright (C) 2007 Alan Stern <[email protected]>
+ */
+
+/*
+ * When this driver is loaded, it will create several attribute files
+ * under /sys/bus/platform/drivers/bptest:
+ *
+ * call, read, write, and bp0,..., bp3.
+ *
+ * It also allocates a 32-byte array (called "bytes") for testing data
+ * breakpoints, and it contains four do-nothing routines, r0(),..., r3(),
+ * for testing execution breakpoints.
+ *
+ * Writing to the "call" attribute causes the rN routines to be called;
+ * "echo >call N" will call rN(), where N is 0, 1, 2, or 3. Similarly,
+ * "echo >call" will call all four routines.
+ *
+ * The byte array can be accessed through the "read" and "write"
+ * attributes. "echo >read N" will read bytes[N], and "echo >write N V"
+ * will store V in bytes[N], where N is between 0 and 31. There are
+ * no provision for multi-byte accesses; they shouldn't be needed for
+ * simple testing.
+ *
+ * The driver contains four hw_breakpoint structures, which can be
+ * accessed through the "bpN" attributes. Reading the attribute file
+ * will yield the hw_breakpoint's current settings. The settings can be
+ * altered by writing the attribute. The format to use is:
+ *
+ * echo >bpN priority type address [len]
+ *
+ * priority must be a number between 0 and 255. type must be one of 'e'
+ * (execution), 'r' (read), 'w' (write), or 'b' (both read/write).
+ * address must be a number between 0 and 31; if type is 'e' then address
+ * must be between 0 and 3. len must 1, 2, 4, or 8, but if type is 'e'
+ * then len is optional and ignored.
+ *
+ * Execution breakpoints are set on the rN routine and data breakpoints
+ * are set on bytes[N], where N is the address value. You can unregister
+ * a breakpoint by doing "echo >bpN u", where 'u' is any non-digit.
+ *
+ * (Note: On i386 certain values are not implemented. len cannot be set
+ * to 8 and type cannot be set to 'r'.)
+ *
+ * The driver prints lots of information to the system log as it runs.
+ * To best see things as they happen, use a VT console and set the
+ * logging level high (I use Alt-SysRq-9).
+ */
+
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <asm/hw_breakpoint.h>
+
+MODULE_AUTHOR("Alan Stern <[email protected]>");
+MODULE_DESCRIPTION("Hardware Breakpoint test driver");
+MODULE_LICENSE("GPL");
+
+
+static struct hw_breakpoint bps[4];
+
+
+#define NUM_BYTES 32
+static unsigned char bytes[NUM_BYTES] __attribute__((aligned(8)));
+
+/* Write n to read bytes[n] */
+static ssize_t read_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ int n = -1;
+
+ if (sscanf(buf, "%d", &n) < 1 || n < 0 || n >= NUM_BYTES) {
+ printk(KERN_WARNING "bptest: read: invalid index %d\n", n);
+ return -EINVAL;
+ }
+ printk(KERN_INFO "bptest: read: bytes[%d] = %d\n", n, bytes[n]);
+ return count;
+}
+static DRIVER_ATTR(read, 0200, NULL, read_store);
+
+/* Write n v to set bytes[n] = v */
+static ssize_t write_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ int n = -1;
+ int v;
+
+ if (sscanf(buf, "%d %d", &n, &v) < 2 || n < 0 || n >= NUM_BYTES) {
+ printk(KERN_WARNING "bptest: write: invalid index %d\n", n);
+ return -EINVAL;
+ }
+ bytes[n] = v;
+ printk(KERN_INFO "bptest: write: bytes[%d] <- %d\n", n, v);
+ return count;
+}
+static DRIVER_ATTR(write, 0200, NULL, write_store);
+
+
+/* Dummy routines for testing instruction breakpoints */
+static void r0(void)
+{
+ printk(KERN_INFO "This is r%d\n", 0);
+}
+static void r1(void)
+{
+ printk(KERN_INFO "This is r%d\n", 1);
+}
+static void r2(void)
+{
+ printk(KERN_INFO "This is r%d\n", 2);
+}
+static void r3(void)
+{
+ printk(KERN_INFO "This is r%d\n", 3);
+}
+
+static void (*rtns[])(void) = {
+ r0, r1, r2, r3
+};
+
+
+/* Write n to call routine r##n, or a blank line to call them all */
+static ssize_t call_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ int n;
+
+ if (sscanf(buf, "%d", &n) == 0) {
+ printk(KERN_INFO "bptest: call all routines\n");
+ r0();
+ r1();
+ r2();
+ r3();
+ } else if (n >= 0 && n < 4) {
+ printk(KERN_INFO "bptest: call r%d\n", n);
+ rtns[n]();
+ } else {
+ printk(KERN_WARNING "bptest: call: invalid index: %d\n", n);
+ count = -EINVAL;
+ }
+ return count;
+}
+static DRIVER_ATTR(call, 0200, NULL, call_store);
+
+
+/* Breakpoint callbacks */
+static void bptest_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ printk(KERN_INFO "Breakpoint %d triggered\n", bp - bps);
+}
+
+static void bptest_installed(struct hw_breakpoint *bp)
+{
+ printk(KERN_INFO "Breakpoint %d installed\n", bp - bps);
+}
+
+static void bptest_uninstalled(struct hw_breakpoint *bp)
+{
+ printk(KERN_INFO "Breakpoint %d uninstalled\n", bp - bps);
+}
+
+
+/* Breakpoint attribute files for testing */
+static ssize_t bp_show(int n, char *buf)
+{
+ struct hw_breakpoint *bp = &bps[n];
+ int a, len, type;
+
+ if (!bp->status)
+ return sprintf(buf, "bp%d: unregistered\n", n);
+
+ len = -1;
+ switch (bp->len) {
+#ifdef HW_BREAKPOINT_LEN_1
+ case HW_BREAKPOINT_LEN_1: len = 1; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+ case HW_BREAKPOINT_LEN_2: len = 2; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+ case HW_BREAKPOINT_LEN_4: len = 4; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+ case HW_BREAKPOINT_LEN_4: len = 8; break;
+#endif
+ }
+
+ type = '?';
+ switch (bp->type) {
+#ifdef HW_BREAKPOINT_READ
+ case HW_BREAKPOINT_READ: type = 'r'; break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+ case HW_BREAKPOINT_WRITE: type = 'w'; break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+ case HW_BREAKPOINT_RW: type = 'b'; break;
+#endif
+#ifdef HW_BREAKPOINT_EXECUTE
+ case HW_BREAKPOINT_EXECUTE: type = 'e'; break;
+#endif
+ }
+
+ a = -1;
+ if (type == 'e') {
+ if (bp->address.kernel == r0)
+ a = 0;
+ else if (bp->address.kernel == r1)
+ a = 1;
+ else if (bp->address.kernel == r2)
+ a = 2;
+ else if (bp->address.kernel == r3)
+ a = 3;
+ } else {
+ const unsigned char *p = bp->address.kernel;
+
+ if (p >= bytes && p < bytes + NUM_BYTES)
+ a = p - bytes;
+ }
+
+ return sprintf(buf, "bp%d: %d %c %d %d [%sinstalled]\n",
+ n, bp->priority, type, a, len,
+ (bp->status < HW_BREAKPOINT_INSTALLED ? "not " : ""));
+}
+
+static ssize_t bp_store(int n, const char *buf, size_t count)
+{
+ struct hw_breakpoint *bp = &bps[n];
+ int prio, a, len;
+ char type;
+ int i;
+
+ if (count <= 1) {
+ printk(KERN_INFO "bptest: bp%d: format: priority type "
+ "address len\n", n);
+ printk(KERN_INFO " type = r, w, b, or e; address = 0 - 31; "
+ "len = 1, 2, 4, or 8\n");
+ printk(KERN_INFO " Write any non-digit to unregister\n");
+ return count;
+ }
+
+ unregister_kernel_hw_breakpoint(bp);
+ printk(KERN_INFO "bptest: bp%d unregistered\n", n);
+
+ len = -1;
+ i = sscanf(buf, "%d %c %d %d", &prio, &type, &a, &len);
+ if (i == 0)
+ return count;
+ if (i < 3) {
+ printk(KERN_WARNING "bptest: bp%d: too few fields\n", n);
+ return -EINVAL;
+ }
+
+ bp->priority = prio;
+ switch (type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+ case 'e':
+ bp->type = HW_BREAKPOINT_EXECUTE;
+ bp->len = HW_BREAKPOINT_LEN_EXECUTE;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+ case 'r':
+ bp->type = HW_BREAKPOINT_READ;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+ case 'w':
+ bp->type = HW_BREAKPOINT_WRITE;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+ case 'b':
+ bp->type = HW_BREAKPOINT_RW;
+ break;
+#endif
+ default:
+ printk(KERN_WARNING "bptest: bp%d: invalid type %c\n",
+ n, type);
+ return -EINVAL;
+ }
+
+ if (a < 0 || a >= NUM_BYTES || (a >= 4 && type == 'e')) {
+ printk(KERN_WARNING "bptest: bp%d: invalid address %d\n",
+ n, a);
+ return -EINVAL;
+ }
+ if (type == 'e')
+ bp->address.kernel = rtns[a];
+ else {
+ bp->address.kernel = &bytes[a];
+
+ switch (len) {
+#ifdef HW_BREAKPOINT_LEN_1
+ case 1: bp->len = HW_BREAKPOINT_LEN_1; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+ case 2: bp->len = HW_BREAKPOINT_LEN_2; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+ case 4: bp->len = HW_BREAKPOINT_LEN_4; break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+ case 8: bp->len = HW_BREAKPOINT_LEN_8; break;
+#endif
+ default:
+ printk(KERN_WARNING "bptest: bp%d: invalid len %d\n",
+ n, len);
+ return -EINVAL;
+ break;
+ }
+ }
+
+ bp->triggered = bptest_triggered;
+ bp->installed = bptest_installed;
+ bp->uninstalled = bptest_uninstalled;
+
+ i = register_kernel_hw_breakpoint(bp);
+ if (i < 0) {
+ printk(KERN_WARNING "bptest: bp%d: failed to register %d\n",
+ n, i);
+ count = i;
+ } else
+ printk(KERN_INFO "bptest: bp%d registered: %d\n", n, i);
+ return count;
+}
+
+
+static ssize_t bp0_show(struct device_driver *d, char *buf)
+{
+ return bp_show(0, buf);
+}
+static ssize_t bp0_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ return bp_store(0, buf, count);
+}
+static DRIVER_ATTR(bp0, 0600, bp0_show, bp0_store);
+
+static ssize_t bp1_show(struct device_driver *d, char *buf)
+{
+ return bp_show(1, buf);
+}
+static ssize_t bp1_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ return bp_store(1, buf, count);
+}
+static DRIVER_ATTR(bp1, 0600, bp1_show, bp1_store);
+
+static ssize_t bp2_show(struct device_driver *d, char *buf)
+{
+ return bp_show(2, buf);
+}
+static ssize_t bp2_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ return bp_store(2, buf, count);
+}
+static DRIVER_ATTR(bp2, 0600, bp2_show, bp2_store);
+
+static ssize_t bp3_show(struct device_driver *d, char *buf)
+{
+ return bp_show(3, buf);
+}
+static ssize_t bp3_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ return bp_store(3, buf, count);
+}
+static DRIVER_ATTR(bp3, 0600, bp3_show, bp3_store);
+
+
+static int bptest_probe(struct platform_device *pdev)
+{
+ return -ENODEV;
+}
+
+static int bptest_remove(struct platform_device *pdev)
+{
+ return 0;
+}
+
+static struct platform_driver bptest_driver = {
+ .probe = bptest_probe,
+ .remove = bptest_remove,
+ .driver = {
+ .name = "bptest",
+ .owner = THIS_MODULE,
+ }
+};
+
+
+static struct driver_attribute *(bptest_group[]) = {
+ &driver_attr_bp0,
+ &driver_attr_bp1,
+ &driver_attr_bp2,
+ &driver_attr_bp3,
+ &driver_attr_call,
+ &driver_attr_read,
+ &driver_attr_write,
+ NULL
+};
+
+static int add_files(void)
+{
+ int rc = 0;
+ struct driver_attribute **g;
+
+ for (g = bptest_group; *g; ++g) {
+ rc = driver_create_file(&bptest_driver.driver, *g);
+ if (rc)
+ break;
+ }
+ return rc;
+}
+
+static void remove_files(void)
+{
+ struct driver_attribute **g;
+
+ for (g = bptest_group; *g; ++g)
+ driver_remove_file(&bptest_driver.driver, *g);
+}
+
+static int __init bptest_init(void)
+{
+ int rc;
+
+ rc = platform_driver_register(&bptest_driver);
+ if (rc) {
+ printk(KERN_ERR "Failed to register bptest driver: %d\n", rc);
+ return rc;
+ }
+ rc = add_files();
+ if (rc) {
+ remove_files();
+ platform_driver_unregister(&bptest_driver);
+ return rc;
+ }
+ printk("bptest loaded\n");
+ return 0;
+}
+
+static void __exit bptest_exit(void)
+{
+ int n;
+
+ remove_files();
+ for (n = 0; n < 4; ++n)
+ unregister_kernel_hw_breakpoint(&bps[n]);
+ platform_driver_unregister(&bptest_driver);
+ printk("bptest unloaded\n");
+}
+
+module_init(bptest_init);
+module_exit(bptest_exit);
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -296,6 +296,7 @@ __setup("idle=", idle_setup);
void show_regs(struct pt_regs * regs)
{
unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
+ unsigned long d0, d1, d2, d3;

printk("\n");
printk("Pid: %d, comm: %20s\n", current->pid, current->comm);
@@ -320,6 +321,17 @@ void show_regs(struct pt_regs * regs)
cr3 = read_cr3();
cr4 = read_cr4_safe();
printk("CR0: %08lx CR2: %08lx CR3: %08lx CR4: %08lx\n", cr0, cr2, cr3, cr4);
+
+ get_debugreg(d0, 0);
+ get_debugreg(d1, 1);
+ get_debugreg(d2, 2);
+ get_debugreg(d3, 3);
+ printk("DR0: %08lx DR1: %08lx DR2: %08lx DR3: %08lx\n",
+ d0, d1, d2, d3);
+ get_debugreg(d2, 6);
+ get_debugreg(d3, 7);
+ printk(" DR6: %08lx DR7: %08lx\n", d2, d3);
+
show_trace(NULL, regs, &regs->esp);
}


2007-05-17 20:39:17

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Sun, 13 May 2007, Roland McGrath wrote:

> You may need some memory barriers around the switching/restart stuff.
> In fact, I think it would be better not to delve into reinventing the
> low-level bits there at all. Instead use read_seqcount_retry there
> (linux/seqlock.h). Using that read_seqcount_begin's value as the
> number to compare in thbi would also give a 32-bit sequence number.

I took a look at seqlock.h. It turns out not to be a good match for my
requirements; the header file specifically says that it won't work with
data that contains pointers. But changing over to regular 32-bit
sequence numbers was straightforward.

The "switching"/"restart" stuff doesn't need memory barriers because
all the communication is between two routines on the same CPU. Nor are
memory barriers needed in the rest of the code for the kernel
breakpoint updates; the IPI mechanism already provides its own.

However there is one oddball case which does seem to require a memory
barrier: when a new CPU comes online (either for the first time or
during return from hibernation). There's a hook to load the initial
debug register values, and it runs in an atomic context so I can't
grab the mutex. The hook is called in two places:

arch/i386/power/cpu.c: fix_processor_context(), and
arch/i386/kernel/smpboot.c: start_secondary().

A memory barrier is necessary to avoid chaos if another CPU should
happen to update the kernel breakpoint settings at the same time. If
you can suggest a way around it, please do.

> It looks to me like there is quite a lot to be shared. Of course the
> code can refer to constants like HB_NUM, they just have to be defined
> per machine. The dr7 stuff can all be a couple of simple arch_foo
> hooks, which will be empty on other machines. All of the list-managing
> logic, the prio stuff, etc., would be bad to copy.
>
> The two flavors could probably be accomodated cleanly with an
> HB_TYPES_NUM macro that's 1 on x86 and 2 on ia64, and is used in loops
> around some of the calls. I'm not suggesting you try to figure out
> that code structure ahead of time. But I don't think it will be a big
> barrier to code sharing.

Okay, if I don't worry about machines with two sets of code & data
debug registers (HB_TYPES_NUM = 2) then yes, quite a lot of the code is
sharable. There will be a few arch-specific hooks to:

Store the values into the debug registers;

Take care of the DR7 calculations;

Do address limit verification (see whether a pointer
lies in user space or kernel space).

Nothing more seems to be needed. Then there will be unsharable code,
including:

Dumping the debug registers while creating an aout-type
core image;

All the legacy ptrace stuff;

The notify-handler itself.

Does all that sound about right?

> I also want to get this machine-independent code sharing going for
> real. I'd like to have powerpc working as the non-x86 demonstration
> before we declare things in good shape. I don't expect you to write
> any powerpc support, but I hope I can get you to do the arch code
> separation to make the way for it. If you'll take a crack at it, I'll
> fill in and test the powerpc bits and I think we'll get something very
> satisfactory ironed out pretty fast.

How should this be arranged so that it can build okay on all platforms,
even ones where the low-level support code hasn't been written? Maybe
an arch-dependent CONFIG_HW_BREAKPOINT option?

Alan Stern

2007-05-23 08:47:55

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> > This does not happen in reality. Breakpoints can only be set by the
> > debugger, not by the program itself. The debugger should always eat the trap.
>
> Hmmm. I put in a little extra code to account for the possibility that
> a program might want to set hardware breakpoints in itself. Should
> this be removed?

Do you just mean a register_hw_breakpoint call made on current? That
certainly ought to work. That's still "the debugger", i.e. in utracespeak
the tracing engine. My point was that there will never be a facility
intended for a program to use hw_breakpoint to generate a signal that gets
delivered to a handler in the vanilla way. There's always some "outside"
agent who asked for the breakpoint and who is responsible for responding to
the traps it causes, never the program itself so as it would make sense for
it to actually see the signal in the end.

> > I guess my main objection to having .type and .len is the false implied
> > documentation of their presence and names, leading to people thinking they
> > can look at those values. In fact, they are machine-specific and
> > implementation-specific bits of no intrinsic use to anyone else.
>
> The fact that they are machine-specific and implementation-specific
> doesn't necessarily make them of no use. See the driver below.

The code in bp_show is exactly the kind of wrong I want to prevent. When I
say they are machine-specific and implementation-specific, I mean there is
no specified part of the interface to which you can presume they correspond
directly. The powerpc implementation will not have any field that is set
to HW_BREAKPOINT_LEN_8 and may well have none set to the type macros
either. If you want to have some machine-specific macros or inlines to
yield the HW_BREAKPOINT_* values for a struct hw_breakpoint, then fine.

> Allow me to rephrase: When a debug exception occurs, the real DR6 value
> should be copied to vdr6, except that kprobes should adjust DR_STEP and
> hw_breakpoint should adjust the DR_TRAPn bits appropriately. There's
> some question about what value the debug exception handler should write
> back to DR6, if anything.

Agreed.

> As for what users expect of the low four bits, you are definitely
> wrong. My tests with gdb show that it relies on the CPU to clear those
> bits whenever a data breakpoint is hit; it doesn't clear them itself
> and it doesn't work properly if the kernel keeps virtualized versions
> of them set. That's on a Pentium 4 and on an AMD Duron.

Ok. We were both going on what the manual said and I was assuming that
some chip had actually behaved that way and thus that's what users expect.

> Values written back to DR6 were retained in the register until
> the next debug exception occurred.

Ok. This behavior is invisible anyway.

> When the exception handler read DR6, the 0xffff0ff0 bits were
> set every time. The 0x00001000 bit was never set, even if it
> had been turned on before the exception occurred.

Ok. That is not really surprising.

> No matter what values were stored in the low four bits
> beforehand, when the exception occurred DR6 had only the
> bit for the debug register which was triggered.

Ok. This makes the users' expectations make sense. Maybe we can get the
Intel and AMD people to change the manual not to be misleading about this
(it says something terse about "never clears" and without more details I
read it as "never clears any bit, ever").

What about DR_STEP? i.e., if DR_STEP was set from a single-step and then
there was a DR_TRAPn debug exception, is DR_STEP still set? If DR_TRAPn
was set and then you single-step, is DR_TRAPn cleared?

> If the handler wrote back any of BS, BT, or BD to DR6, then
> the system misbehaved. I don't know exactly what happened,
> but my shell process ended and the debug handler got called
> over and over again (as if stuck in a loop) for several
> seconds.

Yowza. That is really surprising.

> In light of these results, the best approach appears to be either to
> leave DR6 alone or to set it to 0.

Agreed. I suspect clearing it to zero is the right thing (given what the
hardware manuals say), even if it appears that DR_STEP and DR_TRAPn do
reset each other on the chips we have on hand.

> Below is a patch containing a driver meant for testing kernel hardware
> breakpoints. Instructions are in the comments at the top. You can
> build the driver by typing "make M=bptest" at the top level.

Thanks.

> The patch also adjust the Alt-SysRq-P handler to print out the debug
> register values along with all the other stuff.

I think you should post that little patch (and equivalent for x86_64) by
itself. There's no reason that shouldn't go right in.

> I took a look at seqlock.h. It turns out not to be a good match for my
> requirements; the header file specifically says that it won't work with
> data that contains pointers.

There is no black magic about that, it's just saying that seqlock/seqcount
does not do any implicit synchronization with your data structure
management. If the pointers in question are protected by RCU, there is no
problem (if your read_seqcount_retry loop is inside rcu_read_lock). Since
the caller supplies the pointers, not requiring them to be freed by RCU
would be simplest for callers. So what seems natural to me is to have a
simple unsigned long kdr[4] array that's updated by register/unregister
calls (while they hold the mutex to exclude each other).

> The "switching"/"restart" stuff doesn't need memory barriers because
> all the communication is between two routines on the same CPU. Nor are
> memory barriers needed in the rest of the code for the kernel
> breakpoint updates; the IPI mechanism already provides its own.

Ok. I thought we were talking about using seqlock to safely read from a
single global data set that's updated in place. I can't really see why
anything but bp_task actually needs to be per-cpu.

> A memory barrier is necessary to avoid chaos if another CPU should
> happen to update the kernel breakpoint settings at the same time. If
> you can suggest a way around it, please do.

The natural thing to me would be to just use the same seqcount-based update
style from a global kdr[4] here.

> Take care of the DR7 calculations;

Call it a generic "make it go after setting each debug register".
For most other machines this will be a no-op.

> Do address limit verification (see whether a pointer
> lies in user space or kernel space).

This is probably always < TASK_SIZE_OF (or TASK_SIZE #ifndef),
but it is probably right to make it an arch macro.

> Dumping the debug registers while creating an aout-type
> core image;

Ha. Probably noone else has that arcane bit of compatibility to do, in fact.

> All the legacy ptrace stuff;

Right. There is nothing in common about this except for needing something
(so maybe an arch-defined struct inside the struct thread_hw_breakpoint).

> Does all that sound about right?

It does.

> How should this be arranged so that it can build okay on all platforms,
> even ones where the low-level support code hasn't been written? Maybe
> an arch-dependent CONFIG_HW_BREAKPOINT option?

I am no authority on kconfig, so seek other advice.

What kprobes does is a separate "config KPROBES" in each arch/foo/Kconfig.
This means that the details and help text must be given separately for each
one. This is a bug or a feature, depending on whether you dislike
repeating the same help text in several places and having it drift and not
stay uniformly maintained, or you want to include different arch-specific
details in the text.

The other option I see is one central:

config HW_BREAKPOINT
depends on X86 || X86_64 || ...
...

AIUI, with this, arch/foo/Kconfig can still have just the lines:

config HW_BREAKPOINT
depends on !FOOBAR

when the FOOBAR submodel of the foo arch does not have the hardware
support, or for whatever reason an arch adds more constraints to the
generically-defined config option.

The latter one is what I would do, but I might get corrected if I did.


Thanks,
Roland


2007-06-01 19:39:19

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Wed, 23 May 2007, Roland McGrath wrote:

> > > I guess my main objection to having .type and .len is the false implied
> > > documentation of their presence and names, leading to people thinking they
> > > can look at those values. In fact, they are machine-specific and
> > > implementation-specific bits of no intrinsic use to anyone else.
> >
> > The fact that they are machine-specific and implementation-specific
> > doesn't necessarily make them of no use. See the driver below.
>
> The code in bp_show is exactly the kind of wrong I want to prevent. When I
> say they are machine-specific and implementation-specific, I mean there is
> no specified part of the interface to which you can presume they correspond
> directly. The powerpc implementation will not have any field that is set
> to HW_BREAKPOINT_LEN_8 and may well have none set to the type macros
> either. If you want to have some machine-specific macros or inlines to
> yield the HW_BREAKPOINT_* values for a struct hw_breakpoint, then fine.

I really don't understand your point here. What's wrong with bp_show?
Is it all the preprocessor conditionals? I thought that was how we had
agreed portable code should determine which types and lengths were
supported on a particular architecture.

Consider that the definition of struct hw_breakpoint is in
include/asm-generic/. Hence .type and .len are guaranteed to be
present on all architectures; we can't just leave them out on some
while including them on others. In particular, .len _will_ always be
equal to HW_BREAKPOINT_LEN_8 on PPC. (Of course, you're always
free to define HW_BREAKPOINT_LEN_8 as 0 in the arch-specific header
file if you want, so this doesn't mean as much as it might seem.)

Consider also that .type and .len impose no overhead on architectures
that don't care about them. The space they use up would be wasted
otherwise. It seems that what you want would complicate the x86
implementations significantly without offering any real benefit to
others.

The one thing which makes sense to me is that some architectures might
want to store type and/or length bits in along with the address field.
So I added documentation explaining that there may be arch-specific
changes to .address while a breakpoint is registered, and I added
arch-specific accessors to fetch the true address value. There are
also arch-specific hooks where those bits can be set and removed.


> What about DR_STEP? i.e., if DR_STEP was set from a single-step and then
> there was a DR_TRAPn debug exception, is DR_STEP still set? If DR_TRAPn
> was set and then you single-step, is DR_TRAPn cleared?

I didn't experiment with using DR_STEP. There wasn't any simple way to
cause a single-step exception. Perhaps if I were more familiar with
kprobes...

> > If the handler wrote back any of BS, BT, or BD to DR6, then
> > the system misbehaved. I don't know exactly what happened,
> > but my shell process ended and the debug handler got called
> > over and over again (as if stuck in a loop) for several
> > seconds.
>
> Yowza. That is really surprising.

Even more surprising was that it stopped and settled back down to
normal after a little while! I'm not accustomed to seeing infinite
loops come to an end. :-)

> > In light of these results, the best approach appears to be either to
> > leave DR6 alone or to set it to 0.
>
> Agreed. I suspect clearing it to zero is the right thing (given what the
> hardware manuals say), even if it appears that DR_STEP and DR_TRAPn do
> reset each other on the chips we have on hand.

Yes. The new version sets it to 0.


> > I took a look at seqlock.h. It turns out not to be a good match for my
> > requirements; the header file specifically says that it won't work with
> > data that contains pointers.
>
> There is no black magic about that, it's just saying that seqlock/seqcount
> does not do any implicit synchronization with your data structure
> management. If the pointers in question are protected by RCU, there is no
> problem (if your read_seqcount_retry loop is inside rcu_read_lock). Since
> the caller supplies the pointers, not requiring them to be freed by RCU
> would be simplest for callers. So what seems natural to me is to have a
> simple unsigned long kdr[4] array that's updated by register/unregister
> calls (while they hold the mutex to exclude each other).

In fact, I don't need the seqcount stuff at all. Just about everything
it provides is already covered by RCU. One of the secrets is to move
the counter (gennum) into the RCU-protected structure.

> Ok. I thought we were talking about using seqlock to safely read from a
> single global data set that's updated in place. I can't really see why
> anything but bp_task actually needs to be per-cpu.

The other secret is to have shared access only to the global data in
the RCU-protected structure, which means storing an array of pointers
to the highest-priority kernel breakpoints there, as you suggest. The
data which gets updated in place then doesn't need to be shared, so it
doesn't need seqlock.

And you're basically right about the per-cpu data. Now it contains
only two values: bp_task and cur_kbpdata (a pointer to the most
recently used version of the RCU-protected data).


> > How should this be arranged so that it can build okay on all platforms,
> > even ones where the low-level support code hasn't been written? Maybe
> > an arch-dependent CONFIG_HW_BREAKPOINT option?
>
> I am no authority on kconfig, so seek other advice.

I decided on something simpler than messing around with Kconfig. I put
all the generic code in kernel/hw_breakpoint.c, together with an
explanation that the file isn't meant to be compiled standalone but
instead should be #include'd by the arch-specific file. So things are
nice and separate, and the new routines don't get built into the kernel
unless the arch can use them.

It wasn't so easy to separate out the generic portions of the data
structure definitions, so I didn't bother to try. There are comments
indicating the boundaries between the generic and arch-specific parts.

This is getting pretty close to a final form. The patch below is for
2.6.22-rc3. See what you think...

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,30 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint address accessors */
+inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
+{
+ return bp->address.kernel;
+}
+
+inline const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *bp)
+{
+ return bp->address.user;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#endif /* __KERNEL__ */
+#endif /* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@

#include <asm/tlbflush.h>
#include <asm/cpu.h>
+#include <asm/debugreg.h>

asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");

@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
*/
void exit_thread(void)
{
+ struct task_struct *tsk = current;
+
/* The process may have allocated an io port bitmap... nuke it. */
if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
- struct task_struct *tsk = current;
struct thread_struct *t = &tsk->thread;
int cpu = get_cpu();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
put_cpu();
}
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
}

void flush_thread(void)
{
struct task_struct *tsk = current;

- memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
- memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
/*
* Forget coprocessor state..
*/
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl

savesegment(gs,p->thread.gs);

+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
+
tsk = current;
+ err = -ENOMEM;
+ if (unlikely(tsk->thread.hw_breakpoint_info)) {
+ if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+ goto out;
+ }
+
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL);
- if (!p->thread.io_bitmap_ptr) {
- p->thread.io_bitmap_max = 0;
- return -ENOMEM;
- }
+ if (!p->thread.io_bitmap_ptr)
+ goto out;
set_tsk_thread_flag(p, TIF_IO_BITMAP);
}

@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl

err = 0;
out:
- if (err && p->thread.io_bitmap_ptr) {
+ if (err) {
+ flush_thread_hw_breakpoint(p);
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
*/
void dump_thread(struct pt_regs * regs, struct user * dump)
{
- int i;
+ struct task_struct *tsk = current;

/* changed the size calculations - should hopefully work better. lbt */
dump->magic = CMAGIC;
dump->start_code = 0;
dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
- dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
- dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+ dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+ dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
- for (i = 0; i < 8; i++)
- dump->u_debugreg[i] = current->thread.debugreg[i];
+
+ dump_thread_hw_breakpoint(tsk, dump->u_debugreg);

if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st

next = &next_p->thread;

- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg[0], 0);
- set_debugreg(next->debugreg[1], 1);
- set_debugreg(next->debugreg[2], 2);
- set_debugreg(next->debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(next->debugreg[6], 6);
- set_debugreg(next->debugreg[7], 7);
- }
-
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
* Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
set_iopl_mask(next->iopl);

/*
- * Now maybe handle debug registers and/or IO bitmaps
+ * Now maybe handle IO bitmaps
*/
if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
|| test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t

x86_write_percpu(current_task, next_p);

+ /*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
+
return prev_p;
}

Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt

signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /* Reenable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (unlikely(current->thread.debugreg[7]))
- set_debugreg(current->thread.debugreg[7], 7);

/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,46 @@ fastcall void __kprobes do_int3(struct p
*/
fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
{
- unsigned int condition;
struct task_struct *tsk = current;
+ unsigned long dr6;

- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
+ if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+ tsk->thread.vdr6 = 0;

- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
- SIGTRAP) == NOTIFY_STOP)
+ if (notify_die(DIE_DEBUG, "debug", regs, (long) &dr6, error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;
+
/* It's safe to allow irq's after DR6 has been saved */
if (regs->eflags & X86_EFLAGS_IF)
local_irq_enable();

- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg[7])
- goto clear_dr7;
+ if (regs->eflags & VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}

- if (regs->eflags & VM_MASK)
- goto debug_vm86;
-
- /* Save debug status register where ptrace can see it */
- tsk->thread.debugreg[6] = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- /*
- * We already checked v86 mode above, so we can
- * check for kernel mode by just checking the CPL
- * of CS.
- */
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ dr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->eflags &= ~X86_EFLAGS_TF;
}

- /* Ok, finally something we can handle */
- send_sigtrap(tsk, regs, error_code);
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;

- /* Disable additional traps. They'll be re-enabled when
- * the signal is delivered.
- */
-clear_dr7:
- set_debugreg(0, 7);
- return;
-
-debug_vm86:
- handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->eflags &= ~TF_MASK;
- return;
+ if (dr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+ send_sigtrap(tsk, regs, error_code);
}

/*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@

#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */

#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -61,4 +63,29 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */

+
+/*
+ * HW breakpoint additions
+ */
+
+#define HB_NUM 4 /* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+ set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
#endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
unsigned long esp;
unsigned long fs;
unsigned long gs;
-/* Hardware debugging registers */
- unsigned long debugreg[8]; /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+ unsigned long vdr6;
+ struct thread_hw_breakpoint *hw_breakpoint_info;
/* fault info */
unsigned long cr2, trap_no, error_code;
/* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,631 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+ How to know whether RF should be cleared when setting a user
+ execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+ /* utrace support */
+ struct list_head node; /* Entry in thread list */
+ struct list_head thread_bps; /* Thread's breakpoints */
+ struct hw_breakpoint *bps[HB_NUM]; /* Highest-priority bps */
+ int num_installed; /* Number of installed bps */
+ unsigned gennum; /* update-generation number */
+
+ /* Only the portions below are arch-specific */
+
+ /* ptrace support -- Note that vdr6 is stored directly in the
+ * thread_struct so that it is always available.
+ */
+ unsigned long vdr7; /* Virtualized DR7 */
+ struct hw_breakpoint vdr_bps[HB_NUM]; /* Breakpoints
+ representing virtualized debug registers 0 - 3 */
+ unsigned long tdr[HB_NUM]; /* and their addresses */
+ unsigned long tdr7; /* Thread's DR7 value */
+ unsigned long tkdr7; /* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+ unsigned gennum; /* Generation number */
+ int num_kbps; /* Number of kernel bps */
+ struct hw_breakpoint *bps[HB_NUM]; /* Loaded breakpoints */
+
+ /* Only the portions below are arch-specific */
+ unsigned long mkdr7; /* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+ struct kernel_bp_data *cur_kbpdata; /* Current kbpdata[] entry */
+ struct task_struct *bp_task; /* The thread whose bps
+ are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data kbpdata[2]; /* Old and new settings */
+static int cur_kbpindex; /* Alternates 0, 1, ... */
+static struct kernel_bp_data *cur_kbpdata = &kbpdata[0];
+ /* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8 tprio[HB_NUM]; /* Thread bp max priorities */
+static LIST_HEAD(kernel_bps); /* Kernel breakpoint list */
+static LIST_HEAD(thread_list); /* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex); /* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long kdr7; /* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps. Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long kdr7_masks[HB_NUM + 1] = {
+ 0x00000000,
+ 0x000f0003, /* LEN0, R/W0, G0, L0 */
+ 0x00ff000f, /* Same for 0,1 */
+ 0x0fff003f, /* Same for 0,1,2 */
+ 0xffff00ff /* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+ struct hw_breakpoint **bps;
+
+ /* Don't allow debug exceptions while we update the registers */
+ set_debugreg(0, 7);
+ chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+ /* Kernel breakpoints are stored starting in DR0 and going up */
+ bps = chbi->cur_kbpdata->bps;
+ switch (chbi->cur_kbpdata->num_kbps) {
+ case 4:
+ set_debugreg(bps[3]->address.va, 3);
+ case 3:
+ set_debugreg(bps[2]->address.va, 2);
+ case 2:
+ set_debugreg(bps[1]->address.va, 1);
+ case 1:
+ set_debugreg(bps[0]->address.va, 0);
+ }
+ /* No need to set DR6 */
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+ struct kernel_bp_data *thr_kbpdata)
+{
+ int num = thr_kbpdata->num_kbps;
+
+ thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static inline void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+ /* Install the user breakpoints. Kernel breakpoints are stored
+ * starting in DR0 and going up; there are num_kbps of them.
+ * User breakpoints are stored starting in DR3 and going down,
+ * as many as we have room for.
+ */
+ switch (thbi->num_installed) {
+ case 4:
+ set_debugreg(thbi->tdr[0], 0);
+ case 3:
+ set_debugreg(thbi->tdr[1], 1);
+ case 2:
+ set_debugreg(thbi->tdr[2], 2);
+ case 1:
+ set_debugreg(thbi->tdr[3], 3);
+ }
+ /* No need to set DR6 */
+ set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static inline void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static inline void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+ int num = new_kbpdata->num_kbps;
+
+ new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static inline int arch_check_va_in_userspace(unsigned long va,
+ struct task_struct *tsk)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE_OF(t) TASK_SIZE
+#endif
+ return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static inline int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE64 TASK_SIZE
+#endif
+ return (va >= TASK_SIZE64);
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static inline unsigned long encode_dr7(int drnum, u8 len, u8 type)
+{
+ unsigned long temp;
+
+ temp = (len | type) & 0xf;
+ temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+ temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+ DR_GLOBAL_SLOWDOWN;
+ return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+ int is_user;
+ struct list_head *bp_list;
+ struct hw_breakpoint *bp;
+ int i;
+ int drnum;
+ unsigned long dr7;
+
+ if (thbi) {
+ is_user = 1;
+ bp_list = &thbi->thread_bps;
+ drnum = HB_NUM - 1;
+ } else {
+ is_user = 0;
+ bp_list = &kernel_bps;
+ drnum = 0;
+ }
+
+ /* Kernel bps are assigned from DR0 on up, and user bps are assigned
+ * from DR3 on down. Accumulate all 4 bps; the kernel DR7 mask will
+ * select the appropriate bits later.
+ */
+ dr7 = 0;
+ i = 0;
+ list_for_each_entry(bp, bp_list, node) {
+
+ /* Get the debug register number and accumulate the bits */
+ dr7 |= encode_dr7(drnum, bp->len, bp->type);
+ if (++i >= HB_NUM)
+ break;
+ if (is_user)
+ --drnum;
+ else
+ ++drnum;
+ }
+ return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static inline void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+
+ /* If this is an execution breakpoint for the current PC address,
+ * we should clear the task's RF so that the bp will be certain
+ * to trigger.
+ *
+ * FIXME: It's not so easy to get hold of the task's PC as a linear
+ * address! ptrace.c does this already...
+ */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static inline void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static inline void arch_register_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static inline void arch_unregister_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ memset(u_debugreg, 0, sizeof u_debugreg);
+ if (thbi) {
+ for (i = 0; i < HB_NUM; ++i)
+ u_debugreg[i] = thbi->vdr_bps[i].address.va;
+ u_debugreg[7] = thbi->vdr7;
+ }
+ u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ /* Store in the virtual DR6 register the fact that the breakpoint
+ * was hit so the thread's debugger will see it, and send the
+ * debugging signal.
+ */
+ if (thbi) {
+ i = bp - thbi->vdr_bps;
+ tsk->thread.vdr6 |= (DR_TRAP0 << i);
+ send_sigtrap(tsk, regs, 0);
+ }
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_hw_breakpoint *thbi;
+ unsigned long val = 0;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ thbi = tsk->thread.hw_breakpoint_info;
+ if (n < HB_NUM) {
+ if (thbi)
+ val = (unsigned long) thbi->vdr_bps[n].address.va;
+ } else if (n == 6) {
+ val = tsk->thread.vdr6;
+ } else if (n == 7) {
+ if (thbi)
+ val = thbi->vdr7;
+ }
+ mutex_unlock(&hw_breakpoint_mutex);
+ return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7. Return the "enabled" status.
+ */
+static inline int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+ int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+ *len = (temp & 0xc) | 0x40;
+ *type = (temp & 0x3) | 0x80;
+ return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+ struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+ struct hw_breakpoint *bp;
+ int i;
+ int rc = 0;
+ unsigned long old_dr7 = thbi->vdr7;
+
+ data &= ~DR_CONTROL_RESERVED;
+
+ /* Loop through all the hardware breakpoints, making the
+ * appropriate changes to each.
+ */
+ restore_settings:
+ thbi->vdr7 = data;
+ bp = &thbi->vdr_bps[0];
+ for (i = 0; i < HB_NUM; (++i, ++bp)) {
+ int enabled;
+ u8 len, type;
+
+ enabled = decode_dr7(data, i, &len, &type);
+
+ /* Unregister the breakpoint before trying to change it */
+ if (bp->status)
+ __unregister_user_hw_breakpoint(tsk, bp);
+
+ /* Insert the breakpoint's new settings */
+ bp->len = len;
+ bp->type = type;
+
+ /* Now register the breakpoint if it should be enabled.
+ * New invalid entries will raise an error here.
+ */
+ if (enabled) {
+ bp->triggered = ptrace_triggered;
+ bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+ if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+ rc == 0)
+ break;
+ }
+ }
+
+ /* If anything above failed, restore the original settings */
+ if (i < HB_NUM) {
+ rc = -EIO;
+ data = old_dr7;
+ goto restore_settings;
+ }
+ return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+ struct thread_hw_breakpoint *thbi;
+ int rc = -EIO;
+
+ /* We have to hold this lock the entire time, to prevent thbi
+ * from being deallocated out from under us.
+ */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* There are no DR4 or DR5 registers */
+ if (n == 4 || n == 5)
+ ;
+
+ /* Writes to DR6 modify the virtualized value */
+ else if (n == 6) {
+ tsk->thread.vdr6 = val;
+ rc = 0;
+ }
+
+ else if (!tsk->thread.hw_breakpoint_info && val == 0)
+ rc = 0; /* Minor optimization */
+
+ else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+ rc = -ENOMEM;
+
+ /* Writes to DR0 - DR3 change a breakpoint address */
+ else if (n < HB_NUM) {
+ struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+ /* If the breakpoint is registered then unregister it,
+ * change it, and re-register it. Revert to the original
+ * address if an error occurs.
+ */
+ if (bp->status) {
+ unsigned long old_addr = bp->address.va;
+
+ __unregister_user_hw_breakpoint(tsk, bp);
+ bp->address.va = val;
+ rc = __register_user_hw_breakpoint(tsk, bp);
+ if (rc < 0) {
+ bp->address.va = old_addr;
+ __register_user_hw_breakpoint(tsk, bp);
+ }
+ } else {
+ bp->address.va = val;
+ rc = 0;
+ }
+ }
+
+ /* All that's left is DR7 */
+ else
+ rc = ptrace_write_dr7(tsk, thbi, val);
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+ struct cpu_hw_breakpoint *chbi;
+ int i;
+ struct hw_breakpoint *bp;
+ struct thread_hw_breakpoint *thbi = NULL;
+
+ /* A pointer to the DR6 value is stored in args->err */
+#define DR6 (* (unsigned long *) (args->err))
+
+ if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_DONE;
+
+ /* Assert that local interrupts are disabled */
+
+ /* Are we a victim of lazy debug-register switching? */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ if (!chbi->bp_task)
+ ;
+ else if (chbi->bp_task != current) {
+
+ /* No user breakpoints are valid. Perform the belated
+ * debug-register switch.
+ */
+ switch_to_none_hw_breakpoint();
+ } else
+ thbi = chbi->bp_task->thread.hw_breakpoint_info;
+
+ /* Disable all breakpoints so that the callbacks can run without
+ * triggering recursive debug exceptions.
+ */
+ set_debugreg(0, 7);
+
+ /* Handle all the breakpoints that were triggered */
+ for (i = 0; i < HB_NUM; ++i) {
+ if (likely(!(DR6 & (DR_TRAP0 << i))))
+ continue;
+
+ /* Find the corresponding hw_breakpoint structure and
+ * invoke its triggered callback.
+ */
+ if (i < chbi->cur_kbpdata->num_kbps)
+ bp = chbi->cur_kbpdata->bps[i];
+ else if (thbi)
+ bp = thbi->bps[i];
+ else /* False alarm due to lazy DR switching */
+ continue;
+ if (bp) { /* Should always be non-NULL */
+
+ /* Set RF at execution breakpoints */
+ if (bp->type == HW_BREAKPOINT_EXECUTE)
+ args->regs->eflags |= X86_EFLAGS_RF;
+ (bp->triggered)(bp, args->regs);
+ }
+ }
+
+ /* Re-enable the breakpoints */
+ set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+ put_cpu_no_resched();
+
+ /* Mask away the bits we have handled */
+ DR6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+ /* Early exit from the notifier chain if everything has been handled */
+ if (DR6 == 0)
+ return NOTIFY_STOP;
+ return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+ if (val != DIE_DEBUG)
+ return NOTIFY_DONE;
+ return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
tmp = 0; /* Default return condition */
if(addr < FRAME_SIZE*sizeof(long))
tmp = getreg(child, addr);
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
+ else if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
addr -= (long) &dummy->u_debugreg[0];
addr = addr >> 2;
- tmp = child->thread.debugreg[addr];
+ tmp = thread_get_debugreg(child, addr);
}
ret = put_user(tmp, datap);
break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
have to be selective about what portions we allow someone
to modify. */

- ret = -EIO;
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
-
- if(addr == (long) &dummy->u_debugreg[4]) break;
- if(addr == (long) &dummy->u_debugreg[5]) break;
- if(addr < (long) &dummy->u_debugreg[4] &&
- ((unsigned long) data) >= TASK_SIZE-3) break;
-
- /* Sanity-check data. Take one half-byte at once with
- * check = (val >> (16 + 4*i)) & 0xf. It contains the
- * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
- * 2 and 3 are LENi. Given a list of invalid values,
- * we do mask |= 1 << invalid_value, so that
- * (mask >> check) & 1 is a correct test for invalid
- * values.
- *
- * R/Wi contains the type of the breakpoint /
- * watchpoint, LENi contains the length of the watched
- * data in the watchpoint case.
- *
- * The invalid values are:
- * - LENi == 0x10 (undefined), so mask |= 0x0f00.
- * - R/Wi == 0x10 (break on I/O reads or writes), so
- * mask |= 0x4444.
- * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
- * 0x1110.
- *
- * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
- *
- * See the Intel Manual "System Programming Guide",
- * 15.2.4
- *
- * Note that LENi == 0x10 is defined on x86_64 in long
- * mode (i.e. even for 32-bit userspace software, but
- * 64-bit kernel), so the x86_64 mask value is 0x5454.
- * See the AMD manual no. 24593 (AMD64 System
- * Programming)*/
-
- if(addr == (long) &dummy->u_debugreg[7]) {
- data &= ~DR_CONTROL_RESERVED;
- for(i=0; i<4; i++)
- if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- goto out_tsk;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- }
- addr -= (long) &dummy->u_debugreg;
- addr = addr >> 2;
- child->thread.debugreg[addr] = data;
- ret = 0;
+ if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
+ addr -= (long) &dummy->u_debugreg;
+ addr = addr >> 2;
+ ret = thread_set_debugreg(child, addr, data);
}
break;

@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_request(child, request, addr, data);
break;
}
- out_tsk:
return ret;
}

Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
- quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+ quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+ hw_breakpoint.o

obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
#include <linux/suspend.h>
#include <asm/mtrr.h>
#include <asm/mce.h>
+#include <asm/debugreg.h>

static struct saved_context saved_context;

@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
ctxt->cr2 = read_cr2();
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4();
+
+ disable_debug_registers();
}

void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)

load_TR_desc(); /* This does ltr */
load_LDT(&current->active_mm->context); /* This does lldt */
-
- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg[7]){
- set_debugreg(current->thread.debugreg[0], 0);
- set_debugreg(current->thread.debugreg[1], 1);
- set_debugreg(current->thread.debugreg[2], 2);
- set_debugreg(current->thread.debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(current->thread.debugreg[6], 6);
- set_debugreg(current->thread.debugreg[7], 7);
- }
-
+ load_debug_registers();
}

void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -660,9 +660,18 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
+
+ /* A pointer to the DR6 value is stored in args->err */
+#define DR6 (* (unsigned long *) (args->err))
+
+ if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
+ DR6 &= ~DR_STEP;
+ if (DR6 == 0)
+ ret = NOTIFY_STOP;
+ }
break;
+#undef DR6
+
case DIE_GPF:
case DIE_PAGE_FAULT:
/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,224 @@
+#ifndef _ASM_GENERIC_HW_BREAKPOINT_H
+#define _ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints. These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer. While a breakpoint
+ * is registered @address may be modified in an arch-specific manner;
+ * to retrieve its value during this period use the accessor routines
+ * hw_breakpoint_get_kaddr() or hw_breakpoint_get_uaddr().
+ *
+ * @len encodes the breakpoint's extent in bytes, which is subject to
+ * certain limitations. include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture. Note that
+ * @address must have the alignment specified by @len. The breakpoint
+ * will catch accesses to any byte in the range from @address to @address
+ * + (N - 1), where N is the value encoded by @len.
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * %HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_WRITE (triggered on write access), and
+ * %HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures. Execute breakpoints
+ * must have @len equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * In register_user_hw_breakpoint(), @address must refer to a location in
+ * user space (set @address.user). The breakpoint will be active only
+ * while the requested task is running. Conversely in
+ * register_kernel_hw_breakpoint(), @address must refer to a location in
+ * kernel space (set @address.kernel), and the breakpoint will be active
+ * on all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers. Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception). All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource. Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away. Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete. %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs. A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered. The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur. It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be. Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context. Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled. As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported. (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.) The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint(). If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use. However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed. (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable. Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Breakpoint triggered\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * ..........<do anything>............
+ * my_bp.address.kernel = &pid_max;
+ * my_bp.type = HW_BREAKPOINT_WRITE;
+ * my_bp.len = HW_BREAKPOINT_LEN_4;
+ * my_bp.triggered = triggered;
+ * my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * rc = register_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * ..........<do anything>............
+ * unregister_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+ struct list_head node;
+ void (*triggered)(struct hw_breakpoint *, struct pt_regs *);
+ void (*installed)(struct hw_breakpoint *);
+ void (*uninstalled)(struct hw_breakpoint *);
+ union {
+ const void *kernel;
+ const void __user *user;
+ unsigned long va;
+ } address;
+ u8 len;
+ u8 type;
+ u8 priority;
+ u8 status;
+};
+
+/*
+ * Inline accessor routines to retrieve a breakpoint's address:
+ */
+extern const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *);
+extern const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture. On i386 the
+ * possibilities are:
+ *
+ * HW_BREAKPOINT_LEN_1
+ * HW_BREAKPOINT_LEN_2
+ * HW_BREAKPOINT_LEN_4
+ * HW_BREAKPOINT_LEN_EXECUTE
+ * HW_BREAKPOINT_RW
+ * HW_BREAKPOINT_READ
+ * HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable. You can use #ifdef
+ * to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL 25
+#define HW_BREAKPOINT_PRIO_PTRACE 50
+#define HW_BREAKPOINT_PRIO_HIGH 75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED 1
+#define HW_BREAKPOINT_INSTALLED 2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems. The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task. In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
#include <asm/cpufeature.h>
#include <asm/desc.h>
#include <asm/system.h>
+#include <asm/debugreg.h>

#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();

control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
#include <smpboot_hooks.h>
#include <asm/vmi.h>
#include <asm/mtrr.h>
+#include <asm/debugreg.h>

/* Set if we find a B stepping CPU */
static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
local_irq_enable();

wmb();
+ load_debug_registers();
cpu_idle();
}

@@ -1210,6 +1212,7 @@ int __cpu_disable(void)
fixup_irqs(map);
/* It's now safe to remove this processor from the online map */
cpu_clear(cpu, cpu_online_map);
+ disable_debug_registers();
return 0;
}

Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,759 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines. It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct cpu_hw_breakpoint *chbi;
+ struct kernel_bp_data *thr_kbpdata;
+
+ /* This routine is on the hot path; it gets called for every
+ * context switch into a task with active breakpoints. We
+ * must make sure that the common case executes as quickly as
+ * possible.
+ */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = tsk;
+
+ /* Use RCU to synchronize with external updates */
+ rcu_read_lock();
+
+ /* Other CPUs might be making updates to the list of kernel
+ * breakpoints at this time. If they are, they will modify
+ * the other entry in kbpdata[] -- the one not pointed to
+ * by chbi->cur_kbpdata. So the update itself won't affect
+ * us directly.
+ *
+ * However when the update is finished, an IPI will arrive
+ * telling this CPU to change chbi->cur_kbpdata. We need
+ * to use a single consistent kbpdata[] entry, the present one.
+ * So we'll copy the pointer to a local variable, thr_kbpdata,
+ * and we must prevent the compiler from aliasing the two
+ * pointers. Only a compiler barrier is required, not a full
+ * memory barrier, because everything takes place on a single CPU.
+ */
+ restart:
+ thr_kbpdata = chbi->cur_kbpdata;
+ barrier();
+
+ /* Normally we can keep the same debug register settings as the
+ * last time this task ran. But if the kernel breakpoints have
+ * changed or any user breakpoints have been registered or
+ * unregistered, we need to handle the updates and possibly
+ * send out some notifications.
+ */
+ if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+ struct hw_breakpoint *bp;
+ int i;
+ int num;
+
+ thbi->gennum = thr_kbpdata->gennum;
+ arch_update_thbi(thbi, thr_kbpdata);
+ num = thr_kbpdata->num_kbps;
+
+ /* This code can be invoked while a debugger is actively
+ * updating the thread's breakpoint list (for example, if
+ * someone sends SIGKILL to the task). We use RCU to
+ * protect our access to the list pointers. */
+ thbi->num_installed = 0;
+ i = HB_NUM;
+ list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+ /* If this register is allocated for kernel bps,
+ * don't install. Otherwise do. */
+ if (--i < num) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ }
+ } else {
+ ++thbi->num_installed;
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+ }
+
+ /* Set the debug register */
+ arch_install_thbi(thbi);
+
+ /* Were there any kernel breakpoint changes while we were running? */
+ if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+ /* DR0-3 might now be assigned to kernel bps and we might
+ * have messed them up. Reload all the kernel bps and
+ * then reload the thread bps.
+ */
+ arch_install_chbi(chbi);
+ goto restart;
+ }
+
+ rcu_read_unlock();
+ put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+ struct cpu_hw_breakpoint *chbi;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = NULL;
+
+ /* This routine gets called from only two places. In one
+ * the caller holds the hw_breakpoint_mutex; in the other
+ * interrupts are disabled. In either case, no kernel
+ * breakpoint updates can arrive while the routine runs.
+ * So we don't need to use RCU.
+ */
+ arch_install_none(chbi);
+ put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+ struct cpu_hw_breakpoint *chbi;
+ struct task_struct *tsk = current;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+
+ /* Install both the kernel and the user breakpoints */
+ arch_install_chbi(chbi);
+ if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ switch_to_thread_hw_breakpoint(tsk);
+
+ put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+ /* We don't need to use any sort of memory barrier. The IPI
+ * carried out by on_each_cpu() includes its own barriers.
+ */
+ on_each_cpu(update_this_cpu, NULL, 0, 0);
+ synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+ unsigned long flags;
+
+ /* Prevent IPIs for new kernel breakpoint updates */
+ local_irq_save(flags);
+
+ rcu_read_lock();
+ update_this_cpu(NULL);
+ rcu_read_unlock();
+
+ local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio. Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+ int i;
+
+ for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+ tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority. We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc. In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+ struct thread_hw_breakpoint *thbi;
+
+ memset(tprio, 0, sizeof tprio);
+
+ /* Loop through all threads having registered breakpoints
+ * and accumulate the maximum priority levels in tprio.
+ */
+ list_for_each_entry(thbi, &thread_list, node)
+ accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[]. The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU. If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+ int k, u;
+ int changed = 0;
+ struct hw_breakpoint *bp;
+ struct kernel_bp_data *new_kbpdata;
+
+ /* Determine how many debug registers are available for kernel
+ * breakpoints as opposed to user breakpoints, based on the
+ * priorities. Ties are resolved in favor of user bps.
+ */
+ k = 0; /* Next kernel bp to allocate */
+ u = HB_NUM - 1; /* Next user bp to allocate */
+ bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+ while (k <= u) {
+ if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+ --u; /* User bps win a slot */
+ else {
+ ++k; /* Kernel bp wins a slot */
+ if (bp->status != HW_BREAKPOINT_INSTALLED)
+ changed = 1;
+ bp = list_entry(bp->node.next, struct hw_breakpoint,
+ node);
+ }
+ }
+ if (k != cur_kbpdata->num_kbps)
+ changed = 1;
+
+ /* Notify the remaining kernel breakpoints that they are about
+ * to be uninstalled.
+ */
+ list_for_each_entry_from(bp, &kernel_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ changed = 1;
+ }
+ }
+
+ if (changed) {
+ cur_kbpindex ^= 1;
+ new_kbpdata = &kbpdata[cur_kbpindex];
+ new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+ new_kbpdata->num_kbps = k;
+ arch_new_kbpdata(new_kbpdata);
+ u = 0;
+ list_for_each_entry(bp, &kernel_bps, node) {
+ if (u >= k)
+ break;
+ new_kbpdata->bps[u] = bp;
+ ++u;
+ }
+ rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+ /* Tell all the CPUs to update their debug registers */
+ update_all_cpus();
+
+ /* Notify the breakpoints that just got installed */
+ for (u = 0; u < k; ++u) {
+ bp = new_kbpdata->bps[u];
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk)
+{
+ if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+ struct thread_hw_breakpoint *thbi;
+
+ thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+ GFP_KERNEL);
+ if (thbi) {
+ INIT_LIST_HEAD(&thbi->node);
+ INIT_LIST_HEAD(&thbi->thread_bps);
+
+ /* Force an update the next time tsk runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+ tsk->thread.hw_breakpoint_info = thbi;
+ }
+ }
+ return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct hw_breakpoint *bp;
+
+ if (!thbi)
+ return;
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Let the breakpoints know they are being uninstalled */
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+ }
+
+ /* Remove tsk from the list of all threads with registered bps */
+ list_del(&thbi->node);
+
+ /* The thread no longer has any breakpoints associated with it */
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ tsk->thread.hw_breakpoint_info = NULL;
+ kfree(thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities */
+ recalc_tprio();
+ balance_kernel_vs_user();
+
+ /* Actually uninstall the breakpoints if necessary */
+ if (tsk == current)
+ switch_to_none_hw_breakpoint();
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags)
+{
+ /* We will assume that breakpoint settings are not inherited
+ * and the child starts out with no debug registers set.
+ * But what about CLONE_PTRACE?
+ */
+ clear_tsk_thread_flag(child, TIF_DEBUG);
+ return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+ struct hw_breakpoint *bp;
+ int i;
+
+ i = HB_NUM - 1;
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ thbi->bps[i] = bp;
+ thbi->tdr[i] = bp->address.va;
+ if (--i < 0)
+ break;
+ }
+ while (i >= 0)
+ thbi->bps[i--] = NULL;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ * tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ * tsk->thread.hw_breakpoint_info is not NULL.
+ * tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ * iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ struct list_head *head;
+ int pos;
+ struct hw_breakpoint *temp_bp;
+
+ /* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+ if (tsk)
+ head = &thbi->thread_bps;
+ else
+ head = &kernel_bps;
+
+ /* Equal-priority breakpoints get listed first-come-first-served */
+ pos = 0;
+ list_for_each_entry(temp_bp, head, node) {
+ if (bp->priority > temp_bp->priority)
+ break;
+ ++pos;
+ }
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ list_add_tail(&bp->node, &temp_bp->node);
+
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ /* Is this the thread's first registered breakpoint? */
+ if (list_empty(&thbi->node)) {
+ set_tsk_thread_flag(tsk, TIF_DEBUG);
+ list_add(&thbi->node, &thread_list);
+ }
+ }
+ return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ /* Remove bp from the thread's/kernel's list. If the list is now
+ * empty we must clear the TIF_DEBUG flag. But keep the
+ * thread_hw_breakpoint structure, so that the virtualized debug
+ * register values will remain valid.
+ */
+ list_del(&bp->node);
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ if (list_empty(&thbi->thread_bps)) {
+ list_del_init(&thbi->node);
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ }
+ }
+
+ /* Tell the breakpoint it is being uninstalled */
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+ int rc = -EINVAL;
+ unsigned long len;
+
+ switch (bp->type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+ case HW_BREAKPOINT_EXECUTE:
+ if (bp->len != HW_BREAKPOINT_LEN_EXECUTE)
+ return rc;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+ case HW_BREAKPOINT_READ: break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+ case HW_BREAKPOINT_WRITE: break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+ case HW_BREAKPOINT_RW: break;
+#endif
+ default:
+ return rc;
+ }
+
+ switch (bp->len) {
+#ifdef HW_BREAKPOINT_LEN_1
+ case HW_BREAKPOINT_LEN_1:
+ len = 1;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+ case HW_BREAKPOINT_LEN_2:
+ len = 2;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+ case HW_BREAKPOINT_LEN_4:
+ len = 4;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+ case HW_BREAKPOINT_LEN_8:
+ len = 8;
+ break;
+#endif
+ default:
+ return rc;
+ }
+
+ /* Check that the low-order bits of the address are appropriate
+ * for the alignment implied by len.
+ */
+ if (bp->address.va & (len - 1))
+ return rc;
+
+ /* Check that the virtual address is in the proper range */
+ if (tsk) {
+ if (!arch_check_va_in_userspace(bp->address.va, tsk))
+ return rc;
+ } else {
+ if (!arch_check_va_in_kernelspace(bp->address.va))
+ return rc;
+ }
+
+ if (bp->triggered)
+ rc = 0;
+ return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ int rc;
+ struct thread_hw_breakpoint *thbi;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, tsk);
+ if (rc)
+ return rc;
+
+ thbi = alloc_thread_hw_breakpoint(tsk);
+ if (!thbi)
+ return -ENOMEM;
+
+ /* Insert bp in the thread's list and update the DR7 value */
+ pos = insert_bp_in_list(bp, thbi, tsk);
+ arch_register_user_hw_breakpoint(bp, thbi);
+
+ /* Update and rebalance the priorities. We don't need to go through
+ * the list of all threads; adding a breakpoint can only cause the
+ * priorities for this thread to increase.
+ */
+ accum_thread_tprio(thbi);
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+ rc = 1;
+
+ /* Does it need to be installed right now? */
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+ /* Otherwise it will get installed the next time tsk runs */
+ }
+
+ return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running. It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity. @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ int rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ rc = __register_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+ if (!bp->status)
+ return; /* Not registered */
+
+ /* Remove bp from the thread's list and update the DR7 value */
+ remove_bp_from_list(bp, thbi, tsk);
+ arch_unregister_user_hw_breakpoint(bp, thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities,
+ * and actually uninstall bp if necessary.
+ */
+ recalc_tprio();
+ balance_kernel_vs_user();
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ mutex_lock(&hw_breakpoint_mutex);
+ __unregister_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times. It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity. @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ int rc;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, NULL);
+ if (rc)
+ return rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Insert bp in the kernel's list and update the DR7 value */
+ pos = insert_bp_in_list(bp, NULL, NULL);
+ arch_register_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will install bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < cur_kbpdata->num_kbps)
+ rc = 1;
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ if (!bp->status)
+ return; /* Not registered */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Remove bp from the kernel's list and update the DR7 value */
+ remove_bp_from_list(bp, NULL, NULL);
+ arch_unregister_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will uninstall bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);

2007-06-14 06:49:19

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> I really don't understand your point here. What's wrong with bp_show?
> Is it all the preprocessor conditionals? I thought that was how we had
> agreed portable code should determine which types and lengths were
> supported on a particular architecture.

That part is fine. The problem is fetching the hw_breakpoint.len field
directly and expecting it to contain the API values. In an implementation
done as I've been referring to, there is no need for any field to contain
the HW_BREAKPOINT_LEN_8 value, and it's a waste to store one. If it were
hw_breakpoint_get_len(bp), that would be fine.

> Consider that the definition of struct hw_breakpoint is in
> include/asm-generic/. [...]
> The one thing which makes sense to me is that some architectures might
> want to store type and/or length bits in along with the address field.

Indeed, that is the natural thing (and all the bits needed) on several.
I hadn't raised this before since I was having so much trouble already
convincing you about storing things in machine-dependent fashion so that
users cannot just use the struct fields directly.

I really think it would be cleanest all around to use just:

struct arch_hw_breakpoint info;

in place of address union, len, type in struct hw_breakpoint. Then each
arch provides hw_breakpoint_get_{kaddr,uaddr,len,type} inlines. For
storing, each arch can define hw_breakpoint_init(addr, len, type) (or
maybe k/u variants). This can be used by callers directly if you want to
keep register_hw_breakpoint to one argument, or could just be internal if
register_hw_breakpoint takes the three more args. If callers use it
directly, there can also be an INIT_ARCH_HW_BREAKPOINT(addr, len, type)
for use in struct hw_breakpoint_init initializers.

On x86 use:

struct arch_hw_breakpoint_info {
union {
const void *kernel;
const void __user *user;
unsigned long va;
} address;
u8 len;
u8 type;
} __attribute__((packed));

and the size of struct hw_breakpoint won't increase.

> > What about DR_STEP? i.e., if DR_STEP was set from a single-step and then
> > there was a DR_TRAPn debug exception, is DR_STEP still set? If DR_TRAPn
> > was set and then you single-step, is DR_TRAPn cleared?
>
> I didn't experiment with using DR_STEP. There wasn't any simple way to
> cause a single-step exception. Perhaps if I were more familiar with
> kprobes...

It's easy for user mode with gdb. kprobes is simple to use, and it
always does a single-step to execute (a copy of) the instruction that
was overwritten with the breakpoint. So, write a module that does:

int testvar=0;
asm(".globl testme; testme: movl $17,testvar; ret");
void testme();
testinit() {
... register kprobe at &testme ...
... register hw_breakpoint at &testvar ...
testme()
}

Your kprobe handlers don't have to actually do anything at all, if you
are just hacking the low-level code so see what %dr6 values you get at
each trap.

> I decided on something simpler than messing around with Kconfig.

I still think it's the proper thing to make it conditional, not always
built in. But it's a pedantic point.

> This is getting pretty close to a final form. The patch below is for
> 2.6.22-rc3. See what you think...

Indeed I think we have come nearly as far as we will before we have a few
arch ports get done and some heavy use to find the rough edges. Thanks
very much for being so accomodating to all my criticism, which I hope has
been constructive.

> +inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)

These need to be static inline. Here you're defining a global function
in every .o file that uses the header.

> + get_debugreg(dr6, 6);
> + set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
> + if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> + tsk->thread.vdr6 = 0;

Some comment here about this conditional clearing, please.

> +
> +/*
> + * HW breakpoint additions
> + */
> +
> +#define HB_NUM 4 /* Number of hardware breakpoints */

Need #ifdef __KERNEL__ around all these additions to debugreg.h.

> +static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,

For local functions in a source file (not a header), it's standard form
now just to define them static, not static inline. For these trivial
ones, the compiler will always inline them.


Thanks,
Roland

2007-06-19 20:35:48

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Wed, 13 Jun 2007, Roland McGrath wrote:

> > I really don't understand your point here. What's wrong with bp_show?
> > Is it all the preprocessor conditionals? I thought that was how we had
> > agreed portable code should determine which types and lengths were
> > supported on a particular architecture.
>
> That part is fine. The problem is fetching the hw_breakpoint.len field
> directly and expecting it to contain the API values. In an implementation
> done as I've been referring to, there is no need for any field to contain
> the HW_BREAKPOINT_LEN_8 value, and it's a waste to store one. If it were
> hw_breakpoint_get_len(bp), that would be fine.

"A waste to store one"? Waste of what? It isn't a waste of space; the
space would otherwise be unused. Waste of an instruction, perhaps.

> Indeed, that is the natural thing (and all the bits needed) on several.
> I hadn't raised this before since I was having so much trouble already
> convincing you about storing things in machine-dependent fashion so that
> users cannot just use the struct fields directly.

It is now possible for an implementation to store things in a
machine-dependent fashion; I have added accessor routines as you
suggested. But I also left the fields as they were; the documentation
mentions that they won't necessarily contain any particular values.

You might want to examine the check in validate_settings() for address
alignment; it might not be valid if other values get stored in the
low-order bits of the address. This is a tricky point; it's not safe
to mix bits around unless you know that the data values are correct,
but in validate_settings() you don't yet know that.

> On x86 use:
>
> struct arch_hw_breakpoint_info {
> union {
> const void *kernel;
> const void __user *user;
> unsigned long va;
> } address;
> u8 len;
> u8 type;
> } __attribute__((packed));
>
> and the size of struct hw_breakpoint won't increase.

Maybe. I don't see any reason for the unnecessary encapsulation,
though.

> > > What about DR_STEP? i.e., if DR_STEP was set from a single-step and then
> > > there was a DR_TRAPn debug exception, is DR_STEP still set? If DR_TRAPn
> > > was set and then you single-step, is DR_TRAPn cleared?
> >
> > I didn't experiment with using DR_STEP. There wasn't any simple way to
> > cause a single-step exception. Perhaps if I were more familiar with
> > kprobes...
>
> It's easy for user mode with gdb.

Yes, of course. I feel foolish for having forgotten.

Tests show that my CPU does not clear DR_STEP when a data breakpoint is
hit. Conversely, the DR_TRAPn bits are cleared even when a single-step
exception occurs.

The bizarre behavior from before is still present; the system gets in a
long loop when the exception handler leaves any of the 0xe000 bits set
in DR6. And it kills my shell process, probably by sending it a
SIGTRAP. Oddly enough, this only happens when there's a kernel-space
debug exception -- faults in user-space continue to work normally.
It's not clear what this means; the behavior indicates a software
problem but the dependency on the DR6 value indicates a hardware
contribution as well...

If you're interested, I can send you the code I used to do this testing
so you can try it on your machine.


> > I decided on something simpler than messing around with Kconfig.
>
> I still think it's the proper thing to make it conditional, not always
> built in. But it's a pedantic point.

We have three things to consider: ptrace, utrace, and hw-breakpoint.
Ultimately hw-breakpoint should become part of utrace; we might not
want to bother with a standalone version.

Furthermore, hw-breakpoint takes over the ptrace's mechanism for
breakpoint handling. If we want to allow a configuration where ptrace
is present and hw-breakpoint isn't, then I would have to add an
alternate implementation containing only support for the legacy
interface.

It doesn't have to be done now, but it is something to bear in mind
while trying to decide what things should be conditional on which
options.


> Indeed I think we have come nearly as far as we will before we have a few
> arch ports get done and some heavy use to find the rough edges. Thanks
> very much for being so accomodating to all my criticism, which I hope has
> been constructive.

There's no question that the code is much improved as a result of our
interaction.


> > +inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
>
> These need to be static inline. Here you're defining a global function
> in every .o file that uses the header.

Whoops. It's fixed now.

> > + get_debugreg(dr6, 6);
> > + set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
> > + if (dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
> > + tsk->thread.vdr6 = 0;
>
> Some comment here about this conditional clearing, please.

In fact I decided to change that whole thing around. Now dr6 gets
stored in vdr6 immediately, with no conditional. This is the right
thing to do when hw-breakpoint support is missing (aside from false
triggers caused by lazy debug register switching). Then the
hw-breakpoint notifier routine clears the DR_TRAPn bits from vdr6, and
the ptrace "triggered" callback sets the appropriate virtualized bits
in vdr6. Overall it's a lot simpler and easier to analyze.

I made a few other changes to do_debug. For instance, it no longer
checks whether notify_die() returns NOTIFY_STOP. That check was a
mistake to begin with; NOTIFY_STOP merely means to cut the notifier
chain short -- it doesn't mean that the debug exception can be ignored.
Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are
set in vdr6; this is now the appropriate condition.

> > +
> > +/*
> > + * HW breakpoint additions
> > + */
> > +
> > +#define HB_NUM 4 /* Number of hardware breakpoints */
>
> Need #ifdef __KERNEL__ around all these additions to debugreg.h.

Done.

> > +static inline void arch_update_thbi(struct thread_hw_breakpoint *thbi,
>
> For local functions in a source file (not a header), it's standard form
> now just to define them static, not static inline. For these trivial
> ones, the compiler will always inline them.

Okay. Here's the latest form of the code, with the updated bptest
patch as an attachment.

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,70 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+#define __ARCH_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint static initializers */
+#define HW_BREAKPOINT_KINIT(addr, _len, _type) \
+ .address = {.kernel = addr,}, \
+ .len = _len, \
+ .type = _type
+
+#define HW_BREAKPOINT_UINIT(addr, _len, _type) \
+ .address = {.user = addr,}, \
+ .len = _len, \
+ .type = _type
+
+/* HW breakpoint setter routines */
+static inline void hw_breakpoint_kinit(struct hw_breakpoint *bp,
+ const void *addr, unsigned len, unsigned type)
+{
+ bp->address.kernel = addr;
+ bp->len = len;
+ bp->type = type;
+}
+
+static inline void hw_breakpoint_uinit(struct hw_breakpoint *bp,
+ const void __user *addr, unsigned len, unsigned type)
+{
+ bp->address.user = addr;
+ bp->len = len;
+ bp->type = type;
+}
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp)
+{
+ return bp->address.kernel;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddr(
+ struct hw_breakpoint *bp)
+{
+ return bp->address.user;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return bp->len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+ return bp->type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#endif /* __KERNEL__ */
+#endif /* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@

#include <asm/tlbflush.h>
#include <asm/cpu.h>
+#include <asm/debugreg.h>

asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");

@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
*/
void exit_thread(void)
{
+ struct task_struct *tsk = current;
+
/* The process may have allocated an io port bitmap... nuke it. */
if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
- struct task_struct *tsk = current;
struct thread_struct *t = &tsk->thread;
int cpu = get_cpu();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
put_cpu();
}
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
}

void flush_thread(void)
{
struct task_struct *tsk = current;

- memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
- memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
/*
* Forget coprocessor state..
*/
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl

savesegment(gs,p->thread.gs);

+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
+
tsk = current;
+ err = -ENOMEM;
+ if (unlikely(tsk->thread.hw_breakpoint_info)) {
+ if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+ goto out;
+ }
+
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL);
- if (!p->thread.io_bitmap_ptr) {
- p->thread.io_bitmap_max = 0;
- return -ENOMEM;
- }
+ if (!p->thread.io_bitmap_ptr)
+ goto out;
set_tsk_thread_flag(p, TIF_IO_BITMAP);
}

@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl

err = 0;
out:
- if (err && p->thread.io_bitmap_ptr) {
+ if (err) {
+ flush_thread_hw_breakpoint(p);
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
*/
void dump_thread(struct pt_regs * regs, struct user * dump)
{
- int i;
+ struct task_struct *tsk = current;

/* changed the size calculations - should hopefully work better. lbt */
dump->magic = CMAGIC;
dump->start_code = 0;
dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
- dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
- dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+ dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+ dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
- for (i = 0; i < 8; i++)
- dump->u_debugreg[i] = current->thread.debugreg[i];
+
+ dump_thread_hw_breakpoint(tsk, dump->u_debugreg);

if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st

next = &next_p->thread;

- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg[0], 0);
- set_debugreg(next->debugreg[1], 1);
- set_debugreg(next->debugreg[2], 2);
- set_debugreg(next->debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(next->debugreg[6], 6);
- set_debugreg(next->debugreg[7], 7);
- }
-
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
* Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
set_iopl_mask(next->iopl);

/*
- * Now maybe handle debug registers and/or IO bitmaps
+ * Now maybe handle IO bitmaps
*/
if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
|| test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t

x86_write_percpu(current_task, next_p);

+ /*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
+
return prev_p;
}

Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt

signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /* Reenable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (unlikely(current->thread.debugreg[7]))
- set_debugreg(current->thread.debugreg[7], 7);

/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,42 @@ fastcall void __kprobes do_int3(struct p
*/
fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
{
- unsigned int condition;
struct task_struct *tsk = current;
+ unsigned long dr6;

- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
+
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ notify_die(DIE_DEBUG, "debug", regs, dr6, error_code, SIGTRAP);

- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
- SIGTRAP) == NOTIFY_STOP)
- return;
/* It's safe to allow irq's after DR6 has been saved */
if (regs->eflags & X86_EFLAGS_IF)
local_irq_enable();

- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg[7])
- goto clear_dr7;
+ if (regs->eflags & VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}

- if (regs->eflags & VM_MASK)
- goto debug_vm86;
-
- /* Save debug status register where ptrace can see it */
- tsk->thread.debugreg[6] = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- /*
- * We already checked v86 mode above, so we can
- * check for kernel mode by just checking the CPL
- * of CS.
- */
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.vdr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->eflags &= ~X86_EFLAGS_TF;
}

- /* Ok, finally something we can handle */
- send_sigtrap(tsk, regs, error_code);
-
- /* Disable additional traps. They'll be re-enabled when
- * the signal is delivered.
- */
-clear_dr7:
- set_debugreg(0, 7);
- return;
-
-debug_vm86:
- handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->eflags &= ~TF_MASK;
- return;
+ if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+ send_sigtrap(tsk, regs, error_code);
}

/*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@

#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */

#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -61,4 +63,32 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */

+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM 4 /* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+ set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif /* __KERNEL__ */
+
#endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
unsigned long esp;
unsigned long fs;
unsigned long gs;
-/* Hardware debugging registers */
- unsigned long debugreg[8]; /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+ unsigned long vdr6;
+ struct thread_hw_breakpoint *hw_breakpoint_info;
/* fault info */
unsigned long cr2, trap_no, error_code;
/* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,633 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+ How to know whether RF should be cleared when setting a user
+ execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+ /* utrace support */
+ struct list_head node; /* Entry in thread list */
+ struct list_head thread_bps; /* Thread's breakpoints */
+ struct hw_breakpoint *bps[HB_NUM]; /* Highest-priority bps */
+ int num_installed; /* Number of installed bps */
+ unsigned gennum; /* update-generation number */
+
+ /* Only the portions below are arch-specific */
+
+ /* ptrace support -- Note that vdr6 is stored directly in the
+ * thread_struct so that it is always available.
+ */
+ unsigned long vdr7; /* Virtualized DR7 */
+ struct hw_breakpoint vdr_bps[HB_NUM]; /* Breakpoints
+ representing virtualized debug registers 0 - 3 */
+ unsigned long tdr[HB_NUM]; /* and their addresses */
+ unsigned long tdr7; /* Thread's DR7 value */
+ unsigned long tkdr7; /* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+ unsigned gennum; /* Generation number */
+ int num_kbps; /* Number of kernel bps */
+ struct hw_breakpoint *bps[HB_NUM]; /* Loaded breakpoints */
+
+ /* Only the portions below are arch-specific */
+ unsigned long mkdr7; /* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+ struct kernel_bp_data *cur_kbpdata; /* Current kbpdata[] entry */
+ struct task_struct *bp_task; /* The thread whose bps
+ are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data kbpdata[2]; /* Old and new settings */
+static int cur_kbpindex; /* Alternates 0, 1, ... */
+static struct kernel_bp_data *cur_kbpdata = &kbpdata[0];
+ /* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8 tprio[HB_NUM]; /* Thread bp max priorities */
+static LIST_HEAD(kernel_bps); /* Kernel breakpoint list */
+static LIST_HEAD(thread_list); /* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex); /* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long kdr7; /* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps. Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long kdr7_masks[HB_NUM + 1] = {
+ 0x00000000,
+ 0x000f0003, /* LEN0, R/W0, G0, L0 */
+ 0x00ff000f, /* Same for 0,1 */
+ 0x0fff003f, /* Same for 0,1,2 */
+ 0xffff00ff /* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+ struct hw_breakpoint **bps;
+
+ /* Don't allow debug exceptions while we update the registers */
+ set_debugreg(0, 7);
+ chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+ /* Kernel breakpoints are stored starting in DR0 and going up */
+ bps = chbi->cur_kbpdata->bps;
+ switch (chbi->cur_kbpdata->num_kbps) {
+ case 4:
+ set_debugreg(bps[3]->address.va, 3);
+ case 3:
+ set_debugreg(bps[2]->address.va, 2);
+ case 2:
+ set_debugreg(bps[1]->address.va, 1);
+ case 1:
+ set_debugreg(bps[0]->address.va, 0);
+ }
+ /* No need to set DR6 */
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+ struct kernel_bp_data *thr_kbpdata)
+{
+ int num = thr_kbpdata->num_kbps;
+
+ thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+ /* Install the user breakpoints. Kernel breakpoints are stored
+ * starting in DR0 and going up; there are num_kbps of them.
+ * User breakpoints are stored starting in DR3 and going down,
+ * as many as we have room for.
+ */
+ switch (thbi->num_installed) {
+ case 4:
+ set_debugreg(thbi->tdr[0], 0);
+ case 3:
+ set_debugreg(thbi->tdr[1], 1);
+ case 2:
+ set_debugreg(thbi->tdr[2], 2);
+ case 1:
+ set_debugreg(thbi->tdr[3], 3);
+ }
+ /* No need to set DR6 */
+ set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+ int num = new_kbpdata->num_kbps;
+
+ new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va,
+ struct task_struct *tsk)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE_OF(t) TASK_SIZE
+#endif
+ return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE64 TASK_SIZE
+#endif
+ return (va >= TASK_SIZE64);
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, u8 len, u8 type)
+{
+ unsigned long temp;
+
+ temp = (len | type) & 0xf;
+ temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+ temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+ DR_GLOBAL_SLOWDOWN;
+ return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+ int is_user;
+ struct list_head *bp_list;
+ struct hw_breakpoint *bp;
+ int i;
+ int drnum;
+ unsigned long dr7;
+
+ if (thbi) {
+ is_user = 1;
+ bp_list = &thbi->thread_bps;
+ drnum = HB_NUM - 1;
+ } else {
+ is_user = 0;
+ bp_list = &kernel_bps;
+ drnum = 0;
+ }
+
+ /* Kernel bps are assigned from DR0 on up, and user bps are assigned
+ * from DR3 on down. Accumulate all 4 bps; the kernel DR7 mask will
+ * select the appropriate bits later.
+ */
+ dr7 = 0;
+ i = 0;
+ list_for_each_entry(bp, bp_list, node) {
+
+ /* Get the debug register number and accumulate the bits */
+ dr7 |= encode_dr7(drnum, bp->len, bp->type);
+ if (++i >= HB_NUM)
+ break;
+ if (is_user)
+ --drnum;
+ else
+ ++drnum;
+ }
+ return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+
+ /* If this is an execution breakpoint for the current PC address,
+ * we should clear the task's RF so that the bp will be certain
+ * to trigger.
+ *
+ * FIXME: It's not so easy to get hold of the task's PC as a linear
+ * address! ptrace.c does this already...
+ */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ memset(u_debugreg, 0, sizeof u_debugreg);
+ if (thbi) {
+ for (i = 0; i < HB_NUM; ++i)
+ u_debugreg[i] = thbi->vdr_bps[i].address.va;
+ u_debugreg[7] = thbi->vdr7;
+ }
+ u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ /* Store in the virtual DR6 register the fact that the breakpoint
+ * was hit so the thread's debugger will see it.
+ */
+ if (thbi) {
+ i = bp - thbi->vdr_bps;
+ tsk->thread.vdr6 |= (DR_TRAP0 << i);
+ }
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_hw_breakpoint *thbi;
+ unsigned long val = 0;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ thbi = tsk->thread.hw_breakpoint_info;
+ if (n < HB_NUM) {
+ if (thbi)
+ val = (unsigned long) thbi->vdr_bps[n].address.va;
+ } else if (n == 6) {
+ val = tsk->thread.vdr6;
+ } else if (n == 7) {
+ if (thbi)
+ val = thbi->vdr7;
+ }
+ mutex_unlock(&hw_breakpoint_mutex);
+ return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7. Return the "enabled" status.
+ */
+static int decode_dr7(unsigned long dr7, int bpnum, u8 *len, u8 *type)
+{
+ int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+ *len = (temp & 0xc) | 0x40;
+ *type = (temp & 0x3) | 0x80;
+ return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+ struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+ struct hw_breakpoint *bp;
+ int i;
+ int rc = 0;
+ unsigned long old_dr7 = thbi->vdr7;
+
+ data &= ~DR_CONTROL_RESERVED;
+
+ /* Loop through all the hardware breakpoints, making the
+ * appropriate changes to each.
+ */
+ restore_settings:
+ thbi->vdr7 = data;
+ bp = &thbi->vdr_bps[0];
+ for (i = 0; i < HB_NUM; (++i, ++bp)) {
+ int enabled;
+ u8 len, type;
+
+ enabled = decode_dr7(data, i, &len, &type);
+
+ /* Unregister the breakpoint before trying to change it */
+ if (bp->status)
+ __unregister_user_hw_breakpoint(tsk, bp);
+
+ /* Insert the breakpoint's new settings */
+ bp->len = len;
+ bp->type = type;
+
+ /* Now register the breakpoint if it should be enabled.
+ * New invalid entries will raise an error here.
+ */
+ if (enabled) {
+ bp->triggered = ptrace_triggered;
+ bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+ if (__register_user_hw_breakpoint(tsk, bp) < 0 &&
+ rc == 0)
+ break;
+ }
+ }
+
+ /* If anything above failed, restore the original settings */
+ if (i < HB_NUM) {
+ rc = -EIO;
+ data = old_dr7;
+ goto restore_settings;
+ }
+ return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+ struct thread_hw_breakpoint *thbi;
+ int rc = -EIO;
+
+ /* We have to hold this lock the entire time, to prevent thbi
+ * from being deallocated out from under us.
+ */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* There are no DR4 or DR5 registers */
+ if (n == 4 || n == 5)
+ ;
+
+ /* Writes to DR6 modify the virtualized value */
+ else if (n == 6) {
+ tsk->thread.vdr6 = val;
+ rc = 0;
+ }
+
+ else if (!tsk->thread.hw_breakpoint_info && val == 0)
+ rc = 0; /* Minor optimization */
+
+ else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+ rc = -ENOMEM;
+
+ /* Writes to DR0 - DR3 change a breakpoint address */
+ else if (n < HB_NUM) {
+ struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+ /* If the breakpoint is registered then unregister it,
+ * change it, and re-register it. Revert to the original
+ * address if an error occurs.
+ */
+ if (bp->status) {
+ unsigned long old_addr = bp->address.va;
+
+ __unregister_user_hw_breakpoint(tsk, bp);
+ bp->address.va = val;
+ rc = __register_user_hw_breakpoint(tsk, bp);
+ if (rc < 0) {
+ bp->address.va = old_addr;
+ __register_user_hw_breakpoint(tsk, bp);
+ }
+ } else {
+ bp->address.va = val;
+ rc = 0;
+ }
+ }
+
+ /* All that's left is DR7 */
+ else
+ rc = ptrace_write_dr7(tsk, thbi, val);
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+ struct cpu_hw_breakpoint *chbi;
+ int i;
+ struct hw_breakpoint *bp;
+ struct thread_hw_breakpoint *thbi = NULL;
+
+ /* The DR6 value is stored in args->err */
+#define DR6 (args->err)
+
+ if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_DONE;
+
+ /* Assert that local interrupts are disabled */
+
+ /* Reset the DRn bits in the virtualized register value.
+ * The ptrace trigger routine will add in whatever is needed.
+ */
+ current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+ /* Are we a victim of lazy debug-register switching? */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ if (!chbi->bp_task)
+ ;
+ else if (chbi->bp_task != current) {
+
+ /* No user breakpoints are valid. Perform the belated
+ * debug-register switch.
+ */
+ switch_to_none_hw_breakpoint();
+ } else {
+ thbi = chbi->bp_task->thread.hw_breakpoint_info;
+ }
+
+ /* Disable all breakpoints so that the callbacks can run without
+ * triggering recursive debug exceptions.
+ */
+ set_debugreg(0, 7);
+
+ /* Handle all the breakpoints that were triggered */
+ for (i = 0; i < HB_NUM; ++i) {
+ if (likely(!(DR6 & (DR_TRAP0 << i))))
+ continue;
+
+ /* Find the corresponding hw_breakpoint structure and
+ * invoke its triggered callback.
+ */
+ if (i < chbi->cur_kbpdata->num_kbps)
+ bp = chbi->cur_kbpdata->bps[i];
+ else if (thbi)
+ bp = thbi->bps[i];
+ else /* False alarm due to lazy DR switching */
+ continue;
+ if (bp) { /* Should always be non-NULL */
+
+ /* Set RF at execution breakpoints */
+ if (bp->type == HW_BREAKPOINT_EXECUTE)
+ args->regs->eflags |= X86_EFLAGS_RF;
+ (bp->triggered)(bp, args->regs);
+ }
+ }
+
+ /* Re-enable the breakpoints */
+ set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+ put_cpu_no_resched();
+
+ if (!(DR6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_STOP;
+
+ return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+ if (val != DIE_DEBUG)
+ return NOTIFY_DONE;
+ return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+ load_debug_registers();
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
tmp = 0; /* Default return condition */
if(addr < FRAME_SIZE*sizeof(long))
tmp = getreg(child, addr);
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
+ else if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
addr -= (long) &dummy->u_debugreg[0];
addr = addr >> 2;
- tmp = child->thread.debugreg[addr];
+ tmp = thread_get_debugreg(child, addr);
}
ret = put_user(tmp, datap);
break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
have to be selective about what portions we allow someone
to modify. */

- ret = -EIO;
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
-
- if(addr == (long) &dummy->u_debugreg[4]) break;
- if(addr == (long) &dummy->u_debugreg[5]) break;
- if(addr < (long) &dummy->u_debugreg[4] &&
- ((unsigned long) data) >= TASK_SIZE-3) break;
-
- /* Sanity-check data. Take one half-byte at once with
- * check = (val >> (16 + 4*i)) & 0xf. It contains the
- * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
- * 2 and 3 are LENi. Given a list of invalid values,
- * we do mask |= 1 << invalid_value, so that
- * (mask >> check) & 1 is a correct test for invalid
- * values.
- *
- * R/Wi contains the type of the breakpoint /
- * watchpoint, LENi contains the length of the watched
- * data in the watchpoint case.
- *
- * The invalid values are:
- * - LENi == 0x10 (undefined), so mask |= 0x0f00.
- * - R/Wi == 0x10 (break on I/O reads or writes), so
- * mask |= 0x4444.
- * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
- * 0x1110.
- *
- * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
- *
- * See the Intel Manual "System Programming Guide",
- * 15.2.4
- *
- * Note that LENi == 0x10 is defined on x86_64 in long
- * mode (i.e. even for 32-bit userspace software, but
- * 64-bit kernel), so the x86_64 mask value is 0x5454.
- * See the AMD manual no. 24593 (AMD64 System
- * Programming)*/
-
- if(addr == (long) &dummy->u_debugreg[7]) {
- data &= ~DR_CONTROL_RESERVED;
- for(i=0; i<4; i++)
- if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- goto out_tsk;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- }
- addr -= (long) &dummy->u_debugreg;
- addr = addr >> 2;
- child->thread.debugreg[addr] = data;
- ret = 0;
+ if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
+ addr -= (long) &dummy->u_debugreg;
+ addr = addr >> 2;
+ ret = thread_set_debugreg(child, addr, data);
}
break;

@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_request(child, request, addr, data);
break;
}
- out_tsk:
return ret;
}

Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
- quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+ quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+ hw_breakpoint.o

obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
#include <linux/suspend.h>
#include <asm/mtrr.h>
#include <asm/mce.h>
+#include <asm/debugreg.h>

static struct saved_context saved_context;

@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
ctxt->cr2 = read_cr2();
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4();
+
+ disable_debug_registers();
}

void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)

load_TR_desc(); /* This does ltr */
load_LDT(&current->active_mm->context); /* This does lldt */
-
- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg[7]){
- set_debugreg(current->thread.debugreg[0], 0);
- set_debugreg(current->thread.debugreg[1], 1);
- set_debugreg(current->thread.debugreg[2], 2);
- set_debugreg(current->thread.debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(current->thread.debugreg[6], 6);
- set_debugreg(current->thread.debugreg[7], 7);
- }
-
+ load_debug_registers();
}

void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -660,9 +660,17 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
+
+ /* The DR6 value is stored in args->err */
+#define DR6 (args->err)
+
+ if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
+ if ((DR6 & ~DR_STEP) == 0)
+ ret = NOTIFY_STOP;
+ }
break;
+#undef DR6
+
case DIE_GPF:
case DIE_PAGE_FAULT:
/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,264 @@
+#ifndef _ASM_GENERIC_HW_BREAKPOINT_H
+#define _ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read/write, or execute)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints. These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The @address, @len, and @type fields are highly architecture-specific.
+ * Portable drivers should not use them directly but should employ the
+ * following accessor inlines and macros instead:
+ *
+ * To set @address, @len, and @type before registering a
+ * breakpoint, use hw_breakpoint_kinit() or hw_breakpoint_uinit()
+ * for kernel- and user-space breakpoints respectively.
+ *
+ * To retrieve the values use
+ * hw_breakpoint_get_{kaddr,uaddr,len,type}().
+ *
+ * To initialize these fields in a static breakpoint structure,
+ * use HW_BREAKPOINT_KINIT() or HW_BREAKPOINT_UINIT() as part
+ * of the initializer.
+ *
+ * The general descriptions below are accurate for x86; on other
+ * architectures some of the fields might be unused or might have bits
+ * altered while a breakpoint is registered.
+ *
+ * The @address field contains the breakpoint's address, as either a
+ * regular kernel pointer or an %__user pointer.
+ *
+ * @len encodes the breakpoint's extent in bytes, which is subject to
+ * certain limitations. include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture. Note that
+ * @address must have the alignment specified by @len. The breakpoint
+ * will catch accesses to any byte in the range from @address to @address
+ * + (N-1), where N is the value encoded by @len.
+ *
+ * @type indicates the type of access that will trigger the breakpoint.
+ * Possible values may include:
+ *
+ * %HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_WRITE (triggered on write access), and
+ * %HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures. Execute breakpoints
+ * must have @len equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * With register_user_hw_breakpoint(), @address must refer to a location
+ * in user space (@address.user). The breakpoint will be active only
+ * while the requested task is running. Conversely with
+ * register_kernel_hw_breakpoint(), @address must refer to a location in
+ * kernel space (@address.kernel), and the breakpoint will be active on
+ * all CPUs regardless of the current task.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers. Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception). All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource. Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away. Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete. %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs. A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered. The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur. It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be. Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context. Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled. As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported. (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.) The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint(). If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use. However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed. (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable. Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Breakpoint triggered\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * ..........<do anything>............
+ * hw_breakpoint_kinit(&my_bp, &pid_max, HW_BREAKPOINT_LEN_4,
+ * HW_BREAKPOINT_WRITE);
+ * my_bp.triggered = triggered;
+ * my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * rc = register_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * ..........<do anything>............
+ * unregister_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+ struct list_head node;
+ void (*triggered)(struct hw_breakpoint *, struct pt_regs *);
+ void (*installed)(struct hw_breakpoint *);
+ void (*uninstalled)(struct hw_breakpoint *);
+ union {
+ const void *kernel;
+ const void __user *user;
+ unsigned long va;
+ } address;
+ u8 len;
+ u8 type;
+ u8 priority;
+ u8 status;
+};
+
+/*
+ * Macros to initialize the arch-specific parts of a static breakpoint
+ * structure (mnemonic: the address, len, and type arguments occur in
+ * alpabetical order):
+ *
+ * HW_BREAKPOINT_KINIT(addr, len, type)
+ * HW_BREAKPOINT_UINIT(addr, len, type)
+ */
+
+/*
+ * Inline setter routines to initialize the arch-specific parts of
+ * a breakpoint structure:
+ */
+static void hw_breakpoint_kinit(struct hw_breakpoint *bp,
+ const void *addr, unsigned len, unsigned type);
+static void hw_breakpoint_uinit(struct hw_breakpoint *bp,
+ const void __user *addr, unsigned len, unsigned type);
+
+/*
+ * Inline accessor routines to retrieve the arch-specific parts of
+ * a breakpoint structure:
+ */
+static const void *hw_breakpoint_get_kaddr(struct hw_breakpoint *bp);
+static const void __user *hw_breakpoint_get_uaddr(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture. On i386 the
+ * possibilities are:
+ *
+ * HW_BREAKPOINT_LEN_1
+ * HW_BREAKPOINT_LEN_2
+ * HW_BREAKPOINT_LEN_4
+ * HW_BREAKPOINT_LEN_EXECUTE
+ * HW_BREAKPOINT_RW
+ * HW_BREAKPOINT_READ
+ * HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable. There also may be
+ * HW_BREAKPOINT_WRITE. You can use #ifdef to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL 25
+#define HW_BREAKPOINT_PRIO_PTRACE 50
+#define HW_BREAKPOINT_PRIO_HIGH 75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED 1
+#define HW_BREAKPOINT_INSTALLED 2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems. The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task. In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
#include <asm/cpufeature.h>
#include <asm/desc.h>
#include <asm/system.h>
+#include <asm/debugreg.h>

#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();

control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
#include <smpboot_hooks.h>
#include <asm/vmi.h>
#include <asm/mtrr.h>
+#include <asm/debugreg.h>

/* Set if we find a B stepping CPU */
static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
local_irq_enable();

wmb();
+ load_debug_registers();
cpu_idle();
}

@@ -1209,6 +1211,7 @@ int __cpu_disable(void)
fixup_irqs(map);
/* It's now safe to remove this processor from the online map */
cpu_clear(cpu, cpu_online_map);
+ disable_debug_registers();
return 0;
}

Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,762 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines. It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct cpu_hw_breakpoint *chbi;
+ struct kernel_bp_data *thr_kbpdata;
+
+ /* This routine is on the hot path; it gets called for every
+ * context switch into a task with active breakpoints. We
+ * must make sure that the common case executes as quickly as
+ * possible.
+ */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = tsk;
+
+ /* Use RCU to synchronize with external updates */
+ rcu_read_lock();
+
+ /* Other CPUs might be making updates to the list of kernel
+ * breakpoints at this time. If they are, they will modify
+ * the other entry in kbpdata[] -- the one not pointed to
+ * by chbi->cur_kbpdata. So the update itself won't affect
+ * us directly.
+ *
+ * However when the update is finished, an IPI will arrive
+ * telling this CPU to change chbi->cur_kbpdata. We need
+ * to use a single consistent kbpdata[] entry, the present one.
+ * So we'll copy the pointer to a local variable, thr_kbpdata,
+ * and we must prevent the compiler from aliasing the two
+ * pointers. Only a compiler barrier is required, not a full
+ * memory barrier, because everything takes place on a single CPU.
+ */
+ restart:
+ thr_kbpdata = chbi->cur_kbpdata;
+ barrier();
+
+ /* Normally we can keep the same debug register settings as the
+ * last time this task ran. But if the kernel breakpoints have
+ * changed or any user breakpoints have been registered or
+ * unregistered, we need to handle the updates and possibly
+ * send out some notifications.
+ */
+ if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+ struct hw_breakpoint *bp;
+ int i;
+ int num;
+
+ thbi->gennum = thr_kbpdata->gennum;
+ arch_update_thbi(thbi, thr_kbpdata);
+ num = thr_kbpdata->num_kbps;
+
+ /* This code can be invoked while a debugger is actively
+ * updating the thread's breakpoint list (for example, if
+ * someone sends SIGKILL to the task). We use RCU to
+ * protect our access to the list pointers. */
+ thbi->num_installed = 0;
+ i = HB_NUM;
+ list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+ /* If this register is allocated for kernel bps,
+ * don't install. Otherwise do. */
+ if (--i < num) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ }
+ } else {
+ ++thbi->num_installed;
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+ }
+
+ /* Set the debug register */
+ arch_install_thbi(thbi);
+
+ /* Were there any kernel breakpoint changes while we were running? */
+ if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+ /* DR0-3 might now be assigned to kernel bps and we might
+ * have messed them up. Reload all the kernel bps and
+ * then reload the thread bps.
+ */
+ arch_install_chbi(chbi);
+ goto restart;
+ }
+
+ rcu_read_unlock();
+ put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+ struct cpu_hw_breakpoint *chbi;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = NULL;
+
+ /* This routine gets called from only two places. In one
+ * the caller holds the hw_breakpoint_mutex; in the other
+ * interrupts are disabled. In either case, no kernel
+ * breakpoint updates can arrive while the routine runs.
+ * So we don't need to use RCU.
+ */
+ arch_install_none(chbi);
+ put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+ struct cpu_hw_breakpoint *chbi;
+ struct task_struct *tsk = current;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+
+ /* Install both the kernel and the user breakpoints */
+ arch_install_chbi(chbi);
+ if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ switch_to_thread_hw_breakpoint(tsk);
+
+ put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+ /* We don't need to use any sort of memory barrier. The IPI
+ * carried out by on_each_cpu() includes its own barriers.
+ */
+ on_each_cpu(update_this_cpu, NULL, 0, 0);
+ synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+ unsigned long flags;
+
+ /* Prevent IPIs for new kernel breakpoint updates */
+ local_irq_save(flags);
+
+ rcu_read_lock();
+ update_this_cpu(NULL);
+ rcu_read_unlock();
+
+ local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio. Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+ int i;
+
+ for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+ tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority. We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc. In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+ struct thread_hw_breakpoint *thbi;
+
+ memset(tprio, 0, sizeof tprio);
+
+ /* Loop through all threads having registered breakpoints
+ * and accumulate the maximum priority levels in tprio.
+ */
+ list_for_each_entry(thbi, &thread_list, node)
+ accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[]. The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU. If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+ int k, u;
+ int changed = 0;
+ struct hw_breakpoint *bp;
+ struct kernel_bp_data *new_kbpdata;
+
+ /* Determine how many debug registers are available for kernel
+ * breakpoints as opposed to user breakpoints, based on the
+ * priorities. Ties are resolved in favor of user bps.
+ */
+ k = 0; /* Next kernel bp to allocate */
+ u = HB_NUM - 1; /* Next user bp to allocate */
+ bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+ while (k <= u) {
+ if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+ --u; /* User bps win a slot */
+ else {
+ ++k; /* Kernel bp wins a slot */
+ if (bp->status != HW_BREAKPOINT_INSTALLED)
+ changed = 1;
+ bp = list_entry(bp->node.next, struct hw_breakpoint,
+ node);
+ }
+ }
+ if (k != cur_kbpdata->num_kbps)
+ changed = 1;
+
+ /* Notify the remaining kernel breakpoints that they are about
+ * to be uninstalled.
+ */
+ list_for_each_entry_from(bp, &kernel_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ changed = 1;
+ }
+ }
+
+ if (changed) {
+ cur_kbpindex ^= 1;
+ new_kbpdata = &kbpdata[cur_kbpindex];
+ new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+ new_kbpdata->num_kbps = k;
+ arch_new_kbpdata(new_kbpdata);
+ u = 0;
+ list_for_each_entry(bp, &kernel_bps, node) {
+ if (u >= k)
+ break;
+ new_kbpdata->bps[u] = bp;
+ ++u;
+ }
+ rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+ /* Tell all the CPUs to update their debug registers */
+ update_all_cpus();
+
+ /* Notify the breakpoints that just got installed */
+ for (u = 0; u < k; ++u) {
+ bp = new_kbpdata->bps[u];
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk)
+{
+ if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+ struct thread_hw_breakpoint *thbi;
+
+ thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+ GFP_KERNEL);
+ if (thbi) {
+ INIT_LIST_HEAD(&thbi->node);
+ INIT_LIST_HEAD(&thbi->thread_bps);
+
+ /* Force an update the next time tsk runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+ tsk->thread.hw_breakpoint_info = thbi;
+ }
+ }
+ return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct hw_breakpoint *bp;
+
+ if (!thbi)
+ return;
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Let the breakpoints know they are being uninstalled */
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+ }
+
+ /* Remove tsk from the list of all threads with registered bps */
+ list_del(&thbi->node);
+
+ /* The thread no longer has any breakpoints associated with it */
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ tsk->thread.hw_breakpoint_info = NULL;
+ kfree(thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities */
+ recalc_tprio();
+ balance_kernel_vs_user();
+
+ /* Actually uninstall the breakpoints if necessary */
+ if (tsk == current)
+ switch_to_none_hw_breakpoint();
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags)
+{
+ /* We will assume that breakpoint settings are not inherited
+ * and the child starts out with no debug registers set.
+ * But what about CLONE_PTRACE?
+ */
+ clear_tsk_thread_flag(child, TIF_DEBUG);
+ return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+ struct hw_breakpoint *bp;
+ int i;
+
+ i = HB_NUM - 1;
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ thbi->bps[i] = bp;
+ thbi->tdr[i] = bp->address.va;
+ if (--i < 0)
+ break;
+ }
+ while (i >= 0)
+ thbi->bps[i--] = NULL;
+
+ /* Force an update the next time this task runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ * tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ * tsk->thread.hw_breakpoint_info is not NULL.
+ * tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ * iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ struct list_head *head;
+ int pos;
+ struct hw_breakpoint *temp_bp;
+
+ /* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+ if (tsk)
+ head = &thbi->thread_bps;
+ else
+ head = &kernel_bps;
+
+ /* Equal-priority breakpoints get listed first-come-first-served */
+ pos = 0;
+ list_for_each_entry(temp_bp, head, node) {
+ if (bp->priority > temp_bp->priority)
+ break;
+ ++pos;
+ }
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ list_add_tail(&bp->node, &temp_bp->node);
+
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ /* Is this the thread's first registered breakpoint? */
+ if (list_empty(&thbi->node)) {
+ set_tsk_thread_flag(tsk, TIF_DEBUG);
+ list_add(&thbi->node, &thread_list);
+ }
+ }
+ return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ /* Remove bp from the thread's/kernel's list. If the list is now
+ * empty we must clear the TIF_DEBUG flag. But keep the
+ * thread_hw_breakpoint structure, so that the virtualized debug
+ * register values will remain valid.
+ */
+ list_del(&bp->node);
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ if (list_empty(&thbi->thread_bps)) {
+ list_del_init(&thbi->node);
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ }
+ }
+
+ /* Tell the breakpoint it is being uninstalled */
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
+{
+ int rc = -EINVAL;
+ unsigned long len;
+
+ switch (hw_breakpoint_get_type(bp)) {
+#ifdef HW_BREAKPOINT_EXECUTE
+ case HW_BREAKPOINT_EXECUTE:
+ if (hw_breakpoint_get_len(bp) != HW_BREAKPOINT_LEN_EXECUTE)
+ return rc;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+ case HW_BREAKPOINT_READ: break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+ case HW_BREAKPOINT_WRITE: break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+ case HW_BREAKPOINT_RW: break;
+#endif
+ default:
+ return rc;
+ }
+
+ switch (hw_breakpoint_get_len(bp)) {
+#ifdef HW_BREAKPOINT_LEN_1
+ case HW_BREAKPOINT_LEN_1:
+ len = 1;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+ case HW_BREAKPOINT_LEN_2:
+ len = 2;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+ case HW_BREAKPOINT_LEN_4:
+ len = 4;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+ case HW_BREAKPOINT_LEN_8:
+ len = 8;
+ break;
+#endif
+ default:
+ return rc;
+ }
+
+ /* Check that the low-order bits of the address are appropriate
+ * for the alignment implied by len.
+ */
+ if (bp->address.va & (len - 1))
+ return rc;
+
+ /* Check that the virtual address is in the proper range */
+ if (tsk) {
+ if (!arch_check_va_in_userspace(bp->address.va, tsk))
+ return rc;
+ } else {
+ if (!arch_check_va_in_kernelspace(bp->address.va))
+ return rc;
+ }
+
+ if (bp->triggered)
+ rc = 0;
+ return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ int rc;
+ struct thread_hw_breakpoint *thbi;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, tsk);
+ if (rc)
+ return rc;
+
+ thbi = alloc_thread_hw_breakpoint(tsk);
+ if (!thbi)
+ return -ENOMEM;
+
+ /* Insert bp in the thread's list and update the DR7 value */
+ pos = insert_bp_in_list(bp, thbi, tsk);
+ arch_register_user_hw_breakpoint(bp, thbi);
+
+ /* Update and rebalance the priorities. We don't need to go through
+ * the list of all threads; adding a breakpoint can only cause the
+ * priorities for this thread to increase.
+ */
+ accum_thread_tprio(thbi);
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+ rc = 1;
+
+ /* Does it need to be installed right now? */
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+ /* Otherwise it will get installed the next time tsk runs */
+ }
+
+ return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running. It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * The fields in @bp are checked for validity. @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ int rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ rc = __register_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+ if (!bp->status)
+ return; /* Not registered */
+
+ /* Remove bp from the thread's list and update the DR7 value */
+ remove_bp_from_list(bp, thbi, tsk);
+ arch_unregister_user_hw_breakpoint(bp, thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities,
+ * and actually uninstall bp if necessary.
+ */
+ recalc_tprio();
+ balance_kernel_vs_user();
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ mutex_lock(&hw_breakpoint_mutex);
+ __unregister_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ *
+ * This routine registers a breakpoint to be active at all times. It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * The fields in @bp are checked for validity. @bp->len, @bp->type,
+ * @bp->address, @bp->triggered, and @bp->priority must be set properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ int rc;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, NULL);
+ if (rc)
+ return rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Insert bp in the kernel's list and update the DR7 value */
+ pos = insert_bp_in_list(bp, NULL, NULL);
+ arch_register_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will install bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < cur_kbpdata->num_kbps)
+ rc = 1;
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ if (!bp->status)
+ return; /* Not registered */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Remove bp from the kernel's list and update the DR7 value */
+ remove_bp_from_list(bp, NULL, NULL);
+ arch_unregister_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will uninstall bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);


Attachments:
bptest (11.87 kB)

2007-06-25 10:53:31

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> "A waste to store one"? Waste of what? It isn't a waste of space; the
> space would otherwise be unused. Waste of an instruction, perhaps.

Yes.

> It is now possible for an implementation to store things in a
> machine-dependent fashion; I have added accessor routines as you
> suggested. But I also left the fields as they were; the documentation
> mentions that they won't necessarily contain any particular values.

People usually read the documentation after the fields named like they can
guess what they contain have values that confuse them, not before.

> You might want to examine the check in validate_settings() for address
> alignment; it might not be valid if other values get stored in the
> low-order bits of the address. This is a tricky point; it's not safe
> to mix bits around unless you know that the data values are correct,
> but in validate_settings() you don't yet know that.

This is why I didn't bring up encoded addresses earlier on. :-)

These kinds of issues are why I prefer unambiguously opaque arch-specific
encodings. validate_settings is indeed wrong for the natural ppc encoding.

The values must be set by a call that can return an error. That means you
can't really have a static initializer macro, unless it's intended to mean
"unspecified garbage if not used exactly right". I favor just going back
to passing three more args to register_kernel_hw_breakpoint.

> Tests show that my CPU does not clear DR_STEP when a data breakpoint is
> hit. Conversely, the DR_TRAPn bits are cleared even when a single-step
> exception occurs.

Ok, this is pretty consistent with what the newest Intel manuals say.

> If you're interested, I can send you the code I used to do this testing
> so you can try it on your machine.

Ok.

> > I still think it's the proper thing to make it conditional, not always
> > built in. But it's a pedantic point.
>
> We have three things to consider: ptrace, utrace, and hw-breakpoint.
> Ultimately hw-breakpoint should become part of utrace; we might not
> want to bother with a standalone version.

It is not hard to make it a separate option, so there is no reason not to.

> Furthermore, hw-breakpoint takes over the ptrace's mechanism for
> breakpoint handling. If we want to allow a configuration where ptrace
> is present and hw-breakpoint isn't, then I would have to add an
> alternate implementation containing only support for the legacy
> interface.

I was not suggesting that. CONFIG_PTRACE would require HW_BREAKPOINT on
machines where arch ptrace code uses it.

> I made a few other changes to do_debug. For instance, it no longer
> checks whether notify_die() returns NOTIFY_STOP. That check was a
> mistake to begin with; NOTIFY_STOP merely means to cut the notifier
> chain short -- it doesn't mean that the debug exception can be ignored.

This is incorrect. The usage of notify_die in all other cases, at least of
machine exceptions on x86, is to test for == NOTIFY_STOP and when true
short-circuit the normal effect of the exception (signal, oops). The
notifiers should return NOTIFY_STOP if they consumed the exception wholly.
If none uses NOTIFY_STOP, then the normal user signal should happen.

> Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are
> set in vdr6; this is now the appropriate condition.

>From what you've said, DR_STEP will remain set on a later debug exception.
So if a non-ptrace hw breakpoint consumed the exception and left no
DR_TRAPn bits set, the thread would generate a second SIGTRAP from the
prior single-step. Currently userland expects to have to clear DR_STEP in
dr6 via ptrace itself, but does not expect it can get a duplicate SIGTRAP
if it doesn't.


Thanks,
Roland

2007-06-25 11:32:50

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

I added this on top of your patch to make it compile (and look a little nicer).
With that, bptest worked nicely.

---
arch/i386/kernel/kprobes.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)

Index: b/arch/i386/kernel/kprobes.c
===================================================================
--- a/arch/i386/kernel/kprobes.c
+++ b/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
#include <asm/cacheflush.h>
#include <asm/desc.h>
#include <asm/uaccess.h>
+#include <asm/debugreg.h>

void jprobe_return_end(void);

@@ -660,16 +661,16 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
-
- /* The DR6 value is stored in args->err */
-#define DR6 (args->err)
-
- if ((DR6 & DR_STEP) && post_kprobe_handler(args->regs)) {
- if ((DR6 & ~DR_STEP) == 0)
- ret = NOTIFY_STOP;
- }
+ /*
+ * The %db6 value is stored in args->err.
+ * If DR_STEP is the only bit set and it's ours,
+ * we should eat this exception.
+ */
+ if ((args->err & DR_STEP) &&
+ post_kprobe_handler(args->regs) &&
+ (args->err & ~DR_STEP) == 0)
+ ret = NOTIFY_STOP;
break;
-#undef DR6

case DIE_GPF:
case DIE_PAGE_FAULT:

2007-06-25 15:36:58

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Mon, 25 Jun 2007, Roland McGrath wrote:

> > "A waste to store one"? Waste of what? It isn't a waste of space; the
> > space would otherwise be unused. Waste of an instruction, perhaps.
>
> Yes.

Of course, calling register_kernel_hw_breakpoint() with three extra
arguments is a waste of an instruction also, if one of those arguments
isn't used.

And yet it's not clear that either of these really is a waste. Suppose
somebody ports code from x86 to PPC64 and leaves a breakpoint length
set to HW_BREAKPOINT_LEN_4. Clearly we would want to return an error.
This means that the length value _has_ to be tested, even if it won't
be used for anything. And this means the length _has_ to be passed
along somehow, either as an argument or as a field value.

> > You might want to examine the check in validate_settings() for address
> > alignment; it might not be valid if other values get stored in the
> > low-order bits of the address. This is a tricky point; it's not safe
> > to mix bits around unless you know that the data values are correct,
> > but in validate_settings() you don't yet know that.
>
> This is why I didn't bring up encoded addresses earlier on. :-)
>
> These kinds of issues are why I prefer unambiguously opaque arch-specific
> encodings. validate_settings is indeed wrong for the natural ppc encoding.
>
> The values must be set by a call that can return an error. That means you
> can't really have a static initializer macro, unless it's intended to mean
> "unspecified garbage if not used exactly right". I favor just going back
> to passing three more args to register_kernel_hw_breakpoint.

All right, I'll change it. And I'll encapsulate those fields. I still
think it will accomplish nothing more than hiding some implementation
details which don't really need to be hidden.


> > Tests show that my CPU does not clear DR_STEP when a data breakpoint is
> > hit. Conversely, the DR_TRAPn bits are cleared even when a single-step
> > exception occurs.
>
> Ok, this is pretty consistent with what the newest Intel manuals say.
>
> > If you're interested, I can send you the code I used to do this testing
> > so you can try it on your machine.
>
> Ok.

It's below. The patch logs the value of DR6 when each debug interrupt
occurs, and it adds another sysfs attribute to the bptest driver. The
attribute is named "test", and it contains the value that the IRQ
handler will write back to DR6. Combine this with the Alt-SysRq-P
change already submitted, and you can get a clear view of what's going
on.


> > We have three things to consider: ptrace, utrace, and hw-breakpoint.
> > Ultimately hw-breakpoint should become part of utrace; we might not
> > want to bother with a standalone version.
>
> It is not hard to make it a separate option, so there is no reason not to.
>
> > Furthermore, hw-breakpoint takes over the ptrace's mechanism for
> > breakpoint handling. If we want to allow a configuration where ptrace
> > is present and hw-breakpoint isn't, then I would have to add an
> > alternate implementation containing only support for the legacy
> > interface.
>
> I was not suggesting that. CONFIG_PTRACE would require HW_BREAKPOINT on
> machines where arch ptrace code uses it.

I see. So I could add a CONFIG_HW_BREAKPOINT option and make
CONFIG_PTRACE depend on it. That will be simple enough.

Do you think it would make sense to allow utrace without hw-breakpoint?


> > I made a few other changes to do_debug. For instance, it no longer
> > checks whether notify_die() returns NOTIFY_STOP. That check was a
> > mistake to begin with; NOTIFY_STOP merely means to cut the notifier
> > chain short -- it doesn't mean that the debug exception can be ignored.
>
> This is incorrect. The usage of notify_die in all other cases, at least of
> machine exceptions on x86, is to test for == NOTIFY_STOP and when true
> short-circuit the normal effect of the exception (signal, oops). The
> notifiers should return NOTIFY_STOP if they consumed the exception wholly.
> If none uses NOTIFY_STOP, then the normal user signal should happen.

All right, I'll fix that back up.

> > Also it sends the SIGTRAP when any of the DR_STEP or DR_TRAPn bits are
> > set in vdr6; this is now the appropriate condition.
>
> From what you've said, DR_STEP will remain set on a later debug exception.
> So if a non-ptrace hw breakpoint consumed the exception and left no
> DR_TRAPn bits set, the thread would generate a second SIGTRAP from the
> prior single-step. Currently userland expects to have to clear DR_STEP in
> dr6 via ptrace itself, but does not expect it can get a duplicate SIGTRAP
> if it doesn't.

No, because do_debug always writes a 0 to DR6 after reading it;
consequently DR_STEP does not remain set on later exceptions. Unless
we do something like this we would never know whether we entered the
handler because of a single-step exception or not.

But the same effect could occur because of a bogus debug exception
caused by lazy DR7 switching. I'll have to add back in code to detect
that case.

Alan Stern



Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -802,13 +802,17 @@ fastcall void __kprobes do_int3(struct p
* find every occurrence of the TF bit that could be saved away even
* by user code)
*/
+unsigned long dr6test;
+EXPORT_SYMBOL(dr6test);
+
fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
{
struct task_struct *tsk = current;
unsigned long dr6;

get_debugreg(dr6, 6);
- set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
+ printk(KERN_INFO "dr6 = %08lx\n", dr6);
+ set_debugreg(dr6test, 6); /* DR6 may or may not be cleared by the CPU */

/* Store the virtualized DR6 value */
tsk->thread.vdr6 = dr6;
Index: usb-2.6/bptest/bptest.c
===================================================================
--- usb-2.6.orig/bptest/bptest.c
+++ usb-2.6/bptest/bptest.c
@@ -58,6 +58,22 @@ MODULE_AUTHOR("Alan Stern <stern@rowland
MODULE_DESCRIPTION("Hardware Breakpoint test driver");
MODULE_LICENSE("GPL");

+extern unsigned long dr6test;
+
+static ssize_t test_store(struct device_driver *d, const char *buf,
+ size_t count)
+{
+ if (sscanf(buf, "%lx", &dr6test) <= 0)
+ return -EIO;
+ return count;
+}
+
+static ssize_t test_show(struct device_driver *d, char *buf)
+{
+ return sprintf(buf, "dr6test: %08lx\n", dr6test);
+}
+static DRIVER_ATTR(test, 0600, test_show, test_store);
+

static struct hw_breakpoint bps[4];

@@ -402,6 +418,7 @@ static struct driver_attribute *(bptest_
&driver_attr_call,
&driver_attr_read,
&driver_attr_write,
+ &driver_attr_test,
NULL
};


2007-06-25 15:38:00

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Mon, 25 Jun 2007, Roland McGrath wrote:

> I added this on top of your patch to make it compile (and look a little nicer).
> With that, bptest worked nicely.

I'll merge this with the rest of the patch.

Alan Stern

2007-06-25 20:51:56

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

Roland:

Here's the next iteration. The arch-specific parts are now completely
encapsulated. validate_settings is in a form which should be workable
on all architectures. And the address, length, and type are passed as
arguments to register_{kernel,user}_hw_breakpoint().

I changed the Kprobes single-step routine along the lines you
suggested, but added a little extra. See what you think.

I haven't tried to modify Kconfig at all. To do it properly would
require making ptrace configurable, which is not something I want to
tackle at the moment.

The test for early termination of the exception handler is now back the
way it was. However I didn't change the test for deciding whether to
send a SIGTRAP. Under the current circumstances I don't see how it
could ever be wrong. (On the other hand, the code will end up calling
send_sigtrap() twice when a ptrace exception occurs: once in the ptrace
trigger routine and once in do_debug. That won't matter will it? I
would expect send_sigtrap() to be idempotent.)

Are you going to the Ottawa Linux Symposium?

Alan Stern



Index: usb-2.6/include/asm-i386/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-i386/hw_breakpoint.h
@@ -0,0 +1,49 @@
+#ifndef _I386_HW_BREAKPOINT_H
+#define _I386_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#define __ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+ unsigned long address;
+ u8 len;
+ u8 type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+ return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+ struct hw_breakpoint *bp)
+{
+ return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return bp->info.len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+ return bp->info.type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#endif /* __KERNEL__ */
+#endif /* _I386_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/process.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/process.c
+++ usb-2.6/arch/i386/kernel/process.c
@@ -57,6 +57,7 @@

#include <asm/tlbflush.h>
#include <asm/cpu.h>
+#include <asm/debugreg.h>

asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");

@@ -376,9 +377,10 @@ EXPORT_SYMBOL(kernel_thread);
*/
void exit_thread(void)
{
+ struct task_struct *tsk = current;
+
/* The process may have allocated an io port bitmap... nuke it. */
if (unlikely(test_thread_flag(TIF_IO_BITMAP))) {
- struct task_struct *tsk = current;
struct thread_struct *t = &tsk->thread;
int cpu = get_cpu();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -396,15 +398,17 @@ void exit_thread(void)
tss->x86_tss.io_bitmap_base = INVALID_IO_BITMAP_OFFSET;
put_cpu();
}
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
}

void flush_thread(void)
{
struct task_struct *tsk = current;

- memset(tsk->thread.debugreg, 0, sizeof(unsigned long)*8);
- memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
/*
* Forget coprocessor state..
*/
@@ -447,14 +451,21 @@ int copy_thread(int nr, unsigned long cl

savesegment(gs,p->thread.gs);

+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
+
tsk = current;
+ err = -ENOMEM;
+ if (unlikely(tsk->thread.hw_breakpoint_info)) {
+ if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
+ goto out;
+ }
+
if (unlikely(test_tsk_thread_flag(tsk, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmemdup(tsk->thread.io_bitmap_ptr,
IO_BITMAP_BYTES, GFP_KERNEL);
- if (!p->thread.io_bitmap_ptr) {
- p->thread.io_bitmap_max = 0;
- return -ENOMEM;
- }
+ if (!p->thread.io_bitmap_ptr)
+ goto out;
set_tsk_thread_flag(p, TIF_IO_BITMAP);
}

@@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl

err = 0;
out:
- if (err && p->thread.io_bitmap_ptr) {
+ if (err) {
+ flush_thread_hw_breakpoint(p);
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}
@@ -496,18 +508,18 @@ int copy_thread(int nr, unsigned long cl
*/
void dump_thread(struct pt_regs * regs, struct user * dump)
{
- int i;
+ struct task_struct *tsk = current;

/* changed the size calculations - should hopefully work better. lbt */
dump->magic = CMAGIC;
dump->start_code = 0;
dump->start_stack = regs->esp & ~(PAGE_SIZE - 1);
- dump->u_tsize = ((unsigned long) current->mm->end_code) >> PAGE_SHIFT;
- dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
+ dump->u_tsize = ((unsigned long) tsk->mm->end_code) >> PAGE_SHIFT;
+ dump->u_dsize = ((unsigned long) (tsk->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
- for (i = 0; i < 8; i++)
- dump->u_debugreg[i] = current->thread.debugreg[i];
+
+ dump_thread_hw_breakpoint(tsk, dump->u_debugreg);

if (dump->start_stack < TASK_SIZE)
dump->u_ssize = ((unsigned long) (TASK_SIZE - dump->start_stack)) >> PAGE_SHIFT;
@@ -557,16 +569,6 @@ static noinline void __switch_to_xtra(st

next = &next_p->thread;

- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- set_debugreg(next->debugreg[0], 0);
- set_debugreg(next->debugreg[1], 1);
- set_debugreg(next->debugreg[2], 2);
- set_debugreg(next->debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(next->debugreg[6], 6);
- set_debugreg(next->debugreg[7], 7);
- }
-
if (!test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
* Disable the bitmap via an invalid offset. We still cache
@@ -699,7 +701,7 @@ struct task_struct fastcall * __switch_t
set_iopl_mask(next->iopl);

/*
- * Now maybe handle debug registers and/or IO bitmaps
+ * Now maybe handle IO bitmaps
*/
if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
|| test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)))
@@ -731,6 +733,13 @@ struct task_struct fastcall * __switch_t

x86_write_percpu(current_task, next_p);

+ /*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
+
return prev_p;
}

Index: usb-2.6/arch/i386/kernel/signal.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/signal.c
+++ usb-2.6/arch/i386/kernel/signal.c
@@ -591,13 +591,6 @@ static void fastcall do_signal(struct pt

signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /* Reenable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (unlikely(current->thread.debugreg[7]))
- set_debugreg(current->thread.debugreg[7], 7);

/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
Index: usb-2.6/arch/i386/kernel/traps.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/traps.c
+++ usb-2.6/arch/i386/kernel/traps.c
@@ -804,62 +804,44 @@ fastcall void __kprobes do_int3(struct p
*/
fastcall void __kprobes do_debug(struct pt_regs * regs, long error_code)
{
- unsigned int condition;
struct task_struct *tsk = current;
+ unsigned long dr6;

- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */

- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
- SIGTRAP) == NOTIFY_STOP)
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
+ SIGTRAP) == NOTIFY_STOP)
return;
+
/* It's safe to allow irq's after DR6 has been saved */
if (regs->eflags & X86_EFLAGS_IF)
local_irq_enable();

- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg[7])
- goto clear_dr7;
+ if (regs->eflags & VM_MASK) {
+ handle_vm86_trap((struct kernel_vm86_regs *) regs,
+ error_code, 1);
+ return;
}

- if (regs->eflags & VM_MASK)
- goto debug_vm86;
-
- /* Save debug status register where ptrace can see it */
- tsk->thread.debugreg[6] = condition;
-
/*
- * Single-stepping through TF: make sure we ignore any events in
- * kernel space (but re-enable TF when returning to user mode).
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
*/
- if (condition & DR_STEP) {
- /*
- * We already checked v86 mode above, so we can
- * check for kernel mode by just checking the CPL
- * of CS.
- */
- if (!user_mode(regs))
- goto clear_TF_reenable;
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.vdr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->eflags &= ~X86_EFLAGS_TF;
}

- /* Ok, finally something we can handle */
- send_sigtrap(tsk, regs, error_code);
-
- /* Disable additional traps. They'll be re-enabled when
- * the signal is delivered.
- */
-clear_dr7:
- set_debugreg(0, 7);
- return;
-
-debug_vm86:
- handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->eflags &= ~TF_MASK;
- return;
+ if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3))
+ send_sigtrap(tsk, regs, error_code);
}

/*
Index: usb-2.6/include/asm-i386/debugreg.h
===================================================================
--- usb-2.6.orig/include/asm-i386/debugreg.h
+++ usb-2.6/include/asm-i386/debugreg.h
@@ -48,6 +48,8 @@

#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */

#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -61,4 +63,32 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */

+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM 4 /* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+ set_debugreg(0, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif /* __KERNEL__ */
+
#endif
Index: usb-2.6/include/asm-i386/processor.h
===================================================================
--- usb-2.6.orig/include/asm-i386/processor.h
+++ usb-2.6/include/asm-i386/processor.h
@@ -354,8 +354,9 @@ struct thread_struct {
unsigned long esp;
unsigned long fs;
unsigned long gs;
-/* Hardware debugging registers */
- unsigned long debugreg[8]; /* %%db0-7 debug registers */
+/* Hardware breakpoint info */
+ unsigned long vdr6;
+ struct thread_hw_breakpoint *hw_breakpoint_info;
/* fault info */
unsigned long cr2, trap_no, error_code;
/* floating point info */
Index: usb-2.6/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/arch/i386/kernel/hw_breakpoint.c
@@ -0,0 +1,653 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+/* QUESTIONS
+
+ How to know whether RF should be cleared when setting a user
+ execution breakpoint?
+
+*/
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/debugreg.h>
+#include <asm/hw_breakpoint.h>
+#include <asm/percpu.h>
+#include <asm/processor.h>
+
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+ /* utrace support */
+ struct list_head node; /* Entry in thread list */
+ struct list_head thread_bps; /* Thread's breakpoints */
+ struct hw_breakpoint *bps[HB_NUM]; /* Highest-priority bps */
+ unsigned long tdr[HB_NUM]; /* and their addresses */
+ int num_installed; /* Number of installed bps */
+ unsigned gennum; /* update-generation number */
+
+ /* Only the portions below are arch-specific */
+
+ /* ptrace support -- Note that vdr6 is stored directly in the
+ * thread_struct so that it is always available.
+ */
+ unsigned long vdr7; /* Virtualized DR7 */
+ struct hw_breakpoint vdr_bps[HB_NUM]; /* Breakpoints
+ representing virtualized debug registers 0 - 3 */
+ unsigned long tdr7; /* Thread's DR7 value */
+ unsigned long tkdr7; /* Thread + kernel DR7 value */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+ unsigned gennum; /* Generation number */
+ int num_kbps; /* Number of kernel bps */
+ struct hw_breakpoint *bps[HB_NUM]; /* Loaded breakpoints */
+
+ /* Only the portions below are arch-specific */
+ unsigned long mkdr7; /* Masked kernel DR7 value */
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+ struct kernel_bp_data *cur_kbpdata; /* Current kbpdata[] entry */
+ struct task_struct *bp_task; /* The thread whose bps
+ are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data kbpdata[2]; /* Old and new settings */
+static int cur_kbpindex; /* Alternates 0, 1, ... */
+static struct kernel_bp_data *cur_kbpdata = &kbpdata[0];
+ /* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8 tprio[HB_NUM]; /* Thread bp max priorities */
+static LIST_HEAD(kernel_bps); /* Kernel breakpoint list */
+static LIST_HEAD(thread_list); /* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex); /* Protects everything */
+
+/* Only the portions below are arch-specific */
+
+static unsigned long kdr7; /* Unmasked kernel DR7 value */
+
+/* Masks for the bits in DR7 related to kernel breakpoints, for various
+ * values of num_kbps. Entry n is the mask for when there are n kernel
+ * breakpoints, in debug registers 0 - (n-1). The DR_GLOBAL_SLOWDOWN bit
+ * (GE) is handled specially.
+ */
+static const unsigned long kdr7_masks[HB_NUM + 1] = {
+ 0x00000000,
+ 0x000f0003, /* LEN0, R/W0, G0, L0 */
+ 0x00ff000f, /* Same for 0,1 */
+ 0x0fff003f, /* Same for 0,1,2 */
+ 0xffff00ff /* Same for 0,1,2,3 */
+};
+
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+ struct hw_breakpoint **bps;
+
+ /* Don't allow debug exceptions while we update the registers */
+ set_debugreg(0, 7);
+ chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+ /* Kernel breakpoints are stored starting in DR0 and going up */
+ bps = chbi->cur_kbpdata->bps;
+ switch (chbi->cur_kbpdata->num_kbps) {
+ case 4:
+ set_debugreg(bps[3]->info.address, 3);
+ case 3:
+ set_debugreg(bps[2]->info.address, 2);
+ case 2:
+ set_debugreg(bps[1]->info.address, 1);
+ case 1:
+ set_debugreg(bps[0]->info.address, 0);
+ }
+ /* No need to set DR6 */
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+ struct kernel_bp_data *thr_kbpdata)
+{
+ int num = thr_kbpdata->num_kbps;
+
+ thbi->tkdr7 = thr_kbpdata->mkdr7 | (thbi->tdr7 & ~kdr7_masks[num]);
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+ /* Install the user breakpoints. Kernel breakpoints are stored
+ * starting in DR0 and going up; there are num_kbps of them.
+ * User breakpoints are stored starting in DR3 and going down,
+ * as many as we have room for.
+ */
+ switch (thbi->num_installed) {
+ case 4:
+ set_debugreg(thbi->tdr[0], 0);
+ case 3:
+ set_debugreg(thbi->tdr[1], 1);
+ case 2:
+ set_debugreg(thbi->tdr[2], 2);
+ case 1:
+ set_debugreg(thbi->tdr[3], 3);
+ }
+ /* No need to set DR6 */
+ set_debugreg(thbi->tkdr7, 7);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+ set_debugreg(chbi->cur_kbpdata->mkdr7, 7);
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+ int num = new_kbpdata->num_kbps;
+
+ new_kbpdata->mkdr7 = kdr7 & (kdr7_masks[num] | DR_GLOBAL_SLOWDOWN);
+}
+
+/*
+ * Store a thread breakpoint array entry's address
+ */
+static void arch_store_thread_bp_array(struct thread_hw_breakpoint *thbi,
+ struct hw_breakpoint *bp, int i)
+{
+ thbi->tdr[i] = bp->info.address;
+}
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va,
+ struct task_struct *tsk)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE_OF(t) TASK_SIZE
+#endif
+ return (va < TASK_SIZE_OF(tsk));
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+#ifndef CONFIG_X86_64
+#define TASK_SIZE64 TASK_SIZE
+#endif
+ return (va >= TASK_SIZE64);
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static void arch_store_info(struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type)
+{
+ bp->info.address = address;
+ bp->info.len = len;
+ bp->info.type = type;
+}
+
+/*
+ * Encode the length, type, Exact, and Enable bits for a particular breakpoint
+ * as stored in debug register 7.
+ */
+static unsigned long encode_dr7(int drnum, unsigned len, unsigned type)
+{
+ unsigned long temp;
+
+ temp = (len | type) & 0xf;
+ temp <<= (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
+ temp |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)) |
+ DR_GLOBAL_SLOWDOWN;
+ return temp;
+}
+
+/*
+ * Calculate the DR7 value for a list of kernel or user breakpoints.
+ */
+static unsigned long calculate_dr7(struct thread_hw_breakpoint *thbi)
+{
+ int is_user;
+ struct list_head *bp_list;
+ struct hw_breakpoint *bp;
+ int i;
+ int drnum;
+ unsigned long dr7;
+
+ if (thbi) {
+ is_user = 1;
+ bp_list = &thbi->thread_bps;
+ drnum = HB_NUM - 1;
+ } else {
+ is_user = 0;
+ bp_list = &kernel_bps;
+ drnum = 0;
+ }
+
+ /* Kernel bps are assigned from DR0 on up, and user bps are assigned
+ * from DR3 on down. Accumulate all 4 bps; the kernel DR7 mask will
+ * select the appropriate bits later.
+ */
+ dr7 = 0;
+ i = 0;
+ list_for_each_entry(bp, bp_list, node) {
+
+ /* Get the debug register number and accumulate the bits */
+ dr7 |= encode_dr7(drnum, bp->info.len, bp->info.type);
+ if (++i >= HB_NUM)
+ break;
+ if (is_user)
+ --drnum;
+ else
+ ++drnum;
+ }
+ return dr7;
+}
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+
+ /* If this is an execution breakpoint for the current PC address,
+ * we should clear the task's RF so that the bp will be certain
+ * to trigger.
+ *
+ * FIXME: It's not so easy to get hold of the task's PC as a linear
+ * address! ptrace.c does this already...
+ */
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+ thbi->tdr7 = calculate_dr7(thbi);
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(
+ struct hw_breakpoint *bp)
+{
+ kdr7 = calculate_dr7(NULL);
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Copy out the debug register information for a core dump.
+ *
+ * tsk must be equal to current.
+ */
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ memset(u_debugreg, 0, sizeof u_debugreg);
+ if (thbi) {
+ for (i = 0; i < HB_NUM; ++i)
+ u_debugreg[i] = thbi->vdr_bps[i].info.address;
+ u_debugreg[7] = thbi->vdr7;
+ }
+ u_debugreg[6] = tsk->thread.vdr6;
+}
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ struct task_struct *tsk = current;
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ int i;
+
+ /* Store in the virtual DR6 register the fact that the breakpoint
+ * was hit so the thread's debugger will see it.
+ */
+ if (thbi) {
+ i = bp - thbi->vdr_bps;
+ tsk->thread.vdr6 |= (DR_TRAP0 << i);
+ send_sigtrap(tsk, regs, 0);
+ }
+}
+
+/*
+ * Handle PTRACE_PEEKUSR calls for the debug register area.
+ */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n)
+{
+ struct thread_hw_breakpoint *thbi;
+ unsigned long val = 0;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ thbi = tsk->thread.hw_breakpoint_info;
+ if (n < HB_NUM) {
+ if (thbi)
+ val = thbi->vdr_bps[n].info.address;
+ } else if (n == 6) {
+ val = tsk->thread.vdr6;
+ } else if (n == 7) {
+ if (thbi)
+ val = thbi->vdr7;
+ }
+ mutex_unlock(&hw_breakpoint_mutex);
+ return val;
+}
+
+/*
+ * Decode the length and type bits for a particular breakpoint as
+ * stored in debug register 7. Return the "enabled" status.
+ */
+static int decode_dr7(unsigned long dr7, int bpnum, unsigned *len,
+ unsigned *type)
+{
+ int temp = dr7 >> (DR_CONTROL_SHIFT + bpnum * DR_CONTROL_SIZE);
+
+ *len = (temp & 0xc) | 0x40;
+ *type = (temp & 0x3) | 0x80;
+ return (dr7 >> (bpnum * DR_ENABLE_SIZE)) & 0x3;
+}
+
+/*
+ * Handle ptrace writes to debug register 7.
+ */
+static int ptrace_write_dr7(struct task_struct *tsk,
+ struct thread_hw_breakpoint *thbi, unsigned long data)
+{
+ struct hw_breakpoint *bp;
+ int i;
+ int rc = 0;
+ unsigned long old_dr7 = thbi->vdr7;
+
+ data &= ~DR_CONTROL_RESERVED;
+
+ /* Loop through all the hardware breakpoints, making the
+ * appropriate changes to each.
+ */
+ restore_settings:
+ thbi->vdr7 = data;
+ bp = &thbi->vdr_bps[0];
+ for (i = 0; i < HB_NUM; (++i, ++bp)) {
+ int enabled;
+ unsigned len, type;
+
+ enabled = decode_dr7(data, i, &len, &type);
+
+ /* Unregister the breakpoint before trying to change it */
+ if (bp->status)
+ __unregister_user_hw_breakpoint(tsk, bp);
+
+ /* Now register the breakpoint if it should be enabled.
+ * New invalid entries will raise an error here.
+ */
+ if (enabled) {
+ bp->triggered = ptrace_triggered;
+ bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+ if (rc == 0 && __register_user_hw_breakpoint(tsk, bp,
+ bp->info.address, len, type) < 0)
+ break;
+ }
+ }
+
+ /* If anything above failed, restore the original settings */
+ if (i < HB_NUM) {
+ rc = -EIO;
+ data = old_dr7;
+ goto restore_settings;
+ }
+ return rc;
+}
+
+/*
+ * Handle PTRACE_POKEUSR calls for the debug register area.
+ */
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val)
+{
+ struct thread_hw_breakpoint *thbi;
+ int rc = -EIO;
+
+ /* We have to hold this lock the entire time, to prevent thbi
+ * from being deallocated out from under us.
+ */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* There are no DR4 or DR5 registers */
+ if (n == 4 || n == 5)
+ ;
+
+ /* Writes to DR6 modify the virtualized value */
+ else if (n == 6) {
+ tsk->thread.vdr6 = val;
+ rc = 0;
+ }
+
+ else if (!tsk->thread.hw_breakpoint_info && val == 0)
+ rc = 0; /* Minor optimization */
+
+ else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+ rc = -ENOMEM;
+
+ /* Writes to DR0 - DR3 change a breakpoint address */
+ else if (n < HB_NUM) {
+ struct hw_breakpoint *bp = &thbi->vdr_bps[n];
+
+ /* If the breakpoint is registered then unregister it,
+ * change it, and re-register it. Revert to the original
+ * address if an error occurs.
+ */
+ if (bp->status) {
+ unsigned long old_addr = bp->info.address;
+
+ __unregister_user_hw_breakpoint(tsk, bp);
+ rc = __register_user_hw_breakpoint(tsk, bp,
+ val, bp->info.len, bp->info.type);
+ if (rc < 0) {
+ __register_user_hw_breakpoint(tsk, bp,
+ old_addr,
+ bp->info.len, bp->info.type);
+ }
+ } else {
+ bp->info.address = val;
+ rc = 0;
+ }
+ }
+
+ /* All that's left is DR7 */
+ else
+ rc = ptrace_write_dr7(tsk, thbi, val);
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int __kprobes hw_breakpoint_handler(struct die_args *args)
+{
+ struct cpu_hw_breakpoint *chbi;
+ int i;
+ struct hw_breakpoint *bp;
+ struct thread_hw_breakpoint *thbi = NULL;
+
+ /* The DR6 value is stored in args->err */
+#define DR6 (args->err)
+
+ if (!(DR6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_DONE;
+
+ /* Assert that local interrupts are disabled */
+
+ /* Reset the DRn bits in the virtualized register value.
+ * The ptrace trigger routine will add in whatever is needed.
+ */
+ current->thread.vdr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
+
+ /* Are we a victim of lazy debug-register switching? */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ if (!chbi->bp_task)
+ ;
+ else if (chbi->bp_task != current) {
+
+ /* No user breakpoints are valid. Perform the belated
+ * debug-register switch.
+ */
+ switch_to_none_hw_breakpoint();
+ } else {
+ thbi = chbi->bp_task->thread.hw_breakpoint_info;
+ }
+
+ /* Disable all breakpoints so that the callbacks can run without
+ * triggering recursive debug exceptions.
+ */
+ set_debugreg(0, 7);
+
+ /* Handle all the breakpoints that were triggered */
+ for (i = 0; i < HB_NUM; ++i) {
+ if (likely(!(DR6 & (DR_TRAP0 << i))))
+ continue;
+
+ /* Find the corresponding hw_breakpoint structure and
+ * invoke its triggered callback.
+ */
+ if (i < chbi->cur_kbpdata->num_kbps)
+ bp = chbi->cur_kbpdata->bps[i];
+ else if (thbi)
+ bp = thbi->bps[i];
+ else /* False alarm due to lazy DR switching */
+ continue;
+ if (bp) { /* Should always be non-NULL */
+
+ /* Set RF at execution breakpoints */
+ if (bp->info.type == HW_BREAKPOINT_EXECUTE)
+ args->regs->eflags |= X86_EFLAGS_RF;
+ (bp->triggered)(bp, args->regs);
+ }
+ }
+
+ /* Re-enable the breakpoints */
+ set_debugreg(thbi ? thbi->tkdr7 : chbi->cur_kbpdata->mkdr7, 7);
+ put_cpu_no_resched();
+
+ if (!(DR6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
+ return NOTIFY_STOP;
+
+ return NOTIFY_DONE;
+#undef DR6
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int __kprobes hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+ if (val != DIE_DEBUG)
+ return NOTIFY_DONE;
+ return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify
+};
+
+static int __init init_hw_breakpoint(void)
+{
+ load_debug_registers();
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: usb-2.6/arch/i386/kernel/ptrace.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/ptrace.c
+++ usb-2.6/arch/i386/kernel/ptrace.c
@@ -382,11 +382,11 @@ long arch_ptrace(struct task_struct *chi
tmp = 0; /* Default return condition */
if(addr < FRAME_SIZE*sizeof(long))
tmp = getreg(child, addr);
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
+ else if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
addr -= (long) &dummy->u_debugreg[0];
addr = addr >> 2;
- tmp = child->thread.debugreg[addr];
+ tmp = thread_get_debugreg(child, addr);
}
ret = put_user(tmp, datap);
break;
@@ -416,59 +416,11 @@ long arch_ptrace(struct task_struct *chi
have to be selective about what portions we allow someone
to modify. */

- ret = -EIO;
- if(addr >= (long) &dummy->u_debugreg[0] &&
- addr <= (long) &dummy->u_debugreg[7]){
-
- if(addr == (long) &dummy->u_debugreg[4]) break;
- if(addr == (long) &dummy->u_debugreg[5]) break;
- if(addr < (long) &dummy->u_debugreg[4] &&
- ((unsigned long) data) >= TASK_SIZE-3) break;
-
- /* Sanity-check data. Take one half-byte at once with
- * check = (val >> (16 + 4*i)) & 0xf. It contains the
- * R/Wi and LENi bits; bits 0 and 1 are R/Wi, and bits
- * 2 and 3 are LENi. Given a list of invalid values,
- * we do mask |= 1 << invalid_value, so that
- * (mask >> check) & 1 is a correct test for invalid
- * values.
- *
- * R/Wi contains the type of the breakpoint /
- * watchpoint, LENi contains the length of the watched
- * data in the watchpoint case.
- *
- * The invalid values are:
- * - LENi == 0x10 (undefined), so mask |= 0x0f00.
- * - R/Wi == 0x10 (break on I/O reads or writes), so
- * mask |= 0x4444.
- * - R/Wi == 0x00 && LENi != 0x00, so we have mask |=
- * 0x1110.
- *
- * Finally, mask = 0x0f00 | 0x4444 | 0x1110 == 0x5f54.
- *
- * See the Intel Manual "System Programming Guide",
- * 15.2.4
- *
- * Note that LENi == 0x10 is defined on x86_64 in long
- * mode (i.e. even for 32-bit userspace software, but
- * 64-bit kernel), so the x86_64 mask value is 0x5454.
- * See the AMD manual no. 24593 (AMD64 System
- * Programming)*/
-
- if(addr == (long) &dummy->u_debugreg[7]) {
- data &= ~DR_CONTROL_RESERVED;
- for(i=0; i<4; i++)
- if ((0x5f54 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- goto out_tsk;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- }
- addr -= (long) &dummy->u_debugreg;
- addr = addr >> 2;
- child->thread.debugreg[addr] = data;
- ret = 0;
+ if (addr >= (long) &dummy->u_debugreg[0] &&
+ addr <= (long) &dummy->u_debugreg[7]) {
+ addr -= (long) &dummy->u_debugreg;
+ addr = addr >> 2;
+ ret = thread_set_debugreg(child, addr, data);
}
break;

@@ -624,7 +576,6 @@ long arch_ptrace(struct task_struct *chi
ret = ptrace_request(child, request, addr, data);
break;
}
- out_tsk:
return ret;
}

Index: usb-2.6/arch/i386/kernel/Makefile
===================================================================
--- usb-2.6.orig/arch/i386/kernel/Makefile
+++ usb-2.6/arch/i386/kernel/Makefile
@@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld
obj-y := process.o signal.o entry.o traps.o irq.o \
ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \
pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\
- quirks.o i8237.o topology.o alternative.o i8253.o tsc.o
+ quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \
+ hw_breakpoint.o

obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-y += cpu/
Index: usb-2.6/arch/i386/power/cpu.c
===================================================================
--- usb-2.6.orig/arch/i386/power/cpu.c
+++ usb-2.6/arch/i386/power/cpu.c
@@ -11,6 +11,7 @@
#include <linux/suspend.h>
#include <asm/mtrr.h>
#include <asm/mce.h>
+#include <asm/debugreg.h>

static struct saved_context saved_context;

@@ -46,6 +47,8 @@ void __save_processor_state(struct saved
ctxt->cr2 = read_cr2();
ctxt->cr3 = read_cr3();
ctxt->cr4 = read_cr4();
+
+ disable_debug_registers();
}

void save_processor_state(void)
@@ -70,20 +73,7 @@ static void fix_processor_context(void)

load_TR_desc(); /* This does ltr */
load_LDT(&current->active_mm->context); /* This does lldt */
-
- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg[7]){
- set_debugreg(current->thread.debugreg[0], 0);
- set_debugreg(current->thread.debugreg[1], 1);
- set_debugreg(current->thread.debugreg[2], 2);
- set_debugreg(current->thread.debugreg[3], 3);
- /* no 4 and 5 */
- set_debugreg(current->thread.debugreg[6], 6);
- set_debugreg(current->thread.debugreg[7], 7);
- }
-
+ load_debug_registers();
}

void __restore_processor_state(struct saved_context *ctxt)
Index: usb-2.6/arch/i386/kernel/kprobes.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/kprobes.c
+++ usb-2.6/arch/i386/kernel/kprobes.c
@@ -35,6 +35,7 @@
#include <asm/cacheflush.h>
#include <asm/desc.h>
#include <asm/uaccess.h>
+#include <asm/debugreg.h>

void jprobe_return_end(void);

@@ -660,9 +661,19 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
+ /*
+ * The DR6 value is stored in args->err.
+ * If DR_STEP is set and it's ours, we should clear DR_STEP
+ * from the user's virtualized DR6 register.
+ * Then if no more bits are set we should eat this exception.
+ */
+ if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+ current->thread.vdr6 &= ~DR_STEP;
+ if ((args->err & ~DR_STEP) == 0)
+ ret = NOTIFY_STOP;
+ }
break;
+
case DIE_GPF:
case DIE_PAGE_FAULT:
/* kprobe_running() needs smp_processor_id() */
Index: usb-2.6/include/asm-generic/hw_breakpoint.h
===================================================================
--- /dev/null
+++ usb-2.6/include/asm-generic/hw_breakpoint.h
@@ -0,0 +1,225 @@
+#ifndef _ASM_GENERIC_HW_BREAKPOINT_H
+#define _ASM_GENERIC_HW_BREAKPOINT_H
+
+#ifndef __ARCH_HW_BREAKPOINT_H
+#error "Please don't include this file directly"
+#endif
+
+#ifdef __KERNEL__
+#include <linux/list.h>
+#include <linux/types.h>
+
+/**
+ * struct hw_breakpoint - unified kernel/user-space hardware breakpoint
+ * @node: internal linked-list management
+ * @triggered: callback invoked when the breakpoint is hit
+ * @installed: callback invoked when the breakpoint is installed
+ * @uninstalled: callback invoked when the breakpoint is uninstalled
+ * @info: arch-specific breakpoint info (address, length, and type)
+ * @priority: requested priority level
+ * @status: current registration/installation status
+ *
+ * %hw_breakpoint structures are the kernel's way of representing
+ * hardware breakpoints. These can be either execute breakpoints
+ * (triggered on instruction execution) or data breakpoints (also known
+ * as "watchpoints", triggered on data access), and the breakpoint's
+ * target address can be located in either kernel space or user space.
+ *
+ * The breakpoint's address, length, and type are highly
+ * architecture-specific. The values are encoded in the @info field; you
+ * specify them when registering the breakpoint. To examine the encoded
+ * values use hw_breakpoint_get_{kaddress,uaddress,len,type}(), declared
+ * below.
+ *
+ * The address is specified as a regular kernel pointer (for kernel-space
+ * breakponts) or as an %__user pointer (for user-space breakpoints).
+ * With register_user_hw_breakpoint(), the address must refer to a
+ * location in user space. The breakpoint will be active only while the
+ * requested task is running. Conversely with
+ * register_kernel_hw_breakpoint(), the address must refer to a location
+ * in kernel space, and the breakpoint will be active on all CPUs
+ * regardless of the current task.
+ *
+ * The length is the breakpoint's extent in bytes, which is subject to
+ * certain limitations. include/asm/hw_breakpoint.h contains macros
+ * defining the available lengths for a specific architecture. Note that
+ * the address's alignment must match the length. The breakpoint will
+ * catch accesses to any byte in the range from address to address +
+ * (length - 1).
+ *
+ * The breakpoint's type indicates the sort of access that will cause it
+ * to trigger. Possible values may include:
+ *
+ * %HW_BREAKPOINT_EXECUTE (triggered on instruction execution),
+ * %HW_BREAKPOINT_RW (triggered on read or write access),
+ * %HW_BREAKPOINT_WRITE (triggered on write access), and
+ * %HW_BREAKPOINT_READ (triggered on read access).
+ *
+ * Appropriate macros are defined in include/asm/hw_breakpoint.h; not all
+ * possibilities are available on all architectures. Execute breakpoints
+ * must have length equal to the special value %HW_BREAKPOINT_LEN_EXECUTE.
+ *
+ * When a breakpoint gets hit, the @triggered callback is invoked
+ * in_interrupt with a pointer to the %hw_breakpoint structure and the
+ * processor registers. Execute-breakpoint traps occur before the
+ * breakpointed instruction runs; when the callback returns the
+ * instruction is restarted (this time without a debug exception). All
+ * other types of trap occur after the memory access has taken place.
+ * Breakpoints are disabled while @triggered runs, to avoid recursive
+ * traps and allow unhindered access to breakpointed memory.
+ *
+ * Hardware breakpoints are implemented using the CPU's debug registers,
+ * which are a limited hardware resource. Requests to register a
+ * breakpoint will always succeed provided the parameters are valid,
+ * but the breakpoint may not be installed in a debug register right
+ * away. Physical debug registers are allocated based on the priority
+ * level stored in @priority (higher values indicate higher priority).
+ * User-space breakpoints within a single thread compete with one
+ * another, and all user-space breakpoints compete with all kernel-space
+ * breakpoints; however user-space breakpoints in different threads do
+ * not compete. %HW_BREAKPOINT_PRIO_PTRACE is the level used for ptrace
+ * requests; an unobtrusive kernel-space breakpoint will use
+ * %HW_BREAKPOINT_PRIO_NORMAL to avoid disturbing user programs. A
+ * kernel-space breakpoint that always wants to be installed and doesn't
+ * care about disrupting user debugging sessions can specify
+ * %HW_BREAKPOINT_PRIO_HIGH.
+ *
+ * A particular breakpoint may be allocated (installed in) a debug
+ * register or deallocated (uninstalled) from its debug register at any
+ * time, as other breakpoints are registered and unregistered. The
+ * @installed and @uninstalled callbacks are invoked in_atomic when these
+ * events occur. It is legal for @installed or @uninstalled to be %NULL,
+ * however @triggered must not be. Note that it is not possible to
+ * register or unregister a breakpoint from within a callback routine,
+ * since doing so requires a process context. Note also that for user
+ * breakpoints, @installed and @uninstalled may be called during the
+ * middle of a context switch, at a time when it is not safe to call
+ * printk().
+ *
+ * For kernel-space breakpoints, @installed is invoked after the
+ * breakpoint is actually installed and @uninstalled is invoked before
+ * the breakpoint is actually uninstalled. As a result @triggered can
+ * be called when you may not expect it, but this way you will know that
+ * during the time interval from @installed to @uninstalled, all events
+ * are faithfully reported. (It is not possible to do any better than
+ * this in general, because on SMP systems there is no way to set a debug
+ * register simultaneously on all CPUs.) The same isn't always true with
+ * user-space breakpoints, but the differences should not be visible to a
+ * user process.
+ *
+ * If you need to know whether your kernel-space breakpoint was installed
+ * immediately upon registration, you can check the return value from
+ * register_kernel_hw_breakpoint(). If the value is not > 0, you can
+ * give up and unregister the breakpoint right away.
+ *
+ * @node and @status are intended for internal use. However @status
+ * may be read to determine whether or not the breakpoint is currently
+ * installed. (The value is not reliable unless local interrupts are
+ * disabled.)
+ *
+ * This sample code sets a breakpoint on pid_max and registers a callback
+ * function for writes to that variable. Note that it is not portable
+ * as written, because not all architectures support HW_BREAKPOINT_LEN_4.
+ *
+ * ----------------------------------------------------------------------
+ *
+ * #include <asm/hw_breakpoint.h>
+ *
+ * static void triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+ * {
+ * printk(KERN_DEBUG "Breakpoint triggered\n");
+ * dump_stack();
+ * .......<more debugging output>........
+ * }
+ *
+ * static struct hw_breakpoint my_bp;
+ *
+ * static int init_module(void)
+ * {
+ * ..........<do anything>............
+ * my_bp.triggered = triggered;
+ * my_bp.priority = HW_BREAKPOINT_PRIO_NORMAL;
+ * rc = register_kernel_hw_breakpoint(&my_bp, &pid_max,
+ * HW_BREAKPOINT_LEN_4, HW_BREAKPOINT_WRITE);
+ * ..........<do anything>............
+ * }
+ *
+ * static void cleanup_module(void)
+ * {
+ * ..........<do anything>............
+ * unregister_kernel_hw_breakpoint(&my_bp);
+ * ..........<do anything>............
+ * }
+ *
+ * ----------------------------------------------------------------------
+ *
+ */
+struct hw_breakpoint {
+ struct list_head node;
+ void (*triggered)(struct hw_breakpoint *, struct pt_regs *);
+ void (*installed)(struct hw_breakpoint *);
+ void (*uninstalled)(struct hw_breakpoint *);
+ struct arch_hw_breakpoint info;
+ u8 priority;
+ u8 status;
+};
+
+/*
+ * Inline accessor routines to retrieve the arch-specific parts of
+ * a breakpoint structure:
+ */
+static const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp);
+static const void __user *hw_breakpoint_get_uaddress(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp);
+static unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp);
+
+/*
+ * len and type values are defined in include/asm/hw_breakpoint.h.
+ * Available values vary according to the architecture. On i386 the
+ * possibilities are:
+ *
+ * HW_BREAKPOINT_LEN_1
+ * HW_BREAKPOINT_LEN_2
+ * HW_BREAKPOINT_LEN_4
+ * HW_BREAKPOINT_LEN_EXECUTE
+ * HW_BREAKPOINT_RW
+ * HW_BREAKPOINT_READ
+ * HW_BREAKPOINT_EXECUTE
+ *
+ * On other architectures HW_BREAKPOINT_LEN_8 may be available, and the
+ * 1-, 2-, and 4-byte lengths may be unavailable. There also may be
+ * HW_BREAKPOINT_WRITE. You can use #ifdef to check at compile time.
+ */
+
+/* Standard HW breakpoint priority levels (higher value = higher priority) */
+#define HW_BREAKPOINT_PRIO_NORMAL 25
+#define HW_BREAKPOINT_PRIO_PTRACE 50
+#define HW_BREAKPOINT_PRIO_HIGH 75
+
+/* HW breakpoint status values (0 = not registered) */
+#define HW_BREAKPOINT_REGISTERED 1
+#define HW_BREAKPOINT_INSTALLED 2
+
+/*
+ * The following two routines are meant to be called only from within
+ * the ptrace or utrace subsystems. The tsk argument will usually be a
+ * process being debugged by the current task, although it is also legal
+ * for tsk to be the current task. In any case it must be guaranteed
+ * that tsk will not start running in user mode while its breakpoints are
+ * being modified.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ const void __user *address, unsigned len, unsigned type);
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+/*
+ * Kernel breakpoints are not associated with any particular thread.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+ const void *address, unsigned len, unsigned type);
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp);
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_HW_BREAKPOINT_H */
Index: usb-2.6/arch/i386/kernel/machine_kexec.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/machine_kexec.c
+++ usb-2.6/arch/i386/kernel/machine_kexec.c
@@ -19,6 +19,7 @@
#include <asm/cpufeature.h>
#include <asm/desc.h>
#include <asm/system.h>
+#include <asm/debugreg.h>

#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -108,6 +109,7 @@ NORET_TYPE void machine_kexec(struct kim

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();

control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: usb-2.6/arch/i386/kernel/smpboot.c
===================================================================
--- usb-2.6.orig/arch/i386/kernel/smpboot.c
+++ usb-2.6/arch/i386/kernel/smpboot.c
@@ -58,6 +58,7 @@
#include <smpboot_hooks.h>
#include <asm/vmi.h>
#include <asm/mtrr.h>
+#include <asm/debugreg.h>

/* Set if we find a B stepping CPU */
static int __devinitdata smp_b_stepping;
@@ -427,6 +428,7 @@ static void __cpuinit start_secondary(vo
local_irq_enable();

wmb();
+ load_debug_registers();
cpu_idle();
}

@@ -1209,6 +1211,7 @@ int __cpu_disable(void)
fixup_irqs(map);
/* It's now safe to remove this processor from the online map */
cpu_clear(cpu, cpu_online_map);
+ disable_debug_registers();
return 0;
}

Index: usb-2.6/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ usb-2.6/kernel/hw_breakpoint.c
@@ -0,0 +1,777 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2007 Alan Stern
+ */
+
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ *
+ * This file contains the arch-independent routines. It is not meant
+ * to be compiled as a standalone source file; rather it should be
+ * #include'd by the arch-specific implementation.
+ */
+
+
+/*
+ * Install the debug register values for a new thread.
+ */
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct cpu_hw_breakpoint *chbi;
+ struct kernel_bp_data *thr_kbpdata;
+
+ /* This routine is on the hot path; it gets called for every
+ * context switch into a task with active breakpoints. We
+ * must make sure that the common case executes as quickly as
+ * possible.
+ */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = tsk;
+
+ /* Use RCU to synchronize with external updates */
+ rcu_read_lock();
+
+ /* Other CPUs might be making updates to the list of kernel
+ * breakpoints at this time. If they are, they will modify
+ * the other entry in kbpdata[] -- the one not pointed to
+ * by chbi->cur_kbpdata. So the update itself won't affect
+ * us directly.
+ *
+ * However when the update is finished, an IPI will arrive
+ * telling this CPU to change chbi->cur_kbpdata. We need
+ * to use a single consistent kbpdata[] entry, the present one.
+ * So we'll copy the pointer to a local variable, thr_kbpdata,
+ * and we must prevent the compiler from aliasing the two
+ * pointers. Only a compiler barrier is required, not a full
+ * memory barrier, because everything takes place on a single CPU.
+ */
+ restart:
+ thr_kbpdata = chbi->cur_kbpdata;
+ barrier();
+
+ /* Normally we can keep the same debug register settings as the
+ * last time this task ran. But if the kernel breakpoints have
+ * changed or any user breakpoints have been registered or
+ * unregistered, we need to handle the updates and possibly
+ * send out some notifications.
+ */
+ if (unlikely(thbi->gennum != thr_kbpdata->gennum)) {
+ struct hw_breakpoint *bp;
+ int i;
+ int num;
+
+ thbi->gennum = thr_kbpdata->gennum;
+ arch_update_thbi(thbi, thr_kbpdata);
+ num = thr_kbpdata->num_kbps;
+
+ /* This code can be invoked while a debugger is actively
+ * updating the thread's breakpoint list (for example, if
+ * someone sends SIGKILL to the task). We use RCU to
+ * protect our access to the list pointers. */
+ thbi->num_installed = 0;
+ i = HB_NUM;
+ list_for_each_entry_rcu(bp, &thbi->thread_bps, node) {
+
+ /* If this register is allocated for kernel bps,
+ * don't install. Otherwise do. */
+ if (--i < num) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ }
+ } else {
+ ++thbi->num_installed;
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+ }
+
+ /* Set the debug register */
+ arch_install_thbi(thbi);
+
+ /* Were there any kernel breakpoint changes while we were running? */
+ if (unlikely(chbi->cur_kbpdata != thr_kbpdata)) {
+
+ /* Some debug registers now be assigned to kernel bps and
+ * we might have messed them up. Reload all the kernel bps
+ * and then reload the thread bps.
+ */
+ arch_install_chbi(chbi);
+ goto restart;
+ }
+
+ rcu_read_unlock();
+ put_cpu_no_resched();
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void switch_to_none_hw_breakpoint(void)
+{
+ struct cpu_hw_breakpoint *chbi;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+ chbi->bp_task = NULL;
+
+ /* This routine gets called from only two places. In one
+ * the caller holds the hw_breakpoint_mutex; in the other
+ * interrupts are disabled. In either case, no kernel
+ * breakpoint updates can arrive while the routine runs.
+ * So we don't need to use RCU.
+ */
+ arch_install_none(chbi);
+ put_cpu_no_resched();
+}
+
+/*
+ * Update the debug registers on this CPU.
+ */
+static void update_this_cpu(void *unused)
+{
+ struct cpu_hw_breakpoint *chbi;
+ struct task_struct *tsk = current;
+
+ chbi = &per_cpu(cpu_info, get_cpu());
+
+ /* Install both the kernel and the user breakpoints */
+ arch_install_chbi(chbi);
+ if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ switch_to_thread_hw_breakpoint(tsk);
+
+ put_cpu_no_resched();
+}
+
+/*
+ * Tell all CPUs to update their debug registers.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void update_all_cpus(void)
+{
+ /* We don't need to use any sort of memory barrier. The IPI
+ * carried out by on_each_cpu() includes its own barriers.
+ */
+ on_each_cpu(update_this_cpu, NULL, 0, 0);
+ synchronize_rcu();
+}
+
+/*
+ * Load the debug registers during startup of a CPU.
+ */
+void load_debug_registers(void)
+{
+ unsigned long flags;
+
+ /* Prevent IPIs for new kernel breakpoint updates */
+ local_irq_save(flags);
+
+ rcu_read_lock();
+ update_this_cpu(NULL);
+ rcu_read_unlock();
+
+ local_irq_restore(flags);
+}
+
+/*
+ * Take the 4 highest-priority breakpoints in a thread and accumulate
+ * their priorities in tprio. Highest-priority entry is in tprio[3].
+ */
+static void accum_thread_tprio(struct thread_hw_breakpoint *thbi)
+{
+ int i;
+
+ for (i = HB_NUM - 1; i >= 0 && thbi->bps[i]; --i)
+ tprio[i] = max(tprio[i], thbi->bps[i]->priority);
+}
+
+/*
+ * Recalculate the value of the tprio array, the maximum priority levels
+ * requested by user breakpoints in all threads.
+ *
+ * Each thread has a list of registered breakpoints, kept in order of
+ * decreasing priority. We'll set tprio[0] to the maximum priority of
+ * the first entries in all the lists, tprio[1] to the maximum priority
+ * of the second entries in all the lists, etc. In the end, we'll know
+ * that no thread requires breakpoints with priorities higher than the
+ * values in tprio.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void recalc_tprio(void)
+{
+ struct thread_hw_breakpoint *thbi;
+
+ memset(tprio, 0, sizeof tprio);
+
+ /* Loop through all threads having registered breakpoints
+ * and accumulate the maximum priority levels in tprio.
+ */
+ list_for_each_entry(thbi, &thread_list, node)
+ accum_thread_tprio(thbi);
+}
+
+/*
+ * Decide how many debug registers will be allocated to kernel breakpoints
+ * and consequently, how many remain available for user breakpoints.
+ *
+ * The priorities of the entries in the list of registered kernel bps
+ * are compared against the priorities stored in tprio[]. The 4 highest
+ * winners overall get to be installed in a debug register; num_kpbs
+ * keeps track of how many of those winners come from the kernel list.
+ *
+ * If num_kbps changes, or if a kernel bp changes its installation status,
+ * then call update_all_cpus() so that the debug registers will be set
+ * correctly on every CPU. If neither condition holds then the set of
+ * kernel bps hasn't changed, and nothing more needs to be done.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static void balance_kernel_vs_user(void)
+{
+ int k, u;
+ int changed = 0;
+ struct hw_breakpoint *bp;
+ struct kernel_bp_data *new_kbpdata;
+
+ /* Determine how many debug registers are available for kernel
+ * breakpoints as opposed to user breakpoints, based on the
+ * priorities. Ties are resolved in favor of user bps.
+ */
+ k = 0; /* Next kernel bp to allocate */
+ u = HB_NUM - 1; /* Next user bp to allocate */
+ bp = list_entry(kernel_bps.next, struct hw_breakpoint, node);
+ while (k <= u) {
+ if (&bp->node == &kernel_bps || tprio[u] >= bp->priority)
+ --u; /* User bps win a slot */
+ else {
+ ++k; /* Kernel bp wins a slot */
+ if (bp->status != HW_BREAKPOINT_INSTALLED)
+ changed = 1;
+ bp = list_entry(bp->node.next, struct hw_breakpoint,
+ node);
+ }
+ }
+ if (k != cur_kbpdata->num_kbps)
+ changed = 1;
+
+ /* Notify the remaining kernel breakpoints that they are about
+ * to be uninstalled.
+ */
+ list_for_each_entry_from(bp, &kernel_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED) {
+ if (bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ changed = 1;
+ }
+ }
+
+ if (changed) {
+ cur_kbpindex ^= 1;
+ new_kbpdata = &kbpdata[cur_kbpindex];
+ new_kbpdata->gennum = cur_kbpdata->gennum + 1;
+ new_kbpdata->num_kbps = k;
+ arch_new_kbpdata(new_kbpdata);
+ u = 0;
+ list_for_each_entry(bp, &kernel_bps, node) {
+ if (u >= k)
+ break;
+ new_kbpdata->bps[u] = bp;
+ ++u;
+ }
+ rcu_assign_pointer(cur_kbpdata, new_kbpdata);
+
+ /* Tell all the CPUs to update their debug registers */
+ update_all_cpus();
+
+ /* Notify the breakpoints that just got installed */
+ for (u = 0; u < k; ++u) {
+ bp = new_kbpdata->bps[u];
+ if (bp->status != HW_BREAKPOINT_INSTALLED) {
+ bp->status = HW_BREAKPOINT_INSTALLED;
+ if (bp->installed)
+ (bp->installed)(bp);
+ }
+ }
+ }
+}
+
+/*
+ * Return the pointer to a thread's hw_breakpoint info area,
+ * and try to allocate one if it doesn't exist.
+ *
+ * The caller must hold hw_breakpoint_mutex.
+ */
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk)
+{
+ if (!tsk->thread.hw_breakpoint_info && !(tsk->flags & PF_EXITING)) {
+ struct thread_hw_breakpoint *thbi;
+
+ thbi = kzalloc(sizeof(struct thread_hw_breakpoint),
+ GFP_KERNEL);
+ if (thbi) {
+ INIT_LIST_HEAD(&thbi->node);
+ INIT_LIST_HEAD(&thbi->thread_bps);
+
+ /* Force an update the next time tsk runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+ tsk->thread.hw_breakpoint_info = thbi;
+ }
+ }
+ return tsk->thread.hw_breakpoint_info;
+}
+
+/*
+ * Erase all the hardware breakpoint info associated with a thread.
+ *
+ * If tsk != current then tsk must not be usable (for example, a
+ * child being cleaned up from a failed fork).
+ */
+void flush_thread_hw_breakpoint(struct task_struct *tsk)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+ struct hw_breakpoint *bp;
+
+ if (!thbi)
+ return;
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Let the breakpoints know they are being uninstalled */
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+ }
+
+ /* Remove tsk from the list of all threads with registered bps */
+ list_del(&thbi->node);
+
+ /* The thread no longer has any breakpoints associated with it */
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ tsk->thread.hw_breakpoint_info = NULL;
+ kfree(thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities */
+ recalc_tprio();
+ balance_kernel_vs_user();
+
+ /* Actually uninstall the breakpoints if necessary */
+ if (tsk == current)
+ switch_to_none_hw_breakpoint();
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/*
+ * Copy the hardware breakpoint info from a thread to its cloned child.
+ */
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags)
+{
+ /* We will assume that breakpoint settings are not inherited
+ * and the child starts out with no debug registers set.
+ * But what about CLONE_PTRACE?
+ */
+ clear_tsk_thread_flag(child, TIF_DEBUG);
+ return 0;
+}
+
+/*
+ * Store the highest-priority thread breakpoint entries in an array.
+ */
+static void store_thread_bp_array(struct thread_hw_breakpoint *thbi)
+{
+ struct hw_breakpoint *bp;
+ int i;
+
+ i = HB_NUM - 1;
+ list_for_each_entry(bp, &thbi->thread_bps, node) {
+ thbi->bps[i] = bp;
+ arch_store_thread_bp_array(thbi, bp, i);
+ if (--i < 0)
+ break;
+ }
+ while (i >= 0)
+ thbi->bps[i--] = NULL;
+
+ /* Force an update the next time this task runs */
+ thbi->gennum = cur_kbpdata->gennum - 2;
+}
+
+/*
+ * Insert a new breakpoint in a priority-sorted list.
+ * Return the bp's index in the list.
+ *
+ * Thread invariants:
+ * tsk_thread_flag(tsk, TIF_DEBUG) set implies
+ * tsk->thread.hw_breakpoint_info is not NULL.
+ * tsk_thread_flag(tsk, TIF_DEBUG) set iff thbi->thread_bps is non-empty
+ * iff thbi->node is on thread_list.
+ */
+static int insert_bp_in_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ struct list_head *head;
+ int pos;
+ struct hw_breakpoint *temp_bp;
+
+ /* tsk and thbi are NULL for kernel bps, non-NULL for user bps */
+ if (tsk)
+ head = &thbi->thread_bps;
+ else
+ head = &kernel_bps;
+
+ /* Equal-priority breakpoints get listed first-come-first-served */
+ pos = 0;
+ list_for_each_entry(temp_bp, head, node) {
+ if (bp->priority > temp_bp->priority)
+ break;
+ ++pos;
+ }
+ bp->status = HW_BREAKPOINT_REGISTERED;
+ list_add_tail(&bp->node, &temp_bp->node);
+
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ /* Is this the thread's first registered breakpoint? */
+ if (list_empty(&thbi->node)) {
+ set_tsk_thread_flag(tsk, TIF_DEBUG);
+ list_add(&thbi->node, &thread_list);
+ }
+ }
+ return pos;
+}
+
+/*
+ * Remove a breakpoint from its priority-sorted list.
+ *
+ * See the invariants mentioned above.
+ */
+static void remove_bp_from_list(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi, struct task_struct *tsk)
+{
+ /* Remove bp from the thread's/kernel's list. If the list is now
+ * empty we must clear the TIF_DEBUG flag. But keep the
+ * thread_hw_breakpoint structure, so that the virtualized debug
+ * register values will remain valid.
+ */
+ list_del(&bp->node);
+ if (tsk) {
+ store_thread_bp_array(thbi);
+
+ if (list_empty(&thbi->thread_bps)) {
+ list_del_init(&thbi->node);
+ clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ }
+ }
+
+ /* Tell the breakpoint it is being uninstalled */
+ if (bp->status == HW_BREAKPOINT_INSTALLED && bp->uninstalled)
+ (bp->uninstalled)(bp);
+ bp->status = 0;
+}
+
+/*
+ * Validate the settings in a hw_breakpoint structure.
+ */
+static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk,
+ unsigned long address, unsigned len, unsigned type)
+{
+ int rc = -EINVAL;
+ unsigned long align;
+
+ switch (type) {
+#ifdef HW_BREAKPOINT_EXECUTE
+ case HW_BREAKPOINT_EXECUTE:
+ if (len != HW_BREAKPOINT_LEN_EXECUTE)
+ return rc;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_READ
+ case HW_BREAKPOINT_READ: break;
+#endif
+#ifdef HW_BREAKPOINT_WRITE
+ case HW_BREAKPOINT_WRITE: break;
+#endif
+#ifdef HW_BREAKPOINT_RW
+ case HW_BREAKPOINT_RW: break;
+#endif
+ default:
+ return rc;
+ }
+
+ switch (len) {
+#ifdef HW_BREAKPOINT_LEN_1
+ case HW_BREAKPOINT_LEN_1:
+ align = 0;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_2
+ case HW_BREAKPOINT_LEN_2:
+ align = 1;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_4
+ case HW_BREAKPOINT_LEN_4:
+ align = 3;
+ break;
+#endif
+#ifdef HW_BREAKPOINT_LEN_8
+ case HW_BREAKPOINT_LEN_8:
+ align = 7;
+ break;
+#endif
+ default:
+ return rc;
+ }
+
+ /* Check that the low-order bits of the address are appropriate
+ * for the alignment implied by len.
+ */
+ if (address & align)
+ return rc;
+
+ /* Check that the virtual address is in the proper range */
+ if (tsk) {
+ if (!arch_check_va_in_userspace(address, tsk))
+ return rc;
+ } else {
+ if (!arch_check_va_in_kernelspace(address))
+ return rc;
+ }
+
+ if (bp->triggered) {
+ rc = 0;
+ arch_store_info(bp, address, len, type);
+ }
+ return rc;
+}
+
+/*
+ * Actual implementation of register_user_hw_breakpoint.
+ */
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type)
+{
+ int rc;
+ struct thread_hw_breakpoint *thbi;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, tsk, address, len, type);
+ if (rc)
+ return rc;
+
+ thbi = alloc_thread_hw_breakpoint(tsk);
+ if (!thbi)
+ return -ENOMEM;
+
+ /* Insert bp in the thread's list */
+ pos = insert_bp_in_list(bp, thbi, tsk);
+ arch_register_user_hw_breakpoint(bp, thbi);
+
+ /* Update and rebalance the priorities. We don't need to go through
+ * the list of all threads; adding a breakpoint can only cause the
+ * priorities for this thread to increase.
+ */
+ accum_thread_tprio(thbi);
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < HB_NUM - cur_kbpdata->num_kbps) {
+ rc = 1;
+
+ /* Does it need to be installed right now? */
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+ /* Otherwise it will get installed the next time tsk runs */
+ }
+
+ return rc;
+}
+
+/**
+ * register_user_hw_breakpoint - register a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint will be set
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be associated with @tsk's
+ * memory space and active only while @tsk is running. It does not
+ * guarantee that the breakpoint will be allocated to a debug register
+ * immediately; there may be other higher-priority breakpoints registered
+ * which require the use of all the debug registers.
+ *
+ * @tsk will normally be a process being debugged by the current process,
+ * but it may also be the current process.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp. @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ const void __user *address, unsigned len, unsigned type)
+{
+ int rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+ rc = __register_user_hw_breakpoint(tsk, bp,
+ (unsigned long) address, len, type);
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+/*
+ * Actual implementation of unregister_user_hw_breakpoint.
+ */
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ struct thread_hw_breakpoint *thbi = tsk->thread.hw_breakpoint_info;
+
+ if (!bp->status)
+ return; /* Not registered */
+
+ /* Remove bp from the thread's list */
+ remove_bp_from_list(bp, thbi, tsk);
+ arch_unregister_user_hw_breakpoint(bp, thbi);
+
+ /* Recalculate and rebalance the kernel-vs-user priorities,
+ * and actually uninstall bp if necessary.
+ */
+ recalc_tprio();
+ balance_kernel_vs_user();
+ if (tsk == current)
+ switch_to_thread_hw_breakpoint(tsk);
+}
+
+/**
+ * unregister_user_hw_breakpoint - unregister a hardware breakpoint for user space
+ * @tsk: the task in whose memory space the breakpoint is registered
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp)
+{
+ mutex_lock(&hw_breakpoint_mutex);
+ __unregister_user_hw_breakpoint(tsk, bp);
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+
+/**
+ * register_kernel_hw_breakpoint - register a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to register
+ * @address: location (virtual address) of the breakpoint
+ * @len: encoded extent of the breakpoint address (1, 2, 4, or 8 bytes)
+ * @type: breakpoint type (read-only, write-only, read-write, or execute)
+ *
+ * This routine registers a breakpoint to be active at all times. It
+ * does not guarantee that the breakpoint will be allocated to a debug
+ * register immediately; there may be other higher-priority breakpoints
+ * registered which require the use of all the debug registers.
+ *
+ * @address, @len, and @type are checked for validity and stored in
+ * encoded form in @bp. @bp->triggered and @bp->priority must be set
+ * properly.
+ *
+ * Returns 1 if @bp is allocated to a debug register, 0 if @bp is
+ * registered but not allowed to be installed, otherwise a negative error
+ * code.
+ */
+int register_kernel_hw_breakpoint(struct hw_breakpoint *bp,
+ const void *address, unsigned len, unsigned type)
+{
+ int rc;
+ int pos;
+
+ bp->status = 0;
+ rc = validate_settings(bp, NULL, (unsigned long) address, len, type);
+ if (rc)
+ return rc;
+
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Insert bp in the kernel's list */
+ pos = insert_bp_in_list(bp, NULL, NULL);
+ arch_register_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will install bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ /* Did bp get allocated to a debug register? We can tell from its
+ * position in the list. The number of registers allocated to
+ * kernel breakpoints is num_kbps; all the others are available for
+ * user breakpoints. If bp's position in the priority-ordered list
+ * is low enough, it will get a register.
+ */
+ if (pos < cur_kbpdata->num_kbps)
+ rc = 1;
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(register_kernel_hw_breakpoint);
+
+/**
+ * unregister_kernel_hw_breakpoint - unregister a hardware breakpoint for kernel space
+ * @bp: the breakpoint structure to unregister
+ *
+ * Uninstalls and unregisters @bp.
+ */
+void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+ if (!bp->status)
+ return; /* Not registered */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ /* Remove bp from the kernel's list */
+ remove_bp_from_list(bp, NULL, NULL);
+ arch_unregister_kernel_hw_breakpoint(bp);
+
+ /* Rebalance the priorities. This will uninstall bp if it
+ * was allocated a debug register.
+ */
+ balance_kernel_vs_user();
+
+ mutex_unlock(&hw_breakpoint_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_kernel_hw_breakpoint);

2007-06-26 18:17:32

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

---
arch/i386/kernel/hw_breakpoint.c | 5 --
arch/x86_64/ia32/ia32_aout.c | 10 ----
arch/x86_64/ia32/ptrace32.c | 65 ++++----------------------------
arch/x86_64/kernel/Makefile | 3 +
arch/x86_64/kernel/kprobes.c | 14 +++++-
arch/x86_64/kernel/machine_kexec.c | 2
arch/x86_64/kernel/process.c | 46 +++++++++++-----------
arch/x86_64/kernel/ptrace.c | 72 +++++------------------------------
arch/x86_64/kernel/signal.c | 8 ---
arch/x86_64/kernel/smpboot.c | 4 +
arch/x86_64/kernel/suspend.c | 17 +-------
arch/x86_64/kernel/traps.c | 75 +++++++++++++------------------------
include/asm-x86_64/debugreg.h | 30 ++++++++++++++
include/asm-x86_64/hw_breakpoint.h | 50 ++++++++++++++++++++++++
include/asm-x86_64/processor.h | 10 +---
include/asm-x86_64/suspend.h | 3 -
16 files changed, 184 insertions(+), 230 deletions(-)

Index: b/arch/x86_64/kernel/kprobes.c
===================================================================
--- a/arch/x86_64/kernel/kprobes.c
+++ b/arch/x86_64/kernel/kprobes.c
@@ -42,6 +42,7 @@
#include <asm/cacheflush.h>
#include <asm/pgtable.h>
#include <asm/uaccess.h>
+#include <asm/debugreg.h>

void jprobe_return_end(void);
static void __kprobes arch_copy_kprobe(struct kprobe *p);
@@ -652,8 +653,17 @@ int __kprobes kprobe_exceptions_notify(s
ret = NOTIFY_STOP;
break;
case DIE_DEBUG:
- if (post_kprobe_handler(args->regs))
- ret = NOTIFY_STOP;
+ /*
+ * The DR6 value is stored in args->err.
+ * If DR_STEP is set and it's ours, we should clear DR_STEP
+ * from the user's virtualized DR6 register.
+ * Then if no more bits are set we should eat this exception.
+ */
+ if ((args->err & DR_STEP) && post_kprobe_handler(args->regs)) {
+ current->thread.vdr6 &= ~DR_STEP;
+ if ((args->err & ~DR_STEP) == 0)
+ ret = NOTIFY_STOP;
+ }
break;
case DIE_GPF:
case DIE_PAGE_FAULT:
Index: b/include/asm-x86_64/hw_breakpoint.h
===================================================================
--- /dev/null
+++ b/include/asm-x86_64/hw_breakpoint.h
@@ -0,0 +1,50 @@
+#ifndef _X86_64_HW_BREAKPOINT_H
+#define _X86_64_HW_BREAKPOINT_H
+
+#ifdef __KERNEL__
+#define __ARCH_HW_BREAKPOINT_H
+
+struct arch_hw_breakpoint {
+ unsigned long address;
+ u8 len;
+ u8 type;
+} __attribute__((packed));
+
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+ return (const void *) bp->info.address;
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+ struct hw_breakpoint *bp)
+{
+ return (const void __user *) bp->info.address;
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return bp->info.len;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+ return bp->info.type;
+}
+
+/* Available HW breakpoint length encodings */
+#define HW_BREAKPOINT_LEN_1 0x40
+#define HW_BREAKPOINT_LEN_2 0x44
+#define HW_BREAKPOINT_LEN_4 0x4c
+#define HW_BREAKPOINT_LEN_8 0x48
+#define HW_BREAKPOINT_LEN_EXECUTE 0x40
+
+/* Available HW breakpoint type encodings */
+#define HW_BREAKPOINT_EXECUTE 0x80 /* trigger on instruction execute */
+#define HW_BREAKPOINT_WRITE 0x81 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x83 /* trigger on memory read or write */
+
+#endif /* __KERNEL__ */
+#endif /* _X86_64_HW_BREAKPOINT_H */
Index: b/include/asm-x86_64/debugreg.h
===================================================================
--- a/include/asm-x86_64/debugreg.h
+++ b/include/asm-x86_64/debugreg.h
@@ -49,6 +49,8 @@

#define DR_LOCAL_ENABLE_SHIFT 0 /* Extra shift to the local enable bit */
#define DR_GLOBAL_ENABLE_SHIFT 1 /* Extra shift to the global enable bit */
+#define DR_LOCAL_ENABLE (0x1) /* Local enable for reg 0 */
+#define DR_GLOBAL_ENABLE (0x2) /* Global enable for reg 0 */
#define DR_ENABLE_SIZE 2 /* 2 enable bits per register */

#define DR_LOCAL_ENABLE_MASK (0x55) /* Set local bits for all 4 regs */
@@ -62,4 +64,32 @@
#define DR_LOCAL_SLOWDOWN (0x100) /* Local slow the pipeline */
#define DR_GLOBAL_SLOWDOWN (0x200) /* Global slow the pipeline */

+
+/*
+ * HW breakpoint additions
+ */
+#ifdef __KERNEL__
+
+#define HB_NUM 4 /* Number of hardware breakpoints */
+
+/* For process management */
+void flush_thread_hw_breakpoint(struct task_struct *tsk);
+int copy_thread_hw_breakpoint(struct task_struct *tsk,
+ struct task_struct *child, unsigned long clone_flags);
+void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8]);
+void switch_to_thread_hw_breakpoint(struct task_struct *tsk);
+
+/* For CPU management */
+void load_debug_registers(void);
+static inline void disable_debug_registers(void)
+{
+ set_debugreg(0UL, 7);
+}
+
+/* For use by ptrace */
+unsigned long thread_get_debugreg(struct task_struct *tsk, int n);
+int thread_set_debugreg(struct task_struct *tsk, int n, unsigned long val);
+
+#endif /* __KERNEL__ */
+
#endif
Index: b/arch/x86_64/ia32/ia32_aout.c
===================================================================
--- a/arch/x86_64/ia32/ia32_aout.c
+++ b/arch/x86_64/ia32/ia32_aout.c
@@ -32,6 +32,7 @@
#include <asm/cacheflush.h>
#include <asm/user32.h>
#include <asm/ia32.h>
+#include <asm/debugreg.h>

#undef WARN_OLD
#undef CORE_DUMP /* probably broken */
@@ -57,14 +58,7 @@ static void dump_thread32(struct pt_regs
dump->u_dsize = ((unsigned long) (current->mm->brk + (PAGE_SIZE-1))) >> PAGE_SHIFT;
dump->u_dsize -= dump->u_tsize;
dump->u_ssize = 0;
- dump->u_debugreg[0] = current->thread.debugreg0;
- dump->u_debugreg[1] = current->thread.debugreg1;
- dump->u_debugreg[2] = current->thread.debugreg2;
- dump->u_debugreg[3] = current->thread.debugreg3;
- dump->u_debugreg[4] = 0;
- dump->u_debugreg[5] = 0;
- dump->u_debugreg[6] = current->thread.debugreg6;
- dump->u_debugreg[7] = current->thread.debugreg7;
+ dump_thread_hw_breakpoint(current, dump->u_debugreg);

if (dump->start_stack < 0xc0000000)
dump->u_ssize = ((unsigned long) (0xc0000000 - dump->start_stack)) >> PAGE_SHIFT;
Index: b/arch/x86_64/ia32/ptrace32.c
===================================================================
--- a/arch/x86_64/ia32/ptrace32.c
+++ b/arch/x86_64/ia32/ptrace32.c
@@ -39,7 +39,6 @@

static int putreg32(struct task_struct *child, unsigned regno, u32 val)
{
- int i;
__u64 *stack = (__u64 *)task_pt_regs(child);

switch (regno) {
@@ -85,43 +84,11 @@ static int putreg32(struct task_struct *
break;
}

- case offsetof(struct user32, u_debugreg[4]):
- case offsetof(struct user32, u_debugreg[5]):
- return -EIO;
-
- case offsetof(struct user32, u_debugreg[0]):
- child->thread.debugreg0 = val;
- break;
-
- case offsetof(struct user32, u_debugreg[1]):
- child->thread.debugreg1 = val;
- break;
-
- case offsetof(struct user32, u_debugreg[2]):
- child->thread.debugreg2 = val;
- break;
-
- case offsetof(struct user32, u_debugreg[3]):
- child->thread.debugreg3 = val;
- break;
-
- case offsetof(struct user32, u_debugreg[6]):
- child->thread.debugreg6 = val;
- break;
-
- case offsetof(struct user32, u_debugreg[7]):
- val &= ~DR_CONTROL_RESERVED;
- /* See arch/i386/kernel/ptrace.c for an explanation of
- * this awkward check.*/
- for(i=0; i<4; i++)
- if ((0x5454 >> ((val >> (16 + 4*i)) & 0xf)) & 1)
- return -EIO;
- child->thread.debugreg7 = val;
- if (val)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- break;
+ case offsetof(struct user32, u_debugreg[0])
+ ... offsetof(struct user32, u_debugreg[7]):
+ regno -= offsetof(struct user32, u_debugreg[0]);
+ regno >>= 2;
+ return thread_set_debugreg(child, regno, val);

default:
if (regno > sizeof(struct user32) || (regno & 3))
@@ -170,23 +137,11 @@ static int getreg32(struct task_struct *
R32(eflags, eflags);
R32(esp, rsp);

- case offsetof(struct user32, u_debugreg[0]):
- *val = child->thread.debugreg0;
- break;
- case offsetof(struct user32, u_debugreg[1]):
- *val = child->thread.debugreg1;
- break;
- case offsetof(struct user32, u_debugreg[2]):
- *val = child->thread.debugreg2;
- break;
- case offsetof(struct user32, u_debugreg[3]):
- *val = child->thread.debugreg3;
- break;
- case offsetof(struct user32, u_debugreg[6]):
- *val = child->thread.debugreg6;
- break;
- case offsetof(struct user32, u_debugreg[7]):
- *val = child->thread.debugreg7;
+ case offsetof(struct user32, u_debugreg[0])
+ ... offsetof(struct user32, u_debugreg[7]):
+ regno -= offsetof(struct user32, u_debugreg[0]);
+ regno >>= 2;
+ *val = thread_get_debugreg(child, regno);
break;

default:
Index: b/arch/x86_64/kernel/Makefile
===================================================================
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -61,3 +61,6 @@ msr-$(subst m,y,$(CONFIG_X86_MSR)) += .
alternative-y += ../../i386/kernel/alternative.o
pcspeaker-y += ../../i386/kernel/pcspeaker.o
perfctr-watchdog-y += ../../i386/kernel/cpu/perfctr-watchdog.o
+
+obj-y += hw_breakpoint.o
+hw_breakpoint-y += ../../i386/kernel/hw_breakpoint.o
Index: b/arch/x86_64/kernel/machine_kexec.c
===================================================================
--- a/arch/x86_64/kernel/machine_kexec.c
+++ b/arch/x86_64/kernel/machine_kexec.c
@@ -14,6 +14,7 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/io.h>
+#include <asm/debugreg.h>

#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
static u64 kexec_pgd[512] PAGE_ALIGNED;
@@ -185,6 +186,7 @@ NORET_TYPE void machine_kexec(struct kim

/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
+ disable_debug_registers();

control_page = page_address(image->control_code_page) + PAGE_SIZE;
memcpy(control_page, relocate_kernel, PAGE_SIZE);
Index: b/arch/x86_64/kernel/process.c
===================================================================
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -51,6 +51,7 @@
#include <asm/proto.h>
#include <asm/ia32.h>
#include <asm/idle.h>
+#include <asm/debugreg.h>

asmlinkage extern void ret_from_fork(void);

@@ -379,6 +380,9 @@ void exit_thread(void)
t->io_bitmap_max = 0;
put_cpu();
}
+
+ if (unlikely(me->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(me);
}

void flush_thread(void)
@@ -394,14 +398,10 @@ void flush_thread(void)
current_thread_info()->status |= TS_COMPAT;
}
}
- clear_tsk_thread_flag(tsk, TIF_DEBUG);

- tsk->thread.debugreg0 = 0;
- tsk->thread.debugreg1 = 0;
- tsk->thread.debugreg2 = 0;
- tsk->thread.debugreg3 = 0;
- tsk->thread.debugreg6 = 0;
- tsk->thread.debugreg7 = 0;
+ if (unlikely(tsk->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(tsk);
+
memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
/*
* Forget coprocessor state..
@@ -487,6 +487,14 @@ int copy_thread(int nr, unsigned long cl
asm("mov %%es,%0" : "=m" (p->thread.es));
asm("mov %%ds,%0" : "=m" (p->thread.ds));

+ p->thread.hw_breakpoint_info = NULL;
+ p->thread.io_bitmap_ptr = NULL;
+
+ err = -ENOMEM;
+ if (unlikely(me->thread.hw_breakpoint_info) &&
+ copy_thread_hw_breakpoint(me, p, clone_flags))
+ goto out;
+
if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
if (!p->thread.io_bitmap_ptr) {
@@ -513,6 +521,8 @@ int copy_thread(int nr, unsigned long cl
}
err = 0;
out:
+ if (err)
+ flush_thread_hw_breakpoint(p);
if (err && p->thread.io_bitmap_ptr) {
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
@@ -520,11 +530,6 @@ out:
return err;
}

-/*
- * This special macro can be used to load a debugging register
- */
-#define loaddebug(thread,r) set_debugreg(thread->debugreg ## r, r)
-
static inline void __switch_to_xtra(struct task_struct *prev_p,
struct task_struct *next_p,
struct tss_struct *tss)
@@ -534,16 +539,6 @@ static inline void __switch_to_xtra(stru
prev = &prev_p->thread,
next = &next_p->thread;

- if (test_tsk_thread_flag(next_p, TIF_DEBUG)) {
- loaddebug(next, 0);
- loaddebug(next, 1);
- loaddebug(next, 2);
- loaddebug(next, 3);
- /* no 4 and 5 */
- loaddebug(next, 6);
- loaddebug(next, 7);
- }
-
if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
/*
* Copy the relevant range of the IO bitmap.
@@ -557,6 +552,13 @@ static inline void __switch_to_xtra(stru
*/
memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
}
+
+ /*
+ * Handle debug registers. This must be done _after_ current
+ * is updated.
+ */
+ if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
+ switch_to_thread_hw_breakpoint(next_p);
}

/*
Index: b/arch/x86_64/kernel/ptrace.c
===================================================================
--- a/arch/x86_64/kernel/ptrace.c
+++ b/arch/x86_64/kernel/ptrace.c
@@ -307,7 +307,7 @@ static unsigned long getreg(struct task_

long arch_ptrace(struct task_struct *child, long request, long addr, long data)
{
- long i, ret;
+ long ret;
unsigned ui;

switch (request) {
@@ -338,23 +338,11 @@ long arch_ptrace(struct task_struct *chi
case 0 ... sizeof(struct user_regs_struct) - sizeof(long):
tmp = getreg(child, addr);
break;
- case offsetof(struct user, u_debugreg[0]):
- tmp = child->thread.debugreg0;
- break;
- case offsetof(struct user, u_debugreg[1]):
- tmp = child->thread.debugreg1;
- break;
- case offsetof(struct user, u_debugreg[2]):
- tmp = child->thread.debugreg2;
- break;
- case offsetof(struct user, u_debugreg[3]):
- tmp = child->thread.debugreg3;
- break;
- case offsetof(struct user, u_debugreg[6]):
- tmp = child->thread.debugreg6;
- break;
- case offsetof(struct user, u_debugreg[7]):
- tmp = child->thread.debugreg7;
+ case offsetof(struct user, u_debugreg[0])
+ ... offsetof(struct user, u_debugreg[7]):
+ addr -= offsetof(struct user, u_debugreg[0]);
+ addr >>= 3;
+ tmp = thread_get_debugreg(child, addr);
break;
default:
tmp = 0;
@@ -375,7 +363,6 @@ long arch_ptrace(struct task_struct *chi

case PTRACE_POKEUSR: /* write the word at location addr in the USER area */
{
- int dsize = test_tsk_thread_flag(child, TIF_IA32) ? 3 : 7;
ret = -EIO;
if ((addr & 7) ||
addr > sizeof(struct user) - 7)
@@ -385,49 +372,12 @@ long arch_ptrace(struct task_struct *chi
case 0 ... sizeof(struct user_regs_struct) - sizeof(long):
ret = putreg(child, addr, data);
break;
- /* Disallows to set a breakpoint into the vsyscall */
- case offsetof(struct user, u_debugreg[0]):
- if (data >= TASK_SIZE_OF(child) - dsize) break;
- child->thread.debugreg0 = data;
- ret = 0;
- break;
- case offsetof(struct user, u_debugreg[1]):
- if (data >= TASK_SIZE_OF(child) - dsize) break;
- child->thread.debugreg1 = data;
- ret = 0;
- break;
- case offsetof(struct user, u_debugreg[2]):
- if (data >= TASK_SIZE_OF(child) - dsize) break;
- child->thread.debugreg2 = data;
- ret = 0;
- break;
- case offsetof(struct user, u_debugreg[3]):
- if (data >= TASK_SIZE_OF(child) - dsize) break;
- child->thread.debugreg3 = data;
- ret = 0;
- break;
- case offsetof(struct user, u_debugreg[6]):
- if (data >> 32)
- break;
- child->thread.debugreg6 = data;
- ret = 0;
+ case offsetof(struct user, u_debugreg[0])
+ ... offsetof(struct user, u_debugreg[7]):
+ addr -= offsetof(struct user, u_debugreg[0]);
+ addr >>= 3;
+ ret = thread_set_debugreg(child, addr, data);
break;
- case offsetof(struct user, u_debugreg[7]):
- /* See arch/i386/kernel/ptrace.c for an explanation of
- * this awkward check.*/
- data &= ~DR_CONTROL_RESERVED;
- for(i=0; i<4; i++)
- if ((0x5554 >> ((data >> (16 + 4*i)) & 0xf)) & 1)
- break;
- if (i == 4) {
- child->thread.debugreg7 = data;
- if (data)
- set_tsk_thread_flag(child, TIF_DEBUG);
- else
- clear_tsk_thread_flag(child, TIF_DEBUG);
- ret = 0;
- }
- break;
}
break;
}
Index: b/arch/x86_64/kernel/signal.c
===================================================================
--- a/arch/x86_64/kernel/signal.c
+++ b/arch/x86_64/kernel/signal.c
@@ -411,14 +411,6 @@ static void do_signal(struct pt_regs *re

signr = get_signal_to_deliver(&info, &ka, regs, NULL);
if (signr > 0) {
- /* Reenable any watchpoints before delivering the
- * signal to user space. The processor register will
- * have been cleared if the watchpoint triggered
- * inside the kernel.
- */
- if (current->thread.debugreg7)
- set_debugreg(current->thread.debugreg7, 7);
-
/* Whee! Actually deliver the signal. */
if (handle_signal(signr, &info, &ka, oldset, regs) == 0) {
/* a signal was successfully delivered; the saved
Index: b/arch/x86_64/kernel/smpboot.c
===================================================================
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -59,6 +59,7 @@
#include <asm/irq.h>
#include <asm/hw_irq.h>
#include <asm/numa.h>
+#include <asm/debugreg.h>

/* Number of siblings per CPU package */
int smp_num_siblings = 1;
@@ -378,6 +379,8 @@ void __cpuinit start_secondary(void)

unlock_ipi_call_lock();

+ load_debug_registers();
+
cpu_idle();
}

@@ -1043,6 +1046,7 @@ int __cpu_disable(void)
spin_unlock(&vector_lock);
remove_cpu_from_maps();
fixup_irqs(cpu_online_map);
+ disable_debug_registers();
return 0;
}

Index: b/arch/x86_64/kernel/suspend.c
===================================================================
--- a/arch/x86_64/kernel/suspend.c
+++ b/arch/x86_64/kernel/suspend.c
@@ -13,6 +13,7 @@
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/mtrr.h>
+#include <asm/debugreg.h>

/* References to section boundaries */
extern const void __nosave_begin, __nosave_end;
@@ -60,6 +61,8 @@ void __save_processor_state(struct saved
asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
asm volatile ("movq %%cr4, %0" : "=r" (ctxt->cr4));
asm volatile ("movq %%cr8, %0" : "=r" (ctxt->cr8));
+
+ disable_debug_registers();
}

void save_processor_state(void)
@@ -131,19 +134,7 @@ void fix_processor_context(void)
load_TR_desc(); /* This does ltr */
load_LDT(&current->active_mm->context); /* This does lldt */

- /*
- * Now maybe reload the debug registers
- */
- if (current->thread.debugreg7){
- loaddebug(&current->thread, 0);
- loaddebug(&current->thread, 1);
- loaddebug(&current->thread, 2);
- loaddebug(&current->thread, 3);
- /* no 4 and 5 */
- loaddebug(&current->thread, 6);
- loaddebug(&current->thread, 7);
- }
-
+ load_debug_registers();
}

#ifdef CONFIG_SOFTWARE_SUSPEND
Index: b/arch/x86_64/kernel/traps.c
===================================================================
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -829,67 +829,46 @@ asmlinkage __kprobes struct pt_regs *syn
asmlinkage void __kprobes do_debug(struct pt_regs * regs,
unsigned long error_code)
{
- unsigned long condition;
+ unsigned long dr6;
struct task_struct *tsk = current;
siginfo_t info;

- get_debugreg(condition, 6);
+ get_debugreg(dr6, 6);
+ set_debugreg(0UL, 6); /* DR6 may or may not be cleared by the CPU */

- if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
+ /* Store the virtualized DR6 value */
+ tsk->thread.vdr6 = dr6;
+
+ if (notify_die(DIE_DEBUG, "debug", regs, dr6, error_code,
SIGTRAP) == NOTIFY_STOP)
return;

preempt_conditional_sti(regs);

- /* Mask out spurious debug traps due to lazy DR7 setting */
- if (condition & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
- if (!tsk->thread.debugreg7) {
- goto clear_dr7;
- }
+ /*
+ * Single-stepping through system calls: ignore any exceptions in
+ * kernel space, but re-enable TF when returning to user mode.
+ *
+ * We already checked v86 mode above, so we can check for kernel mode
+ * by just checking the CPL of CS.
+ */
+ if ((dr6 & DR_STEP) && !user_mode(regs)) {
+ tsk->thread.vdr6 &= ~DR_STEP;
+ set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
+ regs->eflags &= ~X86_EFLAGS_TF;
}

- tsk->thread.debugreg6 = condition;
-
- /* Mask out spurious TF errors due to lazy TF clearing */
- if (condition & DR_STEP) {
- /*
- * The TF error should be masked out only if the current
- * process is not traced and if the TRAP flag has been set
- * previously by a tracing process (condition detected by
- * the PT_DTRACE flag); remember that the i386 TRAP flag
- * can be modified by the process itself in user mode,
- * allowing programs to debug themselves without the ptrace()
- * interface.
- */
- if (!user_mode(regs))
- goto clear_TF_reenable;
- /*
- * Was the TF flag set by a debugger? If so, clear it now,
- * so that register information is correct.
- */
- if (tsk->ptrace & PT_DTRACE) {
- regs->eflags &= ~TF_MASK;
- tsk->ptrace &= ~PT_DTRACE;
- }
+ if (tsk->thread.vdr6 & (DR_STEP|DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)) {
+ /* Ok, finally something we can handle */
+ tsk->thread.trap_no = 1;
+ tsk->thread.error_code = error_code;
+ info.si_signo = SIGTRAP;
+ info.si_errno = 0;
+ info.si_code = TRAP_BRKPT;
+ info.si_addr = user_mode(regs) ? (void __user *)regs->rip : NULL;
+ force_sig_info(SIGTRAP, &info, tsk);
}

- /* Ok, finally something we can handle */
- tsk->thread.trap_no = 1;
- tsk->thread.error_code = error_code;
- info.si_signo = SIGTRAP;
- info.si_errno = 0;
- info.si_code = TRAP_BRKPT;
- info.si_addr = user_mode(regs) ? (void __user *)regs->rip : NULL;
- force_sig_info(SIGTRAP, &info, tsk);
-
-clear_dr7:
- set_debugreg(0UL, 7);
- preempt_conditional_cli(regs);
- return;
-
-clear_TF_reenable:
- set_tsk_thread_flag(tsk, TIF_SINGLESTEP);
- regs->eflags &= ~TF_MASK;
preempt_conditional_cli(regs);
}

Index: b/include/asm-x86_64/processor.h
===================================================================
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -221,13 +221,9 @@ struct thread_struct {
unsigned long fs;
unsigned long gs;
unsigned short es, ds, fsindex, gsindex;
-/* Hardware debugging registers */
- unsigned long debugreg0;
- unsigned long debugreg1;
- unsigned long debugreg2;
- unsigned long debugreg3;
- unsigned long debugreg6;
- unsigned long debugreg7;
+/* Hardware breakpoint info */
+ unsigned long vdr6;
+ struct thread_hw_breakpoint *hw_breakpoint_info;
/* fault info */
unsigned long cr2, trap_no, error_code;
/* floating point info */
Index: b/include/asm-x86_64/suspend.h
===================================================================
--- a/include/asm-x86_64/suspend.h
+++ b/include/asm-x86_64/suspend.h
@@ -39,9 +39,6 @@ extern unsigned long saved_context_r08,
extern unsigned long saved_context_r12, saved_context_r13, saved_context_r14, saved_context_r15;
extern unsigned long saved_context_eflags;

-#define loaddebug(thread,register) \
- set_debugreg((thread)->debugreg##register, register)
-
extern void fix_processor_context(void);

#ifdef CONFIG_ACPI_SLEEP
Index: b/arch/i386/kernel/hw_breakpoint.c
===================================================================
--- a/arch/i386/kernel/hw_breakpoint.c
+++ b/arch/i386/kernel/hw_breakpoint.c
@@ -128,7 +128,7 @@ static void arch_install_chbi(struct cpu
struct hw_breakpoint **bps;

/* Don't allow debug exceptions while we update the registers */
- set_debugreg(0, 7);
+ set_debugreg(0UL, 7);
chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);

/* Kernel breakpoints are stored starting in DR0 and going up */
@@ -391,7 +391,6 @@ static void ptrace_triggered(struct hw_b
if (thbi) {
i = bp - thbi->vdr_bps;
tsk->thread.vdr6 |= (DR_TRAP0 << i);
- send_sigtrap(tsk, regs, 0);
}
}

@@ -588,7 +587,7 @@ static int __kprobes hw_breakpoint_handl
/* Disable all breakpoints so that the callbacks can run without
* triggering recursive debug exceptions.
*/
- set_debugreg(0, 7);
+ set_debugreg(0UL, 7);

/* Handle all the breakpoints that were triggered */
for (i = 0; i < HB_NUM; ++i) {


Attachments:
bptest-fixup.patch (2.18 kB)
bptest patch
hw-breakpoint-x86_64.patch (23.84 kB)
hw-breakpoint port to x86-64
Download all attachments

2007-06-26 21:02:16

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> All right, I'll change it. And I'll encapsulate those fields. I still
> think it will accomplish nothing more than hiding some implementation
> details which don't really need to be hidden.

It makes me a little happier, and I at least consider that a substantial
accomplishment. ;-)

> It's below. The patch logs the value of DR6 when each debug interrupt
> occurs, and it adds another sysfs attribute to the bptest driver. The
> attribute is named "test", and it contains the value that the IRQ
> handler will write back to DR6. Combine this with the Alt-SysRq-P
> change already submitted, and you can get a clear view of what's going
> on.

Thanks. I haven't played with this.

> I see. So I could add a CONFIG_HW_BREAKPOINT option and make
> CONFIG_PTRACE depend on it. That will be simple enough.

Right.

> Do you think it would make sense to allow utrace without hw-breakpoint?

Sure. There's no special reason to want to turn hw-breakpoint off, but
it is a naturally separable option.

> Here's the next iteration. The arch-specific parts are now completely
> encapsulated. validate_settings is in a form which should be workable
> on all architectures. And the address, length, and type are passed as
> arguments to register_{kernel,user}_hw_breakpoint().

I like it!

> I haven't tried to modify Kconfig at all. To do it properly would
> require making ptrace configurable, which is not something I want to
> tackle at the moment.

You don't need to worry about that. Under utrace, CONFIG_PTRACE is
already separate and can be turned off. I don't think we need really to
finish the Kconfig stuff at all before I merge it into the utrace code.

> I changed the Kprobes single-step routine along the lines you
> suggested, but added a little extra. See what you think.
[...]
> The test for early termination of the exception handler is now back the
> way it was. However I didn't change the test for deciding whether to
> send a SIGTRAP. Under the current circumstances I don't see how it
> could ever be wrong. (On the other hand, the code will end up calling
> send_sigtrap() twice when a ptrace exception occurs: once in the ptrace
> trigger routine and once in do_debug. That won't matter will it? I
> would expect send_sigtrap() to be idempotent.)

Calling send_sigtrap twice during the same exception does happen to be
harmless, but I don't think it should be presumed to be. It is just not
the right way to go about things that you send a signal twice when there
is one signal you want to generate.

Also, send_sigtrap is an i386-only function (not even x86_64 has the
same). Only x86_64 will share this actual code, but all others will be
modelled on it. I think it makes things simplest across the board if
the standard form is that when there is a ptrace exception, the notifier
does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP
arch code.

So, hmm. In the old do_debug code, if a notifier returns NOTIFY_STOP,
it bails immediately, before the db6 value is saved in current->thread.
This is the normal theory of notify_die use, where NOTIFY_STOP means to
completely swallow the event as if it never happened. In the event
there were some third party notifier involved, it ought to be able to
swallow its magic exceptions as before and have no user-visible db6
change happen at the time of that exception. So how about this:

get_debugreg(condition, 6);
set_debugreg(0UL, 6); /* The CPU does not clear it. */

if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
SIGTRAP) == NOTIFY_STOP)
return;

The kprobes notifier uses max priority, so it will run first. Its
notifier code uses my version. For a single-step that belongs to it,
it will return NOTIFY_STOP and nothing else happens (noone touches
vdr6). (I think I'm dredging up old territory by asking what happens
when kprobes steps over an insn that hits a data breakpoint, but I
don't recall atm.)

vdr6 belongs wholly to hw_breakpoint, no other code refers to it
directly. hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
if it's a user-mode exception. If it's a ptrace exception it also
sets the mapped DR_TRAPn bits. If it's not a ptrace exception and
only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP. If
it's a spurious exception from lazy db7 setting, hw_breakpoint just
returns NOTIFY_STOP early.

The rest of the old do_debug code stays as it is, only clear_dr7 goes.

> Are you going to the Ottawa Linux Symposium?

I am not.

> @@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
>
> err = 0;
> out:
> - if (err && p->thread.io_bitmap_ptr) {
> + if (err) {
> + flush_thread_hw_breakpoint(p);
> kfree(p->thread.io_bitmap_ptr);
> p->thread.io_bitmap_max = 0;
> }

This can call kfree(NULL). I would leave the original code alone, i.e.:

if (err)
flush_thread_hw_breakpoint(p);
if (err && p->thread.io_bitmap_ptr) {
kfree(p->thread.io_bitmap_ptr);
p->thread.io_bitmap_max = 0;
}

> + set_debugreg(0, 7);

You'll note in my x86-64 patch changing these to 0UL. It matters for the
asm in the set_debugreg macro that the argument have type long, not int
(which plain 0 has).


Thanks,
Roland

2007-06-27 02:44:17

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Tue, 26 Jun 2007, Roland McGrath wrote:

> I needed the attached patch on top of the bptest patch for the current
> code. Btw, that is a very nice little tester!

I had already made some of those changes (the ones needed to make
bptest build with the new hw_breakpoint code). I'll add in the others.

> Below that is a patch to go on top of your current patch, with x86-64
> support. I've only tried a few trivial tests with bptest (including an
> 8-byte bp), which worked great. It is a pretty faithful copy of your i386
> changes. I'm still not sure we have all that right, but you might as well
> incorporate this into your patch. You should change the x86_64 code in
> parallel with any i386 changes we decide on later, and I can test it and
> send you any typo fixups or whatnot.

Right. I may update a few comments...

Alan Stern

2007-06-27 03:31:41

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Tue, 26 Jun 2007, Roland McGrath wrote:

> > Here's the next iteration. The arch-specific parts are now completely
> > encapsulated. validate_settings is in a form which should be workable
> > on all architectures. And the address, length, and type are passed as
> > arguments to register_{kernel,user}_hw_breakpoint().
>
> I like it!

Good. My earlier stubbornness was caused by a desire to allow static
initializers, but now I see that specifying the values in the
registration call really isn't all that bad.

> > I haven't tried to modify Kconfig at all. To do it properly would
> > require making ptrace configurable, which is not something I want to
> > tackle at the moment.
>
> You don't need to worry about that. Under utrace, CONFIG_PTRACE is
> already separate and can be turned off. I don't think we need really to
> finish the Kconfig stuff at all before I merge it into the utrace code.

So far this work has all been based on the vanilla kernel. Should I
switch over to basing it on -mm?


> Calling send_sigtrap twice during the same exception does happen to be
> harmless, but I don't think it should be presumed to be. It is just not
> the right way to go about things that you send a signal twice when there
> is one signal you want to generate.

What happens when there are two ptrace exceptions at different points
during the same system call? Won't we end up sending the signal twice
no matter what?

> Also, send_sigtrap is an i386-only function (not even x86_64 has the
> same). Only x86_64 will share this actual code, but all others will be
> modelled on it. I think it makes things simplest across the board if
> the standard form is that when there is a ptrace exception, the notifier
> does not return NOTIFY_STOP, so it falls through to the existing SIGTRAP
> arch code.
>
> So, hmm. In the old do_debug code, if a notifier returns NOTIFY_STOP,
> it bails immediately, before the db6 value is saved in current->thread.
> This is the normal theory of notify_die use, where NOTIFY_STOP means to
> completely swallow the event as if it never happened. In the event
> there were some third party notifier involved, it ought to be able to
> swallow its magic exceptions as before and have no user-visible db6
> change happen at the time of that exception. So how about this:
>
> get_debugreg(condition, 6);
> set_debugreg(0UL, 6); /* The CPU does not clear it. */
>
> if (notify_die(DIE_DEBUG, "debug", regs, condition, error_code,
> SIGTRAP) == NOTIFY_STOP)
> return;
>
> The kprobes notifier uses max priority, so it will run first. Its
> notifier code uses my version. For a single-step that belongs to it,
> it will return NOTIFY_STOP and nothing else happens (noone touches
> vdr6). (I think I'm dredging up old territory by asking what happens
> when kprobes steps over an insn that hits a data breakpoint, but I
> don't recall atm.)

In theory we should get an exception with both DR_STEP and DR_TRAPn
set, meaning that neither notifier will return NOTIFY_STOP. But if the
kprobes handler clears DR_STEP in the DR6 image passed to the
hw_breakpoint handler, it should work out better.

> vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> directly. hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> if it's a user-mode exception. If it's a ptrace exception it also
> sets the mapped DR_TRAPn bits. If it's not a ptrace exception and
> only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP. If
> it's a spurious exception from lazy db7 setting, hw_breakpoint just
> returns NOTIFY_STOP early.

That sounds not quite right. To a user-space debugger, a system call
should appear as an atomic operation. If multiple ptrace exceptions
occur during a system call, all the relevant DR_TRAPn bits should be
set in vdr6 together and all the other ones reset. How can we arrange
that?

There's also the question of whether to send the SIGTRAP. If
extraneous bits are set in DR6 (e.g., because the CPU always sets some
extra bits) then we will never get NOTIFY_STOP. Nevertheless, the
signal should not always be sent.

> > @@ -484,7 +495,8 @@ int copy_thread(int nr, unsigned long cl
> >
> > err = 0;
> > out:
> > - if (err && p->thread.io_bitmap_ptr) {
> > + if (err) {
> > + flush_thread_hw_breakpoint(p);
> > kfree(p->thread.io_bitmap_ptr);
> > p->thread.io_bitmap_max = 0;
> > }
>
> This can call kfree(NULL). I would leave the original code alone, i.e.:
>
> if (err)
> flush_thread_hw_breakpoint(p);
> if (err && p->thread.io_bitmap_ptr) {
> kfree(p->thread.io_bitmap_ptr);
> p->thread.io_bitmap_max = 0;
> }

I disagree. kfree() is documented to return harmlessly when passed a
NULL pointer, and lots of places in the kernel have been changed to
remove useless tests for NULL before calls to kfree(). This is just
another example.

> > + set_debugreg(0, 7);
>
> You'll note in my x86-64 patch changing these to 0UL. It matters for the
> asm in the set_debugreg macro that the argument have type long, not int
> (which plain 0 has).

I figured there was some reason like that.

Alan Stern

2007-06-27 21:04:31

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

> So far this work has all been based on the vanilla kernel. Should I
> switch over to basing it on -mm?

It doesn't much matter at the moment. Sticking with vanilla is the easiest
for you and me testing it right now.

> > Calling send_sigtrap twice during the same exception does happen to be
> > harmless, but I don't think it should be presumed to be. It is just not
> > the right way to go about things that you send a signal twice when there
> > is one signal you want to generate.
>
> What happens when there are two ptrace exceptions at different points
> during the same system call? Won't we end up sending the signal twice
> no matter what?

Well then that is two signals for good reason, so that is a different
story. It winds up indistinguishable from only sending the second, but as
far as the organization of the code and thinking about the semantics, twice
is right in this case and once is right in the simpler case.

> In theory we should get an exception with both DR_STEP and DR_TRAPn
> set, meaning that neither notifier will return NOTIFY_STOP. But if the
> kprobes handler clears DR_STEP in the DR6 image passed to the
> hw_breakpoint handler, it should work out better.

It's since occurred to me that kprobes can and should do:

args->err &= ~(unsigned long) DR_STEP;
if (args->err == 0)
return NOTIFY_STOP;

This doesn't affect do_debug directly, but it will change the value seen by
the next notifier. So if hw_breakpoint_handler is responsible for setting
vdr6 based on its args->err value, we should win.

> > vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> > directly. hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> > if it's a user-mode exception. If it's a ptrace exception it also
> > sets the mapped DR_TRAPn bits. If it's not a ptrace exception and
> > only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP. If
> > it's a spurious exception from lazy db7 setting, hw_breakpoint just
> > returns NOTIFY_STOP early.
>
> That sounds not quite right. To a user-space debugger, a system call
> should appear as an atomic operation. If multiple ptrace exceptions
> occur during a system call, all the relevant DR_TRAPn bits should be
> set in vdr6 together and all the other ones reset. How can we arrange
> that?

That would be nice. But it's more than the old code did. I don't feel any
strong need to improve the situation when using ptrace. The old code
disabled breakpoints after the first hit, so userland would only see the
first DR_TRAPn bit. (Even if it didn't, with the blind copying of the
hardware %db6 value, we now know it would only see one DR_TRAPn bit still
set after a second exception.) With my suggestion above, userland would
only see the last DR_TRAPn bit. So it's not worse.

In the ptrace case, we know it's always going to wind up with a signal
before it finishes and returns to user mode. So one approach would be in
e.g. do_notify_resume, do:

if (thread_info_flags & _TIF_DEBUG)
current->thread.hw_breakpoint_info->someflag = 0;

Then ptrace_triggered could set someflag, and know from it still being set
on entry that it's a second trigger without getting back to user mode yet
(and so accumulate bits instead reset old ones).

But I just would not bother improving ptrace beyond the status quo for a
corner case noone has cared about in practice so far. In sensible
mechanisms of the future, nothing will examine db6 values directly.

> There's also the question of whether to send the SIGTRAP. If
> extraneous bits are set in DR6 (e.g., because the CPU always sets some
> extra bits) then we will never get NOTIFY_STOP. Nevertheless, the
> signal should not always be sent.

Yeah. The current Intel manual describes all the unspecified DR6 bits as
explicitly reserved and set to 1 (except 0x1000 reserved and 0). If new
meanings are assigned in future chips, presumably those will only be
enabled by some new explicit cr/msr setting. Those might be enabled by
some extra module or something, but there is only so much we can do to
accomodate. I think the best plan is that notifiers should do:

args->err &= ~bits_i_recognize_as_mine;
if (!(args->err & known_bits))
return NOTIFY_STOP;

known_bits are the ones we use, plus 0x8000 (DR_SWITCH/BS) and 0x2000 (BD).
(Those two should be impossible without some strange new kernel bug.)
Probably should write it as ~DR_STATUS_RESERVED, to parallel existing macros.

Then we only possibly interfere with a newfangled debug exception flavor
that occurs in the same one debug exception for an instruction also
triggering for hw_breakpoint or step. In the more likely cases of a new
flavor of exception happening by itself, or the aforementioned strange new
kernel bugs, we will get to the bottom of do_debug and do the SIGTRAP.

For this plan, hw_breakpoint_handler also needs not to return NOTIFY_STOP
as a special case for a ptrace trigger.

> I disagree. kfree() is documented to return harmlessly when passed a
> NULL pointer, and lots of places in the kernel have been changed to
> remove useless tests for NULL before calls to kfree(). This is just
> another example.

Ok. I have no special opinions about that. I just tend to avoid folding
miscellaneous changes into a patch adding new code. It would be better
form to send first the trivial cleanup patch removing that second condition.


Thanks,
Roland

2007-06-28 03:03:09

by Roland McGrath

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

I did the first crack at a powerpc port. I'd appreciate your comments on
this patch. It should not be incorporated, isn't finished, probably breaks
ptrace, etc. I'm posting it now just to get any thoughts you have raised
by seeing the second machine share the intended arch-independent code.

I just translated your implementation to powerpc terms without thinking
about it much. If you see anything that you aren't sure is right, please
tell me and don't presume there is some powerpc-specific reason it's
different. More likely I just didn't think it through.

In the first battle just to make it compile, the only issue was that you
assume every machine has TIF_DEBUG, which is in fact an implementation
detail chosen lately by i386 and x86_64. AFAIK the only reason for it
there is just to make a cheap test of multiple bits in the hot path
deciding to call __switch_to_xtra. Do you rely on it meaning something
more precise than just being a shorthand for hw_breakpoint_info!=NULL?

Incidentally, I think it would be nice if kernel/hw_breakpoint.c itself had
all the #include's for everything it uses directly. arch hw_breakpoint.c
files probably only need <asm/hw_breakpoint.h> and one or two others to
define what they need before #include "../../../kernel/hw_breakpoint.c".

The num_installed/num_kbps stuff feels a little hokey when it's really a
flag because the maximum number is one. It seems like I could make it
tighter with some more finesse in the arch-specific hook options, so that
chbi and thbi each just store dabr, dabr!=0 means "mine gets installed",
and the switch in is just chbi->dabr?:thbi->dabr or something like that.
As we get more machines, more cleanups along these lines will probably make
sense. (Also, before the next person not me or you tries a port, we could
use for the generic hw_breakpoint.c to get some comments at the top making
explicit what the arch file is expected to define in its types, etc.)

With just the included change to the generic code for the TIF_DEBUG, this
kind of works. That is, it doesn't break everything else and I can use
bptest, sort of. I didn't even try ptrace, I probably broke that.

It works enough to make clear the main new wrinkle. On powerpc, the data
breakpoint exception is a fault before the instruction executes, not a trap
after it. The load/store will not complete until the breakpoint is cleared.
With this patch, you can use bptest to generate a tight loop of bp0 triggers.

For ptrace compatibility, userland already expects to deal with this. gdb
has it as per-machine implementation options how ptrace watchpoints behave,
and for powerpc it knows to remove the watchpoint, step, and reinsert it.

One approach for hw_breakpoint is just to expose in asm/hw_breakpoint.h
some standard macros saying how things behave, and caveat emptor. But I
don't like that much. I think things will just wind up being confused and
inadvertently unportable if the important semantics vary too much between
machines. The point of the whole facility is to make watchpoints easy to
use, after all.

Some uses might be happy with trigger-before, but I don't see much benefit.
For writing, the trigger function can look at the memory before it's
changed. But you could just as well have recorded the old value before
setting the breakpoint, as you have to for trigger-after--and to see both
old and new values you then need to single-step to get the new value, which
trigger-after handles with a single exception. For reading, the trigger
function can change the memory before it's read. But likewise, you could
just as well have changed it before setting the breakpoint--you know noone
will have read the new value until your trigger anyway. (I have never used
a read-triggered breakpoint, so I'm rather vague on those use scenarios.)

The third machine whose manual I have handy is ia64. It has instruction
and data breakpoints that are both trigger-before. It has processor flags
similar to x86's RF for both, to ignore one or both breakpoint flavor for
one instruction. That makes it cheap to continue past the breakpoint since
you don't have to clear and reset it. But for getting new values from
data-write breakpoints, it still requires a single-step and second stop,
like powerpc. (Incidentally ia64 has another interesting feature, which I
think the generic code accomodates nicely as an upward-compatible addition
just by changing the len arg in the register and arch_* calls to unsigned long,
and adding an arch_validate_len that can short-circuit the generic length
and alignment check.)

So, I'd like your thoughts on the whole situation. The starting point we
can do without anything else is:

int hw_breakpoint_triggers_before(struct hw_breakpoint *);
int hw_breakpoint_can_resume(struct hw_breakpoint *);

or perhaps taking (unsigned int type) instead, in <asm-cpu/hw_breakpoint.h>.
i.e. for x86:

#define hw_breakpoint_triggers_before(type) ((type) == HW_BREAKPOINT_EXECUTE)
#define hw_breakpoint_can_resume(type) 1

and powerpc:

#define hw_breakpoint_triggers_before(any) 1
#define hw_breakpoint_can_resume(any) 0


For powerpc at least (and I figure for ia64 too) it seems easy enough to
implement disable-step-enable to turn it into trigger-after. But it is
costly and hairy if one doesn't care. So now I'm thinking to somewhat
follow the kprobes model, and have pre and post trigger handler options.
i.e.

int hw_breakpoint_pre_handle_type(unsigned type);
int hw_breakpoint_post_handle_type(unsigned type);

and in struct hw_breakpoint (replacing trigger):

int (*pre_handler)(struct hw_breakpoint *, struct pt_regs *);
void (*post_handler)(struct hw_breakpoint *, struct pt_regs *);

The pre_handler returns zero if it wants the post_handler to run. On x86,
register would return -EINVAL if pre_handler is not NULL and type is not
EXECUTE (i.e. pre_handle_type returns false). It also fails if
post_handler is not NULL and post_handle_type returns false, meaning the
arch code doesn't want to deal with step-over-and-trigger.

We'd still want hw_breakpoint_can_resume to tell whether you can return
from a pre_handler and continue with no a post_handler, without needing to
unregister the breakpoint. That's true on ia64, while on powerpc you
either have to clear the breakpoint or request the post_handler stepping logic.


Thanks,
Roland


---
arch/powerpc/kernel/Makefile | 2
arch/powerpc/kernel/hw_breakpoint.c | 348 ++++++++++++++++++++++++++++++++++++
arch/powerpc/kernel/process.c | 14 -
arch/powerpc/kernel/ptrace-common.h | 16 -
arch/powerpc/kernel/ptrace.c | 2
arch/powerpc/kernel/ptrace32.c | 2
arch/powerpc/kernel/signal_32.c | 10 -
arch/powerpc/kernel/signal_64.c | 8
arch/powerpc/mm/fault.c | 19 -
include/asm-powerpc/hw_breakpoint.h | 49 +++++
include/asm-powerpc/processor.h | 2
kernel/hw_breakpoint.c | 22 +-
12 files changed, 438 insertions(+), 56 deletions(-)

Index: b/include/asm-powerpc/hw_breakpoint.h
===================================================================
--- /dev/null
+++ b/include/asm-powerpc/hw_breakpoint.h
@@ -0,0 +1,49 @@
+#ifndef _ASM_POWERPC_HW_BREAKPOINT_H
+#define _ASM_POWERPC_HW_BREAKPOINT_H
+
+/*
+ * The only available size of data breakpoint is 8.
+ */
+#define HW_BREAKPOINT_LEN_8 0x0d00dbe8
+
+/*
+ * Available HW breakpoint type encodings.
+ */
+#define HW_BREAKPOINT_READ 0x0dab0005 /* trigger on memory read */
+#define HW_BREAKPOINT_WRITE 0x0dab0006 /* trigger on memory write */
+#define HW_BREAKPOINT_RW 0x0dab0007 /* ... on read or write */
+
+
+struct arch_hw_breakpoint {
+ /*
+ * High bits are aligned address, low 3 bits are flags.
+ */
+ unsigned long dabr;
+};
+
+#define __ARCH_HW_BREAKPOINT_H
+#include <asm-generic/hw_breakpoint.h>
+
+/* HW breakpoint accessor routines */
+static inline const void *hw_breakpoint_get_kaddress(struct hw_breakpoint *bp)
+{
+ return (const void *) (bp->info.dabr &~ 7UL);
+}
+
+static inline const void __user *hw_breakpoint_get_uaddress(
+ struct hw_breakpoint *bp)
+{
+ return (const void __user *) (bp->info.dabr &~ 7UL);
+}
+
+static inline unsigned hw_breakpoint_get_len(struct hw_breakpoint *bp)
+{
+ return HW_BREAKPOINT_LEN_8;
+}
+
+static inline unsigned hw_breakpoint_get_type(struct hw_breakpoint *bp)
+{
+ return (bp->info.dabr & 7UL) | 0x0dab0000;
+}
+
+#endif /* _ASM_POWERPC_HW_BREAKPOINT_H */
Index: b/arch/powerpc/kernel/Makefile
===================================================================
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -75,6 +75,8 @@ obj-$(CONFIG_KEXEC) += machine_kexec.o
obj-$(CONFIG_AUDIT) += audit.o
obj64-$(CONFIG_AUDIT) += compat_audit.o

+obj64-y += hw_breakpoint.o
+
ifneq ($(CONFIG_PPC_INDIRECT_IO),y)
obj-y += iomap.o
endif
Index: b/arch/powerpc/kernel/hw_breakpoint.c
===================================================================
--- /dev/null
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -0,0 +1,348 @@
+/*
+ * HW_breakpoint: a unified kernel/user-space hardware breakpoint facility,
+ * using the CPU's debug registers.
+ */
+
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/kdebug.h>
+#include <linux/kernel.h>
+#include <linux/kprobes.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/smp.h>
+
+#include <asm/hw_breakpoint.h>
+
+#define HB_NUM 1
+
+/* Per-thread HW breakpoint and debug register info */
+struct thread_hw_breakpoint {
+
+ struct list_head node; /* Entry in thread list */
+ struct list_head thread_bps; /* Thread's breakpoints */
+ struct hw_breakpoint *bps[HB_NUM]; /* Highest-priority bps */
+ int num_installed; /* Number of installed bps */
+ unsigned gennum; /* update-generation number */
+
+ struct hw_breakpoint ptrace_bp;
+
+ unsigned long dabr; /* Value switched in */
+};
+
+/* Kernel-space breakpoint data */
+struct kernel_bp_data {
+ unsigned gennum; /* Generation number */
+ struct hw_breakpoint *bps[HB_NUM]; /* Loaded breakpoint */
+ int num_kbps;
+};
+
+/* Per-CPU debug register info */
+struct cpu_hw_breakpoint {
+ struct kernel_bp_data *cur_kbpdata; /* Current kbpdata[] entry */
+ struct task_struct *bp_task; /* The thread whose bps
+ are currently loaded in the debug registers */
+};
+
+static DEFINE_PER_CPU(struct cpu_hw_breakpoint, cpu_info);
+
+/* Global info */
+static struct kernel_bp_data kbpdata[2]; /* Old and new settings */
+static int cur_kbpindex; /* Alternates 0, 1, ... */
+static struct kernel_bp_data *cur_kbpdata = &kbpdata[0];
+ /* Always equal to &kbpdata[cur_kbpindex] */
+
+static u8 tprio[HB_NUM]; /* Thread bp max priorities */
+static LIST_HEAD(kernel_bps); /* Kernel breakpoint list */
+static LIST_HEAD(thread_list); /* thread_hw_breakpoint list */
+static DEFINE_MUTEX(hw_breakpoint_mutex); /* Protects everything */
+
+/* Arch-specific hook routines */
+
+
+/*
+ * Install the kernel breakpoints in their debug registers.
+ */
+static void arch_install_chbi(struct cpu_hw_breakpoint *chbi)
+{
+ /*
+ * Don't allow debug exceptions while we update the DABR.
+ */
+ set_dabr(0);
+
+ chbi->cur_kbpdata = rcu_dereference(cur_kbpdata);
+
+ if (chbi->cur_kbpdata->num_kbps)
+ set_dabr(chbi->cur_kbpdata->bps[0]->info.dabr);
+}
+
+/*
+ * Update an out-of-date thread hw_breakpoint info structure.
+ */
+static void arch_update_thbi(struct thread_hw_breakpoint *thbi,
+ struct kernel_bp_data *thr_kbpdata)
+{
+}
+
+/*
+ * Install the thread breakpoints in their debug registers.
+ */
+static void arch_install_thbi(struct thread_hw_breakpoint *thbi)
+{
+ if (thbi->dabr)
+ set_dabr(thbi->dabr);
+}
+
+/*
+ * Install the debug register values for just the kernel, no thread.
+ */
+static void arch_install_none(struct cpu_hw_breakpoint *chbi)
+{
+}
+
+/*
+ * Create a new kbpdata entry.
+ */
+static void arch_new_kbpdata(struct kernel_bp_data *new_kbpdata)
+{
+}
+
+/*
+ * Store a thread breakpoint array entry's address
+ */
+static void arch_store_thread_bp_array(struct thread_hw_breakpoint *thbi,
+ struct hw_breakpoint *bp, int i)
+{
+ thbi->dabr = bp->info.dabr;
+}
+
+#define TASK_SIZE_OF(tsk) \
+ (test_tsk_thread_flag(tsk, TIF_32BIT) \
+ ? TASK_SIZE_USER32 : TASK_SIZE_USER64)
+
+/*
+ * Check for virtual address in user space.
+ */
+static int arch_check_va_in_userspace(unsigned long va, struct task_struct *tsk)
+{
+ return va < TASK_SIZE_OF(tsk);
+}
+
+/*
+ * Check for virtual address in kernel space.
+ */
+static int arch_check_va_in_kernelspace(unsigned long va)
+{
+ return va >= TASK_SIZE_USER64;
+}
+
+/*
+ * Store a breakpoint's encoded address, length, and type.
+ */
+static void arch_store_info(struct hw_breakpoint *bp,
+ unsigned long address, unsigned len, unsigned type)
+{
+ BUG_ON(address & 7UL);
+ BUG_ON(!(type & DABR_TRANSLATION));
+ bp->info.dabr = address | (type & 7UL);
+}
+
+
+/*
+ * Register a new user breakpoint structure.
+ */
+static void arch_register_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+}
+
+/*
+ * Unregister a user breakpoint structure.
+ */
+static void arch_unregister_user_hw_breakpoint(struct hw_breakpoint *bp,
+ struct thread_hw_breakpoint *thbi)
+{
+}
+
+/*
+ * Register a kernel breakpoint structure.
+ */
+static void arch_register_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+}
+
+/*
+ * Unregister a kernel breakpoint structure.
+ */
+static void arch_unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
+{
+}
+
+
+/* End of arch-specific hook routines */
+
+
+/*
+ * Ptrace support: breakpoint trigger routine.
+ */
+
+static struct thread_hw_breakpoint *alloc_thread_hw_breakpoint(
+ struct task_struct *tsk);
+static int __register_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp,
+ unsigned long address,
+ unsigned len, unsigned type);
+static void __unregister_user_hw_breakpoint(struct task_struct *tsk,
+ struct hw_breakpoint *bp);
+
+/*
+ * This is a placeholder that never gets called.
+ */
+static void ptrace_triggered(struct hw_breakpoint *bp, struct pt_regs *regs)
+{
+ BUG();
+}
+
+unsigned long thread_get_dabr(struct task_struct *tsk)
+{
+ if (tsk->thread.hw_breakpoint_info)
+ return tsk->thread.hw_breakpoint_info->dabr;
+ return 0;
+}
+
+int thread_set_dabr(struct task_struct *tsk, unsigned long val)
+{
+ unsigned long addr = val &~ 7UL;
+ unsigned int type = 0x0dab0000 | (val & 7UL);
+
+ struct thread_hw_breakpoint *thbi;
+ int rc = -EIO;
+
+ /* We have to hold this lock the entire time, to prevent thbi
+ * from being deallocated out from under us.
+ */
+ mutex_lock(&hw_breakpoint_mutex);
+
+ if (!tsk->thread.hw_breakpoint_info && val == 0)
+ rc = 0; /* Minor optimization */
+ else if ((thbi = alloc_thread_hw_breakpoint(tsk)) == NULL)
+ rc = -ENOMEM;
+ else {
+ struct hw_breakpoint *bp = &thbi->ptrace_bp;
+
+ /*
+ * If the breakpoint is registered then unregister it,
+ * change it, and re-register it. Revert to the original
+ * address if an error occurs.
+ */
+ if (bp->status) {
+ unsigned long old_dabr = bp->info.dabr;
+
+ __unregister_user_hw_breakpoint(tsk, bp);
+ if (val != 0) {
+ rc = __register_user_hw_breakpoint(
+ tsk, bp, addr,
+ HW_BREAKPOINT_LEN_8, type);
+ if (rc < 0)
+ __register_user_hw_breakpoint(
+ tsk, bp,
+ old_dabr &~ 7UL,
+ HW_BREAKPOINT_LEN_8,
+ 0x0dab0000 | (old_dabr & 7UL));
+ }
+ } else if (val != 0) {
+ bp->triggered = ptrace_triggered;
+ bp->priority = HW_BREAKPOINT_PRIO_PTRACE;
+ rc = __register_user_hw_breakpoint(
+ tsk, bp, addr,
+ HW_BREAKPOINT_LEN_8, type);
+ }
+ }
+
+ mutex_unlock(&hw_breakpoint_mutex);
+ return rc;
+}
+
+
+/*
+ * Handle debug exception notifications.
+ */
+
+static void switch_to_none_hw_breakpoint(void);
+
+static int hw_breakpoint_handler(struct die_args *args)
+{
+ struct cpu_hw_breakpoint *chbi;
+ struct hw_breakpoint *bp;
+ struct thread_hw_breakpoint *thbi = NULL;
+ int ret;
+
+ /* Assert that local interrupts are disabled */
+
+ /* Are we a victim of lazy debug-register switching? */
+ chbi = &per_cpu(cpu_info, get_cpu());
+ if (!chbi->bp_task)
+ ;
+ else if (chbi->bp_task != current) {
+
+ /* No user breakpoints are valid. Perform the belated
+ * debug-register switch.
+ */
+ switch_to_none_hw_breakpoint();
+ } else {
+ thbi = chbi->bp_task->thread.hw_breakpoint_info;
+ }
+
+ /*
+ * Disable all breakpoints so that the callbacks can run without
+ * triggering recursive debug exceptions.
+ */
+ set_dabr(0);
+
+ bp = chbi->cur_kbpdata->bps[0] ?: thbi->bps[0];
+ ret = NOTIFY_STOP;
+ if (bp == &thbi->ptrace_bp)
+ ret = NOTIFY_DONE;
+ else
+ (*bp->triggered)(bp, args->regs);
+
+ /* Re-enable the breakpoints */
+ set_dabr(thbi ? thbi->dabr : chbi->cur_kbpdata->bps[0]->info.dabr);
+ put_cpu_no_resched();
+
+ return NOTIFY_STOP;
+}
+
+/*
+ * Handle debug exception notifications.
+ */
+static int hw_breakpoint_exceptions_notify(
+ struct notifier_block *unused, unsigned long val, void *data)
+{
+ if (val != DIE_DABR_MATCH)
+ return NOTIFY_DONE;
+ return hw_breakpoint_handler(data);
+}
+
+static struct notifier_block hw_breakpoint_exceptions_nb = {
+ .notifier_call = hw_breakpoint_exceptions_notify
+};
+
+void load_debug_registers(void);
+
+static int __init init_hw_breakpoint(void)
+{
+ printk(KERN_EMERG "hw_breakpoint initializing\n");
+ load_debug_registers();
+ return register_die_notifier(&hw_breakpoint_exceptions_nb);
+}
+
+core_initcall(init_hw_breakpoint);
+
+
+/* Grab the arch-independent code */
+
+#include "../../../kernel/hw_breakpoint.c"
Index: b/arch/powerpc/kernel/ptrace-common.h
===================================================================
--- a/arch/powerpc/kernel/ptrace-common.h
+++ b/arch/powerpc/kernel/ptrace-common.h
@@ -139,6 +139,10 @@ static inline int set_vrregs(struct task
}
#endif

+#ifdef CONFIG_PPC64
+unsigned long thread_get_dabr(struct task_struct *tsk);
+int thread_set_dabr(struct task_struct *tsk, unsigned long val);
+
static inline int ptrace_set_debugreg(struct task_struct *task,
unsigned long addr, unsigned long data)
{
@@ -146,16 +150,8 @@ static inline int ptrace_set_debugreg(st
if (addr > 0)
return -EINVAL;

- /* The bottom 3 bits are flags */
- if ((data & ~0x7UL) >= TASK_SIZE)
- return -EIO;
-
- /* Ensure translation is on */
- if (data && !(data & DABR_TRANSLATION))
- return -EIO;
-
- task->thread.dabr = data;
- return 0;
+ return thread_set_dabr(task, data);
}
+#endif

#endif /* _PPC64_PTRACE_COMMON_H */
Index: b/arch/powerpc/kernel/ptrace.c
===================================================================
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -390,7 +390,7 @@ long arch_ptrace(struct task_struct *chi
/* We only support one DABR and no IABRS at the moment */
if (addr > 0)
break;
- ret = put_user(child->thread.dabr,
+ ret = put_user(thread_get_dabr(child),
(unsigned long __user *)data);
break;
}
Index: b/arch/powerpc/kernel/ptrace32.c
===================================================================
--- a/arch/powerpc/kernel/ptrace32.c
+++ b/arch/powerpc/kernel/ptrace32.c
@@ -330,7 +330,7 @@ long compat_sys_ptrace(int request, int
/* We only support one DABR and no IABRS at the moment */
if (addr > 0)
break;
- ret = put_user(child->thread.dabr, (u32 __user *)data);
+ ret = put_user((u32)thread_get_dabr(child), (u32 __user *)data);
break;
}

Index: b/arch/powerpc/kernel/signal_32.c
===================================================================
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1197,16 +1197,6 @@ no_signal:
newsp = regs->gpr[1];
newsp &= ~0xfUL;

-#ifdef CONFIG_PPC64
- /*
- * Reenable the DABR before delivering the signal to
- * user space. The DABR will have been cleared if it
- * triggered inside the kernel.
- */
- if (current->thread.dabr)
- set_dabr(current->thread.dabr);
-#endif
-
/* Whee! Actually deliver the signal. */
if (ka.sa.sa_flags & SA_SIGINFO)
ret = handle_rt_signal(signr, &ka, &info, oldset, regs, newsp);
Index: b/arch/powerpc/kernel/signal_64.c
===================================================================
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -529,14 +529,6 @@ int do_signal(sigset_t *oldset, struct p
if (TRAP(regs) == 0x0C00)
syscall_restart(regs, &ka);

- /*
- * Reenable the DABR before delivering the signal to
- * user space. The DABR will have been cleared if it
- * triggered inside the kernel.
- */
- if (current->thread.dabr)
- set_dabr(current->thread.dabr);
-
ret = handle_signal(signr, &ka, &info, oldset, regs);

/* If a signal was successfully delivered, the saved sigmask is in
Index: b/arch/powerpc/mm/fault.c
===================================================================
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -113,9 +113,6 @@ static void do_dabr(struct pt_regs *regs
if (debugger_dabr_match(regs))
return;

- /* Clear the DABR */
- set_dabr(0);
-
/* Deliver the signal to userspace */
info.si_signo = SIGTRAP;
info.si_errno = 0;
@@ -164,6 +161,14 @@ int __kprobes do_page_fault(struct pt_re
is_write = error_code & ESR_DST;
#endif /* CONFIG_4xx || CONFIG_BOOKE */

+#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
+ if (error_code & DSISR_DABRMATCH) {
+ /* DABR match */
+ do_dabr(regs, address, error_code);
+ return 0;
+ }
+#endif /* !(CONFIG_4xx || CONFIG_BOOKE)*/
+
if (notify_page_fault(regs))
return 0;

@@ -176,14 +181,6 @@ int __kprobes do_page_fault(struct pt_re
if (!user_mode(regs) && (address >= TASK_SIZE))
return SIGSEGV;

-#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
- if (error_code & DSISR_DABRMATCH) {
- /* DABR match */
- do_dabr(regs, address, error_code);
- return 0;
- }
-#endif /* !(CONFIG_4xx || CONFIG_BOOKE)*/
-
if (in_atomic() || mm == NULL) {
if (!user_mode(regs))
return SIGSEGV;
Index: b/include/asm-powerpc/processor.h
===================================================================
--- a/include/asm-powerpc/processor.h
+++ b/include/asm-powerpc/processor.h
@@ -149,8 +149,8 @@ struct thread_struct {
#ifdef CONFIG_PPC64
unsigned long start_tb; /* Start purr when proc switched in */
unsigned long accum_tb; /* Total accumilated purr for process */
+ struct thread_hw_breakpoint *hw_breakpoint_info;
#endif
- unsigned long dabr; /* Data address breakpoint register */
#ifdef CONFIG_ALTIVEC
/* Complete AltiVec register set */
vector128 vr[32] __attribute((aligned(16)));
Index: b/arch/powerpc/kernel/process.c
===================================================================
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -46,6 +46,7 @@
#include <asm/syscalls.h>
#ifdef CONFIG_PPC64
#include <asm/firmware.h>
+#include <asm/hw_breakpoint.h>
#endif

extern unsigned long _get_SP(void);
@@ -232,7 +233,6 @@ int set_dabr(unsigned long dabr)

#ifdef CONFIG_PPC64
DEFINE_PER_CPU(struct cpu_usage, cpu_usage_array);
-static DEFINE_PER_CPU(unsigned long, current_dabr);
#endif

struct task_struct *__switch_to(struct task_struct *prev,
@@ -300,10 +300,8 @@ struct task_struct *__switch_to(struct t
#endif /* CONFIG_SMP */

#ifdef CONFIG_PPC64 /* for now */
- if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) {
- set_dabr(new->thread.dabr);
- __get_cpu_var(current_dabr) = new->thread.dabr;
- }
+ if (unlikely(new->thread.hw_breakpoint_info != NULL))
+ switch_to_thread_hw_breakpoint(new);
#endif /* CONFIG_PPC64 */

new_thread = &new->thread;
@@ -474,10 +472,8 @@ void flush_thread(void)
discard_lazy_cpu_state();

#ifdef CONFIG_PPC64 /* for now */
- if (current->thread.dabr) {
- current->thread.dabr = 0;
- set_dabr(0);
- }
+ if (unlikely(current->thread.hw_breakpoint_info))
+ flush_thread_hw_breakpoint(current);
#endif
}

Index: b/kernel/hw_breakpoint.c
===================================================================
--- a/kernel/hw_breakpoint.c
+++ b/kernel/hw_breakpoint.c
@@ -25,6 +25,18 @@
* #include'd by the arch-specific implementation.
*/

+#include <asm/thread_info.h>
+
+#ifdef TIF_DEBUG
+#define clear_tsk_debug_flag(tsk) clear_tsk_thread_flag(tsk, TIF_DEBUG)
+#define set_tsk_debug_flag(tsk) set_tsk_thread_flag(tsk, TIF_DEBUG)
+#define test_tsk_debug_flag(tsk) test_tsk_thread_flag(tsk, TIF_DEBUG)
+#else
+#define clear_tsk_debug_flag(tsk) do { } while (0)
+#define set_tsk_debug_flag(tsk) do { } while (0)
+#define test_tsk_debug_flag(tsk) \
+ ((tsk)->thread.hw_breakpoint_info != NULL)
+#endif

/*
* Install the debug register values for a new thread.
@@ -156,7 +168,7 @@ static void update_this_cpu(void *unused

/* Install both the kernel and the user breakpoints */
arch_install_chbi(chbi);
- if (test_tsk_thread_flag(tsk, TIF_DEBUG))
+ if (test_tsk_debug_flag(tsk))
switch_to_thread_hw_breakpoint(tsk);

put_cpu_no_resched();
@@ -369,7 +381,7 @@ void flush_thread_hw_breakpoint(struct t
list_del(&thbi->node);

/* The thread no longer has any breakpoints associated with it */
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ clear_tsk_debug_flag(tsk);
tsk->thread.hw_breakpoint_info = NULL;
kfree(thbi);

@@ -393,7 +405,7 @@ int copy_thread_hw_breakpoint(struct tas
* and the child starts out with no debug registers set.
* But what about CLONE_PTRACE?
*/
- clear_tsk_thread_flag(child, TIF_DEBUG);
+ clear_tsk_debug_flag(child);
return 0;
}

@@ -457,7 +469,7 @@ static int insert_bp_in_list(struct hw_b

/* Is this the thread's first registered breakpoint? */
if (list_empty(&thbi->node)) {
- set_tsk_thread_flag(tsk, TIF_DEBUG);
+ set_tsk_debug_flag(tsk);
list_add(&thbi->node, &thread_list);
}
}
@@ -483,7 +495,7 @@ static void remove_bp_from_list(struct h

if (list_empty(&thbi->thread_bps)) {
list_del_init(&thbi->node);
- clear_tsk_thread_flag(tsk, TIF_DEBUG);
+ clear_tsk_debug_flag(tsk);
}
}

2007-06-29 03:00:32

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC] hwbkpt: Hardware breakpoints (was Kwatch)

On Wed, 27 Jun 2007, Roland McGrath wrote:

> > In theory we should get an exception with both DR_STEP and DR_TRAPn
> > set, meaning that neither notifier will return NOTIFY_STOP. But if the
> > kprobes handler clears DR_STEP in the DR6 image passed to the
> > hw_breakpoint handler, it should work out better.
>
> It's since occurred to me that kprobes can and should do:
>
> args->err &= ~(unsigned long) DR_STEP;
> if (args->err == 0)
> return NOTIFY_STOP;
>
> This doesn't affect do_debug directly, but it will change the value seen by
> the next notifier. So if hw_breakpoint_handler is responsible for setting
> vdr6 based on its args->err value, we should win.

Exactly what I had in mind.

> > > vdr6 belongs wholly to hw_breakpoint, no other code refers to it
> > > directly. hw_breakpoint's notifier sets vdr6 with non-DR_TRAPn bits,
> > > if it's a user-mode exception. If it's a ptrace exception it also
> > > sets the mapped DR_TRAPn bits. If it's not a ptrace exception and
> > > only DR_TRAPn bits were newly set, then it returns NOTIFY_STOP. If
> > > it's a spurious exception from lazy db7 setting, hw_breakpoint just
> > > returns NOTIFY_STOP early.
> >
> > That sounds not quite right. To a user-space debugger, a system call
> > should appear as an atomic operation. If multiple ptrace exceptions
> > occur during a system call, all the relevant DR_TRAPn bits should be
> > set in vdr6 together and all the other ones reset. How can we arrange
> > that?
>
> That would be nice. But it's more than the old code did. I don't feel any
> strong need to improve the situation when using ptrace. The old code
> disabled breakpoints after the first hit, so userland would only see the
> first DR_TRAPn bit. (Even if it didn't, with the blind copying of the
> hardware %db6 value, we now know it would only see one DR_TRAPn bit still
> set after a second exception.) With my suggestion above, userland would
> only see the last DR_TRAPn bit. So it's not worse.
>
> In the ptrace case, we know it's always going to wind up with a signal
> before it finishes and returns to user mode. So one approach would be in
> e.g. do_notify_resume, do:
>
> if (thread_info_flags & _TIF_DEBUG)
> current->thread.hw_breakpoint_info->someflag = 0;
>
> Then ptrace_triggered could set someflag, and know from it still being set
> on entry that it's a second trigger without getting back to user mode yet
> (and so accumulate bits instead reset old ones).
>
> But I just would not bother improving ptrace beyond the status quo for a
> corner case noone has cared about in practice so far. In sensible
> mechanisms of the future, nothing will examine db6 values directly.

Come to think of it, I believe that gdb doesn't check beyond the first
DR_TRAPn bit it finds set. I can live with reporting only the last
hit.

> > There's also the question of whether to send the SIGTRAP. If
> > extraneous bits are set in DR6 (e.g., because the CPU always sets some
> > extra bits) then we will never get NOTIFY_STOP. Nevertheless, the
> > signal should not always be sent.
>
> Yeah. The current Intel manual describes all the unspecified DR6 bits as
> explicitly reserved and set to 1 (except 0x1000 reserved and 0). If new
> meanings are assigned in future chips, presumably those will only be
> enabled by some new explicit cr/msr setting. Those might be enabled by
> some extra module or something, but there is only so much we can do to
> accomodate. I think the best plan is that notifiers should do:
>
> args->err &= ~bits_i_recognize_as_mine;
> if (!(args->err & known_bits))
> return NOTIFY_STOP;
>
> known_bits are the ones we use, plus 0x8000 (DR_SWITCH/BS) and 0x2000 (BD).
> (Those two should be impossible without some strange new kernel bug.)
> Probably should write it as ~DR_STATUS_RESERVED, to parallel existing macros.
>
> Then we only possibly interfere with a newfangled debug exception flavor
> that occurs in the same one debug exception for an instruction also
> triggering for hw_breakpoint or step. In the more likely cases of a new
> flavor of exception happening by itself, or the aforementioned strange new
> kernel bugs, we will get to the bottom of do_debug and do the SIGTRAP.
>
> For this plan, hw_breakpoint_handler also needs not to return NOTIFY_STOP
> as a special case for a ptrace trigger.

That should work well. But how does the handler know whether a ptrace
trigger occurred? I can think of several possible ways, none of them
very attractive. Simply checking the vdr6 value might not work. The
simplest approach would be to see if the trigger callback address is
equal to ptrace_triggered -- it's a hack but it is reliable.

For that matter, knowing when to set vdr6 is a little tricky. I guess
it should be set whenever a debug exception occurs in user mode (which
includes both breakpoints and single-step events). But what about
ptrace triggers while the CPU is in kernel mode? Should they set the
four DR_TRAPn bits in vdr6 and leave the rest alone?

> > I disagree. kfree() is documented to return harmlessly when passed a
> > NULL pointer, and lots of places in the kernel have been changed to
> > remove useless tests for NULL before calls to kfree(). This is just
> > another example.
>
> Ok. I have no special opinions about that. I just tend to avoid folding
> miscellaneous changes into a patch adding new code. It would be better
> form to send first the trivial cleanup patch removing that second condition.

Sounds reasonable. I'll split it out.

Alan Stern