Hi,
First of all, many thanks to the developers of Kprobes! I use both
Kprobes and parts of their code a lot in my projects these days.
As far as I can see, the pre-handlers of Kprobes run with interrupts and
preemption disabled on the given CPU, at least on x86 without Kprobe
optimization.
Is it possible, however, to use Kprobes to somehow execute my code
before a given instruction but with the same restrictions as the
original instruction, at least, w.r.t. the interrupts?
I mean, if the instruction is executed with interrupts enabled, my code
would also execute with interrupts enabled, etc.
If it is possible, how would you recommend to do that? Without patching
the implementation of Kprobes, I mean.
Same for preemption, but, it seems, Kprobes really need it disabled, at
least to be able to use kprobe_running() and other per-cpu data.
In RaceHound project I am now working on
(https://github.com/winnukem/racehound/tree/rh_rework), the breakpoints
are used to detect data races in the kernel code in runtime. Software
breakpoints for the code, hardware breakpoints for the data that is
about to be accessed.
However, to make it all work, the detector introduces delays before the
instructions of interest. I could do this in Kprobes' pre-handlers but
the interrupts would always be disabled on the current CPU during the
delays, which is no good.
So far, I implemented it using software breakpoints directly, without
Kprobes. The pre-handlers are executed then in the same context as the
original instructions.
Still the implementation becomes more and more like Kprobes in some
places over time. If there is a way to avoid reinventing the wheel and
just use Kprobes, I would do that.
So, any ideas?
Regards,
Eugene
--
Eugene Shatokhin, ROSA
http://www.rosalab.com
Hello,
(2015/02/24 0:04), Eugene Shatokhin wrote:
> Hi,
>
>
> First of all, many thanks to the developers of Kprobes! I use both
> Kprobes and parts of their code a lot in my projects these days.
>
> As far as I can see, the pre-handlers of Kprobes run with interrupts and
> preemption disabled on the given CPU, at least on x86 without Kprobe
> optimization.
Even with kprobe optimization, I also disabled both since it must be
transparently optimized (this means both optimized/non-optiomized kprobes
have to have same behavior).
Note that x86 int3 trap handler automatically disables local interrupts.
> Is it possible, however, to use Kprobes to somehow execute my code
> before a given instruction but with the same restrictions as the
> original instruction, at least, w.r.t. the interrupts?
No, that is not allowed. I mean, you can do anything you want to do
on your handler (enabling preemption/irq etc.) but the result may be
not safe (it can crash your kernel, but it's not a kprobes' bug).
Actually, enable interrupts on kprobe handlers can cause reentering
kprobes (by kprobes on interrupt handlers), and currently kprobe skips
all those reentered kprobes.
Is it acceptable that some of your kprobe handlers are not fired when
hitting?
> I mean, if the instruction is executed with interrupts enabled, my code
> would also execute with interrupts enabled, etc.
>
> If it is possible, how would you recommend to do that? Without patching
> the implementation of Kprobes, I mean.
>
> Same for preemption, but, it seems, Kprobes really need it disabled, at
> least to be able to use kprobe_running() and other per-cpu data.
>
> In RaceHound project I am now working on
> (https://github.com/winnukem/racehound/tree/rh_rework), the breakpoints
> are used to detect data races in the kernel code in runtime. Software
> breakpoints for the code, hardware breakpoints for the data that is
> about to be accessed.
>
> However, to make it all work, the detector introduces delays before the
> instructions of interest. I could do this in Kprobes' pre-handlers but
> the interrupts would always be disabled on the current CPU during the
> delays, which is no good.
Would you mean sleep on your handler?? No, that is NOT possible. We are
in an exception context, that must not be preempted nor sleep.
How long you need to add delay? Can you use cpu_relax busy loops on it?
> So far, I implemented it using software breakpoints directly, without
> Kprobes. The pre-handlers are executed then in the same context as the
> original instructions.
>
> Still the implementation becomes more and more like Kprobes in some
> places over time. If there is a way to avoid reinventing the wheel and
> just use Kprobes, I would do that.
>
> So, any ideas?
As I said, I recommend you to use some kind of busy-loop wait for making
delays on it. Please don't try to enable irq.
Thank you,
>
> Regards,
> Eugene
>
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
24.02.2015 06:47, Masami Hiramatsu пишет:
> No, that is not allowed. I mean, you can do anything you want to do
> on your handler (enabling preemption/irq etc.) but the result may be
> not safe (it can crash your kernel, but it's not a kprobes' bug).
Yes, that is why I am asking.
> Actually, enable interrupts on kprobe handlers can cause reentering
> kprobes (by kprobes on interrupt handlers), and currently kprobe skips
> all those reentered kprobes.
> Is it acceptable that some of your kprobe handlers are not fired when
> hitting?
I think, yes. When a software breakpoint hits, my system decodes the
instruction, finds the address that is about to be accessed and tries to
place a hardware breakpoint on that memory area.
There are only 4 hardware breakpoints a CPU can use on x86, so if the
software breakpoint hits too often, the system will not be able to
process all hits anyway because all HW breakpoints may be already in use.
> Would you mean sleep on your handler??
No, I use mdelay(). It is, in essence, a busy-wait loop as far as I
know. The delay intervals may vary, the default is 5 jiffies.
Regards,
Eugene
--
Eugene Shatokhin, ROSA
http://www.rosalab.com
(2015/02/24 15:04), Eugene Shatokhin wrote:
> 24.02.2015 06:47, Masami Hiramatsu пишет:
>> No, that is not allowed. I mean, you can do anything you want to do
>> on your handler (enabling preemption/irq etc.) but the result may be
>> not safe (it can crash your kernel, but it's not a kprobes' bug).
>
> Yes, that is why I am asking.
>
>> Actually, enable interrupts on kprobe handlers can cause reentering
>> kprobes (by kprobes on interrupt handlers), and currently kprobe skips
>> all those reentered kprobes.
>> Is it acceptable that some of your kprobe handlers are not fired when
>> hitting?
>
> I think, yes. When a software breakpoint hits, my system decodes the
> instruction, finds the address that is about to be accessed and tries to
> place a hardware breakpoint on that memory area.
>
> There are only 4 hardware breakpoints a CPU can use on x86, so if the
> software breakpoint hits too often, the system will not be able to
> process all hits anyway because all HW breakpoints may be already in use.
>
>> Would you mean sleep on your handler??
>
> No, I use mdelay(). It is, in essence, a busy-wait loop as far as I
> know. The delay intervals may vary, the default is 5 jiffies.
Hmm, here I couldn't understand. If mdelay() does busy-wait loop, why
would you like to enable irq??
Other code doesn't work on the core while waiting.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
> (2015/02/24 15:04), Eugene Shatokhin wrote:
>> 24.02.2015 06:47, Masami Hiramatsu пишет:
>>> No, that is not allowed. I mean, you can do anything you want to do
>>> on your handler (enabling preemption/irq etc.) but the result may be
>>> not safe (it can crash your kernel, but it's not a kprobes' bug).
>>
>> Yes, that is why I am asking.
>>
>>> Actually, enable interrupts on kprobe handlers can cause reentering
>>> kprobes (by kprobes on interrupt handlers), and currently kprobe skips
>>> all those reentered kprobes.
>>> Is it acceptable that some of your kprobe handlers are not fired when
>>> hitting?
>>
>> I think, yes. When a software breakpoint hits, my system decodes the
>> instruction, finds the address that is about to be accessed and tries to
>> place a hardware breakpoint on that memory area.
>>
>> There are only 4 hardware breakpoints a CPU can use on x86, so if the
>> software breakpoint hits too often, the system will not be able to
>> process all hits anyway because all HW breakpoints may be already in use.
>>
>>> Would you mean sleep on your handler??
>>
>> No, I use mdelay(). It is, in essence, a busy-wait loop as far as I
>> know. The delay intervals may vary, the default is 5 jiffies.
>
> Hmm, here I couldn't understand. If mdelay() does busy-wait loop, why
> would you like to enable irq??
> Other code doesn't work on the core while waiting.
I'd like not to enable IRQ but rather to execute my handler with the
same (or similar) restrictions as the original instruction would. If the
insn executed with IRQ enabled, so would the handler, etc. So I am
looking for a way to avoid *additionally* disabling IRQ (and, perhaps,
preemption, although this might be harder).
The breakpoints and delays already incur a penalty on the system's
responsiveness.
However, if, say, I probe an insn executing in a process context with
IRQs enabled, the interrupts may be served on this CPU during the delay.
If, additionally, preemption is not disabled and the kernel is built
with CONFIG_PREEMPT=y then, I guess, mdelay() can be preempted allowing
some other task to run, which is good for overall responsiveness.
Usually, the longer delays I make, the more likely the races are
detected but the performance overhead increases too. I do not have the
exact numbers yet, but still.
So, while 5-10 jiffies are often enough, sometimes it could be
beneficial to wait longer. For example, when I used the system to
confirm a race between .probe() and .ndo_open() callbacks in e1000
driver a year ago, I used the delay of about one second or more (for
NetworkManager to start working with the device), which is too much if
the IRQs were disabled, I think. Both .probe() and .ndo_open() executed
in process context, by the way.
Well, I was actually thinking about something like the following (for
x86, at least).
If a Kprobe's pre_handler returns non-zero, single-step will not be
performed, right? As far as I can see in the code, Jprobes rely on that.
Preemption will still be disabled and Jprobe's handler enables it when
ready.
What if I place a Kprobe on an insn of interest and the pre_handler
changes regs->ip to the address of my function, say, "my_thunk_pre" (see
below) then returns non-zero. Handling of int3 then completes, the
context is restored, the interrupts are re-enabled (if they were enabled
before int3). Preemption remains off because the Kprobe's implementation
disabled it. Execution resumes in "my_thunk_pre" that is written in
assembly and may look like this on x86_64 (x86_32 is similar):
----------------------
my_thunk_pre:
push %rax
<push scratch registers except rax on stack>
call my_handler
// my_handler() is a C function, with the default
// calling convention/linkage.
// Returns the address of the copied insn in the
// Kprobe's insn slot in %rax.
<pop scratch registers except rax from stack>
// restore the orig value of %rax and push the address
// to jump to on the stack
xchg %rax, (%rsp)
// Jump to the copied insn (and fix %rsp at the same time):
ret
----------------------
In this case, my_handler() seems to execute in the same context as the
original insn, except for disabled preemption.
It may use kprobe_running() to get the Kprobe, and, perhaps, some my
structure that contains that Kprobe. Then, I guess, it might call
preempt_enable_no_resched() like Jprobe's handler does (may be some
other actions are needed?). After that, my_handler can do the rest of
its job: arm the HW breakpoints, call mdelay(), etc.
my_handler will return the address of the copied insn in the Kprobe's
insn slot. The control will be passed there by my_thunk_pre().
For this to work, it is needed that the copied insn stored in the
Kprobe's insn slot was followed by a jump back to the original code, to
the next insn, I mean. Of course, this is not necessary for some
control-transfer insns. But my system mostly works with the insns that
access data rather than with these.
Looks like Kprobes already do something similar and place such jumps in
the insn slots (Kprobes with ainsn.boostable == 1) if there is enough
space there. That is, if the size of the copied insn + 5 (size of jmp
near relative) < 16 (MAX_INSN_SIZE). However, this seems to be done
after single-step, which will not happen in my case.
Still, I could place the jumps after the insns in the slots earlier,
e.g., before I arm the Kprobes. Perhaps, it will not interfere with
other functions of Kprobes.
So, if all this worked, I suppose, my system would get everything it
needs: my_handler() will do the delays in the same context and with the
same restrictions as the original insn executes.
Or perhaps, I am missing something critical here? Could this scheme
break Kprobes somehow, what do you think?
If there are no visible culprits, I think, I will give it a try.
So, what is your opinion?
By the way, thanks for you time, this my letter became unusually long.
Regards,
Eugene
--
Eugene Shatokhin, ROSA
http://www.rosalab.com