Hi Masami,
While testing some changes in -rt against kprobes, I hit a crash that
exists in mainline. If we stick a probe in a location that reads
preempt_count, we corrupt the kernel itself.
Reason is that the kprobe single step handler disables preemption, sets
up the single step, returns to the code to execute that single step,
takes the trap, enables preemption, and continues.
The issue, is because we disabled preemption in the trap, returned, then
enabled it again in another trap, we just changed what the code sees
that does that single step.
If we add a kprobe on a inc_preempt_count() call:
[ preempt_count = 0 ]
ld preempt_count, %eax <<--- trap
<trap>
preempt_disable();
[ preempt_count = 1]
setup_singlestep();
<trap return>
[ preempt_count = 1 ]
ld preempt_count, %eax
[ %eax = 1 ]
<trap>
post_kprobe_handler()
preempt_enable_no_resched();
[ preempt_count = 0 ]
<trap return>
[ %eax = 1 ]
add %eax,1
[ %eax = 2 ]
st %eax, preempt_count
[ preempt_count = 2 ]
We just caused preempt count to increment twice when it should have only
incremented once, and this screws everything else up.
Do we really need to have preemption disabled throughout this? Is it
because we don't want to migrate or call schedule? Not sure what the
best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
is checked as well?
-- Steve
Kprobes requires preemption to be disabled as it single steps the code
it replaced with a breakpoint. But because the code that is single
stepped could be reading the preempt count, the kprobe disabling of the
preempt count can cause the wrong value to end up as a result. Here's an
example:
If we add a kprobe on a inc_preempt_count() call:
[ preempt_count = 0 ]
ld preempt_count, %eax <<--- trap
<trap>
preempt_disable();
[ preempt_count = 1]
setup_singlestep();
<trap return>
[ preempt_count = 1 ]
ld preempt_count, %eax
[ %eax = 1 ]
<trap>
post_kprobe_handler()
preempt_enable_no_resched();
[ preempt_count = 0 ]
<trap return>
[ %eax = 1 ]
add %eax,1
[ %eax = 2 ]
st %eax, preempt_count
[ preempt_count = 2 ]
We just caused preempt count to increment twice when it should have only
incremented once, and this screws everything else up.
To solve this, I've added a per_cpu variable called
kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
the preempt_schedule() will not preempt the code.
I did not update task_struct for a separate variable for kprobes to test
because it would just bloat that structure with one more test. Adding a
flag per cpu is much better than adding one per task. The kprobes are
slow anyway, so causing it to do a little bit more work is not a
problem.
I tested this code against the probes that caused my previous crashes,
and it works fine.
Signed-off-by: Steven Rostedt <[email protected]>
diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index f1a6244..2ed67e6 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
* stepping.
*/
regs->ip = (unsigned long)p->ainsn.insn;
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
return;
}
#endif
@@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
* re-enable preemption at the end of this function,
* and also in reenter_kprobe() and setup_singlestep().
*/
- preempt_disable();
+ kprobe_preempt_disable();
kcb = get_kprobe_ctlblk();
p = get_kprobe(addr);
@@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
* the original instruction.
*/
regs->ip = (unsigned long)addr;
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
return 1;
} else if (kprobe_running()) {
p = __this_cpu_read(current_kprobe);
@@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
}
} /* else: not a kprobe fault; let the kernel handle it */
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
return 0;
}
@@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
}
reset_current_kprobe();
out:
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
/*
* if somebody else is singlestepping across a probe point, flags
@@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
restore_previous_kprobe(kcb);
else
reset_current_kprobe();
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
break;
case KPROBE_HIT_ACTIVE:
case KPROBE_HIT_SSDONE:
@@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
kcb->jprobes_stack,
MIN_STACK_SIZE(kcb->jprobe_saved_sp));
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
return 1;
}
return 0;
@@ -1530,7 +1530,7 @@ static int __kprobes setup_detour_execution(struct kprobe *p,
regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
if (!reenter)
reset_current_kprobe();
- preempt_enable_no_resched();
+ kprobe_preempt_enable();
return 1;
}
return 0;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a837b20..c7c423e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
struct task_struct;
+#ifdef CONFIG_KPROBES
+DECLARE_PER_CPU(int, kprobe_preempt_disabled);
+
+static inline void kprobe_preempt_disable(void)
+{
+ preempt_disable();
+ __get_cpu_var(kprobe_preempt_disabled) = 1;
+ preempt_enable_no_resched();
+}
+
+static inline void kprobe_preempt_enable(void)
+{
+ preempt_disable();
+ __get_cpu_var(kprobe_preempt_disabled) = 0;
+ preempt_enable_no_resched();
+}
+#endif
+
#ifdef CONFIG_PROVE_RCU
extern int lockdep_tasklist_lock_is_held(void);
#endif /* #ifdef CONFIG_PROVE_RCU */
diff --git a/kernel/sched.c b/kernel/sched.c
index 3f2e502..990d53d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
#endif /* CONFIG_SMP */
+#ifdef CONFIG_KPROBES
+/*
+ * Kprobes can not disable preempting with the preempt_count.
+ * If it does, and it single steps kernel code that modifies
+ * the preempt_count, it will corrupt the result.
+ *
+ * Since kprobes is slow anyways, we do not need to bloat the
+ * task_struct or thread_struct with another variable to test.
+ * Simply use a per_cpu variable and require the kprobe code
+ * to disable preemption normally to set it.
+ */
+DEFINE_PER_CPU(int, kprobe_preempt_disabled);
+
+static inline int preempt_disabled_kprobe(void)
+{
+ return __get_cpu_var(kprobe_preempt_disabled);
+}
+#else
+static inline int preempt_disabled_kprobe(void)
+{
+ return 0;
+}
+#endif
+
/*
* This is the main, per-CPU runqueue data structure.
*
@@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
PREEMPT_MASK - 10);
#endif
- if (preempt_count() == val)
+ if (preempt_count() == val && !preempt_disabled_kprobe())
trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
}
EXPORT_SYMBOL(add_preempt_count);
@@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
return;
#endif
- if (preempt_count() == val)
+ if (preempt_count() == val && !preempt_disabled_kprobe())
trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
preempt_count() -= val;
}
@@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
do {
add_preempt_count_notrace(PREEMPT_ACTIVE);
+ if (unlikely(preempt_disabled_kprobe())) {
+ sub_preempt_count_notrace(PREEMPT_ACTIVE);
+ break;
+ }
schedule();
sub_preempt_count_notrace(PREEMPT_ACTIVE);
On Thu, Jun 30, 2011 at 11:51:57AM -0400, Steven Rostedt wrote:
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
>
> If we add a kprobe on a inc_preempt_count() call:
>
> [ preempt_count = 0 ]
>
> ld preempt_count, %eax <<--- trap
>
> <trap>
> preempt_disable();
> [ preempt_count = 1]
> setup_singlestep();
> <trap return>
>
> [ preempt_count = 1 ]
>
> ld preempt_count, %eax
>
> [ %eax = 1 ]
>
> <trap>
> post_kprobe_handler()
> preempt_enable_no_resched();
> [ preempt_count = 0 ]
> <trap return>
>
> [ %eax = 1 ]
>
> add %eax,1
>
> [ %eax = 2 ]
>
> st %eax, preempt_count
>
> [ preempt_count = 2 ]
>
>
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
>
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
>
> I did not update task_struct for a separate variable for kprobes to test
> because it would just bloat that structure with one more test. Adding a
> flag per cpu is much better than adding one per task. The kprobes are
> slow anyway, so causing it to do a little bit more work is not a
> problem.
>
> I tested this code against the probes that caused my previous crashes,
> and it works fine.
That's a bit sad we need to bloat preempt_schedule() with a new test, especially
as kprobes should not add overhead in the off case and then I guess many distros
enable kprobes by default, so it's probably not just enabled on some debug kernel.
Another idea could be to turn current_thread_info()->preempt_count into a local_t
var which ensures a single incrementation is performed in a single instruction.
Well, in the hope that every arch can do something like:
inc (mem_addr)
IIRC, I believe arm can't... And the default local_inc is probably not helpful
in our case...
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index f1a6244..2ed67e6 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
> * stepping.
> */
> regs->ip = (unsigned long)p->ainsn.insn;
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return;
> }
> #endif
> @@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> * re-enable preemption at the end of this function,
> * and also in reenter_kprobe() and setup_singlestep().
> */
> - preempt_disable();
> + kprobe_preempt_disable();
>
> kcb = get_kprobe_ctlblk();
> p = get_kprobe(addr);
> @@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> * the original instruction.
> */
> regs->ip = (unsigned long)addr;
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> } else if (kprobe_running()) {
> p = __this_cpu_read(current_kprobe);
> @@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> }
> } /* else: not a kprobe fault; let the kernel handle it */
>
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 0;
> }
>
> @@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
> }
> reset_current_kprobe();
> out:
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
>
> /*
> * if somebody else is singlestepping across a probe point, flags
> @@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
> restore_previous_kprobe(kcb);
> else
> reset_current_kprobe();
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> break;
> case KPROBE_HIT_ACTIVE:
> case KPROBE_HIT_SSDONE:
> @@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
> memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
> kcb->jprobes_stack,
> MIN_STACK_SIZE(kcb->jprobe_saved_sp));
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> }
> return 0;
> @@ -1530,7 +1530,7 @@ static int __kprobes setup_detour_execution(struct kprobe *p,
> regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
> if (!reenter)
> reset_current_kprobe();
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> }
> return 0;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a837b20..c7c423e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
>
> struct task_struct;
>
> +#ifdef CONFIG_KPROBES
> +DECLARE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline void kprobe_preempt_disable(void)
> +{
> + preempt_disable();
> + __get_cpu_var(kprobe_preempt_disabled) = 1;
> + preempt_enable_no_resched();
> +}
> +
> +static inline void kprobe_preempt_enable(void)
> +{
> + preempt_disable();
> + __get_cpu_var(kprobe_preempt_disabled) = 0;
> + preempt_enable_no_resched();
> +}
> +#endif
> +
> #ifdef CONFIG_PROVE_RCU
> extern int lockdep_tasklist_lock_is_held(void);
> #endif /* #ifdef CONFIG_PROVE_RCU */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3f2e502..990d53d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
>
> #endif /* CONFIG_SMP */
>
> +#ifdef CONFIG_KPROBES
> +/*
> + * Kprobes can not disable preempting with the preempt_count.
> + * If it does, and it single steps kernel code that modifies
> + * the preempt_count, it will corrupt the result.
> + *
> + * Since kprobes is slow anyways, we do not need to bloat the
> + * task_struct or thread_struct with another variable to test.
> + * Simply use a per_cpu variable and require the kprobe code
> + * to disable preemption normally to set it.
> + */
> +DEFINE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline int preempt_disabled_kprobe(void)
> +{
> + return __get_cpu_var(kprobe_preempt_disabled);
> +}
> +#else
> +static inline int preempt_disabled_kprobe(void)
> +{
> + return 0;
> +}
> +#endif
> +
> /*
> * This is the main, per-CPU runqueue data structure.
> *
> @@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
> DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
> PREEMPT_MASK - 10);
> #endif
> - if (preempt_count() == val)
> + if (preempt_count() == val && !preempt_disabled_kprobe())
> trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
> }
> EXPORT_SYMBOL(add_preempt_count);
> @@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
> return;
> #endif
>
> - if (preempt_count() == val)
> + if (preempt_count() == val && !preempt_disabled_kprobe())
> trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
> preempt_count() -= val;
> }
> @@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
>
> do {
> add_preempt_count_notrace(PREEMPT_ACTIVE);
> + if (unlikely(preempt_disabled_kprobe())) {
> + sub_preempt_count_notrace(PREEMPT_ACTIVE);
> + break;
> + }
> schedule();
> sub_preempt_count_notrace(PREEMPT_ACTIVE);
>
>
>
On Thu, 2011-06-30 at 18:14 +0200, Frederic Weisbecker wrote:
> That's a bit sad we need to bloat preempt_schedule() with a new test, especially
> as kprobes should not add overhead in the off case and then I guess many distros
> enable kprobes by default, so it's probably not just enabled on some debug kernel.
A simple per_cpu var test is not that bad, and that's also why I put it
where I did. It only gets checked after all the other locations fail. I
doubt this really adds any measurable overhead. Note, most distro's
don't even enable CONFIG_PREEMPT so this isn't even touched by them.
>
> Another idea could be to turn current_thread_info()->preempt_count into a local_t
> var which ensures a single incrementation is performed in a single instruction.
Not all archs support such a thing.
>
> Well, in the hope that every arch can do something like:
>
> inc (mem_addr)
>
> IIRC, I believe arm can't... And the default local_inc is probably not helpful
> in our case...
Some archs (I believe) perform local_inc() slower than i++. In which
case, this solution will more likely slow things down even more than
what I proposed. As the impact will be on every preempt_disable and
enable.
And this would not even help for the place that I actually hit the crash
on. Which was in the scheduler. It wasn't the addition of the
preempt_count that caused my crash, it was just reading it.
if (unlikely(in_atomic_preempt_off() && !prev->exit_state))
__schedule_bug(prev);
The in_atomic_preempt_off() is what blew up on me.
#define in_atomic_preempt_off() \
((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)
What happened was that the reading of preempt_count was single stepped.
And that had preempt_count set to one more due to the preempt_disable()
in kprobes. Even if preempt_count was a local_t, this bug would have
still triggered, and my system would have crashed anyway. (every
schedule caused a printk to happen).
-- Steve
On Thu, Jun 30, 2011 at 11:51:57AM -0400, Steven Rostedt wrote:
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
>
> If we add a kprobe on a inc_preempt_count() call:
>
> [ preempt_count = 0 ]
>
> ld preempt_count, %eax <<--- trap
>
> <trap>
> preempt_disable();
> [ preempt_count = 1]
> setup_singlestep();
> <trap return>
>
> [ preempt_count = 1 ]
>
> ld preempt_count, %eax
>
> [ %eax = 1 ]
>
> <trap>
> post_kprobe_handler()
> preempt_enable_no_resched();
> [ preempt_count = 0 ]
> <trap return>
>
> [ %eax = 1 ]
>
> add %eax,1
>
> [ %eax = 2 ]
>
> st %eax, preempt_count
>
> [ preempt_count = 2 ]
>
>
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
>
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
>
> I did not update task_struct for a separate variable for kprobes to test
> because it would just bloat that structure with one more test. Adding a
> flag per cpu is much better than adding one per task. The kprobes are
> slow anyway, so causing it to do a little bit more work is not a
> problem.
>
> I tested this code against the probes that caused my previous crashes,
> and it works fine.
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index f1a6244..2ed67e6 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
Since this is an issue for all arches, shouldn't this include the
transform:
preempt_enable_no_resched() -> kprobe_preempt_enable()
and
preempt_disable() -> kprobe_preempt_disable()
for all arches?
Or, is that a follow-up patch?
Thanks,
-Jason
> @@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
> * stepping.
> */
> regs->ip = (unsigned long)p->ainsn.insn;
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return;
> }
> #endif
> @@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> * re-enable preemption at the end of this function,
> * and also in reenter_kprobe() and setup_singlestep().
> */
> - preempt_disable();
> + kprobe_preempt_disable();
>
> kcb = get_kprobe_ctlblk();
> p = get_kprobe(addr);
> @@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> * the original instruction.
> */
> regs->ip = (unsigned long)addr;
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> } else if (kprobe_running()) {
> p = __this_cpu_read(current_kprobe);
> @@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
> }
> } /* else: not a kprobe fault; let the kernel handle it */
>
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 0;
> }
>
> @@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
> }
> reset_current_kprobe();
> out:
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
>
> /*
> * if somebody else is singlestepping across a probe point, flags
> @@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
> restore_previous_kprobe(kcb);
> else
> reset_current_kprobe();
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> break;
> case KPROBE_HIT_ACTIVE:
> case KPROBE_HIT_SSDONE:
> @@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
> memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
> kcb->jprobes_stack,
> MIN_STACK_SIZE(kcb->jprobe_saved_sp));
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> }
> return 0;
> @@ -1530,7 +1530,7 @@ static int __kprobes setup_detour_execution(struct kprobe *p,
> regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
> if (!reenter)
> reset_current_kprobe();
> - preempt_enable_no_resched();
> + kprobe_preempt_enable();
> return 1;
> }
> return 0;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a837b20..c7c423e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
>
> struct task_struct;
>
> +#ifdef CONFIG_KPROBES
> +DECLARE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline void kprobe_preempt_disable(void)
> +{
> + preempt_disable();
> + __get_cpu_var(kprobe_preempt_disabled) = 1;
> + preempt_enable_no_resched();
> +}
> +
> +static inline void kprobe_preempt_enable(void)
> +{
> + preempt_disable();
> + __get_cpu_var(kprobe_preempt_disabled) = 0;
> + preempt_enable_no_resched();
> +}
> +#endif
> +
> #ifdef CONFIG_PROVE_RCU
> extern int lockdep_tasklist_lock_is_held(void);
> #endif /* #ifdef CONFIG_PROVE_RCU */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3f2e502..990d53d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
>
> #endif /* CONFIG_SMP */
>
> +#ifdef CONFIG_KPROBES
> +/*
> + * Kprobes can not disable preempting with the preempt_count.
> + * If it does, and it single steps kernel code that modifies
> + * the preempt_count, it will corrupt the result.
> + *
> + * Since kprobes is slow anyways, we do not need to bloat the
> + * task_struct or thread_struct with another variable to test.
> + * Simply use a per_cpu variable and require the kprobe code
> + * to disable preemption normally to set it.
> + */
> +DEFINE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline int preempt_disabled_kprobe(void)
> +{
> + return __get_cpu_var(kprobe_preempt_disabled);
> +}
> +#else
> +static inline int preempt_disabled_kprobe(void)
> +{
> + return 0;
> +}
> +#endif
> +
> /*
> * This is the main, per-CPU runqueue data structure.
> *
> @@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
> DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
> PREEMPT_MASK - 10);
> #endif
> - if (preempt_count() == val)
> + if (preempt_count() == val && !preempt_disabled_kprobe())
> trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
> }
> EXPORT_SYMBOL(add_preempt_count);
> @@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
> return;
> #endif
>
> - if (preempt_count() == val)
> + if (preempt_count() == val && !preempt_disabled_kprobe())
> trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
> preempt_count() -= val;
> }
> @@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
>
> do {
> add_preempt_count_notrace(PREEMPT_ACTIVE);
> + if (unlikely(preempt_disabled_kprobe())) {
> + sub_preempt_count_notrace(PREEMPT_ACTIVE);
> + break;
> + }
> schedule();
> sub_preempt_count_notrace(PREEMPT_ACTIVE);
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Thu, 2011-06-30 at 15:40 -0400, Jason Baron wrote:
> Since this is an issue for all arches, shouldn't this include the
> transform:
>
> preempt_enable_no_resched() -> kprobe_preempt_enable()
>
> and
>
> preempt_disable() -> kprobe_preempt_disable()
>
> for all arches?
>
> Or, is that a follow-up patch?
>
I'm cross compiling all my changes as I type this ;)
-- Steve
On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
>
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
Damn this is ugly. Can we step back and see if we can make the
requirement for kprobe to disable preemption go away?
Why does it have to do that anyway? Isn't it keeping enough per-task
state to allow preemption over the single step?
(2011/06/30 22:23), Steven Rostedt wrote:
> Hi Masami,
>
> While testing some changes in -rt against kprobes, I hit a crash that
> exists in mainline. If we stick a probe in a location that reads
> preempt_count, we corrupt the kernel itself.
>
> Reason is that the kprobe single step handler disables preemption, sets
> up the single step, returns to the code to execute that single step,
> takes the trap, enables preemption, and continues.
>
> The issue, is because we disabled preemption in the trap, returned, then
> enabled it again in another trap, we just changed what the code sees
> that does that single step.
>
> If we add a kprobe on a inc_preempt_count() call:
>
> [ preempt_count = 0 ]
>
> ld preempt_count, %eax <<--- trap
>
> <trap>
> preempt_disable();
> [ preempt_count = 1]
> setup_singlestep();
> <trap return>
>
> [ preempt_count = 1 ]
>
> ld preempt_count, %eax
>
> [ %eax = 1 ]
>
> <trap>
> post_kprobe_handler()
> preempt_enable_no_resched();
> [ preempt_count = 0 ]
> <trap return>
>
> [ %eax = 1 ]
>
> add %eax,1
>
> [ %eax = 2 ]
>
> st %eax, preempt_count
>
> [ preempt_count = 2 ]
>
>
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
Ah! right!
> Do we really need to have preemption disabled throughout this? Is it
> because we don't want to migrate or call schedule? Not sure what the
> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> is checked as well?
I think the best way to do that is just removing preemption disabling
code, because
- breakpoint exception itself disables interrupt (at least on x86)
- While single stepping, interrupts also be disabled.
(BTW, theoretically, boosted and optimized kprobes shouldn't have
this problem, because those doesn't execute single-stepping)
So, I think there is no reason of disabling preemption.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2011/07/01 6:56), Peter Zijlstra wrote:
> On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
>>
>> To solve this, I've added a per_cpu variable called
>> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
>> the preempt_schedule() will not preempt the code.
Sorry for replying so late :(
> Damn this is ugly. Can we step back and see if we can make the
> requirement for kprobe to disable preemption go away?
As I replied right now, I think we can just eliminate that
disabling preemption code. At least we'd better try it.
I agree with you, introducing this kind of complexity
just for kprobes is not what I want. :(
> Why does it have to do that anyway? Isn't it keeping enough per-task
> state to allow preemption over the single step?
preemption itself must not happen on single stepping, but it seems
impossible to do heavy context switching with setting TF bit...
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
On Fri, 2011-07-01 at 10:12 +0900, Masami Hiramatsu wrote:
> > Do we really need to have preemption disabled throughout this? Is it
> > because we don't want to migrate or call schedule? Not sure what the
> > best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> > is checked as well?
>
> I think the best way to do that is just removing preemption disabling
> code, because
> - breakpoint exception itself disables interrupt (at least on x86)
> - While single stepping, interrupts also be disabled.
I guess the above point is critical. If interrupts are disabled through
out the entire walk through, then we are fine, as that just guarantees
preemption is disabled anyway. But! if it does get enabled anywhere,
then we will have issues as the two traps require using the same state
data that is stored per cpu.
> (BTW, theoretically, boosted and optimized kprobes shouldn't have
> this problem, because those doesn't execute single-stepping)
Does the optimized kprobes even disable preemption?
>
> So, I think there is no reason of disabling preemption.
That would be the best solution.
-- Steve
[ Added some of the affected maintainers, left off David Howells and
David Miller due to LKML Cc limit ]
On Fri, 2011-07-01 at 10:22 +0900, Masami Hiramatsu wrote:
> (2011/07/01 6:56), Peter Zijlstra wrote:
> > On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
> >>
> >> To solve this, I've added a per_cpu variable called
> >> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> >> the preempt_schedule() will not preempt the code.
>
> Sorry for replying so late :(
Heh, who can blame you? Timezones make open source development a
wait-and-see affair.
>
> > Damn this is ugly. Can we step back and see if we can make the
> > requirement for kprobe to disable preemption go away?
>
> As I replied right now, I think we can just eliminate that
> disabling preemption code. At least we'd better try it.
> I agree with you, introducing this kind of complexity
> just for kprobes is not what I want. :(
Note, I did clean up this patch, so it is not as fugly.
>
> > Why does it have to do that anyway? Isn't it keeping enough per-task
> > state to allow preemption over the single step?
>
> preemption itself must not happen on single stepping, but it seems
> impossible to do heavy context switching with setting TF bit...
Yeah, if all archs single step with interrupts disabled, then we should
be fine with removing preemption.
-- Steve
(2011/07/01 10:38), Steven Rostedt wrote:
> [ Added some of the affected maintainers, left off David Howells and
> David Miller due to LKML Cc limit ]
>
> On Fri, 2011-07-01 at 10:22 +0900, Masami Hiramatsu wrote:
>> (2011/07/01 6:56), Peter Zijlstra wrote:
>>> On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
>>>>
>>>> To solve this, I've added a per_cpu variable called
>>>> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
>>>> the preempt_schedule() will not preempt the code.
>>
>> Sorry for replying so late :(
>
> Heh, who can blame you? Timezones make open source development a
> wait-and-see affair.
Yeah, we can't go over light speed.
>>> Damn this is ugly. Can we step back and see if we can make the
>>> requirement for kprobe to disable preemption go away?
>>
>> As I replied right now, I think we can just eliminate that
>> disabling preemption code. At least we'd better try it.
>> I agree with you, introducing this kind of complexity
>> just for kprobes is not what I want. :(
>
> Note, I did clean up this patch, so it is not as fugly.
Hm, I think you don't need to introduce new flag for that
purpose, there is current_kprobe and kprobe status flag.
if (kprobe_running() &&
get_kprobe_ctlblk()->kprobe_status == KPROBE_HIT_SS)
/*Running under kprobe's single stepping*/
But I'm not sure that is there any code which can run
under TF=1. (maybe NMI code? but it would not cause preemption)
>>> Why does it have to do that anyway? Isn't it keeping enough per-task
>>> state to allow preemption over the single step?
>>
>> preemption itself must not happen on single stepping, but it seems
>> impossible to do heavy context switching with setting TF bit...
>
> Yeah, if all archs single step with interrupts disabled, then we should
> be fine with removing preemption.
OK, I'll check that.
Thank you ;)
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2011/07/01 10:33), Steven Rostedt wrote:
> On Fri, 2011-07-01 at 10:12 +0900, Masami Hiramatsu wrote:
>
>>> Do we really need to have preemption disabled throughout this? Is it
>>> because we don't want to migrate or call schedule? Not sure what the
>>> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
>>> is checked as well?
>>
>> I think the best way to do that is just removing preemption disabling
>> code, because
>> - breakpoint exception itself disables interrupt (at least on x86)
>> - While single stepping, interrupts also be disabled.
>
> I guess the above point is critical. If interrupts are disabled through
> out the entire walk through, then we are fine, as that just guarantees
> preemption is disabled anyway. But! if it does get enabled anywhere,
> then we will have issues as the two traps require using the same state
> data that is stored per cpu.
That should be a bug, or kprobe's assumption was so fragile (and must
be rewritten.)
Anyway, kprobe_handler() in arch/x86/kernel/kprobes.c expects that
it is executed in a critical section, and it ensures that if there
is no other kprobes running on that processor. (however, as you can
see in reenter_kprobe(), if the breakpoint hits under single stepping,
it calls BUG() because kprobes guess that someone put another kprobe
inside kprobe's critical section)
>> (BTW, theoretically, boosted and optimized kprobes shouldn't have
>> this problem, because those doesn't execute single-stepping)
>
> Does the optimized kprobes even disable preemption?
Yeah, just while calling its handler, since someone will
call may_sleep() in it... Anyway, nowadays it disables
interruption for emulating breakpoint behavior.
>>
>> So, I think there is no reason of disabling preemption.
>
> That would be the best solution.
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]