LinuxLists.cc - [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

2020-04-08 19:38:45

[permalink] [raw]

Subject: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

hi,
Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

============================================
WARNING: possible recursive locking detected
5.6.0-rc6+ #6 Not tainted
--------------------------------------------
sched-messaging/2767 is trying to acquire lock:
ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

but task is already holding lock:
ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(kretprobe_table_locks[i].lock));
lock(&(kretprobe_table_locks[i].lock));

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by sched-messaging/2767:
#0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

stack backtrace:
CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
Call Trace:
dump_stack+0x96/0xe0
__lock_acquire.cold.57+0x173/0x2b7
? native_queued_spin_lock_slowpath+0x42b/0x9e0
? lockdep_hardirqs_on+0x590/0x590
? __lock_acquire+0xf63/0x4030
lock_acquire+0x15a/0x3d0
? kretprobe_hash_lock+0x52/0xa0
_raw_spin_lock_irqsave+0x36/0x70
? kretprobe_hash_lock+0x52/0xa0
kretprobe_hash_lock+0x52/0xa0
trampoline_handler+0xf8/0x940
? kprobe_fault_handler+0x380/0x380
? find_held_lock+0x3a/0x1c0
kretprobe_trampoline+0x25/0x50
? lock_acquired+0x392/0xbc0
? _raw_spin_lock_irqsave+0x50/0x70
? __get_valid_kprobe+0x1f0/0x1f0
? _raw_spin_unlock_irqrestore+0x3b/0x40
? finish_task_switch+0x4b9/0x6d0
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

---> kretprobe_table_locks locked

kretprobe_trampoline
trampoline_handler
kretprobe_hash_lock(current, &head, &flags); <--- deadlock

The change below sets current_kprobe in kprobe_flush_task, so the probe
recursion protection check is hit and the probe is never set. It seems
to fix the deadlock.

I'm not sure this is the best fix, any ideas are welcome ;-)

thanks,
jirka

---
kernel/kprobes.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..b13247cae752 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,10 @@ __releases(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_unlock);

+static struct kprobe kretprobe_dummy = {
+ .addr = (void *)kretprobe_trampoline,
+};
+
/*
* This function is called from finish_task_switch when task tk becomes dead,
* so that we can recycle any function-return probe instances associated
@@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
+ __this_cpu_write(current_kprobe, &kretprobe_dummy);
kretprobe_table_lock(hash, &flags);
hlist_for_each_entry_safe(ri, tmp, head, hlist) {
if (ri->task == tk)
recycle_rp_inst(ri, &empty_rp);
}
kretprobe_table_unlock(hash, &flags);
+ __this_cpu_write(current_kprobe, NULL);
hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
kfree(ri);
--
2.25.2

2020-04-09 09:04:27

by Naveen N. Rao

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi Jiri,

Jiri Olsa wrote:
> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.6.0-rc6+ #6 Not tainted
> --------------------------------------------
> sched-messaging/2767 is trying to acquire lock:
> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>
> but task is already holding lock:
> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(kretprobe_table_locks[i].lock));
> lock(&(kretprobe_table_locks[i].lock));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 1 lock held by sched-messaging/2767:
> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> stack backtrace:
> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> Call Trace:
> dump_stack+0x96/0xe0
> __lock_acquire.cold.57+0x173/0x2b7
> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> ? lockdep_hardirqs_on+0x590/0x590
> ? __lock_acquire+0xf63/0x4030
> lock_acquire+0x15a/0x3d0
> ? kretprobe_hash_lock+0x52/0xa0
> _raw_spin_lock_irqsave+0x36/0x70
> ? kretprobe_hash_lock+0x52/0xa0
> kretprobe_hash_lock+0x52/0xa0
> trampoline_handler+0xf8/0x940
> ? kprobe_fault_handler+0x380/0x380
> ? find_held_lock+0x3a/0x1c0
> kretprobe_trampoline+0x25/0x50
> ? lock_acquired+0x392/0xbc0
> ? _raw_spin_lock_irqsave+0x50/0x70
> ? __get_valid_kprobe+0x1f0/0x1f0
> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> ? finish_task_switch+0x4b9/0x6d0
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
>
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>
> The problem is in outside kprobe_flush_task, where we call:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave
>
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>
> ---> kretprobe_table_locks locked
>
> kretprobe_trampoline
> trampoline_handler
> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.

Good analysis!

>
> I'm not sure this is the best fix, any ideas are welcome ;-)

I think this is a good way to address this issue.

>
> thanks,
> jirka
>
>
> ---
> kernel/kprobes.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..b13247cae752 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +static struct kprobe kretprobe_dummy = {
> + .addr = (void *)kretprobe_trampoline,
> +};
> +

Perhaps a more meaningful name, say, kprobe_flush_task_protect ?

> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> + __this_cpu_write(current_kprobe, &kretprobe_dummy);
> kretprobe_table_lock(hash, &flags);
> hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> if (ri->task == tk)
> recycle_rp_inst(ri, &empty_rp);
> }
> kretprobe_table_unlock(hash, &flags);
> + __this_cpu_write(current_kprobe, NULL);

I would move this to the end of the function to also cover the below
code. kprobe_flush_task() is marked NOKPROBE, so it is good to prevent
probe handling in the below code too.

> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> kfree(ri);

- Naveen

2020-04-09 12:39:58

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi Jiri,

On Wed, 8 Apr 2020 18:46:41 +0200
Jiri Olsa <[email protected]> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.

Hmm, kprobe is lockless, but kretprobe involves spinlock.
Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
for kretprobe.
If you need to trace spinlock return, please consider to putting
kprobe at "ret" instruction.

> My test was also able to trigger lockdep output:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.6.0-rc6+ #6 Not tainted
> --------------------------------------------
> sched-messaging/2767 is trying to acquire lock:
> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>
> but task is already holding lock:
> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(kretprobe_table_locks[i].lock));
> lock(&(kretprobe_table_locks[i].lock));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 1 lock held by sched-messaging/2767:
> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> stack backtrace:
> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> Call Trace:
> dump_stack+0x96/0xe0
> __lock_acquire.cold.57+0x173/0x2b7
> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> ? lockdep_hardirqs_on+0x590/0x590
> ? __lock_acquire+0xf63/0x4030
> lock_acquire+0x15a/0x3d0
> ? kretprobe_hash_lock+0x52/0xa0
> _raw_spin_lock_irqsave+0x36/0x70
> ? kretprobe_hash_lock+0x52/0xa0
> kretprobe_hash_lock+0x52/0xa0
> trampoline_handler+0xf8/0x940
> ? kprobe_fault_handler+0x380/0x380
> ? find_held_lock+0x3a/0x1c0
> kretprobe_trampoline+0x25/0x50
> ? lock_acquired+0x392/0xbc0
> ? _raw_spin_lock_irqsave+0x50/0x70
> ? __get_valid_kprobe+0x1f0/0x1f0
> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> ? finish_task_switch+0x4b9/0x6d0
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
>
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>
> The problem is in outside kprobe_flush_task, where we call:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave
>
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

Hmm, OK. In this case, I think we should mark this process is
going to die and never try to kretprobe on it.

>
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>
> ---> kretprobe_table_locks locked
>
> kretprobe_trampoline
> trampoline_handler
> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
>
> I'm not sure this is the best fix, any ideas are welcome ;-)

Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
kprobes (and kretprobe) on an area by filling current_kprobe might
be a good idea, but it also involves other kprobes.

How about let kretprobe skip the task which state == TASK_DEAD ?

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 627fc1b7011a..3f207d2e0afb 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
* To avoid deadlocks, prohibit return probing in NMI contexts,
* just skip the probe and increase the (inexact) 'nmissed'
* statistical counter, so that the user is informed that
- * something happened:
+ * something happened.
+ * Also, if the current task is dead, we will already in the process
+ * to reclaim kretprobe instances from hash list. To avoid memory
+ * leak, skip to run the kretprobe on such task.
*/
- if (unlikely(in_nmi())) {
+ if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
rp->nmissed++;
return 0;
}

--
Masami Hiramatsu <[email protected]>

2020-04-09 13:19:11

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, Apr 09, 2020 at 09:38:06PM +0900, Masami Hiramatsu wrote:
> Hi Jiri,
>
> On Wed, 8 Apr 2020 18:46:41 +0200
> Jiri Olsa <[email protected]> wrote:
>
> > hi,
> > Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
>
> Hmm, kprobe is lockless, but kretprobe involves spinlock.
> Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> for kretprobe.

I thought of blacklisting, but we were using that kretprobe
in a bcc script.. it's not overloaded by using bpf trampolines,
but still might be useful

SNIP

> > The code within the kretprobe handler checks for probe reentrancy,
> > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> >
> > The problem is in outside kprobe_flush_task, where we call:
> >
> > kprobe_flush_task
> > kretprobe_table_lock
> > raw_spin_lock_irqsave
> > _raw_spin_lock_irqsave
> >
> > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> Hmm, OK. In this case, I think we should mark this process is
> going to die and never try to kretprobe on it.
>
> >
> > The kretprobe_trampoline handler is then executed with already
> > locked kretprobe_table_locks, and first thing it does is to
> > lock kretprobe_table_locks ;-) the whole lockup path like:
> >
> > kprobe_flush_task
> > kretprobe_table_lock
> > raw_spin_lock_irqsave
> > _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> >
> > ---> kretprobe_table_locks locked
> >
> > kretprobe_trampoline
> > trampoline_handler
> > kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> >
> > The change below sets current_kprobe in kprobe_flush_task, so the probe
> > recursion protection check is hit and the probe is never set. It seems
> > to fix the deadlock.
> >
> > I'm not sure this is the best fix, any ideas are welcome ;-)
>
> Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> kprobes (and kretprobe) on an area by filling current_kprobe might
> be a good idea, but it also involves other kprobes.
>
> How about let kretprobe skip the task which state == TASK_DEAD ?

hum, isn't that considerable amount of paths (with state == TASK_DEAD)
that we would kill kprobes for? someone might want to trace it

thanks,
jirka

2020-04-09 14:28:10

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi,

On Thu, 09 Apr 2020 18:46:47 +0530
"Naveen N. Rao" <[email protected]> wrote:

> Hi Masami,
>
> Masami Hiramatsu wrote:
> > Hi Jiri,
> >
> > On Wed, 8 Apr 2020 18:46:41 +0200
> > Jiri Olsa <[email protected]> wrote:
> >
> >> hi,
> >> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> >
> > Hmm, kprobe is lockless, but kretprobe involves spinlock.
> > Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> > for kretprobe.
>
> As far as I can see, this is the only place where probing
> _raw_spin_lock_irqsave() is an issue. Should we blacklist all users for
> this case alone?

Hrm, right. kretprobe is different from kprobe's case.

>
> > If you need to trace spinlock return, please consider to putting
> > kprobe at "ret" instruction.
> >
> >> My test was also able to trigger lockdep output:
> >>
> >> ============================================
> >> WARNING: possible recursive locking detected
> >> 5.6.0-rc6+ #6 Not tainted
> >> --------------------------------------------
> >> sched-messaging/2767 is trying to acquire lock:
> >> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
> >>
> >> but task is already holding lock:
> >> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >>
> >> other info that might help us debug this:
> >> Possible unsafe locking scenario:
> >>
> >> CPU0
> >> ----
> >> lock(&(kretprobe_table_locks[i].lock));
> >> lock(&(kretprobe_table_locks[i].lock));
> >>
> >> *** DEADLOCK ***
> >>
> >> May be due to missing lock nesting notation
> >>
> >> 1 lock held by sched-messaging/2767:
> >> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
> >>
> >> stack backtrace:
> >> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> >> Call Trace:
> >> dump_stack+0x96/0xe0
> >> __lock_acquire.cold.57+0x173/0x2b7
> >> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> >> ? lockdep_hardirqs_on+0x590/0x590
> >> ? __lock_acquire+0xf63/0x4030
> >> lock_acquire+0x15a/0x3d0
> >> ? kretprobe_hash_lock+0x52/0xa0
> >> _raw_spin_lock_irqsave+0x36/0x70
> >> ? kretprobe_hash_lock+0x52/0xa0
> >> kretprobe_hash_lock+0x52/0xa0
> >> trampoline_handler+0xf8/0x940
> >> ? kprobe_fault_handler+0x380/0x380
> >> ? find_held_lock+0x3a/0x1c0
> >> kretprobe_trampoline+0x25/0x50
> >> ? lock_acquired+0x392/0xbc0
> >> ? _raw_spin_lock_irqsave+0x50/0x70
> >> ? __get_valid_kprobe+0x1f0/0x1f0
> >> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> >> ? finish_task_switch+0x4b9/0x6d0
> >> ? __switch_to_asm+0x34/0x70
> >> ? __switch_to_asm+0x40/0x70
> >>
> >> The code within the kretprobe handler checks for probe reentrancy,
> >> so we won't trigger any _raw_spin_lock_irqsave probe in there.
> >>
> >> The problem is in outside kprobe_flush_task, where we call:
> >>
> >> kprobe_flush_task
> >> kretprobe_table_lock
> >> raw_spin_lock_irqsave
> >> _raw_spin_lock_irqsave
> >>
> >> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> >> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> >
> > Hmm, OK. In this case, I think we should mark this process is
> > going to die and never try to kretprobe on it.
> >
> >>
> >> The kretprobe_trampoline handler is then executed with already
> >> locked kretprobe_table_locks, and first thing it does is to
> >> lock kretprobe_table_locks ;-) the whole lockup path like:
> >>
> >> kprobe_flush_task
> >> kretprobe_table_lock
> >> raw_spin_lock_irqsave
> >> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> >>
> >> ---> kretprobe_table_locks locked
> >>
> >> kretprobe_trampoline
> >> trampoline_handler
> >> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> >>
> >> The change below sets current_kprobe in kprobe_flush_task, so the probe
> >> recursion protection check is hit and the probe is never set. It seems
> >> to fix the deadlock.
> >>
> >> I'm not sure this is the best fix, any ideas are welcome ;-)
> >
> > Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> > kprobes (and kretprobe) on an area by filling current_kprobe might
> > be a good idea, but it also involves other kprobes.
>
> Not sure how you mean that. Jiri's RFC patch would be disabling
> k[ret]probes within kprobe_flush_task(), which is only ever invoked from
> finish_task_switch(). I only see calls to spin locks and kfree() from
> here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would
> ideally want to not trace/probe other functions it calls.

Ah, good point. I forgot that has been blacklisted. OK. then I accept
the Jiri's RFC.

Thank you,

>
> >
> > How about let kretprobe skip the task which state == TASK_DEAD ?
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 627fc1b7011a..3f207d2e0afb 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> > * To avoid deadlocks, prohibit return probing in NMI contexts,
> > * just skip the probe and increase the (inexact) 'nmissed'
> > * statistical counter, so that the user is informed that
> > - * something happened:
> > + * something happened.
> > + * Also, if the current task is dead, we will already in the process
> > + * to reclaim kretprobe instances from hash list. To avoid memory
> > + * leak, skip to run the kretprobe on such task.
> > */
> > - if (unlikely(in_nmi())) {
> > + if (unlikely(in_nmi()) || current->state == TASK_DEAD) {
>
> I'm wondering if this actually works. kprobe_flush_task() seems to be
> called from finish_task_switch(), after the task switch is complete. So,
> current task won't actually be dead here.

>
>
> - Naveen
>

--
Masami Hiramatsu <[email protected]>

2020-04-09 16:17:35

by Naveen N. Rao

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi Masami,

Masami Hiramatsu wrote:
> Hi Jiri,
>
> On Wed, 8 Apr 2020 18:46:41 +0200
> Jiri Olsa <[email protected]> wrote:
>
>> hi,
>> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
>
> Hmm, kprobe is lockless, but kretprobe involves spinlock.
> Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> for kretprobe.

As far as I can see, this is the only place where probing
_raw_spin_lock_irqsave() is an issue. Should we blacklist all users for
this case alone?

> If you need to trace spinlock return, please consider to putting
> kprobe at "ret" instruction.
>
>> My test was also able to trigger lockdep output:
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 5.6.0-rc6+ #6 Not tainted
>> --------------------------------------------
>> sched-messaging/2767 is trying to acquire lock:
>> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>>
>> but task is already holding lock:
>> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(&(kretprobe_table_locks[i].lock));
>> lock(&(kretprobe_table_locks[i].lock));
>>
>> *** DEADLOCK ***
>>
>> May be due to missing lock nesting notation
>>
>> 1 lock held by sched-messaging/2767:
>> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>>
>> stack backtrace:
>> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
>> Call Trace:
>> dump_stack+0x96/0xe0
>> __lock_acquire.cold.57+0x173/0x2b7
>> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
>> ? lockdep_hardirqs_on+0x590/0x590
>> ? __lock_acquire+0xf63/0x4030
>> lock_acquire+0x15a/0x3d0
>> ? kretprobe_hash_lock+0x52/0xa0
>> _raw_spin_lock_irqsave+0x36/0x70
>> ? kretprobe_hash_lock+0x52/0xa0
>> kretprobe_hash_lock+0x52/0xa0
>> trampoline_handler+0xf8/0x940
>> ? kprobe_fault_handler+0x380/0x380
>> ? find_held_lock+0x3a/0x1c0
>> kretprobe_trampoline+0x25/0x50
>> ? lock_acquired+0x392/0xbc0
>> ? _raw_spin_lock_irqsave+0x50/0x70
>> ? __get_valid_kprobe+0x1f0/0x1f0
>> ? _raw_spin_unlock_irqrestore+0x3b/0x40
>> ? finish_task_switch+0x4b9/0x6d0
>> ? __switch_to_asm+0x34/0x70
>> ? __switch_to_asm+0x40/0x70
>>
>> The code within the kretprobe handler checks for probe reentrancy,
>> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>>
>> The problem is in outside kprobe_flush_task, where we call:
>>
>> kprobe_flush_task
>> kretprobe_table_lock
>> raw_spin_lock_irqsave
>> _raw_spin_lock_irqsave
>>
>> where _raw_spin_lock_irqsave triggers the kretprobe and installs
>> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> Hmm, OK. In this case, I think we should mark this process is
> going to die and never try to kretprobe on it.
>
>>
>> The kretprobe_trampoline handler is then executed with already
>> locked kretprobe_table_locks, and first thing it does is to
>> lock kretprobe_table_locks ;-) the whole lockup path like:
>>
>> kprobe_flush_task
>> kretprobe_table_lock
>> raw_spin_lock_irqsave
>> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>>
>> ---> kretprobe_table_locks locked
>>
>> kretprobe_trampoline
>> trampoline_handler
>> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>>
>> The change below sets current_kprobe in kprobe_flush_task, so the probe
>> recursion protection check is hit and the probe is never set. It seems
>> to fix the deadlock.
>>
>> I'm not sure this is the best fix, any ideas are welcome ;-)
>
> Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> kprobes (and kretprobe) on an area by filling current_kprobe might
> be a good idea, but it also involves other kprobes.

Not sure how you mean that. Jiri's RFC patch would be disabling
k[ret]probes within kprobe_flush_task(), which is only ever invoked from
finish_task_switch(). I only see calls to spin locks and kfree() from
here. Besides, kprobe_flush_task() itself is NOKPROBE, so we would
ideally want to not trace/probe other functions it calls.

>
> How about let kretprobe skip the task which state == TASK_DEAD ?
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 627fc1b7011a..3f207d2e0afb 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1874,9 +1874,12 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
> * To avoid deadlocks, prohibit return probing in NMI contexts,
> * just skip the probe and increase the (inexact) 'nmissed'
> * statistical counter, so that the user is informed that
> - * something happened:
> + * something happened.
> + * Also, if the current task is dead, we will already in the process
> + * to reclaim kretprobe instances from hash list. To avoid memory
> + * leak, skip to run the kretprobe on such task.
> */
> - if (unlikely(in_nmi())) {
> + if (unlikely(in_nmi()) || current->state == TASK_DEAD) {

I'm wondering if this actually works. kprobe_flush_task() seems to be
called from finish_task_switch(), after the task switch is complete. So,
current task won't actually be dead here.

- Naveen

2020-04-09 16:46:25

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, 9 Apr 2020 14:52:13 +0200
Jiri Olsa <[email protected]> wrote:

> On Thu, Apr 09, 2020 at 09:38:06PM +0900, Masami Hiramatsu wrote:
> > Hi Jiri,
> >
> > On Wed, 8 Apr 2020 18:46:41 +0200
> > Jiri Olsa <[email protected]> wrote:
> >
> > > hi,
> > > Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> >
> > Hmm, kprobe is lockless, but kretprobe involves spinlock.
> > Thus, eventually, I will blacklist the _raw_spin_lock_irqsave()
> > for kretprobe.
>
> I thought of blacklisting, but we were using that kretprobe
> in a bcc script.. it's not overloaded by using bpf trampolines,
> but still might be useful
>
> SNIP
>
> > > The code within the kretprobe handler checks for probe reentrancy,
> > > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > >
> > > The problem is in outside kprobe_flush_task, where we call:
> > >
> > > kprobe_flush_task
> > > kretprobe_table_lock
> > > raw_spin_lock_irqsave
> > > _raw_spin_lock_irqsave
> > >
> > > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> >
> > Hmm, OK. In this case, I think we should mark this process is
> > going to die and never try to kretprobe on it.
> >
> > >
> > > The kretprobe_trampoline handler is then executed with already
> > > locked kretprobe_table_locks, and first thing it does is to
> > > lock kretprobe_table_locks ;-) the whole lockup path like:
> > >
> > > kprobe_flush_task
> > > kretprobe_table_lock
> > > raw_spin_lock_irqsave
> > > _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
> > >
> > > ---> kretprobe_table_locks locked
> > >
> > > kretprobe_trampoline
> > > trampoline_handler
> > > kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> > >
> > > The change below sets current_kprobe in kprobe_flush_task, so the probe
> > > recursion protection check is hit and the probe is never set. It seems
> > > to fix the deadlock.
> > >
> > > I'm not sure this is the best fix, any ideas are welcome ;-)
> >
> > Hmm, this is a bit tricky to fix this issue. Of course, temporary disable
> > kprobes (and kretprobe) on an area by filling current_kprobe might
> > be a good idea, but it also involves other kprobes.
> >
> > How about let kretprobe skip the task which state == TASK_DEAD ?
>
> hum, isn't that considerable amount of paths (with state == TASK_DEAD)
> that we would kill kprobes for? someone might want to trace it

OK, and I found that even after calling kprobe_flush_task(), it may be
work because the task will not be switched. kretprobe instance will be
reclaimed.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-04-09 16:51:35

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Wed, 8 Apr 2020 18:46:41 +0200
Jiri Olsa <[email protected]> wrote:

> hi,
> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.6.0-rc6+ #6 Not tainted
> --------------------------------------------
> sched-messaging/2767 is trying to acquire lock:
> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>
> but task is already holding lock:
> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(kretprobe_table_locks[i].lock));
> lock(&(kretprobe_table_locks[i].lock));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 1 lock held by sched-messaging/2767:
> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> stack backtrace:
> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> Call Trace:
> dump_stack+0x96/0xe0
> __lock_acquire.cold.57+0x173/0x2b7
> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> ? lockdep_hardirqs_on+0x590/0x590
> ? __lock_acquire+0xf63/0x4030
> lock_acquire+0x15a/0x3d0
> ? kretprobe_hash_lock+0x52/0xa0
> _raw_spin_lock_irqsave+0x36/0x70
> ? kretprobe_hash_lock+0x52/0xa0
> kretprobe_hash_lock+0x52/0xa0
> trampoline_handler+0xf8/0x940
> ? kprobe_fault_handler+0x380/0x380
> ? find_held_lock+0x3a/0x1c0
> kretprobe_trampoline+0x25/0x50
> ? lock_acquired+0x392/0xbc0
> ? _raw_spin_lock_irqsave+0x50/0x70
> ? __get_valid_kprobe+0x1f0/0x1f0
> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> ? finish_task_switch+0x4b9/0x6d0
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
>
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>
> The problem is in outside kprobe_flush_task, where we call:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave
>
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>
> ---> kretprobe_table_locks locked
>
> kretprobe_trampoline
> trampoline_handler
> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>
> The change below sets current_kprobe in kprobe_flush_task, so the probe
> recursion protection check is hit and the probe is never set. It seems
> to fix the deadlock.
>
> I'm not sure this is the best fix, any ideas are welcome ;-)

OK, I just have 1 comment. :)

>
> thanks,
> jirka
>
>
> ---
> kernel/kprobes.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..b13247cae752 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +static struct kprobe kretprobe_dummy = {
> + .addr = (void *)kretprobe_trampoline,
> +};
> +
> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> + __this_cpu_write(current_kprobe, &kretprobe_dummy);

Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?

BTW, we may be better to introduce a common kprobe_reject_section_start()
and kprobe_reject_section_end() so that the user don't need to prepare
dummy kprobes.

Thank you,

> kretprobe_table_lock(hash, &flags);
> hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> if (ri->task == tk)
> recycle_rp_inst(ri, &empty_rp);
> }
> kretprobe_table_unlock(hash, &flags);
> + __this_cpu_write(current_kprobe, NULL);
> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> kfree(ri);
> --
> 2.25.2
>

--
Masami Hiramatsu <[email protected]>

2020-04-09 18:46:27

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, Apr 09, 2020 at 02:32:21PM +0530, Naveen N. Rao wrote:

SNIP

> > ---
> > kernel/kprobes.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 2625c241ac00..b13247cae752 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > }
> > NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > +static struct kprobe kretprobe_dummy = {
> > + .addr = (void *)kretprobe_trampoline,
> > +};
> > +
>
> Perhaps a more meaningful name, say, kprobe_flush_task_protect ?

ok, will rename together with Masami's changes

>
> > /*
> > * This function is called from finish_task_switch when task tk becomes dead,
> > * so that we can recycle any function-return probe instances associated
> > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > INIT_HLIST_HEAD(&empty_rp);
> > hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > head = &kretprobe_inst_table[hash];
> > + __this_cpu_write(current_kprobe, &kretprobe_dummy);
> > kretprobe_table_lock(hash, &flags);
> > hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> > if (ri->task == tk)
> > recycle_rp_inst(ri, &empty_rp);
> > }
> > kretprobe_table_unlock(hash, &flags);
> > + __this_cpu_write(current_kprobe, NULL);
>
> I would move this to the end of the function to also cover the below code.
> kprobe_flush_task() is marked NOKPROBE, so it is good to prevent probe
> handling in the below code too.

ok, will do

thanks,
jirka

>
> > hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> > hlist_del(&ri->hlist);
> > kfree(ri);
>
>
> - Naveen
>

2020-04-09 19:28:09

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:

SNIP

> > ---
> > kernel/kprobes.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 2625c241ac00..b13247cae752 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > }
> > NOKPROBE_SYMBOL(kretprobe_table_unlock);
> >
> > +static struct kprobe kretprobe_dummy = {
> > + .addr = (void *)kretprobe_trampoline,
> > +};
> > +
> > /*
> > * This function is called from finish_task_switch when task tk becomes dead,
> > * so that we can recycle any function-return probe instances associated
> > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > INIT_HLIST_HEAD(&empty_rp);
> > hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > head = &kretprobe_inst_table[hash];
> > + __this_cpu_write(current_kprobe, &kretprobe_dummy);
>
> Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
>
> BTW, we may be better to introduce a common kprobe_reject_section_start()
> and kprobe_reject_section_end() so that the user don't need to prepare
> dummy kprobes.

sure, will do

thank you both for review
jirka

>
> Thank you,
>
> > kretprobe_table_lock(hash, &flags);
> > hlist_for_each_entry_safe(ri, tmp, head, hlist) {
> > if (ri->task == tk)
> > recycle_rp_inst(ri, &empty_rp);
> > }
> > kretprobe_table_unlock(hash, &flags);
> > + __this_cpu_write(current_kprobe, NULL);
> > hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> > hlist_del(&ri->hlist);
> > kfree(ri);
> > --
> > 2.25.2
> >
>
>
> --
> Masami Hiramatsu <[email protected]>
>

2020-04-09 20:15:05

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
>
> SNIP
>
> > > ---
> > > kernel/kprobes.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > index 2625c241ac00..b13247cae752 100644
> > > --- a/kernel/kprobes.c
> > > +++ b/kernel/kprobes.c
> > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > > }
> > > NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > >
> > > +static struct kprobe kretprobe_dummy = {
> > > + .addr = (void *)kretprobe_trampoline,
> > > +};
> > > +
> > > /*
> > > * This function is called from finish_task_switch when task tk becomes dead,
> > > * so that we can recycle any function-return probe instances associated
> > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > > INIT_HLIST_HEAD(&empty_rp);
> > > hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > > head = &kretprobe_inst_table[hash];
> > > + __this_cpu_write(current_kprobe, &kretprobe_dummy);
> >
> > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> >
> > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > and kprobe_reject_section_end() so that the user don't need to prepare
> > dummy kprobes.
>
> sure, will do
>
> thank you both for review

ok, found out it's actually arch code.. would you guys be ok with something like below?

jirka

---
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..081d0f366c99 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
.addr = (void *)kretprobe_trampoline,
};

+void arch_kprobe_reject_section_start(void)
+{
+ struct kprobe_ctlblk *kcb;
+
+ preempt_disable();
+
+ /*
+ * Set a dummy kprobe for avoiding kretprobe recursion.
+ * Since kretprobe never run in kprobe handler, kprobe must not
+ * be running behind this point.
+ */
+ __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ kcb = get_kprobe_ctlblk();
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void arch_kprobe_reject_section_end(void)
+{
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+}
+
/*
* Called from kretprobe_trampoline
*/
__used __visible void *trampoline_handler(struct pt_regs *regs)
{
- struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;

- preempt_disable();
-
- /*
- * Set a dummy kprobe for avoiding kretprobe recursion.
- * Since kretprobe never run in kprobe handler, kprobe must not
- * be running at this point.
- */
- kcb = get_kprobe_ctlblk();
- __this_cpu_write(current_kprobe, &kretprobe_kprobe);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ arch_kprobe_reject_section_start();

INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
@@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)

kretprobe_hash_unlock(current, &flags);

- __this_cpu_write(current_kprobe, NULL);
- preempt_enable();
+ arch_kprobe_reject_section_end();

hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..935dd8c87705 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,14 @@ __releases(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_unlock);

+void __weak arch_kprobe_reject_section_start(void)
+{
+}
+
+void __weak arch_kprobe_reject_section_end(void)
+{
+}
+
/*
* This function is called from finish_task_switch when task tk becomes dead,
* so that we can recycle any function-return probe instances associated
@@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
/* Early boot. kretprobe_table_locks not yet initialized. */
return;

+ arch_kprobe_reject_section_start();
+
INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
@@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
hlist_del(&ri->hlist);
kfree(ri);
}
+
+ arch_kprobe_reject_section_end();
}
NOKPROBE_SYMBOL(kprobe_flush_task);

2020-04-10 00:56:17

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi Jiri,

On Thu, 9 Apr 2020 22:13:36 +0200
Jiri Olsa <[email protected]> wrote:

> On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> > On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
> >
> > SNIP
> >
> > > > ---
> > > > kernel/kprobes.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > > index 2625c241ac00..b13247cae752 100644
> > > > --- a/kernel/kprobes.c
> > > > +++ b/kernel/kprobes.c
> > > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > > > }
> > > > NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > > >
> > > > +static struct kprobe kretprobe_dummy = {
> > > > + .addr = (void *)kretprobe_trampoline,
> > > > +};
> > > > +
> > > > /*
> > > > * This function is called from finish_task_switch when task tk becomes dead,
> > > > * so that we can recycle any function-return probe instances associated
> > > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > > > INIT_HLIST_HEAD(&empty_rp);
> > > > hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > > > head = &kretprobe_inst_table[hash];
> > > > + __this_cpu_write(current_kprobe, &kretprobe_dummy);
> > >
> > > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> > >
> > > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > > and kprobe_reject_section_end() so that the user don't need to prepare
> > > dummy kprobes.
> >
> > sure, will do
> >
> > thank you both for review
>
> ok, found out it's actually arch code.. would you guys be ok with something like below?

Thanks for update!

>
> jirka
>
>
> ---
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..081d0f366c99 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
> .addr = (void *)kretprobe_trampoline,
> };
>
> +void arch_kprobe_reject_section_start(void)
> +{
> + struct kprobe_ctlblk *kcb;
> +
> + preempt_disable();
> +
> + /*
> + * Set a dummy kprobe for avoiding kretprobe recursion.
> + * Since kretprobe never run in kprobe handler, kprobe must not
> + * be running behind this point.
> + */
> + __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> + kcb = get_kprobe_ctlblk();
> + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}

Yeah, the code seems good to me.

BTW, I rather like make it arch independent function so that other
arch can use it. In this case, the dummy kprobe's addr should be
somewhere obviously blacklisted (but it must be accessible.)
I think get_kprobe() will be a candidate.

And (sorry about changing my mind), the naming, I think kprobe_busy_begin()
and kprobe_busy_end() will be better because it doesn't reject registering
kprobes, instead, just make it busy :)

Thank you,

> +
> +void arch_kprobe_reject_section_end(void)
> +{
> + __this_cpu_write(current_kprobe, NULL);
> + preempt_enable();
> +}
> +
> /*
> * Called from kretprobe_trampoline
> */
> __used __visible void *trampoline_handler(struct pt_regs *regs)
> {
> - struct kprobe_ctlblk *kcb;
> struct kretprobe_instance *ri = NULL;
> struct hlist_head *head, empty_rp;
> struct hlist_node *tmp;
> @@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
> void *frame_pointer;
> bool skipped = false;
>
> - preempt_disable();
> -
> - /*
> - * Set a dummy kprobe for avoiding kretprobe recursion.
> - * Since kretprobe never run in kprobe handler, kprobe must not
> - * be running at this point.
> - */
> - kcb = get_kprobe_ctlblk();
> - __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> - kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + arch_kprobe_reject_section_start();
>
> INIT_HLIST_HEAD(&empty_rp);
> kretprobe_hash_lock(current, &head, &flags);
> @@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>
> kretprobe_hash_unlock(current, &flags);
>
> - __this_cpu_write(current_kprobe, NULL);
> - preempt_enable();
> + arch_kprobe_reject_section_end();
>
> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..935dd8c87705 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,14 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +void __weak arch_kprobe_reject_section_start(void)
> +{
> +}
> +
> +void __weak arch_kprobe_reject_section_end(void)
> +{
> +}
> +
> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
> /* Early boot. kretprobe_table_locks not yet initialized. */
> return;
>
> + arch_kprobe_reject_section_start();
> +
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
> hlist_del(&ri->hlist);
> kfree(ri);
> }
> +
> + arch_kprobe_reject_section_end();
> }
> NOKPROBE_SYMBOL(kprobe_flush_task);
>
>

--
Masami Hiramatsu <[email protected]>

2020-04-10 01:32:37

by Ziqian SUN (Zamir)

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On 4/10/20 4:13 AM, Jiri Olsa wrote:
> On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
>> On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
>>
>> SNIP
>>
>>>> ---
>>>> kernel/kprobes.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
>>>> index 2625c241ac00..b13247cae752 100644
>>>> --- a/kernel/kprobes.c
>>>> +++ b/kernel/kprobes.c
>>>> @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
>>>> }
>>>> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>>>>
>>>> +static struct kprobe kretprobe_dummy = {
>>>> + .addr = (void *)kretprobe_trampoline,
>>>> +};
>>>> +
>>>> /*
>>>> * This function is called from finish_task_switch when task tk becomes dead,
>>>> * so that we can recycle any function-return probe instances associated
>>>> @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
>>>> INIT_HLIST_HEAD(&empty_rp);
>>>> hash = hash_ptr(tk, KPROBE_HASH_BITS);
>>>> head = &kretprobe_inst_table[hash];
>>>> + __this_cpu_write(current_kprobe, &kretprobe_dummy);
>>>
>>> Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
>>>
>>> BTW, we may be better to introduce a common kprobe_reject_section_start()
>>> and kprobe_reject_section_end() so that the user don't need to prepare
>>> dummy kprobes.
>>
>> sure, will do
>>
>> thank you both for review
>
> ok, found out it's actually arch code.. would you guys be ok with something like below?
>
> jirka
>

Hi Jiri,

In my origin test lockup happens on both x86_64 and ppc64le. So I would
appreciate if you can also come up with a solution for both of the
architectures.

>
> ---
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..081d0f366c99 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
> .addr = (void *)kretprobe_trampoline,
> };
>
> +void arch_kprobe_reject_section_start(void)
> +{
> + struct kprobe_ctlblk *kcb;
> +
> + preempt_disable();
> +
> + /*
> + * Set a dummy kprobe for avoiding kretprobe recursion.
> + * Since kretprobe never run in kprobe handler, kprobe must not
> + * be running behind this point.
> + */
> + __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> + kcb = get_kprobe_ctlblk();
> + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}
> +
> +void arch_kprobe_reject_section_end(void)
> +{
> + __this_cpu_write(current_kprobe, NULL);
> + preempt_enable();
> +}
> +
> /*
> * Called from kretprobe_trampoline
> */
> __used __visible void *trampoline_handler(struct pt_regs *regs)
> {
> - struct kprobe_ctlblk *kcb;
> struct kretprobe_instance *ri = NULL;
> struct hlist_head *head, empty_rp;
> struct hlist_node *tmp;
> @@ -772,16 +793,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
> void *frame_pointer;
> bool skipped = false;
>
> - preempt_disable();
> -
> - /*
> - * Set a dummy kprobe for avoiding kretprobe recursion.
> - * Since kretprobe never run in kprobe handler, kprobe must not
> - * be running at this point.
> - */
> - kcb = get_kprobe_ctlblk();
> - __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> - kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + arch_kprobe_reject_section_start();
>
> INIT_HLIST_HEAD(&empty_rp);
> kretprobe_hash_lock(current, &head, &flags);
> @@ -873,8 +885,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>
> kretprobe_hash_unlock(current, &flags);
>
> - __this_cpu_write(current_kprobe, NULL);
> - preempt_enable();
> + arch_kprobe_reject_section_end();
>
> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..935dd8c87705 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,14 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +void __weak arch_kprobe_reject_section_start(void)
> +{
> +}
> +
> +void __weak arch_kprobe_reject_section_end(void)
> +{
> +}
> +
> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1261,8 @@ void kprobe_flush_task(struct task_struct *tk)
> /* Early boot. kretprobe_table_locks not yet initialized. */
> return;
>
> + arch_kprobe_reject_section_start();
> +
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1276,8 @@ void kprobe_flush_task(struct task_struct *tk)
> hlist_del(&ri->hlist);
> kfree(ri);
> }
> +
> + arch_kprobe_reject_section_end();
> }
> NOKPROBE_SYMBOL(kprobe_flush_task);
>
>

--
Ziqian SUN (Zamir)
9F Raycom office (NAY)
Red Hat Software (Beijing) Co.,Ltd
IRC: zsun (internal and freenode)
Tel: +86 10 65627458
GPG : 1D86 6D4A 49CE 4BBD 72CF FCF5 D856 6E11 F2A0 525E

2020-04-14 16:53:16

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Fri, Apr 10, 2020 at 09:31:07AM +0800, Ziqian SUN (Zamir) wrote:
>
>
> On 4/10/20 4:13 AM, Jiri Olsa wrote:
> > On Thu, Apr 09, 2020 at 08:45:01PM +0200, Jiri Olsa wrote:
> > > On Thu, Apr 09, 2020 at 11:41:01PM +0900, Masami Hiramatsu wrote:
> > >
> > > SNIP
> > >
> > > > > ---
> > > > > kernel/kprobes.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > > > > index 2625c241ac00..b13247cae752 100644
> > > > > --- a/kernel/kprobes.c
> > > > > +++ b/kernel/kprobes.c
> > > > > @@ -1236,6 +1236,10 @@ __releases(hlist_lock)
> > > > > }
> > > > > NOKPROBE_SYMBOL(kretprobe_table_unlock);
> > > > > +static struct kprobe kretprobe_dummy = {
> > > > > + .addr = (void *)kretprobe_trampoline,
> > > > > +};
> > > > > +
> > > > > /*
> > > > > * This function is called from finish_task_switch when task tk becomes dead,
> > > > > * so that we can recycle any function-return probe instances associated
> > > > > @@ -1256,12 +1260,14 @@ void kprobe_flush_task(struct task_struct *tk)
> > > > > INIT_HLIST_HEAD(&empty_rp);
> > > > > hash = hash_ptr(tk, KPROBE_HASH_BITS);
> > > > > head = &kretprobe_inst_table[hash];
> > > > > + __this_cpu_write(current_kprobe, &kretprobe_dummy);
> > > >
> > > > Can you also set the kcb->kprobe_state = KPROBE_HIT_ACTIVE?
> > > >
> > > > BTW, we may be better to introduce a common kprobe_reject_section_start()
> > > > and kprobe_reject_section_end() so that the user don't need to prepare
> > > > dummy kprobes.
> > >
> > > sure, will do
> > >
> > > thank you both for review
> >
> > ok, found out it's actually arch code.. would you guys be ok with something like below?
> >
> > jirka
> >
>
> Hi Jiri,
>
> In my origin test lockup happens on both x86_64 and ppc64le. So I would
> appreciate if you can also come up with a solution for both of the
> architectures.

aaaah right.. will update the fix

thanks,
jirka

2020-04-14 16:53:38

[permalink] [raw]

Subject: Re: [RFC] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Fri, Apr 10, 2020 at 09:31:59AM +0900, Masami Hiramatsu wrote:

SNIP

> > diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> > index 4d7022a740ab..081d0f366c99 100644
> > --- a/arch/x86/kernel/kprobes/core.c
> > +++ b/arch/x86/kernel/kprobes/core.c
> > @@ -757,12 +757,33 @@ static struct kprobe kretprobe_kprobe = {
> > .addr = (void *)kretprobe_trampoline,
> > };
> >
> > +void arch_kprobe_reject_section_start(void)
> > +{
> > + struct kprobe_ctlblk *kcb;
> > +
> > + preempt_disable();
> > +
> > + /*
> > + * Set a dummy kprobe for avoiding kretprobe recursion.
> > + * Since kretprobe never run in kprobe handler, kprobe must not
> > + * be running behind this point.
> > + */
> > + __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> > + kcb = get_kprobe_ctlblk();
> > + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> > +}
>
> Yeah, the code seems good to me.
>
> BTW, I rather like make it arch independent function so that other
> arch can use it. In this case, the dummy kprobe's addr should be
> somewhere obviously blacklisted (but it must be accessible.)
> I think get_kprobe() will be a candidate.

right.. as Ziqian noted we see this on other ppc as well

>
> And (sorry about changing my mind), the naming, I think kprobe_busy_begin()
> and kprobe_busy_end() will be better because it doesn't reject registering
> kprobes, instead, just make it busy :)

ok, will change

thanks,
jirka

2020-04-15 22:43:57

[permalink] [raw]

Subject: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

============================================
WARNING: possible recursive locking detected
5.6.0-rc6+ #6 Not tainted
--------------------------------------------
sched-messaging/2767 is trying to acquire lock:
ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

but task is already holding lock:
ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(kretprobe_table_locks[i].lock));
lock(&(kretprobe_table_locks[i].lock));

*** DEADLOCK ***

May be due to missing lock nesting notation

1 lock held by sched-messaging/2767:
#0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

stack backtrace:
CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
Call Trace:
dump_stack+0x96/0xe0
__lock_acquire.cold.57+0x173/0x2b7
? native_queued_spin_lock_slowpath+0x42b/0x9e0
? lockdep_hardirqs_on+0x590/0x590
? __lock_acquire+0xf63/0x4030
lock_acquire+0x15a/0x3d0
? kretprobe_hash_lock+0x52/0xa0
_raw_spin_lock_irqsave+0x36/0x70
? kretprobe_hash_lock+0x52/0xa0
kretprobe_hash_lock+0x52/0xa0
trampoline_handler+0xf8/0x940
? kprobe_fault_handler+0x380/0x380
? find_held_lock+0x3a/0x1c0
kretprobe_trampoline+0x25/0x50
? lock_acquired+0x392/0xbc0
? _raw_spin_lock_irqsave+0x50/0x70
? __get_valid_kprobe+0x1f0/0x1f0
? _raw_spin_unlock_irqrestore+0x3b/0x40
? finish_task_switch+0x4b9/0x6d0
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

kprobe_flush_task
kretprobe_table_lock
raw_spin_lock_irqsave
_raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

---> kretprobe_table_locks locked

kretprobe_trampoline
trampoline_handler
kretprobe_hash_lock(current, &head, &flags); <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Reported-by: "Ziqian SUN (Zamir)" <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 16 +++-------------
include/linux/kprobes.h | 4 ++++
kernel/kprobes.c | 24 ++++++++++++++++++++++++
3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..a12adbe1559d 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -753,16 +753,11 @@ asm(
NOKPROBE_SYMBOL(kretprobe_trampoline);
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);

-static struct kprobe kretprobe_kprobe = {
- .addr = (void *)kretprobe_trampoline,
-};
-
/*
* Called from kretprobe_trampoline
*/
__used __visible void *trampoline_handler(struct pt_regs *regs)
{
- struct kprobe_ctlblk *kcb;
struct kretprobe_instance *ri = NULL;
struct hlist_head *head, empty_rp;
struct hlist_node *tmp;
@@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
void *frame_pointer;
bool skipped = false;

- preempt_disable();
-
/*
* Set a dummy kprobe for avoiding kretprobe recursion.
* Since kretprobe never run in kprobe handler, kprobe must not
* be running at this point.
*/
- kcb = get_kprobe_ctlblk();
- __this_cpu_write(current_kprobe, &kretprobe_kprobe);
- kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+ kprobe_busy_begin();

INIT_HLIST_HEAD(&empty_rp);
kretprobe_hash_lock(current, &head, &flags);
@@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
__this_cpu_write(current_kprobe, &ri->rp->kp);
ri->ret_addr = correct_ret_addr;
ri->rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, &kretprobe_kprobe);
+ __this_cpu_write(current_kprobe, &kprobe_busy);
}

recycle_rp_inst(ri, &empty_rp);
@@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)

kretprobe_hash_unlock(current, &flags);

- __this_cpu_write(current_kprobe, NULL);
- preempt_enable();
+ kprobe_busy_end();

hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
hlist_del(&ri->hlist);
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..645fd401c856 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
return this_cpu_ptr(&kprobe_ctlblk);
}

+extern struct kprobe kprobe_busy;
+void kprobe_busy_begin(void);
+void kprobe_busy_end(void);
+
kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 2625c241ac00..75bb4a8458e7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1236,6 +1236,26 @@ __releases(hlist_lock)
}
NOKPROBE_SYMBOL(kretprobe_table_unlock);

+struct kprobe kprobe_busy = {
+ .addr = (void *) get_kprobe,
+};
+
+void kprobe_busy_begin(void)
+{
+ struct kprobe_ctlblk *kcb;
+
+ preempt_disable();
+ __this_cpu_write(current_kprobe, &kprobe_busy);
+ kcb = get_kprobe_ctlblk();
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+}
+
+void kprobe_busy_end(void)
+{
+ __this_cpu_write(current_kprobe, NULL);
+ preempt_enable();
+}
+
/*
* This function is called from finish_task_switch when task tk becomes dead,
* so that we can recycle any function-return probe instances associated
@@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
/* Early boot. kretprobe_table_locks not yet initialized. */
return;

+ kprobe_busy_begin();
+
INIT_HLIST_HEAD(&empty_rp);
hash = hash_ptr(tk, KPROBE_HASH_BITS);
head = &kretprobe_inst_table[hash];
@@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
hlist_del(&ri->hlist);
kfree(ri);
}
+
+ kprobe_busy_end();
}
NOKPROBE_SYMBOL(kprobe_flush_task);

--
2.18.2

2020-04-16 02:00:11

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Wed, 15 Apr 2020 11:05:07 +0200
Jiri Olsa <[email protected]> wrote:

> Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
> My test was also able to trigger lockdep output:
>
> ============================================
> WARNING: possible recursive locking detected
> 5.6.0-rc6+ #6 Not tainted
> --------------------------------------------
> sched-messaging/2767 is trying to acquire lock:
> ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0
>
> but task is already holding lock:
> ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(kretprobe_table_locks[i].lock));
> lock(&(kretprobe_table_locks[i].lock));
>
> *** DEADLOCK ***
>
> May be due to missing lock nesting notation
>
> 1 lock held by sched-messaging/2767:
> #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50
>
> stack backtrace:
> CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
> Call Trace:
> dump_stack+0x96/0xe0
> __lock_acquire.cold.57+0x173/0x2b7
> ? native_queued_spin_lock_slowpath+0x42b/0x9e0
> ? lockdep_hardirqs_on+0x590/0x590
> ? __lock_acquire+0xf63/0x4030
> lock_acquire+0x15a/0x3d0
> ? kretprobe_hash_lock+0x52/0xa0
> _raw_spin_lock_irqsave+0x36/0x70
> ? kretprobe_hash_lock+0x52/0xa0
> kretprobe_hash_lock+0x52/0xa0
> trampoline_handler+0xf8/0x940
> ? kprobe_fault_handler+0x380/0x380
> ? find_held_lock+0x3a/0x1c0
> kretprobe_trampoline+0x25/0x50
> ? lock_acquired+0x392/0xbc0
> ? _raw_spin_lock_irqsave+0x50/0x70
> ? __get_valid_kprobe+0x1f0/0x1f0
> ? _raw_spin_unlock_irqrestore+0x3b/0x40
> ? finish_task_switch+0x4b9/0x6d0
> ? __switch_to_asm+0x34/0x70
> ? __switch_to_asm+0x40/0x70
>
> The code within the kretprobe handler checks for probe reentrancy,
> so we won't trigger any _raw_spin_lock_irqsave probe in there.
>
> The problem is in outside kprobe_flush_task, where we call:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave
>
> where _raw_spin_lock_irqsave triggers the kretprobe and installs
> kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
>
> The kretprobe_trampoline handler is then executed with already
> locked kretprobe_table_locks, and first thing it does is to
> lock kretprobe_table_locks ;-) the whole lockup path like:
>
> kprobe_flush_task
> kretprobe_table_lock
> raw_spin_lock_irqsave
> _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed
>
> ---> kretprobe_table_locks locked
>
> kretprobe_trampoline
> trampoline_handler
> kretprobe_hash_lock(current, &head, &flags); <--- deadlock
>
> Adding kprobe_busy_begin/end helpers that mark code with fake
> probe installed to prevent triggering of another kprobe within
> this code.
>
> Using these helpers in kprobe_flush_task, so the probe recursion
> protection check is hit and the probe is never set to prevent
> above lockup.
>
> Reported-by: "Ziqian SUN (Zamir)" <[email protected]>
> Signed-off-by: Jiri Olsa <[email protected]>

Thanks Jiri and Ziqian!

Looks good to me.

Acked-by: Masami Hiramatsu <[email protected]>

BTW, this is a kind of bugfix. So should it add a Fixes tag?

Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
Cc: [email protected]

Thank you,

> ---
> arch/x86/kernel/kprobes/core.c | 16 +++-------------
> include/linux/kprobes.h | 4 ++++
> kernel/kprobes.c | 24 ++++++++++++++++++++++++
> 3 files changed, 31 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 4d7022a740ab..a12adbe1559d 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -753,16 +753,11 @@ asm(
> NOKPROBE_SYMBOL(kretprobe_trampoline);
> STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
>
> -static struct kprobe kretprobe_kprobe = {
> - .addr = (void *)kretprobe_trampoline,
> -};
> -
> /*
> * Called from kretprobe_trampoline
> */
> __used __visible void *trampoline_handler(struct pt_regs *regs)
> {
> - struct kprobe_ctlblk *kcb;
> struct kretprobe_instance *ri = NULL;
> struct hlist_head *head, empty_rp;
> struct hlist_node *tmp;
> @@ -772,16 +767,12 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
> void *frame_pointer;
> bool skipped = false;
>
> - preempt_disable();
> -
> /*
> * Set a dummy kprobe for avoiding kretprobe recursion.
> * Since kretprobe never run in kprobe handler, kprobe must not
> * be running at this point.
> */
> - kcb = get_kprobe_ctlblk();
> - __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> - kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> + kprobe_busy_begin();
>
> INIT_HLIST_HEAD(&empty_rp);
> kretprobe_hash_lock(current, &head, &flags);
> @@ -857,7 +848,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
> __this_cpu_write(current_kprobe, &ri->rp->kp);
> ri->ret_addr = correct_ret_addr;
> ri->rp->handler(ri, regs);
> - __this_cpu_write(current_kprobe, &kretprobe_kprobe);
> + __this_cpu_write(current_kprobe, &kprobe_busy);
> }
>
> recycle_rp_inst(ri, &empty_rp);
> @@ -873,8 +864,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
>
> kretprobe_hash_unlock(current, &flags);
>
> - __this_cpu_write(current_kprobe, NULL);
> - preempt_enable();
> + kprobe_busy_end();
>
> hlist_for_each_entry_safe(ri, tmp, &empty_rp, hlist) {
> hlist_del(&ri->hlist);
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 04bdaf01112c..645fd401c856 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -350,6 +350,10 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void)
> return this_cpu_ptr(&kprobe_ctlblk);
> }
>
> +extern struct kprobe kprobe_busy;
> +void kprobe_busy_begin(void);
> +void kprobe_busy_end(void);
> +
> kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset);
> int register_kprobe(struct kprobe *p);
> void unregister_kprobe(struct kprobe *p);
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 2625c241ac00..75bb4a8458e7 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1236,6 +1236,26 @@ __releases(hlist_lock)
> }
> NOKPROBE_SYMBOL(kretprobe_table_unlock);
>
> +struct kprobe kprobe_busy = {
> + .addr = (void *) get_kprobe,
> +};
> +
> +void kprobe_busy_begin(void)
> +{
> + struct kprobe_ctlblk *kcb;
> +
> + preempt_disable();
> + __this_cpu_write(current_kprobe, &kprobe_busy);
> + kcb = get_kprobe_ctlblk();
> + kcb->kprobe_status = KPROBE_HIT_ACTIVE;
> +}
> +
> +void kprobe_busy_end(void)
> +{
> + __this_cpu_write(current_kprobe, NULL);
> + preempt_enable();
> +}
> +
> /*
> * This function is called from finish_task_switch when task tk becomes dead,
> * so that we can recycle any function-return probe instances associated
> @@ -1253,6 +1273,8 @@ void kprobe_flush_task(struct task_struct *tk)
> /* Early boot. kretprobe_table_locks not yet initialized. */
> return;
>
> + kprobe_busy_begin();
> +
> INIT_HLIST_HEAD(&empty_rp);
> hash = hash_ptr(tk, KPROBE_HASH_BITS);
> head = &kretprobe_inst_table[hash];
> @@ -1266,6 +1288,8 @@ void kprobe_flush_task(struct task_struct *tk)
> hlist_del(&ri->hlist);
> kfree(ri);
> }
> +
> + kprobe_busy_end();
> }
> NOKPROBE_SYMBOL(kprobe_flush_task);
>
> --
> 2.18.2
>

--
Masami Hiramatsu <[email protected]>

2020-04-16 09:17:16

[permalink] [raw]

Subject: Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

On Thu, Apr 16, 2020 at 10:55:06AM +0900, Masami Hiramatsu wrote:

SNIP

> > trampoline_handler
> > kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> >
> > Adding kprobe_busy_begin/end helpers that mark code with fake
> > probe installed to prevent triggering of another kprobe within
> > this code.
> >
> > Using these helpers in kprobe_flush_task, so the probe recursion
> > protection check is hit and the probe is never set to prevent
> > above lockup.
> >
> > Reported-by: "Ziqian SUN (Zamir)" <[email protected]>
> > Signed-off-by: Jiri Olsa <[email protected]>
>
> Thanks Jiri and Ziqian!
>
> Looks good to me.
>
> Acked-by: Masami Hiramatsu <[email protected]>
>
> BTW, this is a kind of bugfix. So should it add a Fixes tag?
>
> Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
> Cc: [email protected]

ah right, do you want me to repost with those?

thanks,
jirka

2020-04-16 20:35:26

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [PATCH] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

Hi Jiri,

On Thu, 16 Apr 2020 11:13:20 +0200
Jiri Olsa <[email protected]> wrote:

> On Thu, Apr 16, 2020 at 10:55:06AM +0900, Masami Hiramatsu wrote:
>
> SNIP
>
> > > trampoline_handler
> > > kretprobe_hash_lock(current, &head, &flags); <--- deadlock
> > >
> > > Adding kprobe_busy_begin/end helpers that mark code with fake
> > > probe installed to prevent triggering of another kprobe within
> > > this code.
> > >
> > > Using these helpers in kprobe_flush_task, so the probe recursion
> > > protection check is hit and the probe is never set to prevent
> > > above lockup.
> > >
> > > Reported-by: "Ziqian SUN (Zamir)" <[email protected]>
> > > Signed-off-by: Jiri Olsa <[email protected]>
> >
> > Thanks Jiri and Ziqian!
> >
> > Looks good to me.
> >
> > Acked-by: Masami Hiramatsu <[email protected]>
> >
> > BTW, this is a kind of bugfix. So should it add a Fixes tag?
> >
> > Fixes: ef53d9c5e4da ('kprobes: improve kretprobe scalability with hashed locking')
> > Cc: [email protected]
>
> ah right, do you want me to repost with those?

Yeah, if you don't mind.

Thank you,

>
> thanks,
> jirka
>

--
Masami Hiramatsu <[email protected]>