2022-04-27 09:59:16

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 156 +++++++++++++++++-------------------------------
1 file changed, 55 insertions(+), 101 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..584d67deb3cb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
bool for_ptracer, int why)
{
struct kernel_siginfo info;
- unsigned long flags;
struct task_struct *parent;
struct sighand_struct *sighand;
+ bool lock;
u64 utime, stime;

+ assert_spin_locked(&tsk->sighand->siglock);
+
if (for_ptracer) {
parent = tsk->parent;
} else {
@@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
}

sighand = parent->sighand;
- spin_lock_irqsave(&sighand->siglock, flags);
+ lock = tsk->sighand != sighand;
+ if (lock)
+ spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2172,7 +2176,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
__wake_up_parent(tsk, parent);
- spin_unlock_irqrestore(&sighand->siglock, flags);
+ if (lock)
+ spin_unlock(&sighand->siglock);
}

/*
@@ -2193,7 +2198,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
__acquires(&current->sighand->siglock)
{
bool gstop_done = false;
- bool read_code = true;

if (arch_ptrace_stop_needed()) {
/*
@@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
spin_lock_irq(&current->sighand->siglock);
}

+ /* Don't stop if current is not ptraced */
+ if (unlikely(!current->ptrace))
+ return (clear_code) ? 0 : exit_code;
+
+ /*
+ * If @why is CLD_STOPPED, we're trapping to participate in a group
+ * stop. Do the bookkeeping. Note that if SIGCONT was delievered
+ * across siglock relocks since INTERRUPT was scheduled, PENDING
+ * could be clear now. We act as if SIGCONT is received after
+ * TASK_TRACED is entered - ignore it.
+ */
+ if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
+ gstop_done = task_participate_group_stop(current);
+
+ /*
+ * Notify parents of the stop.
+ *
+ * While ptraced, there are two parents - the ptracer and
+ * the real_parent of the group_leader. The ptracer should
+ * know about every stop while the real parent is only
+ * interested in the completion of group stop. The states
+ * for the two don't interact with each other. Notify
+ * separately unless they're gonna be duplicates.
+ */
+ do_notify_parent_cldstop(current, true, why);
+ if (gstop_done && ptrace_reparented(current))
+ do_notify_parent_cldstop(current, false, why);
+
/*
* schedule() will not sleep if there is a pending signal that
* can awaken the task.
@@ -2239,15 +2271,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
current->last_siginfo = info;
current->exit_code = exit_code;

- /*
- * If @why is CLD_STOPPED, we're trapping to participate in a group
- * stop. Do the bookkeeping. Note that if SIGCONT was delievered
- * across siglock relocks since INTERRUPT was scheduled, PENDING
- * could be clear now. We act as if SIGCONT is received after
- * TASK_TRACED is entered - ignore it.
- */
- if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
- gstop_done = task_participate_group_stop(current);

/* any trap clears pending STOP trap, STOP trap clears NOTIFY */
task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
@@ -2257,56 +2280,19 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
/* entering a trap, clear TRAPPING */
task_clear_jobctl_trapping(current);

+ /*
+ * Don't want to allow preemption here, because
+ * sys_ptrace() needs this task to be inactive.
+ *
+ * XXX: implement spin_unlock_no_resched().
+ */
+ preempt_disable();
spin_unlock_irq(&current->sighand->siglock);
- read_lock(&tasklist_lock);
- if (likely(current->ptrace)) {
- /*
- * Notify parents of the stop.
- *
- * While ptraced, there are two parents - the ptracer and
- * the real_parent of the group_leader. The ptracer should
- * know about every stop while the real parent is only
- * interested in the completion of group stop. The states
- * for the two don't interact with each other. Notify
- * separately unless they're gonna be duplicates.
- */
- do_notify_parent_cldstop(current, true, why);
- if (gstop_done && ptrace_reparented(current))
- do_notify_parent_cldstop(current, false, why);

- /*
- * Don't want to allow preemption here, because
- * sys_ptrace() needs this task to be inactive.
- *
- * XXX: implement read_unlock_no_resched().
- */
- preempt_disable();
- read_unlock(&tasklist_lock);
- cgroup_enter_frozen();
- preempt_enable_no_resched();
- freezable_schedule();
- cgroup_leave_frozen(true);
- } else {
- /*
- * By the time we got the lock, our tracer went away.
- * Don't drop the lock yet, another tracer may come.
- *
- * If @gstop_done, the ptracer went away between group stop
- * completion and here. During detach, it would have set
- * JOBCTL_STOP_PENDING on us and we'll re-enter
- * TASK_STOPPED in do_signal_stop() on return, so notifying
- * the real parent of the group stop completion is enough.
- */
- if (gstop_done)
- do_notify_parent_cldstop(current, false, why);
-
- /* tasklist protects us from ptrace_freeze_traced() */
- __set_current_state(TASK_RUNNING);
- read_code = false;
- if (clear_code)
- exit_code = 0;
- read_unlock(&tasklist_lock);
- }
+ cgroup_enter_frozen();
+ preempt_enable_no_resched();
+ freezable_schedule();
+ cgroup_leave_frozen(true);

/*
* We are back. Now reacquire the siglock before touching
@@ -2314,8 +2300,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
* any signal-sending on another CPU that wants to examine it.
*/
spin_lock_irq(&current->sighand->siglock);
- if (read_code)
- exit_code = current->exit_code;
+ exit_code = current->exit_code;
current->last_siginfo = NULL;
current->ptrace_message = 0;
current->exit_code = 0;
@@ -2444,34 +2429,17 @@ static bool do_signal_stop(int signr)
}

if (likely(!current->ptrace)) {
- int notify = 0;
-
/*
* If there are no other threads in the group, or if there
* is a group stop in progress and we are the last to stop,
- * report to the parent.
+ * report to the real_parent.
*/
if (task_participate_group_stop(current))
- notify = CLD_STOPPED;
+ do_notify_parent_cldstop(current, false, CLD_STOPPED);

set_special_state(TASK_STOPPED);
spin_unlock_irq(&current->sighand->siglock);

- /*
- * Notify the parent of the group stop completion. Because
- * we're not holding either the siglock or tasklist_lock
- * here, ptracer may attach inbetween; however, this is for
- * group stop and should always be delivered to the real
- * parent of the group leader. The new ptracer will get
- * its notification when this task transitions into
- * TASK_TRACED.
- */
- if (notify) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, false, notify);
- read_unlock(&tasklist_lock);
- }
-
/* Now we don't run again until woken by SIGCONT or SIGKILL */
cgroup_enter_frozen();
freezable_schedule();
@@ -2665,8 +2633,6 @@ bool get_signal(struct ksignal *ksig)

signal->flags &= ~SIGNAL_CLD_MASK;

- spin_unlock_irq(&sighand->siglock);
-
/*
* Notify the parent that we're continuing. This event is
* always per-process and doesn't make whole lot of sense
@@ -2675,15 +2641,10 @@ bool get_signal(struct ksignal *ksig)
* the ptracer of the group leader too unless it's gonna be
* a duplicate.
*/
- read_lock(&tasklist_lock);
do_notify_parent_cldstop(current, false, why);
-
if (ptrace_reparented(current->group_leader))
do_notify_parent_cldstop(current->group_leader,
true, why);
- read_unlock(&tasklist_lock);
-
- goto relock;
}

for (;;) {
@@ -2940,7 +2901,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)

void exit_signals(struct task_struct *tsk)
{
- int group_stop = 0;
sigset_t unblocked;

/*
@@ -2971,21 +2931,15 @@ void exit_signals(struct task_struct *tsk)
signotset(&unblocked);
retarget_shared_pending(tsk, &unblocked);

- if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
- task_participate_group_stop(tsk))
- group_stop = CLD_STOPPED;
-out:
- spin_unlock_irq(&tsk->sighand->siglock);
-
/*
* If group stop has completed, deliver the notification. This
* should always go to the real parent of the group leader.
*/
- if (unlikely(group_stop)) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(tsk, false, group_stop);
- read_unlock(&tasklist_lock);
- }
+ if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
+ task_participate_group_stop(tsk))
+ do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+out:
+ spin_unlock_irq(&tsk->sighand->siglock);
}

/*
--
2.35.3


2022-04-27 14:39:14

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On 04/26, Eric W. Biederman wrote:
>
> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
> }
>
> sighand = parent->sighand;
> - spin_lock_irqsave(&sighand->siglock, flags);
> + lock = tsk->sighand != sighand;
> + if (lock)
> + spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);

But why is it safe?

Suppose we have two tasks, they both trace each other, both call
ptrace_stop() at the same time. Of course this is ugly, they both
will block.

But with this patch in this case we have the trivial ABBA deadlock,
no?

Oleg.

2022-04-27 14:49:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

Oleg Nesterov <[email protected]> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>> }
>>
>> sighand = parent->sighand;
>> - spin_lock_irqsave(&sighand->siglock, flags);
>> + lock = tsk->sighand != sighand;
>> + if (lock)
>> + spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>
> But why is it safe?
>
> Suppose we have two tasks, they both trace each other, both call
> ptrace_stop() at the same time. Of course this is ugly, they both
> will block.
>
> But with this patch in this case we have the trivial ABBA deadlock,
> no?

I was thinking in terms of the process tree (which is fine).

The ptrace parental relationship definitely has the potential to be a
graph with cycles. Which as you point out is not fine.


The result is very nice and I don't want to give it up. I suspect
something ptrace cycles are always a problem and can simply be
forbidden. That is going to take some analsysis and some additional
code in ptrace_attach.

I will go look at that.

Eric

2022-04-27 15:11:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On 04/27, Eric W. Biederman wrote:
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles. Which as you point out is not fine.
>
> The result is very nice and I don't want to give it up. I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden.

OK, please consider another case.

We have a parent P and its child C. C traces P.

This is not that unusual, I don't think we can forbid this case.

P reports an event and calls do_notify_parent_cldstop().

C receives SIGSTOP and calls do_notify_parent_cldstop() too.

Oleg.

2022-04-27 15:17:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

"Eric W. Biederman" <[email protected]> writes:

> Oleg Nesterov <[email protected]> writes:
>
>> On 04/26, Eric W. Biederman wrote:
>>>
>>> @@ -2164,7 +2166,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
>>> }
>>>
>>> sighand = parent->sighand;
>>> - spin_lock_irqsave(&sighand->siglock, flags);
>>> + lock = tsk->sighand != sighand;
>>> + if (lock)
>>> + spin_lock_nested(&sighand->siglock, SINGLE_DEPTH_NESTING);
>>
>> But why is it safe?
>>
>> Suppose we have two tasks, they both trace each other, both call
>> ptrace_stop() at the same time. Of course this is ugly, they both
>> will block.
>>
>> But with this patch in this case we have the trivial ABBA deadlock,
>> no?
>
> I was thinking in terms of the process tree (which is fine).
>
> The ptrace parental relationship definitely has the potential to be a
> graph with cycles. Which as you point out is not fine.
>
>
> The result is very nice and I don't want to give it up. I suspect
> something ptrace cycles are always a problem and can simply be
> forbidden. That is going to take some analsysis and some additional
> code in ptrace_attach.
>
> I will go look at that.


Hmm. If we have the following process tree.

A
\
B
\
C

Process A, B, and C are all in the same process group.
Process A and B are setup to receive SIGCHILD when
their process stops.

Process C traces process A.

When a sigstop is delivered to the group we can have:

Process B takes siglock(B) siglock(A) to notify the real_parent
Process C takes siglock(C) siglock(B) to notify the real_parent
Process A takes siglock(A) siglock(C) to notify the tracer

If they all take their local lock at the same time there is
a deadlock.

I don't think the restriction that you can never ptrace anyone
up the process tree is going to fly. So it looks like I am back to the
drawing board for this one.

Eric







2022-04-27 15:25:18

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On 04/26, Eric W. Biederman wrote:
>
> @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> spin_lock_irq(&current->sighand->siglock);
> }
>
> + /* Don't stop if current is not ptraced */
> + if (unlikely(!current->ptrace))
> + return (clear_code) ? 0 : exit_code;
> +
> + /*
> + * If @why is CLD_STOPPED, we're trapping to participate in a group
> + * stop. Do the bookkeeping. Note that if SIGCONT was delievered
> + * across siglock relocks since INTERRUPT was scheduled, PENDING
> + * could be clear now. We act as if SIGCONT is received after
> + * TASK_TRACED is entered - ignore it.
> + */
> + if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> + gstop_done = task_participate_group_stop(current);
> +
> + /*
> + * Notify parents of the stop.
> + *
> + * While ptraced, there are two parents - the ptracer and
> + * the real_parent of the group_leader. The ptracer should
> + * know about every stop while the real parent is only
> + * interested in the completion of group stop. The states
> + * for the two don't interact with each other. Notify
> + * separately unless they're gonna be duplicates.
> + */
> + do_notify_parent_cldstop(current, true, why);
> + if (gstop_done && ptrace_reparented(current))
> + do_notify_parent_cldstop(current, false, why);

This doesn't look right too. The parent should be notified only after
we set __state = TASK_TRACED and ->exit code.

Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
wakes it up, debugger calls wait_task_stopped() and then it will sleep
again, task_stopped_code() returns 0.

This can be probably fixed if you remove the lockless (fast path)
task_stopped_code() check in wait_task_stopped(), but this is not
nice performance-wise...

Oleg.

2022-04-27 15:29:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On 04/27, Oleg Nesterov wrote:
>
> On 04/26, Eric W. Biederman wrote:
> >
> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> > spin_lock_irq(&current->sighand->siglock);
> > }
> >
> > + /* Don't stop if current is not ptraced */
> > + if (unlikely(!current->ptrace))
> > + return (clear_code) ? 0 : exit_code;
> > +
> > + /*
> > + * If @why is CLD_STOPPED, we're trapping to participate in a group
> > + * stop. Do the bookkeeping. Note that if SIGCONT was delievered
> > + * across siglock relocks since INTERRUPT was scheduled, PENDING
> > + * could be clear now. We act as if SIGCONT is received after
> > + * TASK_TRACED is entered - ignore it.
> > + */
> > + if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
> > + gstop_done = task_participate_group_stop(current);
> > +
> > + /*
> > + * Notify parents of the stop.
> > + *
> > + * While ptraced, there are two parents - the ptracer and
> > + * the real_parent of the group_leader. The ptracer should
> > + * know about every stop while the real parent is only
> > + * interested in the completion of group stop. The states
> > + * for the two don't interact with each other. Notify
> > + * separately unless they're gonna be duplicates.
> > + */
> > + do_notify_parent_cldstop(current, true, why);
> > + if (gstop_done && ptrace_reparented(current))
> > + do_notify_parent_cldstop(current, false, why);
>
> This doesn't look right too. The parent should be notified only after
> we set __state = TASK_TRACED and ->exit code.
>
> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
> wakes it up, debugger calls wait_task_stopped() and then it will sleep
> again, task_stopped_code() returns 0.
>
> This can be probably fixed if you remove the lockless (fast path)
> task_stopped_code() check in wait_task_stopped(), but this is not
> nice performance-wise...

On the other hand, I don't understand why did you move the callsite
of do_notify_parent_cldstop() up... just don't do this?

Oleg.

2022-04-27 23:00:23

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

Oleg Nesterov <[email protected]> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/26, Eric W. Biederman wrote:
>> >
>> > @@ -2209,6 +2213,34 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> > spin_lock_irq(&current->sighand->siglock);
>> > }
>> >
>> > + /* Don't stop if current is not ptraced */
>> > + if (unlikely(!current->ptrace))
>> > + return (clear_code) ? 0 : exit_code;
>> > +
>> > + /*
>> > + * If @why is CLD_STOPPED, we're trapping to participate in a group
>> > + * stop. Do the bookkeeping. Note that if SIGCONT was delievered
>> > + * across siglock relocks since INTERRUPT was scheduled, PENDING
>> > + * could be clear now. We act as if SIGCONT is received after
>> > + * TASK_TRACED is entered - ignore it.
>> > + */
>> > + if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
>> > + gstop_done = task_participate_group_stop(current);
>> > +
>> > + /*
>> > + * Notify parents of the stop.
>> > + *
>> > + * While ptraced, there are two parents - the ptracer and
>> > + * the real_parent of the group_leader. The ptracer should
>> > + * know about every stop while the real parent is only
>> > + * interested in the completion of group stop. The states
>> > + * for the two don't interact with each other. Notify
>> > + * separately unless they're gonna be duplicates.
>> > + */
>> > + do_notify_parent_cldstop(current, true, why);
>> > + if (gstop_done && ptrace_reparented(current))
>> > + do_notify_parent_cldstop(current, false, why);
>>
>> This doesn't look right too. The parent should be notified only after
>> we set __state = TASK_TRACED and ->exit code.
>>
>> Suppose that debugger sleeps in do_wait(). do_notify_parent_cldstop()
>> wakes it up, debugger calls wait_task_stopped() and then it will sleep
>> again, task_stopped_code() returns 0.
>>
>> This can be probably fixed if you remove the lockless (fast path)
>> task_stopped_code() check in wait_task_stopped(), but this is not
>> nice performance-wise...

Another detail I have overlooked. Thank you.

Or we can change task_stopped_code look something like:

static int *task_stopped_code(struct task_struct *p, bool ptrace)
{
if (ptrace) {
- if (task_is_traced(p) && !(p->jobctl & JOBCTL_LISTENING))
+ if (p->ptrace && !(p->jobctl & JOBCTL_LISTENING))
return &p->exit_code;
} else {
if (p->signal->flags & SIGNAL_STOP_STOPPED)
return &p->signal->group_exit_code;
}
return NULL;
}

I probably need to do a little bit more to ensure that it isn't an
actual process exit_code in p->exit_code. But the we don't have to
limit ourselves to being precisely in the task_is_traced stopped place
for the fast path.


> On the other hand, I don't understand why did you move the callsite
> of do_notify_parent_cldstop() up... just don't do this?

My goal and I still think it makes sense (if not my implementation)
is to move set_special_state as close as possible to schedule().

That way we can avoid sleeping spin_locks clobbering it and making
our life difficult.

My hope is we can just clean up ptrace_stop instead of making it more
complicated and harder to follow. Not that I am fundamentally opposed
to the quiesce bit but the code is already very hard to follow because
of all it's nuance and complexity, and I would really like to reduce
that complexity if we can possibly figure out how.

Eric


2022-04-29 06:32:21

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

"Eric W. Biederman" <[email protected]> writes:

> Peter Zijlstra <[email protected]> writes:
>
>> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>>
>>> Hmm. If we have the following process tree.
>>>
>>> A
>>> \
>>> B
>>> \
>>> C
>>>
>>> Process A, B, and C are all in the same process group.
>>> Process A and B are setup to receive SIGCHILD when
>>> their process stops.
>>>
>>> Process C traces process A.
>>>
>>> When a sigstop is delivered to the group we can have:
>>>
>>> Process B takes siglock(B) siglock(A) to notify the real_parent
>>> Process C takes siglock(C) siglock(B) to notify the real_parent
>>> Process A takes siglock(A) siglock(C) to notify the tracer
>>>
>>> If they all take their local lock at the same time there is
>>> a deadlock.
>>>
>>> I don't think the restriction that you can never ptrace anyone
>>> up the process tree is going to fly. So it looks like I am back to the
>>> drawing board for this one.
>>
>> I've not had time to fully appreciate the nested locking here, but if it
>> is possible to rework things to always take both locks at the same time,
>> then it would be possible to impose an arbitrary lock order on things
>> and break the cycle that way.
>>
>> That is, simply order the locks by their heap address or something:
>>
>> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
>> {
>> if (sh1 > sh2)
>> swap(sh1, sh2)
>>
>> spin_lock_irq(&sh1->siglock);
>> spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
>> }
>
> You know it might be. Especially given that the existing code is
> already dropping siglock and grabbing tasklist_lock.
>
> It would take a potentially triple lock function to lock
> the task it's real_parent and it's tracer (aka parent).
>
> That makes this possible to consider is that notifying the ``parents''
> is a fundamental part of the operation so we know we are going to
> need the lock so we can move it up.
>
> Throw in a pinch of lock_task_sighand and the triple lock function
> gets quite interesting.
>
> It is certainly worth trying, and I will.

To my surprise it doesn't look too bad. The locking simplifications and
not using a lock as big as tasklist_lock probably make it even worth
doing.

I need to sleep on it and look at everything again. In the
meantime here is my function that comes in with siglock held,
possibly drops it, and grabs the other two locks all in
order.

static void lock_parents_siglocks(bool lock_tracer)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
__acquires(&current->real_parent->sighand->siglock)
__acquires(&current->parent->sighand->siglock)
{
struct task_struct *me = current;
struct sighand_struct *m_sighand = me->sighand;

lockdep_assert_held(&m_sighand->siglock);

rcu_read_lock();
for (;;) {
struct task_struct *parent, *tracer;
struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;

parent = me->real_parent;
tracer = lock_tracer? me->parent : parent;

p_sighand = rcu_dereference(parent->sighand);
t_sighand = rcu_dereference(tracer->sighand);

/* Sort the sighands so that s1 >= s2 >= s3 */
s1 = m_sighand;
s2 = p_sighand;
s3 = t_sighand;
if (s1 > s2)
swap(s1, s2);
if (s1 > s3)
swap(s1, s3);
if (s2 > s3)
swap(s2, s3);

if (s1 != m_sighand) {
spin_unlock(&m_sighand->siglock);
spin_lock(&s1->siglock);
}

if (s1 != s2)
spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
if (s2 != s3)
spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);

if (likely((me->real_parent == parent) &&
(me->parent == tracer) &&
(parent->sighand == p_sighand) &&
(tracer->sighand == t_sighand))) {
break;
}
spin_unlock(&p_sighand->siglock);
if (t_sighand != p_sighand)
spin_unlock(&t_sighand->siglock);
continue;
}
rcu_read_unlock();
}

Eric

2022-04-29 13:01:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On Tue, Apr 26, 2022 at 05:52:08PM -0500, Eric W. Biederman wrote:
> Now that siglock keeps tsk->parent and tsk->real_parent constant
> require that do_notify_parent_cldstop is called with tsk->siglock held
> instead of the tasklist_lock.
>
> As all of the callers of do_notify_parent_cldstop had to drop the
> siglock and take tasklist_lock this simplifies all of it's callers.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
> kernel/signal.c | 156 +++++++++++++++++-------------------------------
> 1 file changed, 55 insertions(+), 101 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 72d96614effc..584d67deb3cb 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2121,11 +2121,13 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
> bool for_ptracer, int why)
> {
> struct kernel_siginfo info;
> - unsigned long flags;
> struct task_struct *parent;
> struct sighand_struct *sighand;
> + bool lock;
> u64 utime, stime;
>
> + assert_spin_locked(&tsk->sighand->siglock);

lockdep_assert_held() please...

2022-04-29 13:43:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:

> Hmm. If we have the following process tree.
>
> A
> \
> B
> \
> C
>
> Process A, B, and C are all in the same process group.
> Process A and B are setup to receive SIGCHILD when
> their process stops.
>
> Process C traces process A.
>
> When a sigstop is delivered to the group we can have:
>
> Process B takes siglock(B) siglock(A) to notify the real_parent
> Process C takes siglock(C) siglock(B) to notify the real_parent
> Process A takes siglock(A) siglock(C) to notify the tracer
>
> If they all take their local lock at the same time there is
> a deadlock.
>
> I don't think the restriction that you can never ptrace anyone
> up the process tree is going to fly. So it looks like I am back to the
> drawing board for this one.

I've not had time to fully appreciate the nested locking here, but if it
is possible to rework things to always take both locks at the same time,
then it would be possible to impose an arbitrary lock order on things
and break the cycle that way.

That is, simply order the locks by their heap address or something:

static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
{
if (sh1 > sh2)
swap(sh1, sh2)

spin_lock_irq(&sh1->siglock);
spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
}

2022-04-29 18:07:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On 04/28, Peter Zijlstra wrote:
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.

This is clear, but this is not that simple.

For example (with this series at least), ptrace_stop() already holds
current->sighand->siglock which (in particular) we need to protect
current->parent, but then we need current->parent->sighand->siglock
in do_notify_parent_cldstop().

Oleg.

2022-04-30 15:31:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

On Thu, Apr 28, 2022 at 03:49:11PM -0500, Eric W. Biederman wrote:

> static void lock_parents_siglocks(bool lock_tracer)
> __releases(&current->sighand->siglock)
> __acquires(&current->sighand->siglock)
> __acquires(&current->real_parent->sighand->siglock)
> __acquires(&current->parent->sighand->siglock)
> {
> struct task_struct *me = current;
> struct sighand_struct *m_sighand = me->sighand;
>
> lockdep_assert_held(&m_sighand->siglock);
>
> rcu_read_lock();
> for (;;) {
> struct task_struct *parent, *tracer;
> struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
>
> parent = me->real_parent;
> tracer = lock_tracer? me->parent : parent;
>
> p_sighand = rcu_dereference(parent->sighand);
> t_sighand = rcu_dereference(tracer->sighand);
>
> /* Sort the sighands so that s1 >= s2 >= s3 */
> s1 = m_sighand;
> s2 = p_sighand;
> s3 = t_sighand;
> if (s1 > s2)
> swap(s1, s2);
> if (s1 > s3)
> swap(s1, s3);
> if (s2 > s3)
> swap(s2, s3);
>
> if (s1 != m_sighand) {
> spin_unlock(&m_sighand->siglock);
> spin_lock(&s1->siglock);
> }
>
> if (s1 != s2)
> spin_lock_nested(&s2->siglock, SIGLOCK_LOCK_SECOND);
> if (s2 != s3)
> spin_lock_nested(&s3->siglock, SIGLOCK_LOCK_THIRD);
>

Might as well just use 1 and 2 for subclass at this point, or use
SIGLOCK_LOCK_FIRST below.

> if (likely((me->real_parent == parent) &&
> (me->parent == tracer) &&
> (parent->sighand == p_sighand) &&
> (tracer->sighand == t_sighand))) {
> break;
> }
> spin_unlock(&p_sighand->siglock);
> if (t_sighand != p_sighand)
> spin_unlock(&t_sighand->siglock);

Indent fail above ^, also you likey need this:

/*
* Since [pt]_sighand will likely change if we go
* around, and m_sighand is the only one held, make sure
* it is subclass-0, since the above 's1 != m_sighand'
* clause very much relies on that.
*/
lock_set_subclass(&m_sighand->siglock, 0, _RET_IP_);

> continue;
> }
> rcu_read_unlock();
> }
>
> Eric

2022-05-03 00:38:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 6/9] signal: Always call do_notify_parent_cldstop with siglock held

Peter Zijlstra <[email protected]> writes:

> On Wed, Apr 27, 2022 at 09:47:10AM -0500, Eric W. Biederman wrote:
>
>> Hmm. If we have the following process tree.
>>
>> A
>> \
>> B
>> \
>> C
>>
>> Process A, B, and C are all in the same process group.
>> Process A and B are setup to receive SIGCHILD when
>> their process stops.
>>
>> Process C traces process A.
>>
>> When a sigstop is delivered to the group we can have:
>>
>> Process B takes siglock(B) siglock(A) to notify the real_parent
>> Process C takes siglock(C) siglock(B) to notify the real_parent
>> Process A takes siglock(A) siglock(C) to notify the tracer
>>
>> If they all take their local lock at the same time there is
>> a deadlock.
>>
>> I don't think the restriction that you can never ptrace anyone
>> up the process tree is going to fly. So it looks like I am back to the
>> drawing board for this one.
>
> I've not had time to fully appreciate the nested locking here, but if it
> is possible to rework things to always take both locks at the same time,
> then it would be possible to impose an arbitrary lock order on things
> and break the cycle that way.
>
> That is, simply order the locks by their heap address or something:
>
> static void double_siglock_irq(struct sighand *sh1, struct sighand2 *sh2)
> {
> if (sh1 > sh2)
> swap(sh1, sh2)
>
> spin_lock_irq(&sh1->siglock);
> spin_lock_nested(&sh2->siglock, SINGLE_DEPTH_NESTING);
> }

You know it might be. Especially given that the existing code is
already dropping siglock and grabbing tasklist_lock.

It would take a potentially triple lock function to lock
the task it's real_parent and it's tracer (aka parent).

That makes this possible to consider is that notifying the ``parents''
is a fundamental part of the operation so we know we are going to
need the lock so we can move it up.

Throw in a pinch of lock_task_sighand and the triple lock function
gets quite interesting.

It is certainly worth trying, and I will.

Eric