2022-04-21 21:45:34

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
task->__state as much.

Due to how PREEMPT_RT is changing the rules vs task->__state with the
introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
blocking spinlock thing), the way ptrace freeze tries to do things no
longer works.

Specifically there are two problems:

- due to ->saved_state, the ->__state modification removing
TASK_WAKEKILL no longer works reliably.

- due to ->saved_state, wait_task_inactive() also no longer works
reliably.

The first problem is solved by a suggestion from Eric that instead
of changing __state, TASK_WAKEKILL be delayed.

The second problem is solved by a suggestion from Oleg; add
JOBCTL_TRACED_QUIESCE to cover the chunk of code between
set_current_state(TASK_TRACED) and schedule(), such that
ptrace_check_attach() can first wait for JOBCTL_TRACED_QUIESCE to get
cleared, and then use wait_task_inactive().

Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/sched/jobctl.h | 8 ++-
kernel/ptrace.c | 90 ++++++++++++++++++++++---------------------
kernel/sched/core.c | 5 --
kernel/signal.c | 36 ++++++++++++++---
4 files changed, 86 insertions(+), 53 deletions(-)

--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,9 +19,11 @@ struct task_struct;
#define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */
#define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT 24 /* delay killable wakeups */

-#define JOBCTL_STOPPED_BIT 24 /* do_signal_stop() */
-#define JOBCTL_TRACED_BIT 25 /* ptrace_stop() */
+#define JOBCTL_STOPPED_BIT 25 /* do_signal_stop() */
+#define JOBCTL_TRACED_BIT 26 /* ptrace_stop() */
+#define JOBCTL_TRACED_QUIESCE_BIT 27

#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
@@ -31,9 +33,11 @@ struct task_struct;
#define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT)
#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT)
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL (1UL << JOBCTL_DELAY_WAKEKILL_BIT)

#define JOBCTL_STOPPED (1UL << JOBCTL_STOPPED_BIT)
#define JOBCTL_TRACED (1UL << JOBCTL_TRACED_BIT)
+#define JOBCTL_TRACED_QUIESCE (1UL << JOBCTL_TRACED_QUIESCE_BIT)

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -193,41 +193,44 @@ static bool looks_like_a_spurious_pid(st
*/
static bool ptrace_freeze_traced(struct task_struct *task)
{
+ unsigned long flags;
bool ret = false;

/* Lockless, nobody but us can set this flag */
if (task->jobctl & JOBCTL_LISTENING)
return ret;

- spin_lock_irq(&task->sighand->siglock);
- if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
+ if (!lock_task_sighand(task, &flags))
+ return ret;
+
+ if (task_is_traced(task) &&
+ !looks_like_a_spurious_pid(task) &&
!__fatal_signal_pending(task)) {
- WRITE_ONCE(task->__state, __TASK_TRACED);
+ WARN_ON_ONCE(READ_ONCE(task->__state) != TASK_TRACED);
+ WARN_ON_ONCE(task->jobctl & JOBCTL_DELAY_WAKEKILL);
+ task->jobctl |= JOBCTL_DELAY_WAKEKILL;
ret = true;
}
- spin_unlock_irq(&task->sighand->siglock);
+ unlock_task_sighand(task, &flags);

return ret;
}

static void ptrace_unfreeze_traced(struct task_struct *task)
{
- if (READ_ONCE(task->__state) != __TASK_TRACED)
+ if (!task_is_traced(task))
return;

WARN_ON(!task->ptrace || task->parent != current);

- /*
- * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
- * Recheck state under the lock to close this race.
- */
spin_lock_irq(&task->sighand->siglock);
- if (READ_ONCE(task->__state) == __TASK_TRACED) {
+ if (task_is_traced(task)) {
+// WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
+ task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
if (__fatal_signal_pending(task)) {
task->jobctl &= ~JOBCTL_TRACED;
- wake_up_state(task, __TASK_TRACED);
- } else
- WRITE_ONCE(task->__state, TASK_TRACED);
+ wake_up_state(task, TASK_WAKEKILL);
+ }
}
spin_unlock_irq(&task->sighand->siglock);
}
@@ -251,40 +254,45 @@ static void ptrace_unfreeze_traced(struc
*/
static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
{
- int ret = -ESRCH;
+ int traced;

/*
* We take the read lock around doing both checks to close a
- * possible race where someone else was tracing our child and
- * detached between these two checks. After this locked check,
- * we are sure that this is our traced child and that can only
- * be changed by us so it's not changing right after this.
+ * possible race where someone else attaches or detaches our
+ * natural child.
*/
read_lock(&tasklist_lock);
- if (child->ptrace && child->parent == current) {
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
- /*
- * child->sighand can't be NULL, release_task()
- * does ptrace_unlink() before __exit_signal().
- */
- if (ignore_state || ptrace_freeze_traced(child))
- ret = 0;
- }
+ traced = child->ptrace && child->parent == current;
read_unlock(&tasklist_lock);
+ if (!traced)
+ return -ESRCH;

- if (!ret && !ignore_state) {
- if (!wait_task_inactive(child, __TASK_TRACED)) {
- /*
- * This can only happen if may_ptrace_stop() fails and
- * ptrace_stop() changes ->state back to TASK_RUNNING,
- * so we should not worry about leaking __TASK_TRACED.
- */
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
- ret = -ESRCH;
- }
+ if (ignore_state)
+ return 0;
+
+ if (!task_is_traced(child))
+ return -ESRCH;
+
+ WARN_ON_ONCE(READ_ONCE(child->jobctl) & JOBCTL_DELAY_WAKEKILL);
+
+ /* Wait for JOBCTL_TRACED_QUIESCE to go away, see ptrace_stop(). */
+ for (;;) {
+ if (fatal_signal_pending(current))
+ return -EINTR;
+
+ set_current_state(TASK_KILLABLE);
+ if (!(READ_ONCE(child->jobctl) & JOBCTL_TRACED_QUIESCE))
+ break;
+
+ schedule();
}
+ __set_current_state(TASK_RUNNING);

- return ret;
+ if (!wait_task_inactive(child, TASK_TRACED) ||
+ !ptrace_freeze_traced(child))
+ return -ESRCH;
+
+ return 0;
}

static bool ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
@@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
goto out_put_task_struct;

ret = arch_ptrace(child, request, addr, data);
- if (ret || request != PTRACE_DETACH)
- ptrace_unfreeze_traced(child);
+ ptrace_unfreeze_traced(child);

out_put_task_struct:
put_task_struct(child);
@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
request == PTRACE_INTERRUPT);
if (!ret) {
ret = compat_arch_ptrace(child, request, addr, data);
- if (ret || request != PTRACE_DETACH)
- ptrace_unfreeze_traced(child);
+ ptrace_unfreeze_traced(child);
}

out_put_task_struct:
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6310,10 +6310,7 @@ static void __sched notrace __schedule(u

/*
* We must load prev->state once (task_struct::state is volatile), such
- * that:
- *
- * - we form a control dependency vs deactivate_task() below.
- * - ptrace_{,un}freeze_traced() can change ->state underneath us.
+ * that we form a control dependency vs deactivate_task() below.
*/
prev_state = READ_ONCE(prev->__state);
if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -764,6 +764,10 @@ void signal_wake_up_state(struct task_st
{
lockdep_assert_held(&t->sighand->siglock);

+ /* Suppress wakekill? */
+ if (t->jobctl & JOBCTL_DELAY_WAKEKILL)
+ state &= ~TASK_WAKEKILL;
+
set_tsk_thread_flag(t, TIF_SIGPENDING);

/*
@@ -774,7 +778,7 @@ void signal_wake_up_state(struct task_st
* handle its death signal.
*/
if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
- t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+ t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);
else
kick_process(t);
}
@@ -2187,6 +2191,15 @@ static void do_notify_parent_cldstop(str
spin_unlock_irqrestore(&sighand->siglock, flags);
}

+static void clear_traced_quiesce(void)
+{
+ spin_lock_irq(&current->sighand->siglock);
+ WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
+ current->jobctl &= ~JOBCTL_TRACED_QUIESCE;
+ wake_up_state(current->parent, TASK_KILLABLE);
+ spin_unlock_irq(&current->sighand->siglock);
+}
+
/*
* This must be called with current->sighand->siglock held.
*
@@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
* schedule() will not sleep if there is a pending signal that
* can awaken the task.
*/
- current->jobctl |= JOBCTL_TRACED;
+ current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
set_special_state(TASK_TRACED);

/*
@@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
/*
* Don't want to allow preemption here, because
* sys_ptrace() needs this task to be inactive.
- *
- * XXX: implement read_unlock_no_resched().
*/
preempt_disable();
read_unlock(&tasklist_lock);
- cgroup_enter_frozen();
+ cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
+
+ /*
+ * JOBCTL_TRACE_QUIESCE bridges the gap between
+ * set_current_state(TASK_TRACED) above and schedule() below.
+ * There must not be any blocking (specifically anything that
+ * touched ->saved_state on PREEMPT_RT) between here and
+ * schedule().
+ *
+ * ptrace_check_attach() relies on this with its
+ * wait_task_inactive() usage.
+ */
+ clear_traced_quiesce();
+
preempt_enable_no_resched();
freezable_schedule();
+
cgroup_leave_frozen(true);
} else {
/*
@@ -2335,6 +2360,7 @@ static int ptrace_stop(int exit_code, in

/* LISTENING can be set only during STOP traps, clear it */
current->jobctl &= ~JOBCTL_LISTENING;
+ current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

/*
* Queued signals ignored us while we were stopped for tracing.



2022-04-22 02:36:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/21, Peter Zijlstra wrote:
>
> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.

Looks good after the quick glance... but to me honest I got lost and
I need to apply these patches and read the code carefully.

However, I am not able to do this until Monday, sorry.

Just one nit for now,

> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
> - if (READ_ONCE(task->__state) != __TASK_TRACED)
> + if (!task_is_traced(task))
> return;
>
> WARN_ON(!task->ptrace || task->parent != current);
>
> - /*
> - * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> - * Recheck state under the lock to close this race.
> - */
> spin_lock_irq(&task->sighand->siglock);
> - if (READ_ONCE(task->__state) == __TASK_TRACED) {
> + if (task_is_traced(task)) {

I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
I think a single lockless

if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
return;

at the start should be enough?

Nobody else can set this flag. It can be cleared by the tracee if it was
woken up, so perhaps we can check it again but afaics this is not strictly
needed.

> +// WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));

Did you really want to add the commented WARN_ON_ONCE?

Oleg.

2022-04-22 19:39:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On Thu, Apr 21, 2022 at 08:23:26PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> > task->__state as much.
>
> Looks good after the quick glance... but to me honest I got lost and
> I need to apply these patches and read the code carefully.
>
> However, I am not able to do this until Monday, sorry.

Sure, no worries. Take your time.

> Just one nit for now,
>
> > static void ptrace_unfreeze_traced(struct task_struct *task)
> > {
> > - if (READ_ONCE(task->__state) != __TASK_TRACED)
> > + if (!task_is_traced(task))
> > return;
> >
> > WARN_ON(!task->ptrace || task->parent != current);
> >
> > - /*
> > - * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> > - * Recheck state under the lock to close this race.
> > - */
> > spin_lock_irq(&task->sighand->siglock);
> > - if (READ_ONCE(task->__state) == __TASK_TRACED) {
> > + if (task_is_traced(task)) {
>
> I think ptrace_unfreeze_traced() should not use task_is_traced() at all.
> I think a single lockless
>
> if (task->jobctl & JOBCTL_DELAY_WAKEKILL)
> return;
>
> at the start should be enough?

I think so. That is indeed cleaner. I'll make the change if I don't see
anything wrong with it in the morning when the brain has woken up again
;-)

>
> Nobody else can set this flag. It can be cleared by the tracee if it was
> woken up, so perhaps we can check it again but afaics this is not strictly
> needed.
>
> > +// WARN_ON_ONCE(!(task->jobctl & JOBCTL_DELAY_WAKEKILL));
>
> Did you really want to add the commented WARN_ON_ONCE?

I did that because:

@@ -1472,8 +1479,7 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_lo
request == PTRACE_INTERRUPT);
if (!ret) {
ret = compat_arch_ptrace(child, request, addr, data);
- if (ret || request != PTRACE_DETACH)
- ptrace_unfreeze_traced(child);
+ ptrace_unfreeze_traced(child);
}

Can now call unfreeze too often. I left the comment in because I need to
think more about why Eric did that and see if it really is needed.

2022-04-22 21:03:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Peter Zijlstra <[email protected]> writes:

> Rework ptrace_check_attach() / ptrace_unfreeze_traced() to not rely on
> task->__state as much.
>
> Due to how PREEMPT_RT is changing the rules vs task->__state with the
> introduction of task->saved_state while TASK_RTLOCK_WAIT (the whole
> blocking spinlock thing), the way ptrace freeze tries to do things no
> longer works.


The problem with ptrace_stop and do_signal_stop that requires dropping
siglock and grabbing tasklist_lock is that do_notify_parent_cldstop
needs tasklist_lock to keep parent and real_parent stable.

With just some very modest code changes it looks like we can use
a processes own siglock to keep parent and real_parent stable. The
siglock is already acquired in all of those places it is just not held
over the changing parent and real_parent.

Then make a rule that a child's siglock must be grabbed before a parents
siglock and do_notify_parent_cldstop can be always be called under the
childs siglock.

This means ptrace_stop can be significantly simplified, and the
notifications can be moved far enough up that set_special_state
can be called after do_notify_parent_cldstop. With the result
that there is simply no PREEMPT_RT issue to worry about and
wait_task_inactive can be used as is.

I remember Oleg suggesting a change something like this a long
time ago.


I need to handle the case where the parent and the child share
the same sighand but that is just remembering to handle it in
do_notify_parent_cldstop, as the handling is simply not taking
the lock twice.

I am going to play with that and see if I there are any gotcha's
I missed when looking through the code.

Eric

2022-04-25 22:57:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/21, Peter Zijlstra wrote:
>
> +static void clear_traced_quiesce(void)
> +{
> + spin_lock_irq(&current->sighand->siglock);
> + WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));

This WARN_ON_ONCE() doesn't look right, the task can be killed right
after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
drops siglock.

> @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
> /*
> * Don't want to allow preemption here, because
> * sys_ptrace() needs this task to be inactive.
> - *
> - * XXX: implement read_unlock_no_resched().
> */
> preempt_disable();
> read_unlock(&tasklist_lock);
> - cgroup_enter_frozen();
> + cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> +
> + /*
> + * JOBCTL_TRACE_QUIESCE bridges the gap between
> + * set_current_state(TASK_TRACED) above and schedule() below.
> + * There must not be any blocking (specifically anything that
> + * touched ->saved_state on PREEMPT_RT) between here and
> + * schedule().
> + *
> + * ptrace_check_attach() relies on this with its
> + * wait_task_inactive() usage.
> + */
> + clear_traced_quiesce();

Well, I think it should be called earlier under tasklist_lock,
before preempt_disable() above.

We need tasklist_lock to protect ->parent, debugger can be killed
and go away right after read_unlock(&tasklist_lock).

Still trying to convince myself everything is right with
JOBCTL_STOPPED/TRACED ...

Oleg.

2022-04-26 00:49:03

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/21, Peter Zijlstra wrote:
>
> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
> * schedule() will not sleep if there is a pending signal that
> * can awaken the task.
> */
> - current->jobctl |= JOBCTL_TRACED;
> + current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
> set_special_state(TASK_TRACED);

OK, this looks wrong. I actually mean the previous patch which sets
JOBCTL_TRACED.

The problem is that the tracee can be already killed, so that
fatal_signal_pending(current) is true. In this case we can't rely on
signal_wake_up_state() which should clear JOBCTL_TRACED, or the
callers of ptrace_signal_wake_up/etc which clear this flag by hand.

In this case schedule() won't block and ptrace_stop() will leak
JOBCTL_TRACED. Unless I missed something.

We could check fatal_signal_pending() and damn! this is what I think
ptrace_stop() should have done from the very beginning. But for now
I'd suggest to simply clear this flag before return, along with
DELAY_WAKEKILL and LISTENING.

> current->jobctl &= ~JOBCTL_LISTENING;
> + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

current->jobctl &=
~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

Oleg.

2022-04-26 03:38:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Peter Zijlstra <[email protected]> writes:

> On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
>> On 04/21, Peter Zijlstra wrote:
>> >
>> > +static void clear_traced_quiesce(void)
>> > +{
>> > + spin_lock_irq(&current->sighand->siglock);
>> > + WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
>>
>> This WARN_ON_ONCE() doesn't look right, the task can be killed right
>> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
>> drops siglock.
>
> OK, will look at that.
>
>> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
>> > /*
>> > * Don't want to allow preemption here, because
>> > * sys_ptrace() needs this task to be inactive.
>> > - *
>> > - * XXX: implement read_unlock_no_resched().
>> > */
>> > preempt_disable();
>> > read_unlock(&tasklist_lock);
>> > - cgroup_enter_frozen();
>> > + cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
>> > +
>> > + /*
>> > + * JOBCTL_TRACE_QUIESCE bridges the gap between
>> > + * set_current_state(TASK_TRACED) above and schedule() below.
>> > + * There must not be any blocking (specifically anything that
>> > + * touched ->saved_state on PREEMPT_RT) between here and
>> > + * schedule().
>> > + *
>> > + * ptrace_check_attach() relies on this with its
>> > + * wait_task_inactive() usage.
>> > + */
>> > + clear_traced_quiesce();
>>
>> Well, I think it should be called earlier under tasklist_lock,
>> before preempt_disable() above.
>>
>> We need tasklist_lock to protect ->parent, debugger can be killed
>> and go away right after read_unlock(&tasklist_lock).
>>
>> Still trying to convince myself everything is right with
>> JOBCTL_STOPPED/TRACED ...
>
> Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
> use ->saved_state).

There are some other issues in this part of ptrace_stop().


I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".


Currently in ptrace_check_attach a parameter of __TASK_TRACED is passed
so that wait_task_inactive cane fail if the "!current->ptrace" branch
of ptrace_stop is take and ptrace_stop does not stop. With the
TASK_FROZEN state it appears that "!current->ptrace" branch can continue
and freeze somewhere else and wait_task_inactive could decided it was
fine.


I have to run, but hopefully tommorrow I will post the patches that
remove the "!current->ptrace" case altogether and basically
remove the need for quiesce and wait_task_inactive detecting
which branch is taken.

The spinlock in cgroup_enter_frozen remains an issue for PREEMPT_RT.
But the rest of the issues are cleared up by using siglock instead
of tasklist_lock. Plus the code is just easier to read and understand.

Eric

2022-04-26 06:35:02

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/25, Eric W. Biederman wrote:
>
> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".

As Peter explained, in this case we can rely on __ptrace_unlink() which
should clear this flag.

Oleg.

2022-04-26 10:44:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On Mon, Apr 25, 2022 at 04:35:37PM +0200, Oleg Nesterov wrote:
> On 04/21, Peter Zijlstra wrote:
> >
> > +static void clear_traced_quiesce(void)
> > +{
> > + spin_lock_irq(&current->sighand->siglock);
> > + WARN_ON_ONCE(!(current->jobctl & JOBCTL_TRACED_QUIESCE));
>
> This WARN_ON_ONCE() doesn't look right, the task can be killed right
> after ptrace_stop() sets JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE and
> drops siglock.

OK, will look at that.

> > @@ -2290,14 +2303,26 @@ static int ptrace_stop(int exit_code, in
> > /*
> > * Don't want to allow preemption here, because
> > * sys_ptrace() needs this task to be inactive.
> > - *
> > - * XXX: implement read_unlock_no_resched().
> > */
> > preempt_disable();
> > read_unlock(&tasklist_lock);
> > - cgroup_enter_frozen();
> > + cgroup_enter_frozen(); // XXX broken on PREEMPT_RT !!!
> > +
> > + /*
> > + * JOBCTL_TRACE_QUIESCE bridges the gap between
> > + * set_current_state(TASK_TRACED) above and schedule() below.
> > + * There must not be any blocking (specifically anything that
> > + * touched ->saved_state on PREEMPT_RT) between here and
> > + * schedule().
> > + *
> > + * ptrace_check_attach() relies on this with its
> > + * wait_task_inactive() usage.
> > + */
> > + clear_traced_quiesce();
>
> Well, I think it should be called earlier under tasklist_lock,
> before preempt_disable() above.
>
> We need tasklist_lock to protect ->parent, debugger can be killed
> and go away right after read_unlock(&tasklist_lock).
>
> Still trying to convince myself everything is right with
> JOBCTL_STOPPED/TRACED ...

Can't do it earlier, since cgroup_enter_frozen() can do spinlock (eg.
use ->saved_state).

2022-04-27 08:55:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/26, Eric W. Biederman wrote:
>
> Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
> siglock is taken before calling ptrace_stop. Especially with the
> ptrace_notify in signal_delivered that does not look guaranteed.
>
> The __ptrace_unlink could also happen during arch_ptrace_stop.
>
> Relying on siglock is sufficient because __ptrace_unlink holds siglock
> over clearing task->ptrace. Which means that the simple fix for this is
> to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Or simply clear _QUEIESCE along with _TRACED/DELAY_WAKEKILL before return?

Oleg.

2022-04-27 09:23:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Oleg Nesterov <[email protected]> writes:

> On 04/25, Eric W. Biederman wrote:
>>
>> I don't see JOBCTL_TRACED_QUIESCE being cleared "if (!current->ptrace)".
>
> As Peter explained, in this case we can rely on __ptrace_unlink() which
> should clear this flag.

I had missed that that signal_wake_up_state was clearing
JOBCTL_TRACED_QUIESCE.

Relying on __ptrace_unlink assumes the __ptrace_unlink happens after
siglock is taken before calling ptrace_stop. Especially with the
ptrace_notify in signal_delivered that does not look guaranteed.

The __ptrace_unlink could also happen during arch_ptrace_stop.

Relying on siglock is sufficient because __ptrace_unlink holds siglock
over clearing task->ptrace. Which means that the simple fix for this is
to just test task->ptrace before we set JOBCTL_TRACED_QUEIESCE.

Eric

2022-04-27 10:39:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Oleg Nesterov <[email protected]> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -2225,7 +2238,7 @@ static int ptrace_stop(int exit_code, in
>> * schedule() will not sleep if there is a pending signal that
>> * can awaken the task.
>> */
>> - current->jobctl |= JOBCTL_TRACED;
>> + current->jobctl |= JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE;
>> set_special_state(TASK_TRACED);
>
> OK, this looks wrong. I actually mean the previous patch which sets
> JOBCTL_TRACED.
>
> The problem is that the tracee can be already killed, so that
> fatal_signal_pending(current) is true. In this case we can't rely on
> signal_wake_up_state() which should clear JOBCTL_TRACED, or the
> callers of ptrace_signal_wake_up/etc which clear this flag by hand.
>
> In this case schedule() won't block and ptrace_stop() will leak
> JOBCTL_TRACED. Unless I missed something.
>
> We could check fatal_signal_pending() and damn! this is what I think
> ptrace_stop() should have done from the very beginning. But for now
> I'd suggest to simply clear this flag before return, along with
> DELAY_WAKEKILL and LISTENING.

Oh. That is an interesting case for JOBCTL_TRACED. The
scheduler refuses to stop if signal_pending_state(TASK_TRACED, p)
returns true.

The ptrace_stop code used to handle this explicitly and in commit
7d613f9f72ec ("signal: Remove the bogus sigkill_pending in ptrace_stop")
I actually removed the test. As the test was somewhat wrong and
redundant, and in slightly the wrong location.

But doing:

/* Don't stop if the task is dying */
if (unlikely(__fatal_signal_pending(current)))
return exit_code;

Should work.

>
>> current->jobctl &= ~JOBCTL_LISTENING;
>> + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> current->jobctl &=
> ~(~JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);


I presume you meant:

current->jobctl &=
~(JOBCTL_TRACED | JOBCTL_DELAY_WAKEKILL | JOBCTL_LISTENING);

I don't think we want to do that. For the case you are worried about it
is a valid fix.

In general this is the wrong approach as we want the waker to clear
JOBCTL_TRACED. If the waker does not it is possible that
ptrace_freeze_traced might attempt to freeze a process whose state
is not appropriate for attach, because the code is past the call
to schedule().

In fact I think clearing JOBCTL_TRACED at the end of ptrace_stop
will allow ptrace_freeze_traced to come in while siglock is dropped,
expect the process to stop, and have the process not stop. Of
course wait_task_inactive coming first that might not be a problem.



This is a minor problem with the patchset I just posted. I thought the
only reason wait_task_inactive could fail was if ptrace_stop() hit the
!current->ptrace case. Thinking about any it any SIGKILL coming in
before tracee stops in schedule will trigger this, so it is not as
safe as I thought to not pass a state into wait_task_inactive.

It is time for me to shut down today. I will sleep on that and
see what I can see tomorrow.

Eric

2022-04-27 11:12:37

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/9] ptrace: cleaning up ptrace_stop


While looking at how ptrace is broken on PREEMPT_RT I realized
that ptrace_stop would be much simpler and more maintainable
if tsk->ptrace, tsk->parent, and tsk->real_parent were protected
by siglock. Most of the changes are general cleanups in support
of this locking change.

While making the necessary changes to protect tsk->ptrace with
siglock I discovered we have two architectures xtensa and um
that were using tsk->ptrace for what most other architectures
use TIF_SIGPENDING for and not protecting tsk->ptrace with any lock.

By the end of this series ptrace should work on PREEMPT_RT with the
CONFIG_FREEZER and CONFIG_CGROUPS disabled, by the simple fact that the
ptrace_stop code becomes less special. The function cgroup_enter_frozen
because it takes a lock which is a sleeping lock on PREEMPT_RT with
preemption disabled definitely remains a problem. Peter Zijlstra has
been rewriting the classic freezer and in earlier parts of this
discussion so I presume it is also a problem for PREEMPT_RT.

Peter's series rewriting the freezer[1] should work on top of this
series with minimal changes and patch 2/5 removed.

Eric W. Biederman (9):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
signal: Protect parent child relationships by childs siglock
signal: Always call do_notify_parent_cldstop with siglock held
ptrace: Simplify the wait_task_inactive call in ptrace_check_attach
ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
ptrace: Don't change __state

arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +-
arch/um/kernel/signal.c | 4 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched/jobctl.h | 2 +
include/linux/sched/signal.h | 3 +-
include/linux/signal.h | 3 +-
kernel/exit.c | 4 +
kernel/fork.c | 12 +--
kernel/ptrace.c | 61 ++++++-------
kernel/signal.c | 187 ++++++++++++++------------------------
kernel/time/posix-cpu-timers.c | 6 +-
17 files changed, 131 insertions(+), 184 deletions(-)

[1] https://lkml.kernel.org/r/[email protected]

Eric

2022-04-27 11:24:56

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 9/9] ptrace: Don't change __state

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead implement a new jobtl flag JOBCTL_DELAY_WAKEKILL. This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up_state drop TASK_WAKEKILL from state if TASK_WAKEKILL
is used while JOBCTL_DELAY_WAKEKILL is set. This has the same effect
as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that
use TASK_KILLABLE go through signal_wake_up except the wake_up in
ptrace_unfreeze_traced.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop. Now when woken up ptrace_stop now clears
JOBCTL_DELAY_WAKEKILL and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_DELAY_WAKEKILL.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/sched/jobctl.h | 2 ++
include/linux/sched/signal.h | 3 ++-
kernel/ptrace.c | 11 +++++------
kernel/signal.c | 1 +
4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..4e154ad8205f 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
#define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */
#define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
+#define JOBCTL_DELAY_WAKEKILL_BIT 24 /* delay killable wakeups */

#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
#define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT)
#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT)
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_DELAY_WAKEKILL (1UL << JOBCTL_DELAY_WAKEKILL_BIT)

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..1947c85aa9d9 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
- signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+ bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
+ signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
}
static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 842511ee9a9f..0bea74539320 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,7 +194,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)

if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
!__fatal_signal_pending(task)) {
- WRITE_ONCE(task->__state, __TASK_TRACED);
+ task->jobctl |= JOBCTL_DELAY_WAKEKILL;
ret = true;
}

@@ -203,7 +203,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)

static void ptrace_unfreeze_traced(struct task_struct *task)
{
- if (READ_ONCE(task->__state) != __TASK_TRACED)
+ if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
return;

WARN_ON(!task->ptrace || task->parent != current);
@@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
* Recheck state under the lock to close this race.
*/
spin_lock_irq(&task->sighand->siglock);
- if (READ_ONCE(task->__state) == __TASK_TRACED) {
+ if (task->jobctl & JOBCTL_DELAY_WAKEKILL) {
+ task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
if (__fatal_signal_pending(task))
wake_up_state(task, __TASK_TRACED);
- else
- WRITE_ONCE(task->__state, TASK_TRACED);
}
spin_unlock_irq(&task->sighand->siglock);
}
@@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
*/
if (lock_task_sighand(child, &flags)) {
if (child->ptrace && child->parent == current) {
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
+ WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
/*
* child->sighand can't be NULL, release_task()
* does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 584d67deb3cb..2b332f89cbad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,

/* LISTENING can be set only during STOP traps, clear it */
current->jobctl &= ~JOBCTL_LISTENING;
+ current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

/*
* Queued signals ignored us while we were stopped for tracing.
--
2.35.3

2022-04-27 16:12:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/26, Eric W. Biederman wrote:
>
> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
> - if (READ_ONCE(task->__state) != __TASK_TRACED)
> + if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
> return;
>
> WARN_ON(!task->ptrace || task->parent != current);
> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
> * Recheck state under the lock to close this race.
> */
> spin_lock_irq(&task->sighand->siglock);

Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
The tracee can be already woken up by ptrace_resume(), but it is possible that
it didn't clear DELAY_WAKEKILL yet.

Now, before we take ->siglock, the tracee can exit and another thread can do
wait() and reap this task.

Also, I think the comment above should be updated. I agree, it makes sense to
re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already cleared.

> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
> /* LISTENING can be set only during STOP traps, clear it */
> current->jobctl &= ~JOBCTL_LISTENING;
> + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;

minor, but

current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);

looks better.

Oleg.

2022-04-27 16:25:12

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/21, Peter Zijlstra wrote:
>
> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
> goto out_put_task_struct;
>
> ret = arch_ptrace(child, request, addr, data);
> - if (ret || request != PTRACE_DETACH)
> - ptrace_unfreeze_traced(child);
> + ptrace_unfreeze_traced(child);

Forgot to mention... whatever we do this doesn't look right.

ptrace_unfreeze_traced() must not be called if the tracee was untraced,
anothet debugger can come after that. I agree, the current code looks
a bit confusing, perhaps it makes sense to re-write it:

if (request == PTRACE_DETACH && ret == 0)
; /* nothing to do, no longer traced by us */
else
ptrace_unfreeze_traced(child);

Oleg.

2022-04-27 16:45:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/26, Eric W. Biederman wrote:
>
> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> */
> if (lock_task_sighand(child, &flags)) {
> if (child->ptrace && child->parent == current) {
> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);

This WARN_ON() doesn't look right.

It is possible that this child was traced by another task and PTRACE_DETACH'ed,
but it didn't clear DELAY_WAKEKILL.

If the new debugger attaches and calls ptrace() before the child takes siglock
ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Oleg.

2022-04-27 17:03:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 04/26, Eric W. Biederman wrote:
>>
>> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> */
>> if (lock_task_sighand(child, &flags)) {
>> if (child->ptrace && child->parent == current) {
>> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>
> This WARN_ON() doesn't look right.
>
> It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> but it didn't clear DELAY_WAKEKILL.

That would be a bug. That would mean that PTRACE_DETACHED process can
not be SIGKILL'd.

> If the new debugger attaches and calls ptrace() before the child takes siglock
> ptrace_freeze_traced() will fail, but we can hit this WARN_ON().

Eric

2022-04-27 17:45:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/27, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > On 04/26, Eric W. Biederman wrote:
> >>
> >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> >> */
> >> if (lock_task_sighand(child, &flags)) {
> >> if (child->ptrace && child->parent == current) {
> >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> >
> > This WARN_ON() doesn't look right.
> >
> > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > but it didn't clear DELAY_WAKEKILL.
>
> That would be a bug. That would mean that PTRACE_DETACHED process can
> not be SIGKILL'd.

Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
SIGKILL after that.

Oleg.

> > If the new debugger attaches and calls ptrace() before the child takes siglock
> > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
>
> Eric
>

2022-04-27 17:48:22

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/27, Oleg Nesterov wrote:
>
> On 04/27, Eric W. Biederman wrote:
> >
> > Oleg Nesterov <[email protected]> writes:
> >
> > > On 04/26, Eric W. Biederman wrote:
> > >>
> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> > >> */
> > >> if (lock_task_sighand(child, &flags)) {
> > >> if (child->ptrace && child->parent == current) {
> > >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> > >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
> > >
> > > This WARN_ON() doesn't look right.
> > >
> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
> > > but it didn't clear DELAY_WAKEKILL.
> >
> > That would be a bug. That would mean that PTRACE_DETACHED process can
> > not be SIGKILL'd.
>
> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
> SIGKILL after that.

Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oleg.

> > > If the new debugger attaches and calls ptrace() before the child takes siglock
> > > ptrace_freeze_traced() will fail, but we can hit this WARN_ON().
> >
> > Eric
> >

2022-04-27 17:57:54

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 04/27, Oleg Nesterov wrote:
>>
>> On 04/27, Eric W. Biederman wrote:
>> >
>> > Oleg Nesterov <[email protected]> writes:
>> >
>> > > On 04/26, Eric W. Biederman wrote:
>> > >>
>> > >> @@ -253,7 +252,7 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> > >> */
>> > >> if (lock_task_sighand(child, &flags)) {
>> > >> if (child->ptrace && child->parent == current) {
>> > >> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> > >> + WARN_ON(child->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > >
>> > > This WARN_ON() doesn't look right.
>> > >
>> > > It is possible that this child was traced by another task and PTRACE_DETACH'ed,
>> > > but it didn't clear DELAY_WAKEKILL.
>> >
>> > That would be a bug. That would mean that PTRACE_DETACHED process can
>> > not be SIGKILL'd.
>>
>> Why? The tracee will take siglock, clear JOBCTL_DELAY_WAKEKILL and notice
>> SIGKILL after that.
>
> Not to mention that the tracee is TASK_RUNNING after PTRACE_DETACH wakes it
> up, so the pending JOBCTL_DELAY_WAKEKILL simply has no effect.

Oh. You are talking about the window when between clearing the
traced state and when tracee resumes executing and clears
JOBCTL_DELAY_WAKEKILL.

I thought you were thinking about JOBCTL_DELAY_WAKEKILL being leaked.

That requires both ptrace_attach and ptrace_check_attach for the new
tracer to happen before the tracee is scheduled to run.

I agree. I think the WARN_ON could reasonably be moved a bit later,
but I don't know that the WARN_ON is important. I simply kept it because
it seemed to make sense.

Eric

2022-04-27 23:00:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Oleg Nesterov <[email protected]> writes:

> On 04/21, Peter Zijlstra wrote:
>>
>> @@ -1329,8 +1337,7 @@ SYSCALL_DEFINE4(ptrace, long, request, l
>> goto out_put_task_struct;
>>
>> ret = arch_ptrace(child, request, addr, data);
>> - if (ret || request != PTRACE_DETACH)
>> - ptrace_unfreeze_traced(child);
>> + ptrace_unfreeze_traced(child);
>
> Forgot to mention... whatever we do this doesn't look right.
>
> ptrace_unfreeze_traced() must not be called if the tracee was untraced,
> anothet debugger can come after that. I agree, the current code looks
> a bit confusing, perhaps it makes sense to re-write it:
>
> if (request == PTRACE_DETACH && ret == 0)
> ; /* nothing to do, no longer traced by us */
> else
> ptrace_unfreeze_traced(child);

This was a bug in my original JOBCTL_DELAY_WAITKILL patch and it was
just cut and pasted here. I thought it made sense when I was throwing
things together but when I looked more closely I realized that it is
not safe.

Eric

2022-04-27 23:08:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

2> On 04/26, Eric W. Biederman wrote:
>>
>> static void ptrace_unfreeze_traced(struct task_struct *task)
>> {
>> - if (READ_ONCE(task->__state) != __TASK_TRACED)
>> + if (!(READ_ONCE(task->jobctl) & JOBCTL_DELAY_WAKEKILL))
>> return;
>>
>> WARN_ON(!task->ptrace || task->parent != current);
>> @@ -213,11 +213,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
>> * Recheck state under the lock to close this race.
>> */
>> spin_lock_irq(&task->sighand->siglock);
>
> Now that we do not check __state = __TASK_TRACED, we need lock_task_sighand().
> The tracee can be already woken up by ptrace_resume(), but it is possible that
> it didn't clear DELAY_WAKEKILL yet.

Yes. The subtle differences in when __TASK_TRACED and
JOBCTL_DELAY_WAKEKILL are cleared are causing me some minor issues.

This "WARN_ON(!task->ptrace || task->parent != current);" also now
needs to be inside siglock, because the __TASK_TRACED is insufficient.


> Now, before we take ->siglock, the tracee can exit and another thread can do
> wait() and reap this task.
>
> Also, I think the comment above should be updated. I agree, it makes sense to
> re-check JOBCTL_DELAY_WAKEKILL under siglock just for clarity, but we no longer
> need to do this to close the race; jobctl &= ~JOBCTL_DELAY_WAKEKILL and
> wake_up_state() are safe even if JOBCTL_DELAY_WAKEKILL was already
> cleared.

I think you are right about it being safe, but I am having a hard time
convincing myself that is true. I want to be very careful sending
__TASK_TRACED wake_ups as ptrace_stop fundamentally can't handle
spurious wake_ups.

So I think adding task_is_traced to the test to verify the task
is still frozen.

static void ptrace_unfreeze_traced(struct task_struct *task)
{
unsigned long flags;

/*
* Verify the task is still frozen before unfreezing it,
* ptrace_resume could have unfrozen us.
*/
if (lock_task_sighand(task, &flags)) {
if ((task->jobctl & JOBCTL_DELAY_WAKEKILL) &&
task_is_traced(task)) {
task->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
if (__fatal_signal_pending(task))
wake_up_state(task, __TASK_TRACED);
}
unlock_task_sighand(task, &flags);
}
}

>> @@ -2307,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>>
>> /* LISTENING can be set only during STOP traps, clear it */
>> current->jobctl &= ~JOBCTL_LISTENING;
>> + current->jobctl &= ~JOBCTL_DELAY_WAKEKILL;
>
> minor, but
>
> current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_DELAY_WAKEKILL);
>
> looks better.

Yes.


Eric

2022-04-28 04:39:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

"Eric W. Biederman" <[email protected]> writes:

> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..1947c85aa9d9 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>
> static inline void signal_wake_up(struct task_struct *t, bool resume)
> {
> - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> }
> static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> {

Grrr. While looking through everything today I have realized that there
is a bug.

Suppose we have 3 processes: TRACER, TRACEE, KILLER.

Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
been dropped.

The TRACER process has performed ptrace_attach on TRACEE and is in the
middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.

Then comes in the KILLER process and sends the TRACEE a SIGKILL.
The TRACEE __state remains TASK_TRACED, as designed.

The bug appears when the TRACEE makes it to schedule(). Inside
schedule there is a call to signal_pending_state() which notices
a SIGKILL is pending and refuses to sleep.

I could avoid setting TIF_SIGPENDING in signal_wake_up but that
is insufficient as another signal may be pending.

I could avoid marking the task as __fatal_signal_pending but then
where would the information that the task needs to become
__fatal_signal_pending go.

Hmm.

This looks like I need my other pending cleanup which introduces a
helper to get this idea to work.

Eric

2022-04-28 14:16:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/9] ptrace: cleaning up ptrace_stop

On Tue, Apr 26, 2022 at 05:50:21PM -0500, Eric W. Biederman wrote:
> .... Peter Zijlstra has
> been rewriting the classic freezer and in earlier parts of this
> discussion so I presume it is also a problem for PREEMPT_RT.

Ah, the freezer thing is in fact a sched/arm64 issue, the common issue
between these two issues is ptrace though.

Specifically, on recent arm64 chips only a subset of CPUs can execute
arm32 code and 32bit processes are restricted to that subset. If by some
mishap you try and execute a 32bit task on a non-capable CPU it gets
terminated without prejudice.

Now, the current freezer has this problem that tasks can spuriously thaw
too soon (where too soon is before SMP is restored) which leads to these
32bit tasks being killed dead.

That, and it was a good excuse to fix up the current freezer :-)

2022-04-28 22:48:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 04/27, Eric W. Biederman wrote:
>>
>> "Eric W. Biederman" <[email protected]> writes:
>>
>> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
>> > index 3c8b34876744..1947c85aa9d9 100644
>> > --- a/include/linux/sched/signal.h
>> > +++ b/include/linux/sched/signal.h
>> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
>> >
>> > static inline void signal_wake_up(struct task_struct *t, bool resume)
>> > {
>> > - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> > + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> > + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> > }
>> > static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
>> > {
>>
>> Grrr. While looking through everything today I have realized that there
>> is a bug.
>>
>> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>>
>> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
>> been dropped.
>>
>> The TRACER process has performed ptrace_attach on TRACEE and is in the
>> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>>
>> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
>> The TRACEE __state remains TASK_TRACED, as designed.
>>
>> The bug appears when the TRACEE makes it to schedule(). Inside
>> schedule there is a call to signal_pending_state() which notices
>> a SIGKILL is pending and refuses to sleep.
>
> And I think this is fine. This doesn't really differ from the case
> when the tracee was killed before it takes siglock.

Hmm. Maybe.

> The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
> ptrace_stop() can leak this flag. That is why I suggested to clear
> it along with LISTENING/DELAY_WAKEKILL before return, exactly because
> schedule() won't block if fatal_signal_pending() is true.
>
> But may be I misunderstood you concern?

Prior to JOBCTL_DELAY_WAKEKILL once __state was set to __TASK_TRACED
we were guaranteed that schedule() would stop if a SIGKILL was
received after that point. As well as being immune from wake-ups
from SIGKILL.

I guess we are immune from wake-ups with JOBCTL_DELAY_WAKEKILL as I have
implemented it.

The practical concern then seems to be that we are not guaranteed
wait_task_inactive will succeed. Which means that it must continue
to include the TASK_TRACED bit.

Previously we were actually guaranteed in ptrace_check_attach that after
ptrace_freeze_traced would succeed as any pending fatal signal would
cause ptrace_freeze_traced to fail. Any incoming fatal signal would not
stop schedule from sleeping. The ptraced task would continue to be
ptraced, as all other ptrace operations are blocked by virtue of ptrace
being single threaded.

I think in my tired mind yesterday I thought it would messing things
up after schedule decided to sleep. Still I would like to be able to
let wait_task_inactive not care about the state of the process it is
going to sleep for.

Eric

2022-04-29 08:29:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/28, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> >> The bug appears when the TRACEE makes it to schedule(). Inside
> >> schedule there is a call to signal_pending_state() which notices
> >> a SIGKILL is pending and refuses to sleep.
> >
> > And I think this is fine. This doesn't really differ from the case
> > when the tracee was killed before it takes siglock.
>
> Hmm. Maybe.

I hope ;)

> Previously we were actually guaranteed in ptrace_check_attach that after
> ptrace_freeze_traced would succeed as any pending fatal signal would
> cause ptrace_freeze_traced to fail. Any incoming fatal signal would not
> stop schedule from sleeping.

Yes.

So let me repeat, 7/9 "ptrace: Simplify the wait_task_inactive call in
ptrace_check_attach" looks good to me (except it should use
wait_task_inactive(__TASK_TRACED)), but it should come before other
meaningfull changes and the changelog should be updated.

And then we will probably need to reconsider this wait_task_inactive()
and WARN_ON() around it, but depends on what will we finally do.

> I think in my tired mind yesterday

I got lost too ;)

> Still I would like to be able to
> let wait_task_inactive not care about the state of the process it is
> going to sleep for.

Not sure... but to be honest I didn't really pay attention to the
wait_task_inactive(match_state => 0) part...

Oleg.

2022-04-29 09:48:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> On 04/28, Peter Zijlstra wrote:
> >
> > Oleg pointed out that the tracee can already be killed such that
> > fatal_signal_pending() is true. In that case signal_wake_up_state()
> > cannot be relied upon to be responsible for the wakeup -- something
> > we're going to want to rely on.
>
> Peter, I am all confused...
>
> If this patch is against the current tree, we don't need it.
>
> If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> then it can't help - SIGKILL can come right after the tracee drops siglock
> and calls schedule().

But by that time it will already have set TRACED and signal_wake_up()
wil clear it, no?

> Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
> to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
> return to close this race.

I think Eric convinced me there was a problem with that, but I'll go
over it all again in the morning, perhaps I'll reach a different
conclusion :-)

2022-04-29 10:42:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

Peter, you know, it is very difficult to me to discuss the changes
in the 2 unfinished series and not loose the context ;) Plus I am
already sleeping. But I'll try to reply anyway.

On 04/29, Peter Zijlstra wrote:
>
> On Thu, Apr 28, 2022 at 10:59:57PM +0200, Oleg Nesterov wrote:
> > If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
> > then it can't help - SIGKILL can come right after the tracee drops siglock
> > and calls schedule().
>
> But by that time it will already have set TRACED and signal_wake_up()
> wil clear it, no?

No. JOBCTL_DELAY_WAKEKILL is already set, this means that signal_wake_up()
will remove TASK_WAKEKILL from the "state" passed to signal_wake_up_state()
and this is fine and correct, this mean thats ttwu() won't change ->__state.

But this also mean that wake_up_state() will return false, and in this case

signal_wake_up_state:

if (wake_up_state(t, state | TASK_INTERRUPTIBLE))
t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED | JOBCTL_TRACED_QUIESCE);

won't clear these flags. And this is nice too.

But. fatal_signal_pending() is true! And once we change freeze_traced()
to not abuse p->__state, schedule() won't block because it will check
signal_pending_state(TASK_TRACED == TASK_WAKEKILL | __TASK_TRACED) and
__fatal_signal_pending() == T.

In this case ptrace_stop() will leak JOBCTL_TRACED, so we simply need
to clear it before return along with LISTENING | DELAY_WAKEKILL.

> I'll go
> over it all again in the morning, perhaps I'll reach a different
> conclusion :-)

Same here ;)

Oleg.

2022-04-29 10:46:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On Tue, Apr 26, 2022 at 07:24:03PM -0500, Eric W. Biederman wrote:
> But doing:
>
> /* Don't stop if the task is dying */
> if (unlikely(__fatal_signal_pending(current)))
> return exit_code;
>
> Should work.

Something like so then...

---
Subject: signal,ptrace: Don't stop dying tasks
From: Peter Zijlstra <[email protected]>
Date: Thu Apr 28 22:17:56 CEST 2022

Oleg pointed out that the tracee can already be killed such that
fatal_signal_pending() is true. In that case signal_wake_up_state()
cannot be relied upon to be responsible for the wakeup -- something
we're going to want to rely on.

As such, explicitly handle this case.

Suggested-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/signal.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
spin_lock_irq(&current->sighand->siglock);
}

+ /* Don't stop if the task is dying. */
+ if (unlikely(__fatal_signal_pending(current)))
+ return exit_code;
+
/*
* schedule() will not sleep if there is a pending signal that
* can awaken the task.

2022-04-29 12:52:05

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 9/9] ptrace: Don't change __state

On 04/27, Eric W. Biederman wrote:
>
> "Eric W. Biederman" <[email protected]> writes:
>
> > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> > index 3c8b34876744..1947c85aa9d9 100644
> > --- a/include/linux/sched/signal.h
> > +++ b/include/linux/sched/signal.h
> > @@ -437,7 +437,8 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);
> >
> > static inline void signal_wake_up(struct task_struct *t, bool resume)
> > {
> > - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> > + bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> > + signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> > }
> > static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
> > {
>
> Grrr. While looking through everything today I have realized that there
> is a bug.
>
> Suppose we have 3 processes: TRACER, TRACEE, KILLER.
>
> Meanwhile TRACEE is in the middle of ptrace_stop, just after siglock has
> been dropped.
>
> The TRACER process has performed ptrace_attach on TRACEE and is in the
> middle of a ptrace operation and has just set JOBCTL_DELAY_WAKEKILL.
>
> Then comes in the KILLER process and sends the TRACEE a SIGKILL.
> The TRACEE __state remains TASK_TRACED, as designed.
>
> The bug appears when the TRACEE makes it to schedule(). Inside
> schedule there is a call to signal_pending_state() which notices
> a SIGKILL is pending and refuses to sleep.

And I think this is fine. This doesn't really differ from the case
when the tracee was killed before it takes siglock.

The only problem (afaics) is that, once we introduce JOBCTL_TRACED,
ptrace_stop() can leak this flag. That is why I suggested to clear
it along with LISTENING/DELAY_WAKEKILL before return, exactly because
schedule() won't block if fatal_signal_pending() is true.

But may be I misunderstood you concern?

Oleg.

2022-05-01 05:13:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/12] ptrace: cleaning up ptrace_stop


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups. This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

In particular I don't have a solution that is ready for the challenges
presented by wait_task_inactive.

I hope we can review these changes and then have a firm foundation
for the rest of the challenges.

There are cleanups to the ptrace support for xtensa, um, and
ia64.

I have sucked in the first patch of Peter's freezer change as
with minor modifications I believe it is ready to go.

Eric W. Biederman (12):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Don't change __state
ptrace: Remove arch_ptrace_attach
ptrace: Always take siglock in ptrace_resume
ptrace: Only return signr from ptrace_stop if it was provided
ptrace: Always call schedule in ptrace_stop
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 10 +++
include/linux/sched/signal.h | 23 ++++++-
include/linux/signal.h | 3 +-
kernel/ptrace.c | 88 +++++++++----------------
kernel/signal.c | 135 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
18 files changed, 145 insertions(+), 228 deletions(-)

Eric

2022-05-01 09:42:11

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 11/12] ptrace: Always call schedule in ptrace_stop

Stop testing for !current->ptrace and setting __state to TASK_RUNNING.
The code in __ptrace_unlink wakes up the child with
ptrace_signal_wake_up which will set __state to TASK_RUNNING. This
leaves the only thing ptrace_stop needs to do is to send the signals.

Make the signals sending conditional upon current->ptrace so that
the correct signals are sent to the parent.

After that call schedule and let the fact that __state == TASK_RUNNING
keep the code from sleeping in schedule.

Now that it is easy to see that ptrace_stop always sleeps in
ptrace_stop after ptrace_freeze_trace succeeds modify
ptrace_check_attach to warn if wait_task_inactive fails.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 14 +++-------
kernel/signal.c | 68 ++++++++++++++++++-------------------------------
2 files changed, 28 insertions(+), 54 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d80222251f60..c1afebd2e8f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -261,17 +261,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
}
read_unlock(&tasklist_lock);

- if (!ret && !ignore_state) {
- if (!wait_task_inactive(child, __TASK_TRACED)) {
- /*
- * This can only happen if may_ptrace_stop() fails and
- * ptrace_stop() changes ->state back to TASK_RUNNING,
- * so we should not worry about leaking __TASK_TRACED.
- */
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
- ret = -ESRCH;
- }
- }
+ if (!ret && !ignore_state &&
+ WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+ ret = -ESRCH;

return ret;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index 7cb27a27290a..4cae3f47f664 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2255,51 +2255,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,

spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
- if (likely(current->ptrace)) {
- /*
- * Notify parents of the stop.
- *
- * While ptraced, there are two parents - the ptracer and
- * the real_parent of the group_leader. The ptracer should
- * know about every stop while the real parent is only
- * interested in the completion of group stop. The states
- * for the two don't interact with each other. Notify
- * separately unless they're gonna be duplicates.
- */
+ /*
+ * Notify parents of the stop.
+ *
+ * While ptraced, there are two parents - the ptracer and
+ * the real_parent of the group_leader. The ptracer should
+ * know about every stop while the real parent is only
+ * interested in the completion of group stop. The states
+ * for the two don't interact with each other. Notify
+ * separately unless they're gonna be duplicates.
+ */
+ if (current->ptrace)
do_notify_parent_cldstop(current, true, why);
- if (gstop_done && ptrace_reparented(current))
- do_notify_parent_cldstop(current, false, why);
-
- /*
- * Don't want to allow preemption here, because
- * sys_ptrace() needs this task to be inactive.
- *
- * XXX: implement read_unlock_no_resched().
- */
- preempt_disable();
- read_unlock(&tasklist_lock);
- cgroup_enter_frozen();
- preempt_enable_no_resched();
- freezable_schedule();
- cgroup_leave_frozen(true);
- } else {
- /*
- * By the time we got the lock, our tracer went away.
- * Don't drop the lock yet, another tracer may come.
- *
- * If @gstop_done, the ptracer went away between group stop
- * completion and here. During detach, it would have set
- * JOBCTL_STOP_PENDING on us and we'll re-enter
- * TASK_STOPPED in do_signal_stop() on return, so notifying
- * the real parent of the group stop completion is enough.
- */
- if (gstop_done)
- do_notify_parent_cldstop(current, false, why);
+ if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+ do_notify_parent_cldstop(current, false, why);

- /* tasklist protects us from ptrace_freeze_traced() */
- __set_current_state(TASK_RUNNING);
- read_unlock(&tasklist_lock);
- }
+ /*
+ * Don't want to allow preemption here, because
+ * sys_ptrace() needs this task to be inactive.
+ *
+ * XXX: implement read_unlock_no_resched().
+ */
+ preempt_disable();
+ read_unlock(&tasklist_lock);
+ cgroup_enter_frozen();
+ preempt_enable_no_resched();
+ freezable_schedule();
+ cgroup_leave_frozen(true);

/*
* We are back. Now reacquire the siglock before touching
--
2.35.3

2022-05-01 19:32:20

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 04/12] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP

xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: [email protected]
Acked-by: Max Filippov <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/xtensa/kernel/ptrace.c | 4 ++--
arch/xtensa/kernel/signal.c | 4 ++--
include/linux/ptrace.h | 6 ------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)

void user_enable_single_step(struct task_struct *child)
{
- child->ptrace |= PT_SINGLESTEP;
+ set_tsk_thread_flag(child, TIF_SINGLESTEP);
}

void user_disable_single_step(struct task_struct *child)
{
- child->ptrace &= ~PT_SINGLESTEP;
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
}

/*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
/* Set up the stack frame */
ret = setup_frame(&ksig, sigmask_to_save(), regs);
signal_setup_done(ret, &ksig, 0);
- if (current->ptrace & PT_SINGLESTEP)
+ if (test_thread_flag(TIF_SINGLESTEP))
task_pt_regs(current)->icountlevel = 1;

return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
/* If there's no signal to deliver, we just restore the saved mask. */
restore_saved_sigmask();

- if (current->ptrace & PT_SINGLESTEP)
+ if (test_thread_flag(TIF_SINGLESTEP))
task_pt_regs(current)->icountlevel = 1;
return;
}
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
#define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)

-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT 31
-#define PT_SINGLESTEP (1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT 30
-#define PT_BLOCKSTEP (1<<PT_BLOCKSTEP_BIT)
-
extern long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
--
2.35.3

2022-05-02 13:32:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] sched,ptrace: Fix ptrace_check_attach() vs PREEMPT_RT

On 04/28, Peter Zijlstra wrote:
>
> Oleg pointed out that the tracee can already be killed such that
> fatal_signal_pending() is true. In that case signal_wake_up_state()
> cannot be relied upon to be responsible for the wakeup -- something
> we're going to want to rely on.

Peter, I am all confused...

If this patch is against the current tree, we don't need it.

If it is on top of JOBCTL_TRACED/DELAY_WAKEKILL changes (yours or Eric's),
then it can't help - SIGKILL can come right after the tracee drops siglock
and calls schedule().

Perhaps I missed something, but let me repeat the 3rd time: I'd suggest
to simply clear JOBCTL_TRACED along with LISTENING/DELAY_WAKEKILL before
return to close this race.

Oleg.

> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -2226,6 +2226,10 @@ static int ptrace_stop(int exit_code, in
> spin_lock_irq(&current->sighand->siglock);
> }
>
> + /* Don't stop if the task is dying. */
> + if (unlikely(__fatal_signal_pending(current)))
> + return exit_code;
> +
> /*
> * schedule() will not sleep if there is a pending signal that
> * can awaken the task.
>

2022-05-02 20:11:25

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

- PREEMPT_RT has task->saved_state which complicates matters,
meaning task_is_{traced,stopped}() needs to check an additional
variable.

- An alternative freezer implementation that itself relies on a
special TASK state would loose TASK_TRACED/TASK_STOPPED and will
result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
* didn't add a unnecessary newline in signal.h
* Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
instead of in signal_wake_up_state. This prevents the clearing
of TASK_STOPPED and TASK_TRACED from getting lost.
* Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
---
include/linux/sched.h | 8 +++-----
include/linux/sched/jobctl.h | 6 ++++++
include/linux/sched/signal.h | 17 ++++++++++++++---
kernel/ptrace.c | 17 +++++++++++++----
kernel/signal.c | 16 +++++++++++++---
5 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;

#define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING)

-#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task) ((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task) ((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task) ((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)

/*
* Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index 2ff1bcd63cf4..9c0b917de2f9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -22,6 +22,9 @@ struct task_struct;
#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */
#define JOBCTL_PTRACE_SIGNR_BIT 25 /* ptrace signal number */

+#define JOBCTL_STOPPED_BIT 26 /* do_signal_stop() */
+#define JOBCTL_TRACED_BIT 27 /* ptrace_stop() */
+
#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
#define JOBCTL_STOP_CONSUME (1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -33,6 +36,9 @@ struct task_struct;
#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT)
#define JOBCTL_PTRACE_SIGNR (1UL << JOBCTL_PTRACE_SIGNR_BIT)

+#define JOBCTL_STOPPED (1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED (1UL << JOBCTL_TRACED_BIT)
+
#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 35af34eeee9e..4dcce2bbf1fb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
static inline void kernel_signal_stop(void)
{
spin_lock_irq(&current->sighand->siglock);
- if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+ if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+ current->jobctl |= JOBCTL_STOPPED;
set_special_state(TASK_STOPPED);
+ }
spin_unlock_irq(&current->sighand->siglock);

schedule();
@@ -439,15 +441,24 @@ static inline void signal_wake_up(struct task_struct *t, bool resume)
{
unsigned int state = 0;
if (resume) {
+ unsigned long jmask = JOBCTL_STOPPED;
state = TASK_WAKEKILL;
- if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+ if (!(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+ jmask |= JOBCTL_TRACED;
state |= __TASK_TRACED;
+ }
+ t->jobctl &= ~jmask;
}
signal_wake_up_state(t, state);
}
static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
- signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+ unsigned int state = 0;
+ if (resume) {
+ t->jobctl &= ~JOBCTL_TRACED;
+ state = __TASK_TRACED;
+ }
+ signal_wake_up_state(t, state);
}

void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1afebd2e8f3..38913801717f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
return true;
}

-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
static bool ptrace_freeze_traced(struct task_struct *task)
{
bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
*/
if (lock_task_sighand(task, &flags)) {
task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
- if (__fatal_signal_pending(task))
+ if (__fatal_signal_pending(task)) {
+ task->jobctl &= ~TASK_TRACED;
wake_up_state(task, __TASK_TRACED);
+ }
unlock_task_sighand(task, &flags);
}
}
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
* in and out of STOPPED are protected by siglock.
*/
if (task_is_stopped(task) &&
- task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+ task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+ task->jobctl &= ~JOBCTL_STOPPED;
signal_wake_up_state(task, __TASK_STOPPED);
+ }

spin_unlock(&task->sighand->siglock);

@@ -879,7 +888,7 @@ static int ptrace_resume(struct task_struct *child, long request,
*/
spin_lock_irq(&child->sighand->siglock);
child->exit_code = data;
- child->jobctl |= JOBCTL_PTRACE_SIGNR;
+ child->jobctl = (child->jobctl | JOBCTL_PTRACE_SIGNR) & ~JOBCTL_TRACED;
wake_up_state(child, __TASK_TRACED);
spin_unlock_irq(&child->sighand->siglock);

diff --git a/kernel/signal.c b/kernel/signal.c
index 4cae3f47f664..d6573abbc169 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
*/
void signal_wake_up_state(struct task_struct *t, unsigned int state)
{
+ lockdep_assert_held(&t->sighand->siglock);
+
set_tsk_thread_flag(t, TIF_SIGPENDING);
+
/*
* TASK_WAKEKILL also means wake it up in the stopped/traced/killable
* case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
for_each_thread(p, t) {
flush_sigqueue_mask(&flush, &t->pending);
task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
- if (likely(!(t->ptrace & PT_SEIZED)))
+ if (likely(!(t->ptrace & PT_SEIZED))) {
+ t->jobctl &= ~JOBCTL_STOPPED;
wake_up_state(t, __TASK_STOPPED);
- else
+ } else
ptrace_trap_notify(t);
}

@@ -2208,8 +2212,10 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
spin_lock_irq(&current->sighand->siglock);
}

- if (!__fatal_signal_pending(current))
+ if (!__fatal_signal_pending(current)) {
set_special_state(TASK_TRACED);
+ current->jobctl |= JOBCTL_TRACED;
+ }

/*
* We're committing to trapping. TRACED should be visible before
@@ -2301,6 +2307,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,

/* LISTENING can be set only during STOP traps, clear it */
current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);
+ WARN_ON_ONCE(current->jobctl & JOBCTL_TRACED);

/*
* Queued signals ignored us while we were stopped for tracing.
@@ -2433,6 +2440,7 @@ static bool do_signal_stop(int signr)
if (task_participate_group_stop(current))
notify = CLD_STOPPED;

+ current->jobctl |= JOBCTL_STOPPED;
set_special_state(TASK_STOPPED);
spin_unlock_irq(&current->sighand->siglock);

@@ -2454,6 +2462,8 @@ static bool do_signal_stop(int signr)
/* Now we don't run again until woken by SIGCONT or SIGKILL */
cgroup_enter_frozen();
freezable_schedule();
+
+ WARN_ON_ONCE(READ_ONCE(current->jobctl) & JOBCTL_STOPPED);
return true;
} else {
/*
--
2.35.3

2022-05-02 23:10:52

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked

Rename send_signal send_signal_locked and make to make
it usable outside of signal.c.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/signal.h | 2 ++
kernel/signal.c | 24 ++++++++++++------------
2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *p, enum pid_type type);
extern int sigprocmask(int, sigset_t *, sigset_t *);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}

-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
- enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *t, enum pid_type type, bool force)
{
struct sigpending *pending;
struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
return ret;
}

-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
- enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *t, enum pid_type type)
{
/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
force = true;
}
}
- return __send_signal(sig, info, t, type, force);
+ return __send_signal_locked(sig, info, t, type, force);
}

static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
int
__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
{
- return send_signal(sig, info, p, PIDTYPE_TGID);
+ return send_signal_locked(sig, info, p, PIDTYPE_TGID);
}

int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
int ret = -ESRCH;

if (lock_task_sighand(p, &flags)) {
- ret = send_signal(sig, info, p, type);
+ ret = send_signal_locked(sig, info, p, type);
unlock_task_sighand(p, &flags);
}

@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
if (action->sa.sa_handler == SIG_DFL &&
(!t->ptrace || (handler == HANDLER_EXIT)))
t->signal->flags &= ~SIGNAL_UNKILLABLE;
- ret = send_signal(sig, info, t, PIDTYPE_PID);
+ ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
spin_unlock_irqrestore(&t->sighand->siglock, flags);

return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,

if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+ ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
* parent's namespaces.
*/
if (valid_signal(sig) && sig)
- __send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+ __send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);

@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
/* If the (new) signal is now blocked, requeue it. */
if (sigismember(&current->blocked, signr) ||
fatal_signal_pending(current)) {
- send_signal(signr, info, current, type);
+ send_signal_locked(signr, info, current, type);
signr = 0;
}

@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
"the deadlock.\n");
return;
}
- ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+ ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
spin_unlock(&t->sighand->siglock);
if (ret)
kdb_printf("Fail to deliver Signal %d to process %d.\n",
--
2.35.3

2022-05-02 23:10:57

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 07/12] ptrace: Don't change __state

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN. This new This new
flag is set in jobctl_freeze_task and cleared when ptrace_stop is
awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).

In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when it is indicated a fatal signal is pending. Skip adding
__TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same
effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
that use TASK_KILLABLE go through signal_wake_up.

Don't set TASK_TRACED if fatal_signal_pending so that the code
continues not to sleep if there was a pending fatal signal before
ptrace_stop is called. With TASK_WAKEKILL no longer present in
TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
from sleeping if there is a pending fatal signal.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop. Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/sched.h | 2 +-
include/linux/sched/jobctl.h | 2 ++
include/linux/sched/signal.h | 8 +++++++-
kernel/ptrace.c | 21 ++++++++-------------
kernel/signal.c | 9 +++------
5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
#define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED __TASK_TRACED

#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
#define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */
#define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */

#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
#define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT)
#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT)
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT)

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..35af34eeee9e 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -437,7 +437,13 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
- signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+ unsigned int state = 0;
+ if (resume) {
+ state = TASK_WAKEKILL;
+ if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
+ state |= __TASK_TRACED;
+ }
+ signal_wake_up_state(t, state);
}
static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43da5764b6f3..644eb7439d01 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
spin_lock_irq(&task->sighand->siglock);
if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
!__fatal_signal_pending(task)) {
- WRITE_ONCE(task->__state, __TASK_TRACED);
+ task->jobctl |= JOBCTL_PTRACE_FROZEN;
ret = true;
}
spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)

static void ptrace_unfreeze_traced(struct task_struct *task)
{
- if (READ_ONCE(task->__state) != __TASK_TRACED)
- return;
-
- WARN_ON(!task->ptrace || task->parent != current);
+ unsigned long flags;

/*
- * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
- * Recheck state under the lock to close this race.
+ * The child may be awake and may have cleared
+ * JOBCTL_PTRACE_FROZEN (see ptrace_resume). The child will
+ * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
*/
- spin_lock_irq(&task->sighand->siglock);
- if (READ_ONCE(task->__state) == __TASK_TRACED) {
+ if (lock_task_sighand(task, &flags)) {
+ task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
if (__fatal_signal_pending(task))
wake_up_state(task, __TASK_TRACED);
- else
- WRITE_ONCE(task->__state, TASK_TRACED);
+ unlock_task_sighand(task, &flags);
}
- spin_unlock_irq(&task->sighand->siglock);
}

/**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
*/
read_lock(&tasklist_lock);
if (child->ptrace && child->parent == current) {
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
/*
* child->sighand can't be NULL, release_task()
* does ptrace_unlink() before __exit_signal().
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..5cf268982a7e 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
spin_lock_irq(&current->sighand->siglock);
}

- /*
- * schedule() will not sleep if there is a pending signal that
- * can awaken the task.
- */
- set_special_state(TASK_TRACED);
+ if (!__fatal_signal_pending(current))
+ set_special_state(TASK_TRACED);

/*
* We're committing to trapping. TRACED should be visible before
@@ -2321,7 +2318,7 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
current->exit_code = 0;

/* LISTENING can be set only during STOP traps, clear it */
- current->jobctl &= ~JOBCTL_LISTENING;
+ current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);

/*
* Queued signals ignored us while we were stopped for tracing.
--
2.35.3

2022-05-02 23:21:24

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided

In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
siglock is dropped or after tasklist_lock is dropped. At either point
the result can be that ptrace will continue and not stop at schedule.

This means that there are cases where the current logic fails to handle
the fact that ptrace_stop did not actually stop, and can potentially
cause ptrace_report_syscall to attempt to deliver a signal.

Instead of attempting to detect in ptrace_stop when it fails to
stop update ptrace_resume and ptrace_detach to set a flag to indicate
that the signal to continue with has be set. Use that
new flag to decided how to set return signal.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/sched/jobctl.h | 2 ++
kernel/ptrace.c | 5 +++++
kernel/signal.c | 12 ++++++------
3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..2ff1bcd63cf4 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -20,6 +20,7 @@ struct task_struct;
#define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */
+#define JOBCTL_PTRACE_SIGNR_BIT 25 /* ptrace signal number */

#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
@@ -30,6 +31,7 @@ struct task_struct;
#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT)
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT)
+#define JOBCTL_PTRACE_SIGNR (1UL << JOBCTL_PTRACE_SIGNR_BIT)

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index c1c99e8be147..d80222251f60 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -596,7 +596,11 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
* tasklist_lock avoids the race with wait_task_stopped(), see
* the comment in ptrace_resume().
*/
+ spin_lock(&child->sighand->siglock);
child->exit_code = data;
+ child->jobctl |= JOBCTL_PTRACE_SIGNR;
+ spin_unlock(&child->sighand->siglock);
+
__ptrace_detach(current, child);
write_unlock_irq(&tasklist_lock);

@@ -883,6 +887,7 @@ static int ptrace_resume(struct task_struct *child, long request,
*/
spin_lock_irq(&child->sighand->siglock);
child->exit_code = data;
+ child->jobctl |= JOBCTL_PTRACE_SIGNR;
wake_up_state(child, __TASK_TRACED);
spin_unlock_irq(&child->sighand->siglock);

diff --git a/kernel/signal.c b/kernel/signal.c
index 5cf268982a7e..7cb27a27290a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2193,7 +2193,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
__acquires(&current->sighand->siglock)
{
bool gstop_done = false;
- bool read_code = true;

if (arch_ptrace_stop_needed()) {
/*
@@ -2299,9 +2298,6 @@ static int ptrace_stop(int exit_code, int why, int clear_code,

/* tasklist protects us from ptrace_freeze_traced() */
__set_current_state(TASK_RUNNING);
- read_code = false;
- if (clear_code)
- exit_code = 0;
read_unlock(&tasklist_lock);
}

@@ -2311,14 +2307,18 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
* any signal-sending on another CPU that wants to examine it.
*/
spin_lock_irq(&current->sighand->siglock);
- if (read_code)
+ /* Did userspace perhaps provide a signal to resume with? */
+ if (current->jobctl & JOBCTL_PTRACE_SIGNR)
exit_code = current->exit_code;
+ else if (clear_code)
+ exit_code = 0;
+
current->last_siginfo = NULL;
current->ptrace_message = 0;
current->exit_code = 0;

/* LISTENING can be set only during STOP traps, clear it */
- current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);
+ current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN | JOBCTL_PTRACE_SIGNR);

/*
* Queued signals ignored us while we were stopped for tracing.
--
2.35.3

Subject: Re: [PATCH 0/12] ptrace: cleaning up ptrace_stop

On 2022-04-29 16:46:59 [-0500], Eric W. Biederman wrote:
>
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups. This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.

PREEMPT_RT wise, I had to duct tape wait_task_inactive() and remove the
preempt-disable section in ptrace_stop() (like previously). This reduces
the amount of __state + saved_state checks and looks otherwise stable in
light testing.

Sebastian

2022-05-03 00:00:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

On 04/29, Eric W. Biederman wrote:
>
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.

Eric, I'll read this patch and the rest of this series tomorrow.
Somehow I failed to force myself to read yet another version after
weekend ;)

plus I don't really understand this one...

> #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
> -#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
> +#define TASK_TRACED __TASK_TRACED
...
> static inline void signal_wake_up(struct task_struct *t, bool resume)
> {
> - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> + unsigned int state = 0;
> + if (resume) {
> + state = TASK_WAKEKILL;
> + if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> + state |= __TASK_TRACED;
> + }
> + signal_wake_up_state(t, state);

Can't understand why is this better than the previous version which removed
TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
look at the next patches yet.

> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
> spin_lock_irq(&current->sighand->siglock);
> }
>
> - /*
> - * schedule() will not sleep if there is a pending signal that
> - * can awaken the task.
> - */
> - set_special_state(TASK_TRACED);
> + if (!__fatal_signal_pending(current))
> + set_special_state(TASK_TRACED);

This is where I stuck. This probably makes sense, but what does it buy
for this particular patch?

And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
return ?

Oleg.

2022-05-03 00:04:02

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>
> Eric, I'll read this patch and the rest of this series tomorrow.
> Somehow I failed to force myself to read yet another version after
> weekend ;)

That is quite alright.

> plus I don't really understand this one...
>
>> #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
>> -#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
>> +#define TASK_TRACED __TASK_TRACED
> ...
>> static inline void signal_wake_up(struct task_struct *t, bool resume)
>> {
>> - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> + unsigned int state = 0;
>> + if (resume) {
>> + state = TASK_WAKEKILL;
>> + if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> + state |= __TASK_TRACED;
>> + }
>> + signal_wake_up_state(t, state);
>
> Can't understand why is this better than the previous version which removed
> TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> look at the next patches yet.

The goal is to replace the existing mechanism with an equivalent one,
so that we don't have to be clever and deal with it being slightly
different in one case.

The difference is how does signal_pending_state affect how schedule will
sleep in ptrace_stop.

As the patch is constructed currently (and how the existing code works)
is that signal_pending_state will always sleep if ptrace_freeze_traced
completes successfully.

When TASK_WAKEKILL was included in TASK_TRACED schedule might refuse
to sleep even though ptrace_freeze_traced completed successfully. As
you pointed out wait_task_inactive would then fail, keeping
ptrace_check_attach from succeeded.

Other than complicating the analysis by adding extra states we need to
consider when reviewing the patch, the practical difference is for
Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
to cope with the final being changed by something else. (TASK_FROZEN in
the freezer case). I can only see that happening by removing the
dependency on the final state in wait_task_inactive. Which we can't do
if we depend on wait_task_inactive failing if the process is in the
wrong state.


>> @@ -2209,11 +2209,8 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>> spin_lock_irq(&current->sighand->siglock);
>> }
>>
>> - /*
>> - * schedule() will not sleep if there is a pending signal that
>> - * can awaken the task.
>> - */
>> - set_special_state(TASK_TRACED);
>> + if (!__fatal_signal_pending(current))
>> + set_special_state(TASK_TRACED);
>
> This is where I stuck. This probably makes sense, but what does it buy
> for this particular patch?
>
> And if we check __fatal_signal_pending(), why can't ptrace_stop() simply
> return ?

Again this is about preserving existing behavior as much as possible to
simplify analsysis of the patch.

The current code depends upon schedule not sleeping if there was a fatal
signal received before ptrace_stop is called. With TASK_WAKEKILL
removed from TASK_TRACED that no longer happens. Just not setting
TASK_TRACED when !__fatal_signal_pending has the same effect.


At a practical level I think it also has an impact on patch:
"10/12 ptrace: Only return signr from ptrace_stop if it was provided".

At a minimum the code would need to do something like:
if (__fatal_signal_pending(current)) {
return clear_code ? 0 : exit_code;
}

With a little bit of care put in to ensure everytime the logic changes
that early return changes too. I think that just complicates things
unnecessarily.

Eric



2022-05-03 00:04:27

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 03/12] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP

User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: [email protected]
Acked-by: Johannes Berg <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/um/include/asm/thread_info.h | 2 ++
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 ++++----
arch/um/kernel/signal.c | 4 ++--
include/linux/ptrace.h | 1 -
6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_RESTORE_SIGMASK 7
#define TIF_NOTIFY_RESUME 8
#define TIF_SECCOMP 9 /* secure computing */
+#define TIF_SINGLESTEP 10 /* single stepping userspace */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_MEMDIE (1 << TIF_MEMDIE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)

#endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
{
PT_REGS_IP(regs) = eip;
PT_REGS_SP(regs) = esp;
- current->ptrace &= ~PT_DTRACE;
+ clear_thread_flag(TIF_SINGLESTEP);
#ifdef SUBARCH_EXECVE1
SUBARCH_EXECVE1(regs->regs);
#endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
{
struct task_struct *task = t ? t : current;

- if (!(task->ptrace & PT_DTRACE))
+ if (!test_thread_flag(TIF_SINGLESTEP))
return 0;

if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@

void user_enable_single_step(struct task_struct *child)
{
- child->ptrace |= PT_DTRACE;
+ set_tsk_thread_flag(child, TIF_SINGLESTEP);
child->thread.singlestep_syscall = 0;

#ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)

void user_disable_single_step(struct task_struct *child)
{
- child->ptrace &= ~PT_DTRACE;
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
child->thread.singlestep_syscall = 0;

#ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
}

/*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
* PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
*/
int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
audit_syscall_exit(regs);

/* Fake a debug trap */
- if (ptraced & PT_DTRACE)
+ if (test_thread_flag(TIF_SINGLESTEP))
send_sigtrap(&regs->regs, 0);

if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
unsigned long sp;
int err;

- if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+ if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
singlestep = 1;

/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
* on the host. The tracing thread will check this flag and
* PTRACE_SYSCALL if necessary.
*/
- if (current->ptrace & PT_DTRACE)
+ if (test_thread_flag(TIF_SINGLESTEP))
current->thread.singlestep_syscall =
is_syscall(PT_REGS_IP(&current->thread.regs));

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,

#define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */
#define PT_PTRACED 0x00000001
-#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */

#define PT_OPT_FLAG_SHIFT 3
/* PT_TRACE_* event enable flags */
--
2.35.3

2022-05-03 00:11:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

On 04/29, Eric W. Biederman wrote:
>
> static void ptrace_unfreeze_traced(struct task_struct *task)
> {
> - if (READ_ONCE(task->__state) != __TASK_TRACED)
> - return;
> -
> - WARN_ON(!task->ptrace || task->parent != current);
> + unsigned long flags;
>
> /*
> - * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
> - * Recheck state under the lock to close this race.
> + * The child may be awake and may have cleared
> + * JOBCTL_PTRACE_FROZEN (see ptrace_resume). The child will
> + * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
> */
> - spin_lock_irq(&task->sighand->siglock);
> - if (READ_ONCE(task->__state) == __TASK_TRACED) {
> + if (lock_task_sighand(task, &flags)) {
> + task->jobctl &= ~JOBCTL_PTRACE_FROZEN;

Well, I think that the fast-path

if (!(task->jobctl & JOBCTL_PTRACE_FROZEN))
return;

at the start makes sense, we can avoid lock_task_sighand() if the tracee
was resumed.

Oleg.

Subject: Re: [PATCH v2 01/12] signal: Rename send_signal send_signal_locked

On 2022-04-29 16:48:26 [-0500], Eric W. Biederman wrote:
> Rename send_signal send_signal_locked and make to make

s@to make@@

> it usable outside of signal.c.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Sebastian

2022-05-03 00:43:57

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL

Call send_sig_info in PTRACE_KILL instead of ptrace_resume. Calling
ptrace_resume is not safe to call if the task has not been stopped
with ptrace_freeze_traced.

Cc: [email protected]
Reported-by: Al Viro <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..43da5764b6f3 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
case PTRACE_KILL:
if (child->exit_state) /* already dead */
return 0;
- return ptrace_resume(child, request, SIGKILL);
+ return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
case PTRACE_GETREGSET:
--
2.35.3

Subject: Re: [PATCH v2 12/12] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

On 2022-04-29 16:48:37 [-0500], Eric W. Biederman wrote:

Needs
From: Peter Zijlstra (Intel) <[email protected]>

at the top.

> Currently ptrace_stop() / do_signal_stop() rely on the special states
> TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
> state exists only in task->__state and nowhere else.
>
> There's two spots of bother with this:
>
> - PREEMPT_RT has task->saved_state which complicates matters,
> meaning task_is_{traced,stopped}() needs to check an additional
> variable.
>
> - An alternative freezer implementation that itself relies on a
> special TASK state would loose TASK_TRACED/TASK_STOPPED and will
> result in misbehaviour.
>
> As such, add additional state to task->jobctl to track this state
> outside of task->__state.
>
> NOTE: this doesn't actually fix anything yet, just adds extra state.
>
> --EWB
> * didn't add a unnecessary newline in signal.h
> * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
> instead of in signal_wake_up_state. This prevents the clearing
> of TASK_STOPPED and TASK_TRACED from getting lost.
> * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> Signed-off-by: Eric W. Biederman <[email protected]>

Sebastian

Subject: Re: [PATCH v2 10/12] ptrace: Only return signr from ptrace_stop if it was provided

On 2022-04-29 16:48:35 [-0500], Eric W. Biederman wrote:
> In ptrace_stop a ptrace_unlink or SIGKILL can occur either after
> siglock is dropped or after tasklist_lock is dropped. At either point
> the result can be that ptrace will continue and not stop at schedule.
>
> This means that there are cases where the current logic fails to handle
> the fact that ptrace_stop did not actually stop, and can potentially
> cause ptrace_report_syscall to attempt to deliver a signal.
>
> Instead of attempting to detect in ptrace_stop when it fails to
> stop update ptrace_resume and ptrace_detach to set a flag to indicate
,
> that the signal to continue with has be set. Use that
been
> new flag to decided how to set return signal.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Sebastian

Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

On 2022-04-29 16:48:32 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
>
> Instead TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN. This new This new

Instead adding TASK_WAKEKILL to the definition of TASK_TRACED, implement
a new jobctl flag TASK_PTRACE_FROZEN for this. This new

> flag is set in jobctl_freeze_task and cleared when ptrace_stop is
> awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep).
>
> In singal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
signal_wake_up

> when it is indicated a fatal signal is pending. Skip adding
+that ?

> __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same
> effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups
,
> that use TASK_KILLABLE go through signal_wake_up.
,

> Don't set TASK_TRACED if fatal_signal_pending so that the code
> continues not to sleep if there was a pending fatal signal before
> ptrace_stop is called. With TASK_WAKEKILL no longer present in
> TASK_TRACED signal_pending_state will no longer prevent ptrace_stop
> from sleeping if there is a pending fatal signal.
>
> Previously the __state value of __TASK_TRACED was changed to
> TASK_RUNNING when woken up or back to TASK_TRACED when the code was
> left in ptrace_stop. Now when woken up ptrace_stop now clears
> JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
> clears JOBCTL_PTRACE_FROZEN.
>
> Signed-off-by: "Eric W. Biederman" <[email protected]>

Sebastian

2022-05-03 01:15:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL

On 04/29, Eric W. Biederman wrote:
>
> Call send_sig_info in PTRACE_KILL instead of ptrace_resume. Calling
> ptrace_resume is not safe to call if the task has not been stopped
> with ptrace_freeze_traced.

Oh, I was never, never able to understand why do we have PTRACE_KILL
and what should it actually do.

I suggested many times to simply remove it but OK, we probably can't
do this.

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
> case PTRACE_KILL:
> if (child->exit_state) /* already dead */
> return 0;
> - return ptrace_resume(child, request, SIGKILL);
> + return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);

Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
then I'd suggest

case PTRACE_KILL:
if (!child->exit_state)
send_sig_info(SIGKILL);
return 0;

to make this change a bit more compatible.

Also, please remove the note about PTRACE_KILL in set_task_blockstep().

Oleg.

2022-05-03 01:16:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v2 05/12] signal: Use lockdep_assert_held instead of assert_spin_locked

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock. Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
static void ptrace_trap_notify(struct task_struct *t)
{
WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
- assert_spin_locked(&t->sighand->siglock);
+ lockdep_assert_held(&t->sighand->siglock);

task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
int override_rlimit;
int ret = 0, result;

- assert_spin_locked(&t->sighand->siglock);
+ lockdep_assert_held(&t->sighand->siglock);

result = TRACE_SIGNAL_IGNORED;
if (!prepare_signal(sig, t, force))
--
2.35.3

2022-05-03 19:29:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

On 05/02, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> >> #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
> >> #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
> >> -#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
> >> +#define TASK_TRACED __TASK_TRACED
> > ...
> >> static inline void signal_wake_up(struct task_struct *t, bool resume)
> >> {
> >> - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
> >> + unsigned int state = 0;
> >> + if (resume) {
> >> + state = TASK_WAKEKILL;
> >> + if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
> >> + state |= __TASK_TRACED;
> >> + }
> >> + signal_wake_up_state(t, state);
> >
> > Can't understand why is this better than the previous version which removed
> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
> > look at the next patches yet.
>
> The goal is to replace the existing mechanism with an equivalent one,
> so that we don't have to be clever and deal with it being slightly
> different in one case.
>
> The difference is how does signal_pending_state affect how schedule will
> sleep in ptrace_stop.

But why is it bad if the tracee doesn't sleep in schedule ? If it races
with SIGKILL. I still can't understand this.

Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
in 11/12.

Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
*signal_wake_up() better?

And even if we need to ensure the tracee will always block after
ptrace_freeze_traced(), we can change signal_pending_state() to
return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
looks unnecessary to me.



> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
> to cope with the final being changed by something else. (TASK_FROZEN in
> the freezer case). I can only see that happening by removing the
> dependency on the final state in wait_task_inactive. Which we can't do
> if we depend on wait_task_inactive failing if the process is in the
> wrong state.

OK, I guess this is what I do not understand. Could you spell please?

And speaking of RT, wait_task_inactive() still can fail because
cgroup_enter_frozen() takes css_set_lock? And it is called under
preempt_disable() ? I don't understand the plan :/

> At a practical level I think it also has an impact on patch:
> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".

I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
I mean, I am not sure it worth the trouble.

Oleg.

2022-05-03 22:37:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 05/02, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> >> #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
>> >> #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
>> >> -#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
>> >> +#define TASK_TRACED __TASK_TRACED
>> > ...
>> >> static inline void signal_wake_up(struct task_struct *t, bool resume)
>> >> {
>> >> - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
>> >> + unsigned int state = 0;
>> >> + if (resume) {
>> >> + state = TASK_WAKEKILL;
>> >> + if (!(t->jobctl & JOBCTL_PTRACE_FROZEN))
>> >> + state |= __TASK_TRACED;
>> >> + }
>> >> + signal_wake_up_state(t, state);
>> >
>> > Can't understand why is this better than the previous version which removed
>> > TASK_WAKEKILL if resume... Looks a bit strange to me. But again, I didn't
>> > look at the next patches yet.
>>
>> The goal is to replace the existing mechanism with an equivalent one,
>> so that we don't have to be clever and deal with it being slightly
>> different in one case.
>>
>> The difference is how does signal_pending_state affect how schedule will
>> sleep in ptrace_stop.
>
> But why is it bad if the tracee doesn't sleep in schedule ? If it races
> with SIGKILL. I still can't understand this.
>
> Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> in 11/12.


>
> Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> *signal_wake_up() better?

Not changing __state is better because it removes special cases
from the scheduler that only apply to ptrace.


> And even if we need to ensure the tracee will always block after
> ptrace_freeze_traced(), we can change signal_pending_state() to
> return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> looks unnecessary to me.

We still need to change signal_wake_up in that case. Possibly
signal_wake_up_state. The choice is for fatal signals is TASK_WAKEKILL
suppressed or is TASK_TRACED added.

With removing TASK_WAKEKILL the resulting code behaves in a very obvious
minimally special case way. Yes there is a special case in
signal_wake_up but that is the entirety of the special case and it is
easy to read and see what it does.

>> Peter's plans to fix PREEMPT_RT or the freezer wait_task_inactive needs
>> to cope with the final being changed by something else. (TASK_FROZEN in
>> the freezer case). I can only see that happening by removing the
>> dependency on the final state in wait_task_inactive. Which we can't do
>> if we depend on wait_task_inactive failing if the process is in the
>> wrong state.
>
> OK, I guess this is what I do not understand. Could you spell please?
>
> And speaking of RT, wait_task_inactive() still can fail because
> cgroup_enter_frozen() takes css_set_lock? And it is called under
> preempt_disable() ? I don't understand the plan :/

Let me describe his freezer change as that is much easier to get to the
final result. RT has more problems as it turns all spin locks into
sleeping locks. When a task is frozen it turns it's sleeping state into
TASK_FROZEN. That is TASK_STOPPED and TASK_TRACED become TASK_FROZEN.
If this races with ptrace_check_attach the wait_task_inactive fail as
the process state has changed. This makes the freezer userspace
visible.

For ordinary tasks the freezer thaws them just by giving them a spurious
wake-up. After which they check their conditions and go back to sleep
on their on. For TASK_STOPPED and TASK_TRACED (which can't handle
spurious wake-ups) the __state value is recovered from task->jobctl.

For RT cgroup_enter_frozen needs fixes that no one has proposed yet.
The problem is that for "preempt_disable()" before
"read_unlock(&tasklist_lock)" is not something that can reasonably be
removed. It would cause a performance regression.

So my plan is to get the things as far as the Peter's freezer change
working. That cleans up the code and makes it much closer for
ptrace working in PTREMPT_RT. That makes the problems left for
the PREEMPT_RT folks much smaller.


>> At a practical level I think it also has an impact on patch:
>> "10/12 ptrace: Only return signr from ptrace_stop if it was provided".
>
> I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> I mean, I am not sure it worth the trouble.

The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
- stopping in ptrace_report_syscall.
- Not having PT_TRACESYSGOOD set.
- The tracee being killed with a fatal signal
- The tracee sending SIGTRAP to itself.

The larger problem solved by the JOBCTL_PTRACE_SIGNR patch is that
it removes the need for current->ptrace test from ptrace_stop. Which
in turn is part of what is needed for wait_task_inactive to be
guaranteed a stop in ptrace_stop.


Thinking about it. I think a reasonable case can be made that it
is weird if not dangerous to play with the task fields (ptrace_message,
last_siginfo, and exit_code) without task_is_traced being true.
So I will adjust my patch to check that. The difference in behavior
is explicit enough we can think about it easily.

Eric








2022-05-04 02:00:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL

Oleg Nesterov <[email protected]> writes:

> On 04/29, Eric W. Biederman wrote:
>>
>> Call send_sig_info in PTRACE_KILL instead of ptrace_resume. Calling
>> ptrace_resume is not safe to call if the task has not been stopped
>> with ptrace_freeze_traced.
>
> Oh, I was never, never able to understand why do we have PTRACE_KILL
> and what should it actually do.
>
> I suggested many times to simply remove it but OK, we probably can't
> do this.

I thought I remembered you suggesting fixing it in some other way.

I took at quick look in codesearch.debian.net and PTRACE_KILL is
definitely in use. I find uses in gcc-10, firefox-esr_91.8,
llvm_toolchain, qtwebengine. At which point I stopped looking.


>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request,
>> case PTRACE_KILL:
>> if (child->exit_state) /* already dead */
>> return 0;
>> - return ptrace_resume(child, request, SIGKILL);
>> + return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
>
> Note that currently ptrace(PTRACE_KILL) can never fail (yes, yes, it
> is unsafe), but send_sig_info() can. If we do not remove PTRACE_KILL,
> then I'd suggest
>
> case PTRACE_KILL:
> if (!child->exit_state)
> send_sig_info(SIGKILL);
> return 0;
>
> to make this change a bit more compatible.


Quite. The only failure I can find from send_sig_info is if
lock_task_sighand fails and PTRACE_KILL is deliberately ignoring errors
when the target task has exited.

case PTRACE_KILL:
send_sig_info(SIGKILL);
return 0;

I think that should suffice.


> Also, please remove the note about PTRACE_KILL in
> set_task_blockstep().

Good catch, thank you.

Eric

2022-05-04 17:13:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

On 05/03, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
> > with SIGKILL. I still can't understand this.
> >
> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
> > in 11/12.
>
> >
> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
> > *signal_wake_up() better?
>
> Not changing __state is better because it removes special cases
> from the scheduler that only apply to ptrace.

Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.

I meant, I do not think that removing KILLABLE from TASK_TRACED (not
from __state) and complicating *signal_wake_up() (I mean, compared
to your previous version) is a good idea.

And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
around wait_task_inactive().

> > And even if we need to ensure the tracee will always block after
> > ptrace_freeze_traced(), we can change signal_pending_state() to
> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
> > looks unnecessary to me.
>
> We still need to change signal_wake_up in that case. Possibly
> signal_wake_up_state.

Of course. See above.

> >> if we depend on wait_task_inactive failing if the process is in the
> >> wrong state.
> >
> > OK, I guess this is what I do not understand. Could you spell please?
> >
> > And speaking of RT, wait_task_inactive() still can fail because
> > cgroup_enter_frozen() takes css_set_lock? And it is called under
> > preempt_disable() ? I don't understand the plan :/
>
> Let me describe his freezer change as that is much easier to get to the
> final result. RT has more problems as it turns all spin locks into
> sleeping locks. When a task is frozen

[...snip...]

Oh, thanks Eric, but I understand this part. But I still can't understand
why is it that critical to block in schedule... OK, I need to think about
it. Lets assume this is really necessary.

Anyway. I'd suggest to not change TASK_TRACED in this series and not
complicate signal_wake_up() more than you did in your previous version:

static inline void signal_wake_up(struct task_struct *t, bool resume)
{
bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
}

JOBCTL_PTRACE_FROZEN is fine.

ptrace_check_attach() can do

if (!ret && !ignore_state &&
/*
* This can only fail if the frozen tracee races with
* SIGKILL and enters schedule() with fatal_signal_pending
*/
!wait_task_inactive(child, __TASK_TRACED))
ret = -ESRCH;

return ret;


Now. If/when we really need to ensure that the frozen tracee always
blocks and wait_task_inactive() never fails, we can just do

- add the fatal_signal_pending() check into ptrace_stop()
(like this patch does)

- say, change signal_pending_state:

static inline int signal_pending_state(unsigned int state, struct task_struct *p)
{
if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
return 0;
if (!signal_pending(p))
return 0;
if (p->jobctl & JOBCTL_TASK_FROZEN)
return 0;
return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
}

in a separate patch which should carefully document the need for this
change.

> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
> > I mean, I am not sure it worth the trouble.
>
> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
> - stopping in ptrace_report_syscall.
> - Not having PT_TRACESYSGOOD set.
> - The tracee being killed with a fatal signal
^^^^^^
tracer ?
> - The tracee sending SIGTRAP to itself.

Oh, but this is clear. But do we really care? If the tracer exits
unexpectedly, the tracee can have a lot more problems, I don't think
that this particular one is that important.

Oleg.


2022-05-05 01:58:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

Oleg Nesterov <[email protected]> writes:

> On 05/03, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <[email protected]> writes:
>>
>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>> > with SIGKILL. I still can't understand this.
>> >
>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>> > in 11/12.
>>
>> >
>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>> > *signal_wake_up() better?
>>
>> Not changing __state is better because it removes special cases
>> from the scheduler that only apply to ptrace.
>
> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>
> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
> from __state) and complicating *signal_wake_up() (I mean, compared
> to your previous version) is a good idea.
>
> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
> around wait_task_inactive().
>
>> > And even if we need to ensure the tracee will always block after
>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>> > looks unnecessary to me.
>>
>> We still need to change signal_wake_up in that case. Possibly
>> signal_wake_up_state.
>
> Of course. See above.
>
>> >> if we depend on wait_task_inactive failing if the process is in the
>> >> wrong state.
>> >
>> > OK, I guess this is what I do not understand. Could you spell please?
>> >
>> > And speaking of RT, wait_task_inactive() still can fail because
>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>> > preempt_disable() ? I don't understand the plan :/
>>
>> Let me describe his freezer change as that is much easier to get to the
>> final result. RT has more problems as it turns all spin locks into
>> sleeping locks. When a task is frozen
>
> [...snip...]
>
> Oh, thanks Eric, but I understand this part. But I still can't understand
> why is it that critical to block in schedule... OK, I need to think about
> it. Lets assume this is really necessary.
>
> Anyway. I'd suggest to not change TASK_TRACED in this series and not
> complicate signal_wake_up() more than you did in your previous version:
>
> static inline void signal_wake_up(struct task_struct *t, bool resume)
> {
> bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
> signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
> }

If your concern is signal_wake_up there is no reason it can't be:

static inline void signal_wake_up(struct task_struct *t, bool fatal)
{
fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
}

I guess I was more targeted in this version, which lead to more if
statements but as there is only one place in the code that can be
JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.

So yes. I can make the code as simple as my earlier version of
signal_wake_up.

> JOBCTL_PTRACE_FROZEN is fine.
>
> ptrace_check_attach() can do
>
> if (!ret && !ignore_state &&
> /*
> * This can only fail if the frozen tracee races with
> * SIGKILL and enters schedule() with fatal_signal_pending
> */
> !wait_task_inactive(child, __TASK_TRACED))
> ret = -ESRCH;
>
> return ret;
>
>
> Now. If/when we really need to ensure that the frozen tracee always
> blocks and wait_task_inactive() never fails, we can just do
>
> - add the fatal_signal_pending() check into ptrace_stop()
> (like this patch does)
>
> - say, change signal_pending_state:
>
> static inline int signal_pending_state(unsigned int state, struct task_struct *p)
> {
> if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
> return 0;
> if (!signal_pending(p))
> return 0;
> if (p->jobctl & JOBCTL_TASK_FROZEN)
> return 0;
> return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
> }
>
> in a separate patch which should carefully document the need for this
> change.
>
>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>> > I mean, I am not sure it worth the trouble.
>>
>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>> - stopping in ptrace_report_syscall.
>> - Not having PT_TRACESYSGOOD set.
>> - The tracee being killed with a fatal signal
> ^^^^^^
> tracer ?

Both actually.

>> - The tracee sending SIGTRAP to itself.
>
> Oh, but this is clear. But do we really care? If the tracer exits
> unexpectedly, the tracee can have a lot more problems, I don't think
> that this particular one is that important.

I don't know of complaints, and if you haven't heard them either
that that is a good indication that in practice we don't care.

At a practical level I just don't want that silly case that sets
TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
remain. It just seems to make everything more complicated for no real
reason anymore. The deadlocks may_ptrace_stop was guarding against are
gone.

Plus the test is so racy we case can happen after we drop siglock
before we schedule, or shortly after we have stopped so we really
don't reliably catch the condition the code is trying to catch.

I think the case I care most about is ptrace_signal, which pretty much
requires the tracer to wait and clear exit_code before being terminated
to cause problems. We don't handle that at all today.

So yeah. I think the code handles so little at this point we can just
remove the code and simplify things, if we actually care we can come
back and implement JOBCTL_PTRACE_SIGNR or the like.

I will chew on that a bit and see if I can find any reasons for keeping
the code in ptrace_stop at all.



As an added data point we can probably remove handling of the signal
from ptrace_report_syscall entirely (not in this patchset!).

I took a quick skim and it appears that sending a signal in
ptrace_report_syscall appears to be a feature introduced with ptrace
support in Linux v1.0 and the comment in ptrace_report_syscall appears
to document the fact that the code has always been dead.


I made it through 13 of 133 pages of debian code search results for
PTRACE_SYSCALL, and the only use I could find of setting the continue
signal was when the signal reported from wait was not SIGTRAP. Exactly
the same as in the comment in ptrace_report_syscall.

If that pattern holds for all of the uses of ptrace then the code
in ptrace_report_syscall is dead.



Eric


2022-05-05 07:17:15

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 0/11] ptrace: cleaning up ptrace_stop


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups. This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track. The biggest change in v3 is that
instead of trying to prevent sending a spurious SIGTRAP when the tracer
dies with the tracee in ptrace_report_syscall, I have modified the code
to just stop trying. While I still have taken TASK_WAKEKILL out of
TASK_TRACED I have implemented simpler logic in signal_wake_up. Further
I have followed Oleg's advice and exit early from ptrace_stop if a fatal
signal is pending.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

I believe this set of changes will provide a firm foundation for solving
the PREEMPT_RT and freezer challenges.

With fewer lines added and more lines removed this set of changes looks
like it is moving in a good direction.

Eric W. Biederman (10):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++----------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 135 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 138 insertions(+), 237 deletions(-)

Eric

2022-05-05 13:32:18

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 05/11] ptrace: Remove arch_ptrace_attach

The last remaining implementation of arch_ptrace_attach is ia64's
ptrace_attach_sync_user_rbs which was added at the end of 2007 in
commit aa91a2e90044 ("[IA64] Synchronize RBS on PTRACE_ATTACH").

Reading the comments and examining the code ptrace_attach_sync_user_rbs
has the sole purpose of saving registers to the stack when ptrace_attach
changes TASK_STOPPED to TASK_TRACED. In all other cases arch_ptrace_stop
takes care of the register saving.

In commit d79fdd6d96f4 ("ptrace: Clean transitions between TASK_STOPPED and TRACED")
modified ptrace_attach to wake up the thread and enter ptrace_stop normally even
when the thread starts out stopped.

This makes ptrace_attach_sync_user_rbs completely unnecessary. So just
remove it.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/ia64/include/asm/ptrace.h | 4 ---
arch/ia64/kernel/ptrace.c | 57 ----------------------------------
kernel/ptrace.c | 18 -----------
3 files changed, 79 deletions(-)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index a10a498eede1..402874489890 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -139,10 +139,6 @@ static inline long regs_return_value(struct pt_regs *regs)
#define arch_ptrace_stop_needed() \
(!test_thread_flag(TIF_RESTORE_RSE))

- extern void ptrace_attach_sync_user_rbs (struct task_struct *);
- #define arch_ptrace_attach(child) \
- ptrace_attach_sync_user_rbs(child)
-
#define arch_has_single_step() (1)
#define arch_has_block_step() (1)

diff --git a/arch/ia64/kernel/ptrace.c b/arch/ia64/kernel/ptrace.c
index a19acd9f5e1f..a45f529046c3 100644
--- a/arch/ia64/kernel/ptrace.c
+++ b/arch/ia64/kernel/ptrace.c
@@ -617,63 +617,6 @@ void ia64_sync_krbs(void)
unw_init_running(do_sync_rbs, ia64_sync_kernel_rbs);
}

-/*
- * After PTRACE_ATTACH, a thread's register backing store area in user
- * space is assumed to contain correct data whenever the thread is
- * stopped. arch_ptrace_stop takes care of this on tracing stops.
- * But if the child was already stopped for job control when we attach
- * to it, then it might not ever get into ptrace_stop by the time we
- * want to examine the user memory containing the RBS.
- */
-void
-ptrace_attach_sync_user_rbs (struct task_struct *child)
-{
- int stopped = 0;
- struct unw_frame_info info;
-
- /*
- * If the child is in TASK_STOPPED, we need to change that to
- * TASK_TRACED momentarily while we operate on it. This ensures
- * that the child won't be woken up and return to user mode while
- * we are doing the sync. (It can only be woken up for SIGKILL.)
- */
-
- read_lock(&tasklist_lock);
- if (child->sighand) {
- spin_lock_irq(&child->sighand->siglock);
- if (READ_ONCE(child->__state) == TASK_STOPPED &&
- !test_and_set_tsk_thread_flag(child, TIF_RESTORE_RSE)) {
- set_notify_resume(child);
-
- WRITE_ONCE(child->__state, TASK_TRACED);
- stopped = 1;
- }
- spin_unlock_irq(&child->sighand->siglock);
- }
- read_unlock(&tasklist_lock);
-
- if (!stopped)
- return;
-
- unw_init_from_blocked_task(&info, child);
- do_sync_rbs(&info, ia64_sync_user_rbs);
-
- /*
- * Now move the child back into TASK_STOPPED if it should be in a
- * job control stop, so that SIGCONT can be used to wake it up.
- */
- read_lock(&tasklist_lock);
- if (child->sighand) {
- spin_lock_irq(&child->sighand->siglock);
- if (READ_ONCE(child->__state) == TASK_TRACED &&
- (child->signal->flags & SIGNAL_STOP_STOPPED)) {
- WRITE_ONCE(child->__state, TASK_STOPPED);
- }
- spin_unlock_irq(&child->sighand->siglock);
- }
- read_unlock(&tasklist_lock);
-}
-
/*
* Write f32-f127 back to task->thread.fph if it has been modified.
*/
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ccc4b465775b..da30dcd477a0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1285,10 +1285,6 @@ int ptrace_request(struct task_struct *child, long request,
return ret;
}

-#ifndef arch_ptrace_attach
-#define arch_ptrace_attach(child) do { } while (0)
-#endif
-
SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
unsigned long, data)
{
@@ -1297,8 +1293,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,

if (request == PTRACE_TRACEME) {
ret = ptrace_traceme();
- if (!ret)
- arch_ptrace_attach(current);
goto out;
}

@@ -1310,12 +1304,6 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,

if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
ret = ptrace_attach(child, request, addr, data);
- /*
- * Some architectures need to do book-keeping after
- * a ptrace attach.
- */
- if (!ret)
- arch_ptrace_attach(child);
goto out_put_task_struct;
}

@@ -1455,12 +1443,6 @@ COMPAT_SYSCALL_DEFINE4(ptrace, compat_long_t, request, compat_long_t, pid,

if (request == PTRACE_ATTACH || request == PTRACE_SEIZE) {
ret = ptrace_attach(child, request, addr, data);
- /*
- * Some architectures need to do book-keeping after
- * a ptrace attach.
- */
- if (!ret)
- arch_ptrace_attach(child);
goto out_put_task_struct;
}

--
2.35.3


2022-05-05 15:24:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 02/11] signal: Replace __group_send_sig_info with send_signal_locked

The function __group_send_sig_info is just a light wrapper around
send_signal_locked with one parameter fixed to a constant value. As
the wrapper adds no real value update the code to directly call the
wrapped function.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
drivers/tty/tty_jobctrl.c | 4 ++--
include/linux/signal.h | 1 -
kernel/signal.c | 8 +-------
kernel/time/posix-cpu-timers.c | 6 +++---
4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 80b86a7992b5..0d04287da098 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -215,8 +215,8 @@ int tty_signal_session_leader(struct tty_struct *tty, int exit_session)
spin_unlock_irq(&p->sighand->siglock);
continue;
}
- __group_send_sig_info(SIGHUP, SEND_SIG_PRIV, p);
- __group_send_sig_info(SIGCONT, SEND_SIG_PRIV, p);
+ send_signal_locked(SIGHUP, SEND_SIG_PRIV, p, PIDTYPE_TGID);
+ send_signal_locked(SIGCONT, SEND_SIG_PRIV, p, PIDTYPE_TGID);
put_pid(p->signal->tty_old_pgrp); /* A noop */
spin_lock(&tty->ctrl.lock);
tty_pgrp = get_pid(tty->ctrl.pgrp);
diff --git a/include/linux/signal.h b/include/linux/signal.h
index 55605bdf5ce9..3b98e7a28538 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
-extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int send_signal_locked(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
extern int sigprocmask(int, sigset_t *, sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index b0403197b0ad..72d96614effc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1281,12 +1281,6 @@ static int __init setup_print_fatal_signals(char *str)

__setup("print-fatal-signals=", setup_print_fatal_signals);

-int
-__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
-{
- return send_signal_locked(sig, info, p, PIDTYPE_TGID);
-}
-
int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
enum pid_type type)
{
@@ -2173,7 +2167,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
spin_lock_irqsave(&sighand->siglock, flags);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
- __group_send_sig_info(SIGCHLD, &info, parent);
+ send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
/*
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a97193984db..cb925e8ef9a8 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -870,7 +870,7 @@ static inline void check_dl_overrun(struct task_struct *tsk)
{
if (tsk->dl.dl_overrun) {
tsk->dl.dl_overrun = 0;
- __group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
+ send_signal_locked(SIGXCPU, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
}
}

@@ -884,7 +884,7 @@ static bool check_rlimit(u64 time, u64 limit, int signo, bool rt, bool hard)
rt ? "RT" : "CPU", hard ? "hard" : "soft",
current->comm, task_pid_nr(current));
}
- __group_send_sig_info(signo, SEND_SIG_PRIV, current);
+ send_signal_locked(signo, SEND_SIG_PRIV, current, PIDTYPE_TGID);
return true;
}

@@ -958,7 +958,7 @@ static void check_cpu_itimer(struct task_struct *tsk, struct cpu_itimer *it,
trace_itimer_expire(signo == SIGPROF ?
ITIMER_PROF : ITIMER_VIRTUAL,
task_tgid(tsk), cur_time);
- __group_send_sig_info(signo, SEND_SIG_PRIV, tsk);
+ send_signal_locked(signo, SEND_SIG_PRIV, tsk, PIDTYPE_TGID);
}

if (it->expires && it->expires < *expires)
--
2.35.3


2022-05-05 18:25:23

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 09/11] ptrace: Don't change __state

Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
command is executing.

Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
implemention a new jobctl flag TASK_PTRACE_FROZEN. This new flag is
set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
jobctl_unfreeze_task (when ptrace_stop remains asleep).

In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL
when the wake up is for a fatal signal. Skip adding __TASK_TRACED
when TASK_PTRACE_FROZEN is not set. This has the same effect as
changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use
TASK_KILLABLE go through signal_wake_up.

Handle a ptrace_stop being called with a pending fatal signal.
Previously it would have been handled by schedule simply failing to
sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule
will sleep with a fatal_signal_pending. The code in signal_wake_up
guarantees that the code will be awaked by any fatal signal that
codes after TASK_TRACED is set.

Previously the __state value of __TASK_TRACED was changed to
TASK_RUNNING when woken up or back to TASK_TRACED when the code was
left in ptrace_stop. Now when woken up ptrace_stop now clears
JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced
clears JOBCTL_PTRACE_FROZEN.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/sched.h | 2 +-
include/linux/sched/jobctl.h | 2 ++
include/linux/sched/signal.h | 5 +++--
kernel/ptrace.c | 21 ++++++++-------------
kernel/sched/core.c | 5 +----
kernel/signal.c | 10 +++++++---
6 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d5e3c00b74e1..610f2fdb1e2c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -103,7 +103,7 @@ struct task_group;
/* Convenience macros for the sake of set_current_state: */
#define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)
#define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED)
-#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED)
+#define TASK_TRACED __TASK_TRACED

#define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD)

diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index fa067de9f1a9..d556c3425963 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -19,6 +19,7 @@ struct task_struct;
#define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */
#define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
+#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */

#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
@@ -28,6 +29,7 @@ struct task_struct;
#define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT)
#define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT)
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
+#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT)

#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..e66948abbee4 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -435,9 +435,10 @@ extern void calculate_sigpending(void);

extern void signal_wake_up_state(struct task_struct *t, unsigned int state);

-static inline void signal_wake_up(struct task_struct *t, bool resume)
+static inline void signal_wake_up(struct task_struct *t, bool fatal)
{
- signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0);
+ fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
+ signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
}
static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05953ac9f7bd..83ed28262708 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -197,7 +197,7 @@ static bool ptrace_freeze_traced(struct task_struct *task)
spin_lock_irq(&task->sighand->siglock);
if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
!__fatal_signal_pending(task)) {
- WRITE_ONCE(task->__state, __TASK_TRACED);
+ task->jobctl |= JOBCTL_PTRACE_FROZEN;
ret = true;
}
spin_unlock_irq(&task->sighand->siglock);
@@ -207,23 +207,19 @@ static bool ptrace_freeze_traced(struct task_struct *task)

static void ptrace_unfreeze_traced(struct task_struct *task)
{
- if (READ_ONCE(task->__state) != __TASK_TRACED)
- return;
-
- WARN_ON(!task->ptrace || task->parent != current);
+ unsigned long flags;

/*
- * PTRACE_LISTEN can allow ptrace_trap_notify to wake us up remotely.
- * Recheck state under the lock to close this race.
+ * The child may be awake and may have cleared
+ * JOBCTL_PTRACE_FROZEN (see ptrace_resume). The child will
+ * not set JOBCTL_PTRACE_FROZEN or enter __TASK_TRACED anew.
*/
- spin_lock_irq(&task->sighand->siglock);
- if (READ_ONCE(task->__state) == __TASK_TRACED) {
+ if (lock_task_sighand(task, &flags)) {
+ task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
if (__fatal_signal_pending(task))
wake_up_state(task, __TASK_TRACED);
- else
- WRITE_ONCE(task->__state, TASK_TRACED);
+ unlock_task_sighand(task, &flags);
}
- spin_unlock_irq(&task->sighand->siglock);
}

/**
@@ -256,7 +252,6 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
*/
read_lock(&tasklist_lock);
if (child->ptrace && child->parent == current) {
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
/*
* child->sighand can't be NULL, release_task()
* does ptrace_unlink() before __exit_signal().
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d575b4914925..3c351707e830 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6304,10 +6304,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)

/*
* We must load prev->state once (task_struct::state is volatile), such
- * that:
- *
- * - we form a control dependency vs deactivate_task() below.
- * - ptrace_{,un}freeze_traced() can change ->state underneath us.
+ * that we form a control dependency vs deactivate_task() below.
*/
prev_state = READ_ONCE(prev->__state);
if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
diff --git a/kernel/signal.c b/kernel/signal.c
index 16828fde5424..e0b416b21ad3 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2210,9 +2210,13 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
}

/*
- * schedule() will not sleep if there is a pending signal that
- * can awaken the task.
+ * After this point signal_wake_up will clear TASK_TRACED
+ * if a fatal signal comes in. Handle previous fatal signals
+ * here to prevent ptrace_stop sleeping in schedule.
*/
+ if (__fatal_signal_pending(current))
+ return exit_code;
+
set_special_state(TASK_TRACED);

/*
@@ -2300,7 +2304,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
current->exit_code = 0;

/* LISTENING can be set only during STOP traps, clear it */
- current->jobctl &= ~JOBCTL_LISTENING;
+ current->jobctl &= ~(JOBCTL_LISTENING | JOBCTL_PTRACE_FROZEN);

/*
* Queued signals ignored us while we were stopped for tracing.
--
2.35.3


2022-05-05 18:56:19

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 01/11] signal: Rename send_signal send_signal_locked

Rename send_signal and __send_signal to send_signal_locked and
__send_signal_locked to make send_signal usable outside of
signal.c.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
include/linux/signal.h | 2 ++
kernel/signal.c | 24 ++++++++++++------------
2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/signal.h b/include/linux/signal.h
index a6db6f2ae113..55605bdf5ce9 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info,
extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *);
+extern int send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *p, enum pid_type type);
extern int sigprocmask(int, sigset_t *, sigset_t *);
extern void set_current_blocked(sigset_t *);
extern void __set_current_blocked(const sigset_t *);
diff --git a/kernel/signal.c b/kernel/signal.c
index 30cd1ca43bcd..b0403197b0ad 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1071,8 +1071,8 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}

-static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
- enum pid_type type, bool force)
+static int __send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *t, enum pid_type type, bool force)
{
struct sigpending *pending;
struct sigqueue *q;
@@ -1212,8 +1212,8 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
return ret;
}

-static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
- enum pid_type type)
+int send_signal_locked(int sig, struct kernel_siginfo *info,
+ struct task_struct *t, enum pid_type type)
{
/* Should SIGKILL or SIGSTOP be received by a pid namespace init? */
bool force = false;
@@ -1245,7 +1245,7 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
force = true;
}
}
- return __send_signal(sig, info, t, type, force);
+ return __send_signal_locked(sig, info, t, type, force);
}

static void print_fatal_signal(int signr)
@@ -1284,7 +1284,7 @@ __setup("print-fatal-signals=", setup_print_fatal_signals);
int
__group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p)
{
- return send_signal(sig, info, p, PIDTYPE_TGID);
+ return send_signal_locked(sig, info, p, PIDTYPE_TGID);
}

int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p,
@@ -1294,7 +1294,7 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
int ret = -ESRCH;

if (lock_task_sighand(p, &flags)) {
- ret = send_signal(sig, info, p, type);
+ ret = send_signal_locked(sig, info, p, type);
unlock_task_sighand(p, &flags);
}

@@ -1347,7 +1347,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
if (action->sa.sa_handler == SIG_DFL &&
(!t->ptrace || (handler == HANDLER_EXIT)))
t->signal->flags &= ~SIGNAL_UNKILLABLE;
- ret = send_signal(sig, info, t, PIDTYPE_PID);
+ ret = send_signal_locked(sig, info, t, PIDTYPE_PID);
spin_unlock_irqrestore(&t->sighand->siglock, flags);

return ret;
@@ -1567,7 +1567,7 @@ int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,

if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, &info, p, PIDTYPE_TGID, false);
+ ret = __send_signal_locked(sig, &info, p, PIDTYPE_TGID, false);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -2103,7 +2103,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
* parent's namespaces.
*/
if (valid_signal(sig) && sig)
- __send_signal(sig, &info, tsk->parent, PIDTYPE_TGID, false);
+ __send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false);
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);

@@ -2601,7 +2601,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
/* If the (new) signal is now blocked, requeue it. */
if (sigismember(&current->blocked, signr) ||
fatal_signal_pending(current)) {
- send_signal(signr, info, current, type);
+ send_signal_locked(signr, info, current, type);
signr = 0;
}

@@ -4793,7 +4793,7 @@ void kdb_send_sig(struct task_struct *t, int sig)
"the deadlock.\n");
return;
}
- ret = send_signal(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
+ ret = send_signal_locked(sig, SEND_SIG_PRIV, t, PIDTYPE_PID);
spin_unlock(&t->sighand->siglock);
if (ret)
kdb_printf("Fail to deliver Signal %d to process %d.\n",
--
2.35.3


2022-05-05 23:22:07

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

Long ago and far away there was a BUG_ON at the start of ptrace_stop
that did "BUG_ON(!(current->ptrace & PT_PTRACED));" [1]. The BUG_ON
had never triggered but examination of the code showed that the BUG_ON
could actually trigger. To complement removing the BUG_ON an attempt
to better handle the race was added.

The code detected the tracer had gone away and did not call
do_notify_parent_cldstop. The code also attempted to prevent
ptrace_report_syscall from sending spurious SIGTRAPs when the tracer
went away.

The code to detect when the tracer had gone away before sending a
signal to tracer was a legitimate fix and continues to work to this
date.

The code to prevent sending spurious SIGTRAPs is a failure. At the
time and until today the code only catches it when the tracer goes
away after siglock is dropped and before read_lock is acquired. If
the tracer goes away after read_lock is dropped a spurious SIGTRAP can
still be sent to the tracee. The tracer going away after read_lock
is dropped is the far likelier case as it is the bigger window.

Given that the attempt to prevent the generation of a SIGTRAP was a
failure and continues to be a failure remove the code that attempts to
do that. This simplifies the code in ptrace_stop and makes
ptrace_stop much easier to reason about.

To successfully deal with the tracer going away, all of the tracer's
instrumentation of the child would need to be removed, and reliably
detecting when the tracer has set a signal to continue with would need
to be implemented.

With the removal of the incomplete detection of the tracer going away
in ptrace_stop, ptrace_stop always sleeps in schedule after
ptrace_freeze_traced succeeds. Modify ptrace_check_attach to
warn if wait_task_inactive fails.

[1] 66519f549ae5 ("[PATCH] fix ptracer death race yielding bogus BUG_ON")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 14 ++-------
kernel/signal.c | 81 ++++++++++++++++++-------------------------------
2 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 7105821595bc..05953ac9f7bd 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
}
read_unlock(&tasklist_lock);

- if (!ret && !ignore_state) {
- if (!wait_task_inactive(child, __TASK_TRACED)) {
- /*
- * This can only happen if may_ptrace_stop() fails and
- * ptrace_stop() changes ->state back to TASK_RUNNING,
- * so we should not worry about leaking __TASK_TRACED.
- */
- WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
- ret = -ESRCH;
- }
- }
+ if (!ret && !ignore_state &&
+ WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
+ ret = -ESRCH;

return ret;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index 3fd2ce133387..16828fde5424 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2187,8 +2187,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
* with. If the code did not stop because the tracer is gone,
* the stop signal remains unchanged unless clear_code.
*/
-static int ptrace_stop(int exit_code, int why, int clear_code,
- unsigned long message, kernel_siginfo_t *info)
+static int ptrace_stop(int exit_code, int why, unsigned long message,
+ kernel_siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
{
@@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,

spin_unlock_irq(&current->sighand->siglock);
read_lock(&tasklist_lock);
- if (likely(current->ptrace)) {
- /*
- * Notify parents of the stop.
- *
- * While ptraced, there are two parents - the ptracer and
- * the real_parent of the group_leader. The ptracer should
- * know about every stop while the real parent is only
- * interested in the completion of group stop. The states
- * for the two don't interact with each other. Notify
- * separately unless they're gonna be duplicates.
- */
+ /*
+ * Notify parents of the stop.
+ *
+ * While ptraced, there are two parents - the ptracer and
+ * the real_parent of the group_leader. The ptracer should
+ * know about every stop while the real parent is only
+ * interested in the completion of group stop. The states
+ * for the two don't interact with each other. Notify
+ * separately unless they're gonna be duplicates.
+ */
+ if (current->ptrace)
do_notify_parent_cldstop(current, true, why);
- if (gstop_done && ptrace_reparented(current))
- do_notify_parent_cldstop(current, false, why);
+ if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
+ do_notify_parent_cldstop(current, false, why);

- /*
- * Don't want to allow preemption here, because
- * sys_ptrace() needs this task to be inactive.
- *
- * XXX: implement read_unlock_no_resched().
- */
- preempt_disable();
- read_unlock(&tasklist_lock);
- cgroup_enter_frozen();
- preempt_enable_no_resched();
- freezable_schedule();
- cgroup_leave_frozen(true);
- } else {
- /*
- * By the time we got the lock, our tracer went away.
- * Don't drop the lock yet, another tracer may come.
- *
- * If @gstop_done, the ptracer went away between group stop
- * completion and here. During detach, it would have set
- * JOBCTL_STOP_PENDING on us and we'll re-enter
- * TASK_STOPPED in do_signal_stop() on return, so notifying
- * the real parent of the group stop completion is enough.
- */
- if (gstop_done)
- do_notify_parent_cldstop(current, false, why);
-
- /* tasklist protects us from ptrace_freeze_traced() */
- __set_current_state(TASK_RUNNING);
- read_code = false;
- if (clear_code)
- exit_code = 0;
- read_unlock(&tasklist_lock);
- }
+ /*
+ * Don't want to allow preemption here, because
+ * sys_ptrace() needs this task to be inactive.
+ *
+ * XXX: implement read_unlock_no_resched().
+ */
+ preempt_disable();
+ read_unlock(&tasklist_lock);
+ cgroup_enter_frozen();
+ preempt_enable_no_resched();
+ freezable_schedule();
+ cgroup_leave_frozen(true);

/*
* We are back. Now reacquire the siglock before touching
@@ -2343,7 +2322,7 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
info.si_uid = from_kuid_munged(current_user_ns(), current_uid());

/* Let the debugger run. */
- return ptrace_stop(exit_code, why, 1, message, &info);
+ return ptrace_stop(exit_code, why, message, &info);
}

int ptrace_notify(int exit_code, unsigned long message)
@@ -2515,7 +2494,7 @@ static void do_jobctl_trap(void)
CLD_STOPPED, 0);
} else {
WARN_ON_ONCE(!signr);
- ptrace_stop(signr, CLD_STOPPED, 0, 0, NULL);
+ ptrace_stop(signr, CLD_STOPPED, 0, NULL);
}
}

@@ -2568,7 +2547,7 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
* comment in dequeue_signal().
*/
current->jobctl |= JOBCTL_STOP_DEQUEUED;
- signr = ptrace_stop(signr, CLD_TRAPPED, 0, 0, info);
+ signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);

/* We're back. Did the debugger cancel the sig? */
if (signr == 0)
--
2.35.3


2022-05-06 01:50:11

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 04/11] ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP

xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in
user_enable_single_step and user_disable_single_step without locking could
potentiallly cause problems.

So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP
that xtensa already had defined but unused.

Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users.

Cc: [email protected]
Acked-by: Max Filippov <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/xtensa/kernel/ptrace.c | 4 ++--
arch/xtensa/kernel/signal.c | 4 ++--
include/linux/ptrace.h | 6 ------
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/xtensa/kernel/ptrace.c b/arch/xtensa/kernel/ptrace.c
index 323c678a691f..b952e67cc0cc 100644
--- a/arch/xtensa/kernel/ptrace.c
+++ b/arch/xtensa/kernel/ptrace.c
@@ -225,12 +225,12 @@ const struct user_regset_view *task_user_regset_view(struct task_struct *task)

void user_enable_single_step(struct task_struct *child)
{
- child->ptrace |= PT_SINGLESTEP;
+ set_tsk_thread_flag(child, TIF_SINGLESTEP);
}

void user_disable_single_step(struct task_struct *child)
{
- child->ptrace &= ~PT_SINGLESTEP;
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
}

/*
diff --git a/arch/xtensa/kernel/signal.c b/arch/xtensa/kernel/signal.c
index 6f68649e86ba..ac50ec46c8f1 100644
--- a/arch/xtensa/kernel/signal.c
+++ b/arch/xtensa/kernel/signal.c
@@ -473,7 +473,7 @@ static void do_signal(struct pt_regs *regs)
/* Set up the stack frame */
ret = setup_frame(&ksig, sigmask_to_save(), regs);
signal_setup_done(ret, &ksig, 0);
- if (current->ptrace & PT_SINGLESTEP)
+ if (test_thread_flag(TIF_SINGLESTEP))
task_pt_regs(current)->icountlevel = 1;

return;
@@ -499,7 +499,7 @@ static void do_signal(struct pt_regs *regs)
/* If there's no signal to deliver, we just restore the saved mask. */
restore_saved_sigmask();

- if (current->ptrace & PT_SINGLESTEP)
+ if (test_thread_flag(TIF_SINGLESTEP))
task_pt_regs(current)->icountlevel = 1;
return;
}
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4c06f9f8ef3f..c952c5ba8fab 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,
#define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT)
#define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT)

-/* single stepping state bits (used on ARM and PA-RISC) */
-#define PT_SINGLESTEP_BIT 31
-#define PT_SINGLESTEP (1<<PT_SINGLESTEP_BIT)
-#define PT_BLOCKSTEP_BIT 30
-#define PT_BLOCKSTEP (1<<PT_BLOCKSTEP_BIT)
-
extern long arch_ptrace(struct task_struct *child, long request,
unsigned long addr, unsigned long data);
extern int ptrace_readdata(struct task_struct *tsk, unsigned long src, char __user *dst, int len);
--
2.35.3


2022-05-06 02:51:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 09/11] ptrace: Don't change __state

Sebastian Andrzej Siewior <[email protected]> writes:

> On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
>> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
>> command is executing.
>>
>> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
>> implemention a new jobctl flag TASK_PTRACE_FROZEN. This new flag is
> implement ?

Yes. Thank you.

Eric

2022-05-06 16:04:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> - unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> + kernel_siginfo_t *info)

Forgot to mention... but in general I like this change.

In particular, I like the fact it kills the ugly "int clear_code" arg
which looks as if it solves the problems with the exiting tracer, but
actually it doesn't. And we do not really care, imo.

Oleg.


Subject: Re: [PATCH v3 09/11] ptrace: Don't change __state

On 2022-05-04 17:40:56 [-0500], Eric W. Biederman wrote:
> Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace
> command is executing.
>
> Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and
> implemention a new jobctl flag TASK_PTRACE_FROZEN. This new flag is
implement ?

> set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in
> jobctl_unfreeze_task (when ptrace_stop remains asleep).

Sebastian

2022-05-06 23:06:20

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

Oleg Nesterov <[email protected]> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> -static int ptrace_stop(int exit_code, int why, int clear_code,
>> - unsigned long message, kernel_siginfo_t *info)
>> +static int ptrace_stop(int exit_code, int why, unsigned long message,
>> + kernel_siginfo_t *info)
>
> Forgot to mention... but in general I like this change.
>
> In particular, I like the fact it kills the ugly "int clear_code" arg
> which looks as if it solves the problems with the exiting tracer, but
> actually it doesn't. And we do not really care, imo.

Further either this change is necessary or we need to take siglock in
the !current->ptrace path in "ptrace: Don't change __state" so that
JOBCTL_TRACED can be cleared.

So I vote for deleting code, and making ptrace_stop easier to reason
about.

Eric

2022-05-07 08:05:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 07/11] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL

The current implementation of PTRACE_KILL is buggy and has been for
many years as it assumes it's target has stopped in ptrace_stop. At a
quick skim it looks like this assumption has existed since ptrace
support was added in linux v1.0.

While PTRACE_KILL has been deprecated we can not remove it as
a quick search with google code search reveals many existing
programs calling it.

When the ptracee is not stopped at ptrace_stop some fields would be
set that are ignored except in ptrace_stop. Making the userspace
visible behavior of PTRACE_KILL a noop in those case.

As the usual rules are not obeyed it is not clear what the
consequences are of calling PTRACE_KILL on a running process.
Presumably userspace does not do this as it achieves nothing.

Replace the implementation of PTRACE_KILL with a simple
send_sig_info(SIGKILL) followed by a return 0. This changes the
observable user space behavior only in that PTRACE_KILL on a process
not stopped in ptrace_stop will also kill it. As that has always
been the intent of the code this seems like a reasonable change.

Cc: [email protected]
Reported-by: Al Viro <[email protected]>
Suggested-by: Al Viro <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/x86/kernel/step.c | 3 +--
kernel/ptrace.c | 5 ++---
2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 0f3c307b37b3..8e2b2552b5ee 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -180,8 +180,7 @@ void set_task_blockstep(struct task_struct *task, bool on)
*
* NOTE: this means that set/clear TIF_BLOCKSTEP is only safe if
* task is current or it can't be running, otherwise we can race
- * with __switch_to_xtra(). We rely on ptrace_freeze_traced() but
- * PTRACE_KILL is not safe.
+ * with __switch_to_xtra(). We rely on ptrace_freeze_traced().
*/
local_irq_disable();
debugctl = get_debugctlmsr();
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index da30dcd477a0..7105821595bc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1236,9 +1236,8 @@ int ptrace_request(struct task_struct *child, long request,
return ptrace_resume(child, request, data);

case PTRACE_KILL:
- if (child->exit_state) /* already dead */
- return 0;
- return ptrace_resume(child, request, SIGKILL);
+ send_sig_info(SIGKILL, SEND_SIG_NOINFO, child);
+ return 0;

#ifdef CONFIG_HAVE_ARCH_TRACEHOOK
case PTRACE_GETREGSET:
--
2.35.3


2022-05-08 19:18:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

On 05/04, Eric W. Biederman wrote:
>
> With the removal of the incomplete detection of the tracer going away
> in ptrace_stop, ptrace_stop always sleeps in schedule after
> ptrace_freeze_traced succeeds. Modify ptrace_check_attach to
> warn if wait_task_inactive fails.

Oh. Again, I don't understand the changelog. If we forget about RT,
ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
may_ptrace_stop() has gone.

IOW. Lets forget about RT

> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
> }
> read_unlock(&tasklist_lock);
>
> - if (!ret && !ignore_state) {
> - if (!wait_task_inactive(child, __TASK_TRACED)) {
> - /*
> - * This can only happen if may_ptrace_stop() fails and
> - * ptrace_stop() changes ->state back to TASK_RUNNING,
> - * so we should not worry about leaking __TASK_TRACED.
> - */
> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
> - ret = -ESRCH;
> - }
> - }
> + if (!ret && !ignore_state &&
> + WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
> + ret = -ESRCH;
>
> return ret;
> }

Why do you think this change would be wrong without any other changes?

Oleg.


2022-05-09 01:30:38

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 03/11] ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP

User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate
single stepping is a little confusing and worse changing tsk->ptrace without locking
could potentionally cause problems.

So use a thread info flag with a better name instead of flag in tsk->ptrace.

Remove the definition PT_DTRACE as uml is the last user.

Cc: [email protected]
Acked-by: Johannes Berg <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/um/include/asm/thread_info.h | 2 ++
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 ++++----
arch/um/kernel/signal.c | 4 ++--
include/linux/ptrace.h | 1 -
6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/um/include/asm/thread_info.h b/arch/um/include/asm/thread_info.h
index 1395cbd7e340..c7b4b49826a2 100644
--- a/arch/um/include/asm/thread_info.h
+++ b/arch/um/include/asm/thread_info.h
@@ -60,6 +60,7 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_RESTORE_SIGMASK 7
#define TIF_NOTIFY_RESUME 8
#define TIF_SECCOMP 9 /* secure computing */
+#define TIF_SINGLESTEP 10 /* single stepping userspace */

#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING)
@@ -68,5 +69,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_MEMDIE (1 << TIF_MEMDIE)
#define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)
+#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)

#endif
diff --git a/arch/um/kernel/exec.c b/arch/um/kernel/exec.c
index c85e40c72779..58938d75871a 100644
--- a/arch/um/kernel/exec.c
+++ b/arch/um/kernel/exec.c
@@ -43,7 +43,7 @@ void start_thread(struct pt_regs *regs, unsigned long eip, unsigned long esp)
{
PT_REGS_IP(regs) = eip;
PT_REGS_SP(regs) = esp;
- current->ptrace &= ~PT_DTRACE;
+ clear_thread_flag(TIF_SINGLESTEP);
#ifdef SUBARCH_EXECVE1
SUBARCH_EXECVE1(regs->regs);
#endif
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 80504680be08..88c5c7844281 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -335,7 +335,7 @@ int singlestepping(void * t)
{
struct task_struct *task = t ? t : current;

- if (!(task->ptrace & PT_DTRACE))
+ if (!test_thread_flag(TIF_SINGLESTEP))
return 0;

if (task->thread.singlestep_syscall)
diff --git a/arch/um/kernel/ptrace.c b/arch/um/kernel/ptrace.c
index bfaf6ab1ac03..5154b27de580 100644
--- a/arch/um/kernel/ptrace.c
+++ b/arch/um/kernel/ptrace.c
@@ -11,7 +11,7 @@

void user_enable_single_step(struct task_struct *child)
{
- child->ptrace |= PT_DTRACE;
+ set_tsk_thread_flag(child, TIF_SINGLESTEP);
child->thread.singlestep_syscall = 0;

#ifdef SUBARCH_SET_SINGLESTEPPING
@@ -21,7 +21,7 @@ void user_enable_single_step(struct task_struct *child)

void user_disable_single_step(struct task_struct *child)
{
- child->ptrace &= ~PT_DTRACE;
+ clear_tsk_thread_flag(child, TIF_SINGLESTEP);
child->thread.singlestep_syscall = 0;

#ifdef SUBARCH_SET_SINGLESTEPPING
@@ -120,7 +120,7 @@ static void send_sigtrap(struct uml_pt_regs *regs, int error_code)
}

/*
- * XXX Check PT_DTRACE vs TIF_SINGLESTEP for singlestepping check and
+ * XXX Check TIF_SINGLESTEP for singlestepping check and
* PT_PTRACED vs TIF_SYSCALL_TRACE for syscall tracing check
*/
int syscall_trace_enter(struct pt_regs *regs)
@@ -144,7 +144,7 @@ void syscall_trace_leave(struct pt_regs *regs)
audit_syscall_exit(regs);

/* Fake a debug trap */
- if (ptraced & PT_DTRACE)
+ if (test_thread_flag(TIF_SINGLESTEP))
send_sigtrap(&regs->regs, 0);

if (!test_thread_flag(TIF_SYSCALL_TRACE))
diff --git a/arch/um/kernel/signal.c b/arch/um/kernel/signal.c
index 88cd9b5c1b74..ae4658f576ab 100644
--- a/arch/um/kernel/signal.c
+++ b/arch/um/kernel/signal.c
@@ -53,7 +53,7 @@ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
unsigned long sp;
int err;

- if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED))
+ if (test_thread_flag(TIF_SINGLESTEP) && (current->ptrace & PT_PTRACED))
singlestep = 1;

/* Did we come from a system call? */
@@ -128,7 +128,7 @@ void do_signal(struct pt_regs *regs)
* on the host. The tracing thread will check this flag and
* PTRACE_SYSCALL if necessary.
*/
- if (current->ptrace & PT_DTRACE)
+ if (test_thread_flag(TIF_SINGLESTEP))
current->thread.singlestep_syscall =
is_syscall(PT_REGS_IP(&current->thread.regs));

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 15b3d176b6b4..4c06f9f8ef3f 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr,

#define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */
#define PT_PTRACED 0x00000001
-#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */

#define PT_OPT_FLAG_SHIFT 3
/* PT_TRACE_* event enable flags */
--
2.35.3


2022-05-09 02:01:51

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 11/11] sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

From: Peter Zijlstra <[email protected]>

Currently ptrace_stop() / do_signal_stop() rely on the special states
TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this
state exists only in task->__state and nowhere else.

There's two spots of bother with this:

- PREEMPT_RT has task->saved_state which complicates matters,
meaning task_is_{traced,stopped}() needs to check an additional
variable.

- An alternative freezer implementation that itself relies on a
special TASK state would loose TASK_TRACED/TASK_STOPPED and will
result in misbehaviour.

As such, add additional state to task->jobctl to track this state
outside of task->__state.

NOTE: this doesn't actually fix anything yet, just adds extra state.

--EWB
* didn't add a unnecessary newline in signal.h
* Update t->jobctl in signal_wake_up and ptrace_signal_wake_up
instead of in signal_wake_up_state. This prevents the clearing
of TASK_STOPPED and TASK_TRACED from getting lost.
* Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Eric W. Biederman <[email protected]>
---
include/linux/sched.h | 8 +++-----
include/linux/sched/jobctl.h | 6 ++++++
include/linux/sched/signal.h | 19 +++++++++++++++----
kernel/ptrace.c | 16 +++++++++++++---
kernel/signal.c | 10 ++++++++--
5 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 610f2fdb1e2c..cbe5c899599c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -118,11 +118,9 @@ struct task_group;

#define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING)

-#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0)
-
-#define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0)
-
-#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0)
+#define task_is_traced(task) ((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
+#define task_is_stopped(task) ((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0)
+#define task_is_stopped_or_traced(task) ((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0)

/*
* Special states are those that do not use the normal wait-loop pattern. See
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h
index d556c3425963..68876d0a7ef9 100644
--- a/include/linux/sched/jobctl.h
+++ b/include/linux/sched/jobctl.h
@@ -21,6 +21,9 @@ struct task_struct;
#define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */
#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */

+#define JOBCTL_STOPPED_BIT 26 /* do_signal_stop() */
+#define JOBCTL_TRACED_BIT 27 /* ptrace_stop() */
+
#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT)
#define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT)
#define JOBCTL_STOP_CONSUME (1UL << JOBCTL_STOP_CONSUME_BIT)
@@ -31,6 +34,9 @@ struct task_struct;
#define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT)
#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT)

+#define JOBCTL_STOPPED (1UL << JOBCTL_STOPPED_BIT)
+#define JOBCTL_TRACED (1UL << JOBCTL_TRACED_BIT)
+
#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY)
#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK)

diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index e66948abbee4..07ba3404fcde 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void)
static inline void kernel_signal_stop(void)
{
spin_lock_irq(&current->sighand->siglock);
- if (current->jobctl & JOBCTL_STOP_DEQUEUED)
+ if (current->jobctl & JOBCTL_STOP_DEQUEUED) {
+ current->jobctl |= JOBCTL_STOPPED;
set_special_state(TASK_STOPPED);
+ }
spin_unlock_irq(&current->sighand->siglock);

schedule();
@@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state);

static inline void signal_wake_up(struct task_struct *t, bool fatal)
{
- fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
- signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0);
+ unsigned int state = 0;
+ if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) {
+ t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED);
+ state = TASK_WAKEKILL | __TASK_TRACED;
+ }
+ signal_wake_up_state(t, state);
}
static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume)
{
- signal_wake_up_state(t, resume ? __TASK_TRACED : 0);
+ unsigned int state = 0;
+ if (resume) {
+ t->jobctl &= ~JOBCTL_TRACED;
+ state = __TASK_TRACED;
+ }
+ signal_wake_up_state(t, state);
}

void task_join_group_stop(struct task_struct *task);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 36a5b7a00d2f..328a34a99124 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -185,7 +185,12 @@ static bool looks_like_a_spurious_pid(struct task_struct *task)
return true;
}

-/* Ensure that nothing can wake it up, even SIGKILL */
+/*
+ * Ensure that nothing can wake it up, even SIGKILL
+ *
+ * A task is switched to this state while a ptrace operation is in progress;
+ * such that the ptrace operation is uninterruptible.
+ */
static bool ptrace_freeze_traced(struct task_struct *task)
{
bool ret = false;
@@ -216,8 +221,10 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
*/
if (lock_task_sighand(task, &flags)) {
task->jobctl &= ~JOBCTL_PTRACE_FROZEN;
- if (__fatal_signal_pending(task))
+ if (__fatal_signal_pending(task)) {
+ task->jobctl &= ~TASK_TRACED;
wake_up_state(task, __TASK_TRACED);
+ }
unlock_task_sighand(task, &flags);
}
}
@@ -462,8 +469,10 @@ static int ptrace_attach(struct task_struct *task, long request,
* in and out of STOPPED are protected by siglock.
*/
if (task_is_stopped(task) &&
- task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING))
+ task_set_jobctl_pending(task, JOBCTL_TRAP_STOP | JOBCTL_TRAPPING)) {
+ task->jobctl &= ~JOBCTL_STOPPED;
signal_wake_up_state(task, __TASK_STOPPED);
+ }

spin_unlock(&task->sighand->siglock);

@@ -875,6 +884,7 @@ static int ptrace_resume(struct task_struct *child, long request,
*/
spin_lock_irq(&child->sighand->siglock);
child->exit_code = data;
+ child->jobctl &= ~JOBCTL_TRACED;
wake_up_state(child, __TASK_TRACED);
spin_unlock_irq(&child->sighand->siglock);

diff --git a/kernel/signal.c b/kernel/signal.c
index e0b416b21ad3..80108017783d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -762,7 +762,10 @@ static int dequeue_synchronous_signal(kernel_siginfo_t *info)
*/
void signal_wake_up_state(struct task_struct *t, unsigned int state)
{
+ lockdep_assert_held(&t->sighand->siglock);
+
set_tsk_thread_flag(t, TIF_SIGPENDING);
+
/*
* TASK_WAKEKILL also means wake it up in the stopped/traced/killable
* case. We don't check t->state here because there is a race with it
@@ -930,9 +933,10 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force)
for_each_thread(p, t) {
flush_sigqueue_mask(&flush, &t->pending);
task_clear_jobctl_pending(t, JOBCTL_STOP_PENDING);
- if (likely(!(t->ptrace & PT_SEIZED)))
+ if (likely(!(t->ptrace & PT_SEIZED))) {
+ t->jobctl &= ~JOBCTL_STOPPED;
wake_up_state(t, __TASK_STOPPED);
- else
+ } else
ptrace_trap_notify(t);
}

@@ -2218,6 +2222,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
return exit_code;

set_special_state(TASK_TRACED);
+ current->jobctl |= JOBCTL_TRACED;

/*
* We're committing to trapping. TRACED should be visible before
@@ -2437,6 +2442,7 @@ static bool do_signal_stop(int signr)
if (task_participate_group_stop(current))
notify = CLD_STOPPED;

+ current->jobctl |= JOBCTL_STOPPED;
set_special_state(TASK_STOPPED);
spin_unlock_irq(&current->sighand->siglock);

--
2.35.3


2022-05-09 02:06:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

On 05/05, Eric W. Biederman wrote:
>
> And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
> because PREMPT_RT is currently broken with respect to ptrace. Which
> makes a WARN_ON_ONCE appropriate.

Yes agreed. In this case WARN_ON_ONCE() can help a user to understand
that a failure was caused by the kernel problem which we need to fix
anyway.

Oleg.


2022-05-09 02:16:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

On 05/04, Eric W. Biederman wrote:
>
> -static int ptrace_stop(int exit_code, int why, int clear_code,
> - unsigned long message, kernel_siginfo_t *info)
> +static int ptrace_stop(int exit_code, int why, unsigned long message,
> + kernel_siginfo_t *info)
> __releases(&current->sighand->siglock)
> __acquires(&current->sighand->siglock)
> {
> @@ -2259,54 +2259,33 @@ static int ptrace_stop(int exit_code, int why, int clear_code,
>
> spin_unlock_irq(&current->sighand->siglock);
> read_lock(&tasklist_lock);
> - if (likely(current->ptrace)) {
> - /*
> - * Notify parents of the stop.
> - *
> - * While ptraced, there are two parents - the ptracer and
> - * the real_parent of the group_leader. The ptracer should
> - * know about every stop while the real parent is only
> - * interested in the completion of group stop. The states
> - * for the two don't interact with each other. Notify
> - * separately unless they're gonna be duplicates.
> - */
> + /*
> + * Notify parents of the stop.
> + *
> + * While ptraced, there are two parents - the ptracer and
> + * the real_parent of the group_leader. The ptracer should
> + * know about every stop while the real parent is only
> + * interested in the completion of group stop. The states
> + * for the two don't interact with each other. Notify
> + * separately unless they're gonna be duplicates.
> + */
> + if (current->ptrace)
> do_notify_parent_cldstop(current, true, why);
> - if (gstop_done && ptrace_reparented(current))
> - do_notify_parent_cldstop(current, false, why);
> + if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
> + do_notify_parent_cldstop(current, false, why);
>
> - /*
> - * Don't want to allow preemption here, because
> - * sys_ptrace() needs this task to be inactive.
> - *
> - * XXX: implement read_unlock_no_resched().
> - */
> - preempt_disable();
> - read_unlock(&tasklist_lock);
> - cgroup_enter_frozen();
> - preempt_enable_no_resched();
> - freezable_schedule();
> - cgroup_leave_frozen(true);
> - } else {
> - /*
> - * By the time we got the lock, our tracer went away.
> - * Don't drop the lock yet, another tracer may come.
> - *
> - * If @gstop_done, the ptracer went away between group stop
> - * completion and here. During detach, it would have set
> - * JOBCTL_STOP_PENDING on us and we'll re-enter
> - * TASK_STOPPED in do_signal_stop() on return, so notifying
> - * the real parent of the group stop completion is enough.
> - */
> - if (gstop_done)
> - do_notify_parent_cldstop(current, false, why);
> -
> - /* tasklist protects us from ptrace_freeze_traced() */
> - __set_current_state(TASK_RUNNING);
> - read_code = false;
> - if (clear_code)
> - exit_code = 0;
> - read_unlock(&tasklist_lock);
> - }
> + /*
> + * Don't want to allow preemption here, because
> + * sys_ptrace() needs this task to be inactive.
> + *
> + * XXX: implement read_unlock_no_resched().
> + */
> + preempt_disable();
> + read_unlock(&tasklist_lock);
> + cgroup_enter_frozen();
> + preempt_enable_no_resched();
> + freezable_schedule();

I must have missed something.

So the tracee calls ptrace_notify() but debugger goes away before the
ptrace_notify() takes siglock. After that the no longer traced task
will sleep in TASK_TRACED ?

Looks like ptrace_stop() needs to check current->ptrace before it does
set_special_state(TASK_TRACED) with siglock held? Then we can rely on
ptrace_unlink() which will wake the tracee up even if debugger exits.

No?

Oleg.


2022-05-09 03:58:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 10/11] ptrace: Always take siglock in ptrace_resume

Make code analysis simpler and future changes easier by
always taking siglock in ptrace_resume.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 83ed28262708..36a5b7a00d2f 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -837,8 +837,6 @@ static long ptrace_get_rseq_configuration(struct task_struct *task,
static int ptrace_resume(struct task_struct *child, long request,
unsigned long data)
{
- bool need_siglock;
-
if (!valid_signal(data))
return -EIO;

@@ -874,18 +872,11 @@ static int ptrace_resume(struct task_struct *child, long request,
* Note that we need siglock even if ->exit_code == data and/or this
* status was not reported yet, the new status must not be cleared by
* wait_task_stopped() after resume.
- *
- * If data == 0 we do not care if wait_task_stopped() reports the old
- * status and clears the code too; this can't race with the tracee, it
- * takes siglock after resume.
*/
- need_siglock = data && !thread_group_empty(current);
- if (need_siglock)
- spin_lock_irq(&child->sighand->siglock);
+ spin_lock_irq(&child->sighand->siglock);
child->exit_code = data;
wake_up_state(child, __TASK_TRACED);
- if (need_siglock)
- spin_unlock_irq(&child->sighand->siglock);
+ spin_unlock_irq(&child->sighand->siglock);

return 0;
}
--
2.35.3


2022-05-09 05:00:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v3 08/11] ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs

Oleg Nesterov <[email protected]> writes:

> On 05/04, Eric W. Biederman wrote:
>>
>> With the removal of the incomplete detection of the tracer going away
>> in ptrace_stop, ptrace_stop always sleeps in schedule after
>> ptrace_freeze_traced succeeds. Modify ptrace_check_attach to
>> warn if wait_task_inactive fails.
>
> Oh. Again, I don't understand the changelog. If we forget about RT,
> ptrace_stop() will always sleep if ptrace_freeze_traced() succeeds.
> may_ptrace_stop() has gone.
>
> IOW. Lets forget about RT
>
>> --- a/kernel/ptrace.c
>> +++ b/kernel/ptrace.c
>> @@ -266,17 +266,9 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
>> }
>> read_unlock(&tasklist_lock);
>>
>> - if (!ret && !ignore_state) {
>> - if (!wait_task_inactive(child, __TASK_TRACED)) {
>> - /*
>> - * This can only happen if may_ptrace_stop() fails and
>> - * ptrace_stop() changes ->state back to TASK_RUNNING,
>> - * so we should not worry about leaking __TASK_TRACED.
>> - */
>> - WARN_ON(READ_ONCE(child->__state) == __TASK_TRACED);
>> - ret = -ESRCH;
>> - }
>> - }
>> + if (!ret && !ignore_state &&
>> + WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
>> + ret = -ESRCH;
>>
>> return ret;
>> }
>
> Why do you think this change would be wrong without any other changes?

For purposes of this analysis ptrace_detach and ptrace_exit (when the
tracer exits) can't happen. So the bug you spotted in ptrace_stop does
not apply.

I was thinking that the test against !current->ptrace that replaced
the old may_ptrace_stop could trigger a failure here. If the
ptrace_freeze_traced happens before that test that branch clearly can
not happen.

*Looks twice* Both ptrace_check_attach and ptrace_stop taking a
read_lock on tasklist_lock does not protect against concurrency by each
other, but the write_lock on tasklist_lock in ptrace_attach does
protect against a ptrace_attach coming in after the test and before
__set_current_state(TASK_RUNNING).

So yes. I should really split that part out into it's own patch.
And yes that WARN_ON_ONCE can trigger on PREEMPT_RT but that is just
because PREMPT_RT is currently broken with respect to ptrace. Which
makes a WARN_ON_ONCE appropriate.

I will see how much of this analysis I can put in the changelog.

Thank you,
Eric


2022-05-09 05:58:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop


The states TASK_STOPPED and TASK_TRACE are special in they can not
handle spurious wake-ups. This plus actively depending upon and
changing the value of tsk->__state causes problems for PREEMPT_RT and
Peter's freezer rewrite.

There are a lot of details we have to get right to sort out the
technical challenges and this is my parred back version of the changes
that contains just those problems I see good solutions to that I believe
are ready.

A couple of issues have been pointed but I think this parred back set of
changes is still on the right track. The biggest change in v4 is the
split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
two patches because the dependency I thought exited between two
different changes did not exist. The rest of the changes are minor
tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
removing an always true branch, and adding an early test to see if the
ptracer had gone, before TASK_TRAPPING was set.

This set of changes should support Peter's freezer rewrite, and with the
addition of changing wait_task_inactive(TASK_TRACED) to be
wait_task_inactive(0) in ptrace_check_attach I don't think there are any
races or issues to be concerned about from the ptrace side.

More work is needed to support PREEMPT_RT, but these changes get things
closer.

This set of changes continues to look like it will provide a firm
foundation for solving the PREEMPT_RT and freezer challenges.

Eric W. Biederman (11):
signal: Rename send_signal send_signal_locked
signal: Replace __group_send_sig_info with send_signal_locked
ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
ptrace: Remove arch_ptrace_attach
signal: Use lockdep_assert_held instead of assert_spin_locked
ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
ptrace: Document that wait_task_inactive can't fail
ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
ptrace: Don't change __state
ptrace: Always take siglock in ptrace_resume

Peter Zijlstra (1):
sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

arch/ia64/include/asm/ptrace.h | 4 --
arch/ia64/kernel/ptrace.c | 57 ----------------
arch/um/include/asm/thread_info.h | 2 +
arch/um/kernel/exec.c | 2 +-
arch/um/kernel/process.c | 2 +-
arch/um/kernel/ptrace.c | 8 +--
arch/um/kernel/signal.c | 4 +-
arch/x86/kernel/step.c | 3 +-
arch/xtensa/kernel/ptrace.c | 4 +-
arch/xtensa/kernel/signal.c | 4 +-
drivers/tty/tty_jobctrl.c | 4 +-
include/linux/ptrace.h | 7 --
include/linux/sched.h | 10 ++-
include/linux/sched/jobctl.h | 8 +++
include/linux/sched/signal.h | 20 ++++--
include/linux/signal.h | 3 +-
kernel/ptrace.c | 87 ++++++++---------------
kernel/sched/core.c | 5 +-
kernel/signal.c | 140 +++++++++++++++++---------------------
kernel/time/posix-cpu-timers.c | 6 +-
20 files changed, 140 insertions(+), 240 deletions(-)

Eric

2022-05-09 06:37:26

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v2 07/12] ptrace: Don't change __state

"Eric W. Biederman" <[email protected]> writes:

> Oleg Nesterov <[email protected]> writes:
>
>> On 05/03, Eric W. Biederman wrote:
>>>
>>> Oleg Nesterov <[email protected]> writes:
>>>
>>> > But why is it bad if the tracee doesn't sleep in schedule ? If it races
>>> > with SIGKILL. I still can't understand this.
>>> >
>>> > Yes, wait_task_inactive() can fail, so you need to remove WARN_ON_ONCE()
>>> > in 11/12.
>>>
>>> >
>>> > Why is removing TASK_WAKEKILL from TASK_TRACED and complicating
>>> > *signal_wake_up() better?
>>>
>>> Not changing __state is better because it removes special cases
>>> from the scheduler that only apply to ptrace.
>>
>> Hmm. But I didn't argue with that? I like the idea of JOBCTL_TASK_FROZEN.
>>
>> I meant, I do not think that removing KILLABLE from TASK_TRACED (not
>> from __state) and complicating *signal_wake_up() (I mean, compared
>> to your previous version) is a good idea.
>>
>> And. At least in context of this series it is fine if the JOBCTL_TASK_FROZEN
>> tracee do not block in schedule(), just you need to remove WARN_ON_ONCE()
>> around wait_task_inactive().
>>
>>> > And even if we need to ensure the tracee will always block after
>>> > ptrace_freeze_traced(), we can change signal_pending_state() to
>>> > return false if JOBCTL_PTRACE_FROZEN. Much simpler, imo. But still
>>> > looks unnecessary to me.
>>>
>>> We still need to change signal_wake_up in that case. Possibly
>>> signal_wake_up_state.
>>
>> Of course. See above.
>>
>>> >> if we depend on wait_task_inactive failing if the process is in the
>>> >> wrong state.
>>> >
>>> > OK, I guess this is what I do not understand. Could you spell please?
>>> >
>>> > And speaking of RT, wait_task_inactive() still can fail because
>>> > cgroup_enter_frozen() takes css_set_lock? And it is called under
>>> > preempt_disable() ? I don't understand the plan :/
>>>
>>> Let me describe his freezer change as that is much easier to get to the
>>> final result. RT has more problems as it turns all spin locks into
>>> sleeping locks. When a task is frozen
>>
>> [...snip...]
>>
>> Oh, thanks Eric, but I understand this part. But I still can't understand
>> why is it that critical to block in schedule... OK, I need to think about
>> it. Lets assume this is really necessary.
>>
>> Anyway. I'd suggest to not change TASK_TRACED in this series and not
>> complicate signal_wake_up() more than you did in your previous version:
>>
>> static inline void signal_wake_up(struct task_struct *t, bool resume)
>> {
>> bool wakekill = resume && !(t->jobctl & JOBCTL_DELAY_WAKEKILL);
>> signal_wake_up_state(t, wakekill ? TASK_WAKEKILL : 0);
>> }
>
> If your concern is signal_wake_up there is no reason it can't be:
>
> static inline void signal_wake_up(struct task_struct *t, bool fatal)
> {
> fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN);
> signal_wake_up_state(t, fatal ? TASK_WAKEKILL | TASK_TRACED : 0);
> }
>
> I guess I was more targeted in this version, which lead to more if
> statements but as there is only one place in the code that can be
> JOBCTL_PTRACE_FROZEN and TASK_TRACED there is no point in setting
> TASK_WAKEKILL without also setting TASK_TRACED in the wake-up.
>
> So yes. I can make the code as simple as my earlier version of
> signal_wake_up.
>
>> JOBCTL_PTRACE_FROZEN is fine.
>>
>> ptrace_check_attach() can do
>>
>> if (!ret && !ignore_state &&
>> /*
>> * This can only fail if the frozen tracee races with
>> * SIGKILL and enters schedule() with fatal_signal_pending
>> */
>> !wait_task_inactive(child, __TASK_TRACED))
>> ret = -ESRCH;
>>
>> return ret;
>>
>>
>> Now. If/when we really need to ensure that the frozen tracee always
>> blocks and wait_task_inactive() never fails, we can just do
>>
>> - add the fatal_signal_pending() check into ptrace_stop()
>> (like this patch does)
>>
>> - say, change signal_pending_state:
>>
>> static inline int signal_pending_state(unsigned int state, struct task_struct *p)
>> {
>> if (!(state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL)))
>> return 0;
>> if (!signal_pending(p))
>> return 0;
>> if (p->jobctl & JOBCTL_TASK_FROZEN)
>> return 0;
>> return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
>> }
>>
>> in a separate patch which should carefully document the need for this
>> change.
>>
>>> > I didn't look at JOBCTL_PTRACE_SIGNR yet. But this looks minor to me,
>>> > I mean, I am not sure it worth the trouble.
>>>
>>> The immediate problem the JOBCTL_PTRACE_SIGNR patch solves is:
>>> - stopping in ptrace_report_syscall.
>>> - Not having PT_TRACESYSGOOD set.
>>> - The tracee being killed with a fatal signal
>> ^^^^^^
>> tracer ?
>
> Both actually.
>
>>> - The tracee sending SIGTRAP to itself.
>>
>> Oh, but this is clear. But do we really care? If the tracer exits
>> unexpectedly, the tracee can have a lot more problems, I don't think
>> that this particular one is that important.
>
> I don't know of complaints, and if you haven't heard them either
> that that is a good indication that in practice we don't care.
>
> At a practical level I just don't want that silly case that sets
> TASK_TRACED to TASK_RUNNING without stopping at all in ptrace_stop to
> remain. It just seems to make everything more complicated for no real
> reason anymore. The deadlocks may_ptrace_stop was guarding against are
> gone.
>
> Plus the test is so racy we case can happen after we drop siglock
> before we schedule, or shortly after we have stopped so we really
> don't reliably catch the condition the code is trying to catch.
>
> I think the case I care most about is ptrace_signal, which pretty much
> requires the tracer to wait and clear exit_code before being terminated
> to cause problems. We don't handle that at all today.
>
> So yeah. I think the code handles so little at this point we can just
> remove the code and simplify things, if we actually care we can come
> back and implement JOBCTL_PTRACE_SIGNR or the like.

The original explanation for handling this is:

commit 66519f549ae516e7ff2f24a8a5134713411a4a58
Author: Roland McGrath <[email protected]>
Date: Tue Jan 4 05:38:15 2005 -0800

[PATCH] fix ptracer death race yielding bogus BUG_ON

There is a BUG_ON in ptrace_stop that hits if the thread is not ptraced.
However, there is no synchronization between a thread deciding to do a
ptrace stop and so going here, and its ptracer dying and so detaching from
it and clearing its ->ptrace field.

The RHEL3 2.4-based kernel has a backport of a slightly older version of
the 2.6 signals code, which has a different but equivalent BUG_ON. This
actually bit users in practice (when the debugger dies), but was
exceedingly difficult to reproduce in contrived circumstances. We moved
forward in RHEL3 just by removing the BUG_ON, and that fixed the real user
problems even though I was never able to reproduce the scenario myself.
So, to my knowledge this scenario has never actually been seen in practice
under 2.6. But it's plain to see from the code that it is indeed possible.

This patch removes that BUG_ON, but also goes further and tries to handle
this case more gracefully than simply avoiding the crash. By removing the
BUG_ON alone, it becomes possible for the real parent of a process to see
spurious SIGCHLD notifications intended for the debugger that has just
died, and have its child wind up stopped unexpectedly. This patch avoids
that possibility by detecting the case when we are about to do the ptrace
stop but our ptracer has gone away, and simply eliding that ptrace stop
altogether as if we hadn't been ptraced when we hit the interesting event
(signal or ptrace_notify call for syscall tracing or something like that).

Signed-off-by: Roland McGrath <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

And it was all about
BUG_ON(!(current->ptrace & PT_PTRACED));
At the beginning of ptrace_stop.

Which seems like a bit of buggy overkill.

>
> I will chew on that a bit and see if I can find any reasons for keeping
> the code in ptrace_stop at all.

Still chewing.

Eric

2022-05-09 08:18:55

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
> The states TASK_STOPPED and TASK_TRACE are special in they can not
> handle spurious wake-ups. This plus actively depending upon and
> changing the value of tsk->__state causes problems for PREEMPT_RT and
> Peter's freezer rewrite.
>
> There are a lot of details we have to get right to sort out the
> technical challenges and this is my parred back version of the changes
> that contains just those problems I see good solutions to that I believe
> are ready.
>
> A couple of issues have been pointed but I think this parred back set of
> changes is still on the right track. The biggest change in v4 is the
> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
> two patches because the dependency I thought exited between two
> different changes did not exist. The rest of the changes are minor
> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
> removing an always true branch, and adding an early test to see if the
> ptracer had gone, before TASK_TRAPPING was set.
>
> This set of changes should support Peter's freezer rewrite, and with the
> addition of changing wait_task_inactive(TASK_TRACED) to be
> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
> races or issues to be concerned about from the ptrace side.
>
> More work is needed to support PREEMPT_RT, but these changes get things
> closer.
>
> This set of changes continues to look like it will provide a firm
> foundation for solving the PREEMPT_RT and freezer challenges.

One of the more sensitive projects to changes around ptrace is rr
(Robert and Kyle added to CC). I ran rr's selftests before/after this
series and saw no changes. My failures remained the same; I assume
they're due to missing CPU features (pkeys) or build configs (bpf), etc:

99% tests passed, 19 tests failed out of 2777

Total Test time (real) = 773.40 sec

The following tests FAILED:
42 - bpf_map (Failed)
43 - bpf_map-no-syscallbuf (Failed)
414 - netfilter (Failed)
415 - netfilter-no-syscallbuf (Failed)
454 - x86/pkeys (Failed)
455 - x86/pkeys-no-syscallbuf (Failed)
1152 - ttyname (Failed)
1153 - ttyname-no-syscallbuf (Failed)
1430 - bpf_map-32 (Failed)
1431 - bpf_map-32-no-syscallbuf (Failed)
1502 - detach_sigkill-32 (Failed)
1802 - netfilter-32 (Failed)
1803 - netfilter-32-no-syscallbuf (Failed)
1842 - x86/pkeys-32 (Failed)
1843 - x86/pkeys-32-no-syscallbuf (Failed)
2316 - crash_in_function-32 (Failed)
2317 - crash_in_function-32-no-syscallbuf (Failed)
2540 - ttyname-32 (Failed)
2541 - ttyname-32-no-syscallbuf (Failed)

So, I guess:

Tested-by: Kees Cook <[email protected]>

:)

--
Kees Cook

2022-05-09 08:38:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

Kees Cook <[email protected]> writes:

> On Thu, May 05, 2022 at 01:25:57PM -0500, Eric W. Biederman wrote:
>> The states TASK_STOPPED and TASK_TRACE are special in they can not
>> handle spurious wake-ups. This plus actively depending upon and
>> changing the value of tsk->__state causes problems for PREEMPT_RT and
>> Peter's freezer rewrite.
>>
>> There are a lot of details we have to get right to sort out the
>> technical challenges and this is my parred back version of the changes
>> that contains just those problems I see good solutions to that I believe
>> are ready.
>>
>> A couple of issues have been pointed but I think this parred back set of
>> changes is still on the right track. The biggest change in v4 is the
>> split of "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs" into
>> two patches because the dependency I thought exited between two
>> different changes did not exist. The rest of the changes are minor
>> tweaks to "ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs";
>> removing an always true branch, and adding an early test to see if the
>> ptracer had gone, before TASK_TRAPPING was set.
>>
>> This set of changes should support Peter's freezer rewrite, and with the
>> addition of changing wait_task_inactive(TASK_TRACED) to be
>> wait_task_inactive(0) in ptrace_check_attach I don't think there are any
>> races or issues to be concerned about from the ptrace side.
>>
>> More work is needed to support PREEMPT_RT, but these changes get things
>> closer.
>>
>> This set of changes continues to look like it will provide a firm
>> foundation for solving the PREEMPT_RT and freezer challenges.
>
> One of the more sensitive projects to changes around ptrace is rr
> (Robert and Kyle added to CC). I ran rr's selftests before/after this
> series and saw no changes. My failures remained the same; I assume
> they're due to missing CPU features (pkeys) or build configs (bpf), etc:
>
> 99% tests passed, 19 tests failed out of 2777
>
> Total Test time (real) = 773.40 sec
>
> The following tests FAILED:
> 42 - bpf_map (Failed)
> 43 - bpf_map-no-syscallbuf (Failed)
> 414 - netfilter (Failed)
> 415 - netfilter-no-syscallbuf (Failed)
> 454 - x86/pkeys (Failed)
> 455 - x86/pkeys-no-syscallbuf (Failed)
> 1152 - ttyname (Failed)
> 1153 - ttyname-no-syscallbuf (Failed)
> 1430 - bpf_map-32 (Failed)
> 1431 - bpf_map-32-no-syscallbuf (Failed)
> 1502 - detach_sigkill-32 (Failed)
> 1802 - netfilter-32 (Failed)
> 1803 - netfilter-32-no-syscallbuf (Failed)
> 1842 - x86/pkeys-32 (Failed)
> 1843 - x86/pkeys-32-no-syscallbuf (Failed)
> 2316 - crash_in_function-32 (Failed)
> 2317 - crash_in_function-32-no-syscallbuf (Failed)
> 2540 - ttyname-32 (Failed)
> 2541 - ttyname-32-no-syscallbuf (Failed)
>
> So, I guess:
>
> Tested-by: Kees Cook <[email protected]>
>
> :)

Thank you. I was thinking it would be good to add the rr folks to the
discussion.

Eric


2022-05-09 09:10:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH v3 06/11] signal: Use lockdep_assert_held instead of assert_spin_locked

The distinction is that assert_spin_locked() checks if the lock is
held *by*anyone* whereas lockdep_assert_held() asserts the current
context holds the lock. Also, the check goes away if you build
without lockdep.

Suggested-by: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/Ympr/+PX4XgT/[email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 72d96614effc..3fd2ce133387 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -884,7 +884,7 @@ static int check_kill_permission(int sig, struct kernel_siginfo *info,
static void ptrace_trap_notify(struct task_struct *t)
{
WARN_ON_ONCE(!(t->ptrace & PT_SEIZED));
- assert_spin_locked(&t->sighand->siglock);
+ lockdep_assert_held(&t->sighand->siglock);

task_set_jobctl_pending(t, JOBCTL_TRAP_NOTIFY);
ptrace_signal_wake_up(t, t->jobctl & JOBCTL_LISTENING);
@@ -1079,7 +1079,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
int override_rlimit;
int ret = 0, result;

- assert_spin_locked(&t->sighand->siglock);
+ lockdep_assert_held(&t->sighand->siglock);

result = TRACE_SIGNAL_IGNORED;
if (!prepare_signal(sig, t, force))
--
2.35.3


2022-05-10 16:45:02

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

Oleg Nesterov <[email protected]> writes:

> On 05/05, Eric W. Biederman wrote:
>>
>> Eric W. Biederman (11):
>> signal: Rename send_signal send_signal_locked
>> signal: Replace __group_send_sig_info with send_signal_locked
>> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>> ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>> ptrace: Remove arch_ptrace_attach
>> signal: Use lockdep_assert_held instead of assert_spin_locked
>> ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>> ptrace: Document that wait_task_inactive can't fail
>> ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>> ptrace: Don't change __state
>> ptrace: Always take siglock in ptrace_resume
>>
>> Peter Zijlstra (1):
>> sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>
> OK, lgtm.
>
> Reviewed-by: Oleg Nesterov <[email protected]>
>
>
> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
> find a good argument against it ;) and yes, this is subjective.

Does anyone else have any comments on this patchset?

If not I am going to apply this to a branch and get it into linux-next.

Eric


Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
> Does anyone else have any comments on this patchset?
>
> If not I am going to apply this to a branch and get it into linux-next.

Looks good I guess.
Be aware that there will be clash due to
https://lore.kernel.org/all/[email protected]/

which sits currently in -akpm.

> Eric

Sebastian

2022-05-10 21:31:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

On 05/05, Eric W. Biederman wrote:
>
> Eric W. Biederman (11):
> signal: Rename send_signal send_signal_locked
> signal: Replace __group_send_sig_info with send_signal_locked
> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
> ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
> ptrace: Remove arch_ptrace_attach
> signal: Use lockdep_assert_held instead of assert_spin_locked
> ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
> ptrace: Document that wait_task_inactive can't fail
> ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
> ptrace: Don't change __state
> ptrace: Always take siglock in ptrace_resume
>
> Peter Zijlstra (1):
> sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state

OK, lgtm.

Reviewed-by: Oleg Nesterov <[email protected]>


I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
find a good argument against it ;) and yes, this is subjective.

Oleg.


2022-05-10 21:52:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

Sebastian Andrzej Siewior <[email protected]> writes:

> On 2022-05-10 09:26:36 [-0500], Eric W. Biederman wrote:
>> Does anyone else have any comments on this patchset?
>>
>> If not I am going to apply this to a branch and get it into linux-next.
>
> Looks good I guess.
> Be aware that there will be clash due to
> https://lore.kernel.org/all/[email protected]/
>
> which sits currently in -akpm.

Thanks for the heads up. That looks like the best kind of conflict.
One where code just disappears.

Eric

2022-05-12 08:07:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH v4 0/12] ptrace: cleaning up ptrace_stop

"Eric W. Biederman" <[email protected]> writes:

> Oleg Nesterov <[email protected]> writes:
>
>> On 05/05, Eric W. Biederman wrote:
>>>
>>> Eric W. Biederman (11):
>>> signal: Rename send_signal send_signal_locked
>>> signal: Replace __group_send_sig_info with send_signal_locked
>>> ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP
>>> ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP
>>> ptrace: Remove arch_ptrace_attach
>>> signal: Use lockdep_assert_held instead of assert_spin_locked
>>> ptrace: Reimplement PTRACE_KILL by always sending SIGKILL
>>> ptrace: Document that wait_task_inactive can't fail
>>> ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs
>>> ptrace: Don't change __state
>>> ptrace: Always take siglock in ptrace_resume
>>>
>>> Peter Zijlstra (1):
>>> sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state
>>
>> OK, lgtm.
>>
>> Reviewed-by: Oleg Nesterov <[email protected]>
>>
>>
>> I still dislike you removed TASK_WAKEKILL from TASK_TRACED, but I can't
>> find a good argument against it ;) and yes, this is subjective.
>
> Does anyone else have any comments on this patchset?
>
> If not I am going to apply this to a branch and get it into linux-next.

Thank you all.

I have pushed this to my ptrace_stop-cleanup-for-v5.19 branch
and placed the branch in linux-next.

Eric

2022-05-18 22:56:28

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock


For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
ptrace_freeze_traced has completed successfully. Which fundamentally
means the lock dance of dropping siglock and grabbing tasklist_lock does
not work on PREEMPT_RT. So I have worked through what is necessary so
that tasklist_lock does not need to be grabbed in ptrace_stop after
siglock is dropped.

I have explored several alternate ways of getting there and along the
way I found a lot of small bug fixes/cleanups that don't necessarily
contribute to the final result but that or worthwhile on their own. So
I have included those changes in this set of changes just so they don't
get lost.

In addition I had a conversation with Thomas Gleixner recently that
emphasized for me the need to reduce the hold times of tasklist_lock,
and that made me realize that in principle it is possible.
https://lkml.kernel.org/r/[email protected]

Which is a long way of saying that not taking tasklist_lock in
ptrace_stop is good not just for PREMPT_RT but also for improving the
scalability of the kernel in general.

After this set of changes only cgroup_enter_frozen should remain a
stumbling block for PREEMPT_RT in the ptrace_stop path.

Eric W. Biederman (16):
signal/alpha: Remove unused definition of TASK_REAL_PARENT
signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET
kdb: Use real_parent when displaying a list of processes
powerpc/xmon: Use real_parent when displaying a list of processes
ptrace: Remove dead code from __ptrace_detach
ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo
signal: Wake up the designated parent
ptrace: Only populate last_siginfo from ptrace
ptrace: In ptrace_setsiginfo deal with invalid si_signo
ptrace: In ptrace_signal look at what the debugger did with siginfo
ptrace: Use si_sino as the signal number to resume with
ptrace: Stop protecting ptrace_set_signr with tasklist_lock
ptrace: Document why ptrace_setoptions does not need a lock
signal: Protect parent child relationships by childs siglock
ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach
signal: Always call do_notify_parent_cldstop with siglock held

arch/alpha/kernel/asm-offsets.c | 1 -
arch/ia64/kernel/asm-offsets.c | 1 -
arch/powerpc/xmon/xmon.c | 2 +-
kernel/debug/kdb/kdb_main.c | 2 +-
kernel/exit.c | 23 +++-
kernel/fork.c | 12 +-
kernel/ptrace.c | 132 ++++++++----------
kernel/signal.c | 296 ++++++++++++++++++++++++++--------------
8 files changed, 279 insertions(+), 190 deletions(-)

Eric

2022-05-18 22:58:45

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 04/16] powerpc/xmon: Use real_parent when displaying a list of processes

xmon has a bug (copied from kdb) that when showing a list of processes
the debugger is listed as the parent, if a processes is being debugged.

This is silly, and I expect it is rare enough no has noticed in
practice. Update the code to use real_parent so that it is clear xmon
does not want to display a debugger as the parent of a process.

Cc: Douglas Miller <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Fixes: 6dfb54049f9a ("powerpc/xmon: Add xmon command to dump process/task similar to ps(1)")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/powerpc/xmon/xmon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad5..b308ef9ce604 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3282,7 +3282,7 @@ static void show_task(struct task_struct *volatile tsk)

printf("%16px %16lx %16px %6d %6d %c %2d %s\n", tsk,
tsk->thread.ksp, tsk->thread.regs,
- tsk->pid, rcu_dereference(tsk->parent)->pid,
+ tsk->pid, rcu_dereference(tsk->real_parent)->pid,
state, task_cpu(tsk),
tsk->comm);
}
--
2.35.3


2022-05-18 22:58:52

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 07/16] signal: Wake up the designated parent

Today if a process is ptraced only the ptracer will ever be woken up in
wait, if the parent is waiting with __WNOTHREAD. Update the code
so that the real_parent can also be woken up with __WNOTHREAD even
when the code is ptraced.

Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/exit.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..0e26f73c49ac 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1421,26 +1421,35 @@ static int ptrace_do_wait(struct wait_opts *wo, struct task_struct *tsk)
return 0;
}

+struct child_wait_info {
+ struct task_struct *p;
+ struct task_struct *parent;
+};
+
static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
int sync, void *key)
{
struct wait_opts *wo = container_of(wait, struct wait_opts,
child_wait);
- struct task_struct *p = key;
+ struct child_wait_info *info = key;

- if (!eligible_pid(wo, p))
+ if (!eligible_pid(wo, info->p))
return 0;

- if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
- return 0;
+ if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
+ return 0;

return default_wake_function(wait, mode, sync, key);
}

void __wake_up_parent(struct task_struct *p, struct task_struct *parent)
{
+ struct child_wait_info info = {
+ .p = p,
+ .parent = parent,
+ };
__wake_up_sync_key(&parent->signal->wait_chldexit,
- TASK_INTERRUPTIBLE, p);
+ TASK_INTERRUPTIBLE, &info);
}

static bool is_effectively_child(struct wait_opts *wo, bool ptrace,
--
2.35.3


2022-05-18 22:59:41

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 10/16] ptrace: In ptrace_signal look at what the debugger did with siginfo

Now that siginfo is only modified by the tracer and that siginfo is
cleared with the signal is canceled have ptrace_signal directly examine
siginfo.

This makes the code a little simpler and handles the case when
the tracer exits without calling ptrace_detach.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index ff4a52352390..3d955c23b13d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2556,9 +2556,10 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
* comment in dequeue_signal().
*/
current->jobctl |= JOBCTL_STOP_DEQUEUED;
- signr = ptrace_stop(signr, CLD_TRAPPED, 0, info);
+ ptrace_stop(signr, CLD_TRAPPED, 0, info);

/* We're back. Did the debugger cancel the sig? */
+ signr = info->si_signo;
if (signr == 0)
return signr;

--
2.35.3


2022-05-18 23:12:30

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held

Now that siglock keeps tsk->parent and tsk->real_parent constant
require that do_notify_parent_cldstop is called with tsk->siglock held
instead of the tasklist_lock.

As all of the callers of do_notify_parent_cldstop had to drop the
siglock and take tasklist_lock this simplifies all of it's callers.

This removes one reason for taking tasklist_lock.

This makes ptrace_stop so that it should reliably work correctly and
reliably with PREEMPT_RT enabled and CONFIG_CGROUPS disabled. The
remaining challenge is that cgroup_enter_frozen takes spin_lock after
__state has been set to TASK_TRACED. Which on PREEMPT_RT means the
code can sleep and change __state. Not only that but it means that
wait_task_inactive could potentially detect the code scheduling away
at that point and fail, causing ptrace_check_attach to fail.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/signal.c | 262 ++++++++++++++++++++++++++++++++++--------------
1 file changed, 189 insertions(+), 73 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 2cc45e8448e2..d4956be51939 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1994,6 +1994,129 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
return ret;
}

+/**
+ * lock_parents_siglocks - Take current, real_parent, and parent's siglock
+ * @lock_tracer: The tracers siglock is needed.
+ *
+ * There is no natural ordering to these locks so they must be sorted
+ * before being taken.
+ *
+ * There are two complicating factors here:
+ * - The locks live in sighand and sighand can be arbitrarily shared
+ * - parent and real_parent can change when current's siglock is unlocked.
+ *
+ * To deal with this first the all of the sighand pointers are
+ * gathered under current's siglock, and the sighand pointers are
+ * sorted. As siglock lives inside of sighand this also sorts the
+ * siglock's by address.
+ *
+ * Then the siglocks are taken in order dropping current's siglock if
+ * necessary.
+ *
+ * Finally if parent and real_parent have not changed return.
+ * If they either parent has changed drop their locks and try again.
+ *
+ * Changing sighand is an infrequent and somewhat expensive operation
+ * (unshare or exec) and so even in the worst case this loop
+ * should not loop too many times before all of the proper locks are
+ * taken in order.
+ *
+ * CONTEXT:
+ * Must be called with @current->sighand->siglock held
+ *
+ * RETURNS:
+ * current's, real_parent's, and parent's siglock held.
+ */
+static void lock_parents_siglocks(bool lock_tracer)
+ __releases(&current->sighand->siglock)
+ __acquires(&current->sighand->siglock)
+ __acquires(&current->real_parent->sighand->siglock)
+ __acquires(&current->parent->sighand->siglock)
+{
+ struct task_struct *me = current;
+ struct sighand_struct *m_sighand = me->sighand;
+
+ lockdep_assert_held(&m_sighand->siglock);
+
+ rcu_read_lock();
+ for (;;) {
+ struct task_struct *parent, *tracer;
+ struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
+
+ parent = me->real_parent;
+ tracer = ptrace_parent(me);
+ if (!tracer || !lock_tracer)
+ tracer = parent;
+
+ p_sighand = rcu_dereference(parent->sighand);
+ t_sighand = rcu_dereference(tracer->sighand);
+
+ /* Sort the sighands so that s1 >= s2 >= s3 */
+ s1 = m_sighand;
+ s2 = p_sighand;
+ s3 = t_sighand;
+ if (s1 > s2)
+ swap(s1, s2);
+ if (s1 > s3)
+ swap(s1, s3);
+ if (s2 > s3)
+ swap(s2, s3);
+
+ /* Take the locks in order */
+ if (s1 != m_sighand) {
+ spin_unlock(&m_sighand->siglock);
+ spin_lock(&s1->siglock);
+ }
+ if (s1 != s2)
+ spin_lock_nested(&s2->siglock, 1);
+ if (s2 != s3)
+ spin_lock_nested(&s3->siglock, 2);
+
+ /* Verify the proper locks are held */
+ if (likely((s1 == m_sighand) ||
+ ((me->real_parent == parent) &&
+ (me->parent == tracer) &&
+ (parent->sighand == p_sighand) &&
+ (tracer->sighand == t_sighand)))) {
+ break;
+ }
+
+ /* Drop all but current's siglock */
+ if (p_sighand != m_sighand)
+ spin_unlock(&p_sighand->siglock);
+ if (t_sighand != p_sighand)
+ spin_unlock(&t_sighand->siglock);
+
+ /*
+ * Since [pt]_sighand will likely change if we go
+ * around, and m_sighand is the only one held, make sure
+ * it is subclass-0, since the above 's1 != m_sighand'
+ * clause very much relies on that.
+ */
+ lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
+ }
+ rcu_read_unlock();
+}
+
+static void unlock_parents_siglocks(bool unlock_tracer)
+ __releases(&current->real_parent->sighand->siglock)
+ __releases(&current->parent->sighand->siglock)
+{
+ struct task_struct *me = current;
+ struct task_struct *parent = me->real_parent;
+ struct task_struct *tracer = ptrace_parent(me);
+ struct sighand_struct *m_sighand = me->sighand;
+ struct sighand_struct *p_sighand = parent->sighand;
+
+ if (p_sighand != m_sighand)
+ spin_unlock(&p_sighand->siglock);
+ if (tracer && unlock_tracer) {
+ struct sighand_struct *t_sighand = tracer->sighand;
+ if (t_sighand != p_sighand)
+ spin_unlock(&t_sighand->siglock);
+ }
+}
+
static void do_notify_pidfd(struct task_struct *task)
{
struct pid *pid;
@@ -2125,11 +2248,12 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
bool for_ptracer, int why)
{
struct kernel_siginfo info;
- unsigned long flags;
struct task_struct *parent;
struct sighand_struct *sighand;
u64 utime, stime;

+ lockdep_assert_held(&tsk->sighand->siglock);
+
if (for_ptracer) {
parent = tsk->parent;
} else {
@@ -2137,6 +2261,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
parent = tsk->real_parent;
}

+ lockdep_assert_held(&parent->sighand->siglock);
+
clear_siginfo(&info);
info.si_signo = SIGCHLD;
info.si_errno = 0;
@@ -2168,7 +2294,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
}

sighand = parent->sighand;
- spin_lock_irqsave(&sighand->siglock, flags);
if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
!(sighand->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDSTOP))
send_signal_locked(SIGCHLD, &info, parent, PIDTYPE_TGID);
@@ -2176,7 +2301,6 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
* Even if SIGCHLD is not generated, we must wake up wait4 calls.
*/
__wake_up_parent(tsk, parent);
- spin_unlock_irqrestore(&sighand->siglock, flags);
}

/*
@@ -2208,14 +2332,18 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
spin_lock_irq(&current->sighand->siglock);
}

+ lock_parents_siglocks(true);
/*
* After this point ptrace_signal_wake_up or signal_wake_up
* will clear TASK_TRACED if ptrace_unlink happens or a fatal
* signal comes in. Handle previous ptrace_unlinks and fatal
* signals here to prevent ptrace_stop sleeping in schedule.
*/
- if (!current->ptrace || __fatal_signal_pending(current))
+
+ if (!current->ptrace || __fatal_signal_pending(current)) {
+ unlock_parents_siglocks(true);
return;
+ }

set_special_state(TASK_TRACED);
current->jobctl |= JOBCTL_TRACED;
@@ -2254,16 +2382,6 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
if (why == CLD_STOPPED && (current->jobctl & JOBCTL_STOP_PENDING))
gstop_done = task_participate_group_stop(current);

- /* any trap clears pending STOP trap, STOP trap clears NOTIFY */
- task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
- if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
- task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
-
- /* entering a trap, clear TRAPPING */
- task_clear_jobctl_trapping(current);
-
- spin_unlock_irq(&current->sighand->siglock);
- read_lock(&tasklist_lock);
/*
* Notify parents of the stop.
*
@@ -2279,14 +2397,25 @@ static void ptrace_stop(int exit_code, int why, unsigned long message,
if (gstop_done && (!current->ptrace || ptrace_reparented(current)))
do_notify_parent_cldstop(current, false, why);

+ unlock_parents_siglocks(true);
+
+ /* any trap clears pending STOP trap, STOP trap clears NOTIFY */
+ task_clear_jobctl_pending(current, JOBCTL_TRAP_STOP);
+ if (info && info->si_code >> 8 == PTRACE_EVENT_STOP)
+ task_clear_jobctl_pending(current, JOBCTL_TRAP_NOTIFY);
+
+ /* entering a trap, clear TRAPPING */
+ task_clear_jobctl_trapping(current);
+
/*
* Don't want to allow preemption here, because
* sys_ptrace() needs this task to be inactive.
*
- * XXX: implement read_unlock_no_resched().
+ * XXX: implement spin_unlock_no_resched().
*/
preempt_disable();
- read_unlock(&tasklist_lock);
+ spin_unlock_irq(&current->sighand->siglock);
+
cgroup_enter_frozen();
preempt_enable_no_resched();
freezable_schedule();
@@ -2361,8 +2490,8 @@ int ptrace_notify(int exit_code, unsigned long message)
* on %true return.
*
* RETURNS:
- * %false if group stop is already cancelled or ptrace trap is scheduled.
- * %true if participated in group stop.
+ * %false if group stop is already cancelled.
+ * %true otherwise (as lock_parents_siglocks may have dropped siglock).
*/
static bool do_signal_stop(int signr)
__releases(&current->sighand->siglock)
@@ -2425,36 +2554,24 @@ static bool do_signal_stop(int signr)
}
}

+ lock_parents_siglocks(false);
+ /* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+ if (unlikely(!(current->jobctl & JOBCTL_STOP_PENDING)))
+ goto out;
if (likely(!current->ptrace)) {
- int notify = 0;
-
/*
* If there are no other threads in the group, or if there
* is a group stop in progress and we are the last to stop,
- * report to the parent.
+ * report to the real_parent.
*/
if (task_participate_group_stop(current))
- notify = CLD_STOPPED;
+ do_notify_parent_cldstop(current, false, CLD_STOPPED);
+ unlock_parents_siglocks(false);

current->jobctl |= JOBCTL_STOPPED;
set_special_state(TASK_STOPPED);
spin_unlock_irq(&current->sighand->siglock);

- /*
- * Notify the parent of the group stop completion. Because
- * we're not holding either the siglock or tasklist_lock
- * here, ptracer may attach inbetween; however, this is for
- * group stop and should always be delivered to the real
- * parent of the group leader. The new ptracer will get
- * its notification when this task transitions into
- * TASK_TRACED.
- */
- if (notify) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, false, notify);
- read_unlock(&tasklist_lock);
- }
-
/* Now we don't run again until woken by SIGCONT or SIGKILL */
cgroup_enter_frozen();
freezable_schedule();
@@ -2465,8 +2582,11 @@ static bool do_signal_stop(int signr)
* Schedule it and let the caller deal with it.
*/
task_set_jobctl_pending(current, JOBCTL_TRAP_STOP);
- return false;
}
+out:
+ unlock_parents_siglocks(false);
+ spin_unlock_irq(&current->sighand->siglock);
+ return true;
}

/**
@@ -2624,32 +2744,30 @@ bool get_signal(struct ksignal *ksig)
if (unlikely(signal->flags & SIGNAL_CLD_MASK)) {
int why;

- if (signal->flags & SIGNAL_CLD_CONTINUED)
- why = CLD_CONTINUED;
- else
- why = CLD_STOPPED;
+ lock_parents_siglocks(true);
+ /* Recheck signal->flags after unlock+lock of siglock */
+ if (likely(signal->flags & SIGNAL_CLD_MASK)) {
+ if (signal->flags & SIGNAL_CLD_CONTINUED)
+ why = CLD_CONTINUED;
+ else
+ why = CLD_STOPPED;

- signal->flags &= ~SIGNAL_CLD_MASK;
+ signal->flags &= ~SIGNAL_CLD_MASK;

- spin_unlock_irq(&sighand->siglock);
-
- /*
- * Notify the parent that we're continuing. This event is
- * always per-process and doesn't make whole lot of sense
- * for ptracers, who shouldn't consume the state via
- * wait(2) either, but, for backward compatibility, notify
- * the ptracer of the group leader too unless it's gonna be
- * a duplicate.
- */
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(current, false, why);
-
- if (ptrace_reparented(current->group_leader))
- do_notify_parent_cldstop(current->group_leader,
- true, why);
- read_unlock(&tasklist_lock);
-
- goto relock;
+ /*
+ * Notify the parent that we're continuing. This event is
+ * always per-process and doesn't make whole lot of sense
+ * for ptracers, who shouldn't consume the state via
+ * wait(2) either, but, for backward compatibility, notify
+ * the ptracer of the group leader too unless it's gonna be
+ * a duplicate.
+ */
+ do_notify_parent_cldstop(current, false, why);
+ if (ptrace_reparented(current->group_leader))
+ do_notify_parent_cldstop(current->group_leader,
+ true, why);
+ }
+ unlock_parents_siglocks(true);
}

for (;;) {
@@ -2906,7 +3024,6 @@ static void retarget_shared_pending(struct task_struct *tsk, sigset_t *which)

void exit_signals(struct task_struct *tsk)
{
- int group_stop = 0;
sigset_t unblocked;

/*
@@ -2937,21 +3054,20 @@ void exit_signals(struct task_struct *tsk)
signotset(&unblocked);
retarget_shared_pending(tsk, &unblocked);

- if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING) &&
- task_participate_group_stop(tsk))
- group_stop = CLD_STOPPED;
-out:
- spin_unlock_irq(&tsk->sighand->siglock);
-
/*
* If group stop has completed, deliver the notification. This
* should always go to the real parent of the group leader.
*/
- if (unlikely(group_stop)) {
- read_lock(&tasklist_lock);
- do_notify_parent_cldstop(tsk, false, group_stop);
- read_unlock(&tasklist_lock);
+ if (unlikely(tsk->jobctl & JOBCTL_STOP_PENDING)) {
+ lock_parents_siglocks(false);
+ /* Recheck JOBCTL_STOP_PENDING after unlock+lock of siglock */
+ if ((tsk->jobctl & JOBCTL_STOP_PENDING) &&
+ task_participate_group_stop(tsk))
+ do_notify_parent_cldstop(tsk, false, CLD_STOPPED);
+ unlock_parents_siglocks(false);
}
+out:
+ spin_unlock_irq(&tsk->sighand->siglock);
}

/*
--
2.35.3


2022-05-18 23:34:50

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 15/16] ptrace: Use siglock instead of tasklist_lock in ptrace_check_attach

Now that siglock protects tsk->parent and tsk->ptrace there is no need
to grab tasklist_lock in ptrace_check_attach. The siglock can handle
all of the locking needs of ptrace_check_attach.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 77dfdb3d1ced..fa65841bbdbe 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -194,17 +194,14 @@ static bool ptrace_freeze_traced(struct task_struct *task)
{
bool ret = false;

- /* Lockless, nobody but us can set this flag */
if (task->jobctl & JOBCTL_LISTENING)
return ret;

- spin_lock_irq(&task->sighand->siglock);
if (task_is_traced(task) && !looks_like_a_spurious_pid(task) &&
!__fatal_signal_pending(task)) {
task->jobctl |= JOBCTL_PTRACE_FROZEN;
ret = true;
}
- spin_unlock_irq(&task->sighand->siglock);

return ret;
}
@@ -240,32 +237,30 @@ static void ptrace_unfreeze_traced(struct task_struct *task)
* state.
*
* CONTEXT:
- * Grabs and releases tasklist_lock and @child->sighand->siglock.
+ * Grabs and releases @child->sighand->siglock.
*
* RETURNS:
* 0 on success, -ESRCH if %child is not ready.
*/
static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
{
+ unsigned long flags;
int ret = -ESRCH;

/*
- * We take the read lock around doing both checks to close a
+ * We take the siglock around doing both checks to close a
* possible race where someone else was tracing our child and
* detached between these two checks. After this locked check,
* we are sure that this is our traced child and that can only
* be changed by us so it's not changing right after this.
*/
- read_lock(&tasklist_lock);
- if (child->ptrace && child->parent == current) {
- /*
- * child->sighand can't be NULL, release_task()
- * does ptrace_unlink() before __exit_signal().
- */
- if (ignore_state || ptrace_freeze_traced(child))
- ret = 0;
+ if (lock_task_sighand(child, &flags)) {
+ if (child->ptrace && child->parent == current) {
+ if (ignore_state || ptrace_freeze_traced(child))
+ ret = 0;
+ }
+ unlock_task_sighand(child, &flags);
}
- read_unlock(&tasklist_lock);

if (!ret && !ignore_state &&
WARN_ON_ONCE(!wait_task_inactive(child, __TASK_TRACED)))
--
2.35.3


2022-05-18 23:35:00

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo

Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
can never race with SIGKILL") it has been unnecessary for
ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

Having the code taking an unnecessary lock is confusing
as it suggests that other parts of the code need to take
the unnecessary lock as well.

So remove the unnecessary lock to make the code more
efficient, simpler, and less confusing.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 30 ++++++++----------------------
1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index ca0e47691229..15e93eafa6f0 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -677,34 +677,20 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)

static int ptrace_getsiginfo(struct task_struct *child, kernel_siginfo_t *info)
{
- unsigned long flags;
- int error = -ESRCH;
+ if (unlikely(!child->last_siginfo))
+ return -EINVAL;

- if (lock_task_sighand(child, &flags)) {
- error = -EINVAL;
- if (likely(child->last_siginfo != NULL)) {
- copy_siginfo(info, child->last_siginfo);
- error = 0;
- }
- unlock_task_sighand(child, &flags);
- }
- return error;
+ copy_siginfo(info, child->last_siginfo);
+ return 0;
}

static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *info)
{
- unsigned long flags;
- int error = -ESRCH;
+ if (unlikely(!child->last_siginfo))
+ return -EINVAL;

- if (lock_task_sighand(child, &flags)) {
- error = -EINVAL;
- if (likely(child->last_siginfo != NULL)) {
- copy_siginfo(child->last_siginfo, info);
- error = 0;
- }
- unlock_task_sighand(child, &flags);
- }
- return error;
+ copy_siginfo(child->last_siginfo, info);
+ return 0;
}

static int ptrace_peek_siginfo(struct task_struct *child,
--
2.35.3


2022-05-18 23:41:06

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

kdb has a bug that when using the ps command to display a list of
processes, if a process is being debugged the debugger as the parent
process.

This is silly, and I expect it never comes up in ptractice. As there
is very little point in using gdb and kdb simultaneously. Update the
code to use real_parent so that it is clear kdb does not want to
display a debugger as the parent of a process.

Cc: Jason Wessel <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: Douglas Anderson <[email protected]>
Fixes: 5d5314d6795f ("kdb: core for kgdb back end (1 of 2)"
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/debug/kdb/kdb_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index 0852a537dad4..db49f1026eaa 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2306,7 +2306,7 @@ void kdb_ps1(const struct task_struct *p)

cpu = kdb_process_cpu(p);
kdb_printf("0x%px %8d %8d %d %4d %c 0x%px %c%s\n",
- (void *)p, p->pid, p->parent->pid,
+ (void *)p, p->pid, p->real_parent->pid,
kdb_task_has_cpu(p), kdb_process_cpu(p),
kdb_task_state_char(p),
(void *)(&p->thread),
--
2.35.3


2022-05-18 23:54:05

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach

Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
been impossible to attach another thread in the same thread group.

Remove the code from __ptrace_detach that was trying to support
detaching from a thread in the same thread group. The code is
dead and I can not make sense of what it is trying to do.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 24 +++---------------------
1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 328a34a99124..ca0e47691229 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,19 +526,6 @@ static int ptrace_traceme(void)
return ret;
}

-/*
- * Called with irqs disabled, returns true if childs should reap themselves.
- */
-static int ignoring_children(struct sighand_struct *sigh)
-{
- int ret;
- spin_lock(&sigh->siglock);
- ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
- (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
- spin_unlock(&sigh->siglock);
- return ret;
-}
-
/*
* Called with tasklist_lock held for writing.
* Unlink a traced task, and clean it up if it was a traced zombie.
@@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)

dead = !thread_group_leader(p);

- if (!dead && thread_group_empty(p)) {
- if (!same_thread_group(p->real_parent, tracer))
- dead = do_notify_parent(p, p->exit_signal);
- else if (ignoring_children(tracer->sighand)) {
- __wake_up_parent(p, tracer);
- dead = true;
- }
- }
+ if (!dead && thread_group_empty(p))
+ dead = do_notify_parent(p, p->exit_signal);
+
/* Mark it as in the process of being reaped. */
if (dead)
p->exit_state = EXIT_DEAD;
--
2.35.3


2022-05-18 23:57:36

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 02/16] signal/ia64: Remove unused definition of IA64_TASK_REAL_PARENT_OFFSET

Rather than update the unused definition of IA64_TASK_REAL_PARENT_OFFSENT
when I move tsk->real_parent into signal_struct remove it now.

Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/ia64/kernel/asm-offsets.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/ia64/kernel/asm-offsets.c b/arch/ia64/kernel/asm-offsets.c
index be3b90fef2e9..245c4333ea30 100644
--- a/arch/ia64/kernel/asm-offsets.c
+++ b/arch/ia64/kernel/asm-offsets.c
@@ -55,7 +55,6 @@ void foo(void)
DEFINE(IA64_PID_UPID_OFFSET, offsetof (struct pid, numbers[0]));
DEFINE(IA64_TASK_PENDING_OFFSET,offsetof (struct task_struct, pending));
DEFINE(IA64_TASK_PID_OFFSET, offsetof (struct task_struct, pid));
- DEFINE(IA64_TASK_REAL_PARENT_OFFSET, offsetof (struct task_struct, real_parent));
DEFINE(IA64_TASK_SIGNAL_OFFSET,offsetof (struct task_struct, signal));
DEFINE(IA64_TASK_TGID_OFFSET, offsetof (struct task_struct, tgid));
DEFINE(IA64_TASK_THREAD_KSP_OFFSET, offsetof (struct task_struct, thread.ksp));
--
2.35.3


2022-05-19 00:08:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 09/16] ptrace: In ptrace_setsiginfo deal with invalid si_signo

If the tracer calls PTRACE_SETSIGINFO it only has an effect if the
tracee is stopped in ptrace_signal.

When one of PTRACE_DETACH, PTRACE_SINGLESTEP, PTRACE_SINGLEBLOCK,
PTRACE_SYSEMU, PTRACE_SYSEMU_SINGLESTEP, PTRACE_SYSCALL, or
PTRACE_CONT pass in a signel number to continue with the kernel
validates that signal number and the ptrace_signal verifies the signal
number matches the si_signo, before the siginfo is used.

As the signal number to continue with is verified to be a valid signal
number the signal number in si_signo must be a valid signal number.

Make this obvious and avoid needing checks later by immediately
clearing siginfo if si_signo is not valid.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a24eed725cec..a0a07d140751 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -716,7 +716,9 @@ static int ptrace_setsiginfo(struct task_struct *child, const kernel_siginfo_t *
if (unlikely(!child->last_siginfo))
return -EINVAL;

- copy_siginfo(child->last_siginfo, info);
+ clear_siginfo(child->last_siginfo);
+ if (valid_signal(info->si_signo))
+ copy_siginfo(child->last_siginfo, info);
return 0;
}

--
2.35.3


2022-05-19 00:08:40

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 13/16] ptrace: Document why ptrace_setoptions does not need a lock

The functions that change ->ptrace are: ptrace_attach, ptrace_traceme,
ptrace_init_task, __ptrace_unlink, ptrace_setoptions.

Except for ptrace_setoptions all of the places where ->ptrace is
modified hold tasklist_lock for write, and either the tracee or the
tracer is modifies ->ptrace.

When ptrace_setoptions is called the tracee has been frozen with
ptrace_freeze_traced, and most be explicitly unfrozen by the tracer
before it can do anything. As ptrace_setoption is run in the tracer
there can be no contention by the simple fact that the tracee can't
run.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d0527b6e2b29..fbadd2f21f09 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -689,7 +689,10 @@ static int ptrace_setoptions(struct task_struct *child, unsigned long data)
if (ret)
return ret;

- /* Avoid intermediate state when all opts are cleared */
+ /*
+ * With a frozen tracee, only the tracer modifies ->ptrace.
+ * Avoid intermediate state when all opts are cleared.
+ */
flags = child->ptrace;
flags &= ~(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT);
flags |= (data << PT_OPT_FLAG_SHIFT);
--
2.35.3


2022-05-19 00:15:13

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace

The code in ptrace_signal to populate siginfo if the signal number
changed is buggy. If the tracer contined the tracee using
ptrace_detach it is guaranteed to use the real_parent (or possibly a
new tracer) but definitely not the origional tracer to populate si_pid
and si_uid.

Fix this bug by only updating siginfo from the tracer so that the
tracers pid and the tracers uid are always used.

If it happens that ptrace_resume or ptrace_detach don't have
a signal to continue with clear siginfo.

This is a very old bug that has been fixable since commit 1669ce53e2ff
("Add PTRACE_GETSIGINFO and PTRACE_SETSIGINFO") when last_siginfo was
introduced and the tracer could change siginfo.

Fixes: v2.1.68
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 31 +++++++++++++++++++++++++++++--
kernel/signal.c | 18 ------------------
2 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 15e93eafa6f0..a24eed725cec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -526,6 +526,33 @@ static int ptrace_traceme(void)
return ret;
}

+static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
+{
+ struct kernel_siginfo *info = child->last_siginfo;
+
+ child->exit_code = signr;
+ /*
+ * Update the siginfo structure if the signal has
+ * changed. If the debugger wanted something
+ * specific in the siginfo structure then it should
+ * have updated *info via PTRACE_SETSIGINFO.
+ */
+ if (info && (info->si_signo != signr)) {
+ clear_siginfo(info);
+
+ if (signr != 0) {
+ info->si_signo = signr;
+ info->si_errno = 0;
+ info->si_code = SI_USER;
+ rcu_read_lock();
+ info->si_pid = task_pid_nr_ns(current, task_active_pid_ns(child));
+ info->si_uid = from_kuid_munged(task_cred_xxx(child, user_ns),
+ current_uid());
+ rcu_read_unlock();
+ }
+ }
+}
+
/*
* Called with tasklist_lock held for writing.
* Unlink a traced task, and clean it up if it was a traced zombie.
@@ -579,7 +606,7 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
* tasklist_lock avoids the race with wait_task_stopped(), see
* the comment in ptrace_resume().
*/
- child->exit_code = data;
+ ptrace_set_signr(child, data);
__ptrace_detach(current, child);
write_unlock_irq(&tasklist_lock);

@@ -851,7 +878,7 @@ static int ptrace_resume(struct task_struct *child, long request,
* wait_task_stopped() after resume.
*/
spin_lock_irq(&child->sighand->siglock);
- child->exit_code = data;
+ ptrace_set_signr(child, data);
child->jobctl &= ~JOBCTL_TRACED;
wake_up_state(child, __TASK_TRACED);
spin_unlock_irq(&child->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index e782c2611b64..ff4a52352390 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2562,24 +2562,6 @@ static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
if (signr == 0)
return signr;

- /*
- * Update the siginfo structure if the signal has
- * changed. If the debugger wanted something
- * specific in the siginfo structure then it should
- * have updated *info via PTRACE_SETSIGINFO.
- */
- if (signr != info->si_signo) {
- clear_siginfo(info);
- info->si_signo = signr;
- info->si_errno = 0;
- info->si_code = SI_USER;
- rcu_read_lock();
- info->si_pid = task_pid_vnr(current->parent);
- info->si_uid = from_kuid_munged(current_user_ns(),
- task_uid(current->parent));
- rcu_read_unlock();
- }
-
/* If the (new) signal is now blocked, requeue it. */
if (sigismember(&current->blocked, signr) ||
fatal_signal_pending(current)) {
--
2.35.3


2022-05-19 00:43:01

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 01/16] signal/alpha: Remove unused definition of TASK_REAL_PARENT

Rather than update this defition when I move tsk->real_parent into
signal_struct remove it now.

Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: [email protected]
Signed-off-by: "Eric W. Biederman" <[email protected]>
---
arch/alpha/kernel/asm-offsets.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c
index 2e125e5c1508..0fca99dc5757 100644
--- a/arch/alpha/kernel/asm-offsets.c
+++ b/arch/alpha/kernel/asm-offsets.c
@@ -21,7 +21,6 @@ void foo(void)

DEFINE(TASK_BLOCKED, offsetof(struct task_struct, blocked));
DEFINE(TASK_CRED, offsetof(struct task_struct, cred));
- DEFINE(TASK_REAL_PARENT, offsetof(struct task_struct, real_parent));
DEFINE(TASK_GROUP_LEADER, offsetof(struct task_struct, group_leader));
DEFINE(TASK_TGID, offsetof(struct task_struct, tgid));
BLANK();
--
2.35.3


2022-05-19 01:50:33

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 14/16] signal: Protect parent child relationships by childs siglock

The functions ptrace_stop and do_signal_stop have to drop siglock
and grab tasklist_lock because the parent/child relation ship
is guarded by tasklist_lock and not siglock.

Simplify things by additionally guarding the parent/child relationship
with siglock. This just requires a little bit of code motion.

After this change tsk->parent, tsk->real_parent, tsk->ptracer_cred
are all protected by tsk->siglock.

The fields tsk->sibling and tsk->ptrace_entry are mostly protected by
tsk->siglock. The field tsk->ptrace_entry is not protected by siglock
when tsk->ptrace_entry is reused as the dead task list. The field
tsk->sibling is not protected by siglock when children are reparented
because their original parent dies.

The field tsk->ptrace is protected by siglock except for the options
which may change without siglock being held.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/exit.c | 4 ++++
kernel/fork.c | 12 ++++++------
kernel/ptrace.c | 9 +++++----
3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 0e26f73c49ac..bad434b23c48 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -643,11 +643,15 @@ static void forget_original_parent(struct task_struct *father,

reaper = find_new_reaper(father, reaper);
list_for_each_entry(p, &father->children, sibling) {
+ spin_lock(&p->sighand->siglock);
for_each_thread(p, t) {
RCU_INIT_POINTER(t->real_parent, reaper);
BUG_ON((!t->ptrace) != (rcu_access_pointer(t->parent) == father));
if (likely(!t->ptrace))
t->parent = t->real_parent;
+ }
+ spin_unlock(&p->sighand->siglock);
+ for_each_thread(p, t) {
if (t->pdeath_signal)
group_send_sig_info(t->pdeath_signal,
SEND_SIG_NOINFO, t,
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..841021da69f3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2367,6 +2367,12 @@ static __latent_entropy struct task_struct *copy_process(
*/
write_lock_irq(&tasklist_lock);

+ klp_copy_process(p);
+
+ sched_core_fork(p);
+
+ spin_lock(&current->sighand->siglock);
+
/* CLONE_PARENT re-uses the old parent */
if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
p->real_parent = current->real_parent;
@@ -2381,12 +2387,6 @@ static __latent_entropy struct task_struct *copy_process(
p->exit_signal = args->exit_signal;
}

- klp_copy_process(p);
-
- sched_core_fork(p);
-
- spin_lock(&current->sighand->siglock);
-
/*
* Copy seccomp details explicitly here, in case they were changed
* before holding sighand lock.
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fbadd2f21f09..77dfdb3d1ced 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -123,13 +123,12 @@ void __ptrace_unlink(struct task_struct *child)
clear_task_syscall_work(child, SYSCALL_EMU);
#endif

+ spin_lock(&child->sighand->siglock);
child->parent = child->real_parent;
list_del_init(&child->ptrace_entry);
old_cred = child->ptracer_cred;
child->ptracer_cred = NULL;
put_cred(old_cred);
-
- spin_lock(&child->sighand->siglock);
child->ptrace = 0;
/*
* Clear all pending traps and TRAPPING. TRAPPING should be
@@ -441,15 +440,15 @@ static int ptrace_attach(struct task_struct *task, long request,
if (task->ptrace)
goto unlock_tasklist;

+ spin_lock(&task->sighand->siglock);
task->ptrace = flags;

ptrace_link(task, current);

/* SEIZE doesn't trap tracee on attach */
if (!seize)
- send_sig_info(SIGSTOP, SEND_SIG_PRIV, task);
+ send_signal_locked(SIGSTOP, SEND_SIG_PRIV, task, PIDTYPE_PID);

- spin_lock(&task->sighand->siglock);

/*
* If the task is already STOPPED, set JOBCTL_TRAP_STOP and
@@ -517,8 +516,10 @@ static int ptrace_traceme(void)
* pretend ->real_parent untraces us right after return.
*/
if (!ret && !(current->real_parent->flags & PF_EXITING)) {
+ spin_lock(&current->sighand->siglock);
current->ptrace = PT_PTRACED;
ptrace_link(current, current->real_parent);
+ spin_unlock(&current->sighand->siglock);
}
}
write_unlock_irq(&tasklist_lock);
--
2.35.3


2022-05-19 02:43:33

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 11/16] ptrace: Use si_sino as the signal number to resume with

The signal number to resume with is already in si_signo. So instead
of placing an extra copy in tsk->exit_code and later reading the extra
copy from tsk->exit_code just read si_signo.

Read si_signo in ptrace_do_notify where it is easy as the siginfo is a
local variable. Only ptrace_report_syscall cares about the signal to
resume with from ptrace_stop and it calls ptrace_notify which calls
ptrace_do_notify so moving the actual work into ptrace_do_notify where
it is easier is not a problem.

With ptrace_stop not being involved in returning the signal to tracer
asked the tracee to resume with remove the comment and the return
code from ptrace_stop.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 1 -
kernel/signal.c | 13 ++++---------
2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a0a07d140751..e0ecb1536dfc 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -530,7 +530,6 @@ static void ptrace_set_signr(struct task_struct *child, unsigned int signr)
{
struct kernel_siginfo *info = child->last_siginfo;

- child->exit_code = signr;
/*
* Update the siginfo structure if the signal has
* changed. If the debugger wanted something
diff --git a/kernel/signal.c b/kernel/signal.c
index 3d955c23b13d..2cc45e8448e2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2186,12 +2186,8 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
* We always set current->last_siginfo while stopped here.
* That makes it a way to test a stopped process for
* being ptrace-stopped vs being job-control-stopped.
- *
- * Returns the signal the ptracer requested the code resume
- * with. If the code did not stop because the tracer is gone,
- * the stop signal remains unchanged unless clear_code.
*/
-static int ptrace_stop(int exit_code, int why, unsigned long message,
+static void ptrace_stop(int exit_code, int why, unsigned long message,
kernel_siginfo_t *info)
__releases(&current->sighand->siglock)
__acquires(&current->sighand->siglock)
@@ -2219,7 +2215,7 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
* signals here to prevent ptrace_stop sleeping in schedule.
*/
if (!current->ptrace || __fatal_signal_pending(current))
- return exit_code;
+ return;

set_special_state(TASK_TRACED);
current->jobctl |= JOBCTL_TRACED;
@@ -2302,7 +2298,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
* any signal-sending on another CPU that wants to examine it.
*/
spin_lock_irq(&current->sighand->siglock);
- exit_code = current->exit_code;
current->last_siginfo = NULL;
current->ptrace_message = 0;
current->exit_code = 0;
@@ -2316,7 +2311,6 @@ static int ptrace_stop(int exit_code, int why, unsigned long message,
* This sets TIF_SIGPENDING, but never clears it.
*/
recalc_sigpending_tsk(current);
- return exit_code;
}

static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long message)
@@ -2330,7 +2324,8 @@ static int ptrace_do_notify(int signr, int exit_code, int why, unsigned long mes
info.si_uid = from_kuid_munged(current_user_ns(), current_uid());

/* Let the debugger run. */
- return ptrace_stop(exit_code, why, message, &info);
+ ptrace_stop(exit_code, why, message, &info);
+ return info.si_signo;
}

int ptrace_notify(int exit_code, unsigned long message)
--
2.35.3


2022-05-19 04:35:24

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 12/16] ptrace: Stop protecting ptrace_set_signr with tasklist_lock

Now that ptrace_set_signr no longer sets task->exit_code the race
documented in commit b72c186999e6 ("ptrace: fix race between
ptrace_resume() and wait_task_stopped()") is no longer possible, as
task->exit_code is only updated by wait during a ptrace_stop.

As there is no possibilty of a race and ptrace_freeze_traced is
all of the protection ptrace_set_signr needs to operate without
contention move ptrace_set_signr outside of tasklist_lock
and remove the documentation about the race that is no more.

Signed-off-by: "Eric W. Biederman" <[email protected]>
---
kernel/ptrace.c | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index e0ecb1536dfc..d0527b6e2b29 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -595,17 +595,14 @@ static int ptrace_detach(struct task_struct *child, unsigned int data)
/* Architecture-specific hardware disable .. */
ptrace_disable(child);

+ ptrace_set_signr(child, data);
+
write_lock_irq(&tasklist_lock);
/*
* We rely on ptrace_freeze_traced(). It can't be killed and
* untraced by another thread, it can't be a zombie.
*/
WARN_ON(!child->ptrace || child->exit_state);
- /*
- * tasklist_lock avoids the race with wait_task_stopped(), see
- * the comment in ptrace_resume().
- */
- ptrace_set_signr(child, data);
__ptrace_detach(current, child);
write_unlock_irq(&tasklist_lock);

@@ -869,17 +866,9 @@ static int ptrace_resume(struct task_struct *child, long request,
user_disable_single_step(child);
}

- /*
- * Change ->exit_code and ->state under siglock to avoid the race
- * with wait_task_stopped() in between; a non-zero ->exit_code will
- * wrongly look like another report from tracee.
- *
- * Note that we need siglock even if ->exit_code == data and/or this
- * status was not reported yet, the new status must not be cleared by
- * wait_task_stopped() after resume.
- */
- spin_lock_irq(&child->sighand->siglock);
ptrace_set_signr(child, data);
+
+ spin_lock_irq(&child->sighand->siglock);
child->jobctl &= ~JOBCTL_TRACED;
wake_up_state(child, __TASK_TRACED);
spin_unlock_irq(&child->sighand->siglock);
--
2.35.3


Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> Is there a git branch somewhere I can pull to test this? It doesn't apply
> cleanly to Linus's tip.

https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

> - Kyle

Sebastian

2022-05-19 12:58:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice. As there
^^^^^^^^^

Lol, love the new word :-)

2022-05-20 03:50:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

Sebastian Andrzej Siewior <[email protected]> writes:

> On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> cleanly to Linus's tip.
>
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19

Yes that is the branch this all applies to.

This is my second round of cleanups this cycle for this code.
I just keep finding little things that deserve to be changed,
when I am working on the more substantial issues.

Eric




Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> After this set of changes only cgroup_enter_frozen should remain a
> stumbling block for PREEMPT_RT in the ptrace_stop path.

Yes, I can confirm that. I have no systemd-less system at hand which
means I can't boot a kernel without CGROUP support. But after removing
cgroup_{enter|leave}_frozen() in ptrace_stop() I don't see the problems
I saw earlier.

Sebastian

2022-05-20 16:44:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

Doug Anderson <[email protected]> writes:

> Hi,
>
> On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <[email protected]> wrote:
>>
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice. As there
>> is very little point in using gdb and kdb simultaneously. Update the
>> code to use real_parent so that it is clear kdb does not want to
>> display a debugger as the parent of a process.
>
> So I would tend to defer to Daniel, but I'm not convinced that the
> behavior you describe for kdb today _is_ actually silly.
>
> If I was in kdb and I was listing processes, I might actually want to
> see that a process's parent was set to gdb. Presumably that would tell
> me extra information that might be relevant to my debug session.
>
> Personally, I'd rather add an extra piece of information into the list
> showing the real parent if it's not the same as the parent. Then
> you're not throwing away information.

The name of the field is confusing for anyone who isn't intimate with
the implementation details. The function getppid returns
tsk->real_parent->tgid.

If kdb wants information of what the tracer is that is fine, but I
recommend putting that information in another field.

Given that the original description says give the information that ps
gives my sense is that kdb is currently wrong. Especially as it does
not give you the actual parentage anywhere.

I can certainly be convinced, but I do want some clarity. It looks very
attractive to rename task->parent to task->ptracer and leave the field
NULL when there is no tracer.

Eric

2022-05-20 20:53:20

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

Hi,

On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <[email protected]> wrote:
>
> kdb has a bug that when using the ps command to display a list of
> processes, if a process is being debugged the debugger as the parent
> process.
>
> This is silly, and I expect it never comes up in ptractice. As there
> is very little point in using gdb and kdb simultaneously. Update the
> code to use real_parent so that it is clear kdb does not want to
> display a debugger as the parent of a process.

So I would tend to defer to Daniel, but I'm not convinced that the
behavior you describe for kdb today _is_ actually silly.

If I was in kdb and I was listing processes, I might actually want to
see that a process's parent was set to gdb. Presumably that would tell
me extra information that might be relevant to my debug session.

Personally, I'd rather add an extra piece of information into the list
showing the real parent if it's not the same as the parent. Then
you're not throwing away information.

-Doug

2022-05-20 23:22:06

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
<[email protected]> wrote:
>
> Sebastian Andrzej Siewior <[email protected]> writes:
>
> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> cleanly to Linus's tip.
> >
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>
> Yes that is the branch this all applies to.
>
> This is my second round of cleanups this cycle for this code.
> I just keep finding little things that deserve to be changed,
> when I am working on the more substantial issues.
>
> Eric

When running the rr test suite, I see hangs like this

[ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
[condvar_stress-:12152]
[ 812.151529] Modules linked in: snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
snd_hda_codec_
hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
btintel btmtk snd_seq_device rapl bluetooth snd_timer i
ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
odel ipmi_devintf ipmi_msghandler msr vhost_vsock
vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
[ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy
i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
xhci_pci_renesas wmi video
[ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
I L 5.18.0-rc1+ #2
[ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
[ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
[ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
a c1 9a 5f 85 c0 74 02 5d
[ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
[ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
[ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
[ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
[ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
[ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
[ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
knlGS:0000000000000000
[ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
[ 812.151601] Call Trace:
[ 812.151602] <TASK>
[ 812.151604] do_signal_stop+0x228/0x260
[ 812.151606] get_signal+0x43a/0x8e0
[ 812.151608] arch_do_signal_or_restart+0x37/0x7d0
[ 812.151610] ? __this_cpu_preempt_check+0x13/0x20
[ 812.151612] ? __perf_event_task_sched_in+0x81/0x230
[ 812.151616] ? __this_cpu_preempt_check+0x13/0x20
[ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0
[ 812.151620] syscall_exit_to_user_mode+0x26/0x40
[ 812.151621] ret_from_fork+0x15/0x30
[ 812.151623] RIP: 0033:0x7f612dfcd125
[ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
0 48 89 c7 b8 3c 00 00 00
[ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
0000000000000038
[ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
[ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
[ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
[ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
[ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
[ 812.151632] </TASK>

- Kyle

2022-05-23 06:19:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH 16/16] signal: Always call do_notify_parent_cldstop with siglock held

Hi "Eric,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20220518]
[cannot apply to linux/master powerpc/next wireless-next/main wireless/main linus/master v5.18-rc7 v5.18-rc6 v5.18-rc5 v5.18-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
base: 736ee37e2e8eed7fe48d0a37ee5a709514d478b3
config: parisc-randconfig-s032-20220519 (https://download.01.org/0day-ci/archive/20220521/[email protected]/config)
compiler: hppa-linux-gcc (GCC) 11.3.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/intel-lab-lkp/linux/commit/4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Eric-W-Biederman/signal-alpha-Remove-unused-definition-of-TASK_REAL_PARENT/20220519-065947
git checkout 4b66a617bf6d095d33fe43e9dbcfdf2e0de9fb29
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=parisc SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
kernel/signal.c: note: in included file (through arch/parisc/include/uapi/asm/signal.h, arch/parisc/include/asm/signal.h, include/uapi/linux/signal.h, ...):
include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
kernel/signal.c:195:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:195:31: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:195:31: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:198:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:198:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:198:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:480:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:480:9: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:480:9: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:484:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:484:34: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:484:34: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:542:53: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct k_sigaction *ka @@ got struct k_sigaction [noderef] __rcu * @@
kernel/signal.c:542:53: sparse: expected struct k_sigaction *ka
kernel/signal.c:542:53: sparse: got struct k_sigaction [noderef] __rcu *
include/uapi/asm-generic/signal-defs.h:83:29: sparse: sparse: multiple address spaces given
kernel/signal.c:1261:9: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: cannot dereference this type
kernel/signal.c:1267:29: sparse: sparse: no member 'ip' in struct pt_regs
kernel/signal.c:1267:29: sparse: sparse: cast from unknown type
kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
kernel/signal.c:1267:29: sparse: sparse: incompatible types for 'case' statement
kernel/signal.c:1328:9: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:1328:9: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:1328:9: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:1329:16: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct k_sigaction *action @@ got struct k_sigaction [noderef] __rcu * @@
kernel/signal.c:1329:16: sparse: expected struct k_sigaction *action
kernel/signal.c:1329:16: sparse: got struct k_sigaction [noderef] __rcu *
kernel/signal.c:1349:34: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:1349:34: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:1349:34: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:1938:36: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:1938:36: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:1938:36: sparse: got struct spinlock [noderef] __rcu *
>> kernel/signal.c:2048:46: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct sighand_struct *m_sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2048:46: sparse: expected struct sighand_struct *m_sighand
kernel/signal.c:2048:46: sparse: got struct sighand_struct [noderef] __rcu *sighand
kernel/signal.c:2057:24: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct task_struct *parent @@ got struct task_struct [noderef] __rcu *real_parent @@
kernel/signal.c:2057:24: sparse: expected struct task_struct *parent
kernel/signal.c:2057:24: sparse: got struct task_struct [noderef] __rcu *real_parent
kernel/signal.c:2087:21: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> kernel/signal.c:2087:21: sparse: struct task_struct [noderef] __rcu *
>> kernel/signal.c:2087:21: sparse: struct task_struct *
>> kernel/signal.c:2117:40: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct task_struct *parent @@ got struct task_struct [noderef] __rcu *real_parent @@
kernel/signal.c:2117:40: sparse: expected struct task_struct *parent
kernel/signal.c:2117:40: sparse: got struct task_struct [noderef] __rcu *real_parent
kernel/signal.c:2119:46: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct sighand_struct *m_sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2119:46: sparse: expected struct sighand_struct *m_sighand
kernel/signal.c:2119:46: sparse: got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2120:50: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct sighand_struct *p_sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2120:50: sparse: expected struct sighand_struct *p_sighand
kernel/signal.c:2120:50: sparse: got struct sighand_struct [noderef] __rcu *sighand
>> kernel/signal.c:2125:58: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct sighand_struct *t_sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2125:58: sparse: expected struct sighand_struct *t_sighand
kernel/signal.c:2125:58: sparse: got struct sighand_struct [noderef] __rcu *sighand
kernel/signal.c:2171:44: sparse: sparse: cast removes address space '__rcu' of expression
kernel/signal.c:2190:65: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct *tsk @@ got struct task_struct [noderef] __rcu *parent @@
kernel/signal.c:2190:65: sparse: expected struct task_struct *tsk
kernel/signal.c:2190:65: sparse: got struct task_struct [noderef] __rcu *parent
kernel/signal.c:2191:40: sparse: sparse: cast removes address space '__rcu' of expression
kernel/signal.c:2209:14: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sighand_struct *psig @@ got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand @@
kernel/signal.c:2209:14: sparse: expected struct sighand_struct *psig
kernel/signal.c:2209:14: sparse: got struct sighand_struct [noderef] __rcu *[noderef] __rcu sighand
kernel/signal.c:2238:53: sparse: sparse: incorrect type in argument 3 (different address spaces) @@ expected struct task_struct *t @@ got struct task_struct [noderef] __rcu *parent @@
kernel/signal.c:2238:53: sparse: expected struct task_struct *t
kernel/signal.c:2238:53: sparse: got struct task_struct [noderef] __rcu *parent
kernel/signal.c:2239:34: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected struct task_struct *parent @@ got struct task_struct [noderef] __rcu *parent @@
kernel/signal.c:2239:34: sparse: expected struct task_struct *parent
kernel/signal.c:2239:34: sparse: got struct task_struct [noderef] __rcu *parent
kernel/signal.c:2269:24: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct task_struct *parent @@ got struct task_struct [noderef] __rcu *parent @@
kernel/signal.c:2269:24: sparse: expected struct task_struct *parent
kernel/signal.c:2269:24: sparse: got struct task_struct [noderef] __rcu *parent
kernel/signal.c:2272:24: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct task_struct *parent @@ got struct task_struct [noderef] __rcu *real_parent @@
kernel/signal.c:2272:24: sparse: expected struct task_struct *parent
kernel/signal.c:2272:24: sparse: got struct task_struct [noderef] __rcu *real_parent
kernel/signal.c:2307:17: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sighand_struct *sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2307:17: sparse: expected struct sighand_struct *sighand
kernel/signal.c:2307:17: sparse: got struct sighand_struct [noderef] __rcu *sighand
kernel/signal.c:2341:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2341:41: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2341:41: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2343:39: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2343:39: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2343:39: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2428:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2428:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2428:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2440:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2440:31: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2440:31: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2479:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2479:31: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2479:31: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2481:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2481:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2481:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2584:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2584:41: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2584:41: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2599:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2599:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2599:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2656:41: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2656:41: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2656:41: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2668:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:2668:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:2668:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:2726:49: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct sighand_struct *sighand @@ got struct sighand_struct [noderef] __rcu *sighand @@
kernel/signal.c:2726:49: sparse: expected struct sighand_struct *sighand
kernel/signal.c:2726:49: sparse: got struct sighand_struct [noderef] __rcu *sighand
kernel/signal.c:3052:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3052:27: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3052:27: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3081:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3081:29: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3081:29: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3138:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3138:27: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3138:27: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3140:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3140:29: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3140:29: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3291:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3291:31: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3291:31: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3294:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3294:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3294:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3683:27: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3683:27: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3683:27: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3695:37: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3695:37: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3695:37: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3700:35: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3700:35: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3700:35: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:3705:29: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:3705:29: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:3705:29: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:4159:31: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:4159:31: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:4159:31: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:4171:33: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:4171:33: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:4171:33: sparse: got struct spinlock [noderef] __rcu *
kernel/signal.c:4189:11: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct k_sigaction *k @@ got struct k_sigaction [noderef] __rcu * @@
kernel/signal.c:4189:11: sparse: expected struct k_sigaction *k
kernel/signal.c:4189:11: sparse: got struct k_sigaction [noderef] __rcu *
kernel/signal.c:4191:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct spinlock [usertype] *lock @@ got struct spinlock [noderef] __rcu * @@
kernel/signal.c:4191:25: sparse: expected struct spinlock [usertype] *lock
kernel/signal.c:4191:25: sparse: got struct spinlock [noderef] __rcu *

vim +2048 kernel/signal.c

1934
1935 void sigqueue_free(struct sigqueue *q)
1936 {
1937 unsigned long flags;
> 1938 spinlock_t *lock = &current->sighand->siglock;
1939
1940 BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
1941 /*
1942 * We must hold ->siglock while testing q->list
1943 * to serialize with collect_signal() or with
1944 * __exit_signal()->flush_sigqueue().
1945 */
1946 spin_lock_irqsave(lock, flags);
1947 q->flags &= ~SIGQUEUE_PREALLOC;
1948 /*
1949 * If it is queued it will be freed when dequeued,
1950 * like the "regular" sigqueue.
1951 */
1952 if (!list_empty(&q->list))
1953 q = NULL;
1954 spin_unlock_irqrestore(lock, flags);
1955
1956 if (q)
1957 __sigqueue_free(q);
1958 }
1959
1960 int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
1961 {
1962 int sig = q->info.si_signo;
1963 struct sigpending *pending;
1964 struct task_struct *t;
1965 unsigned long flags;
1966 int ret, result;
1967
1968 BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
1969
1970 ret = -1;
1971 rcu_read_lock();
1972 t = pid_task(pid, type);
1973 if (!t || !likely(lock_task_sighand(t, &flags)))
1974 goto ret;
1975
1976 ret = 1; /* the signal is ignored */
1977 result = TRACE_SIGNAL_IGNORED;
1978 if (!prepare_signal(sig, t, false))
1979 goto out;
1980
1981 ret = 0;
1982 if (unlikely(!list_empty(&q->list))) {
1983 /*
1984 * If an SI_TIMER entry is already queue just increment
1985 * the overrun count.
1986 */
1987 BUG_ON(q->info.si_code != SI_TIMER);
1988 q->info.si_overrun++;
1989 result = TRACE_SIGNAL_ALREADY_PENDING;
1990 goto out;
1991 }
1992 q->info.si_overrun = 0;
1993
1994 signalfd_notify(t, sig);
1995 pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
1996 list_add_tail(&q->list, &pending->list);
1997 sigaddset(&pending->signal, sig);
1998 complete_signal(sig, t, type);
1999 result = TRACE_SIGNAL_DELIVERED;
2000 out:
2001 trace_signal_generate(sig, &q->info, t, type != PIDTYPE_PID, result);
2002 unlock_task_sighand(t, &flags);
2003 ret:
2004 rcu_read_unlock();
2005 return ret;
2006 }
2007
2008 /**
2009 * lock_parents_siglocks - Take current, real_parent, and parent's siglock
2010 * @lock_tracer: The tracers siglock is needed.
2011 *
2012 * There is no natural ordering to these locks so they must be sorted
2013 * before being taken.
2014 *
2015 * There are two complicating factors here:
2016 * - The locks live in sighand and sighand can be arbitrarily shared
2017 * - parent and real_parent can change when current's siglock is unlocked.
2018 *
2019 * To deal with this first the all of the sighand pointers are
2020 * gathered under current's siglock, and the sighand pointers are
2021 * sorted. As siglock lives inside of sighand this also sorts the
2022 * siglock's by address.
2023 *
2024 * Then the siglocks are taken in order dropping current's siglock if
2025 * necessary.
2026 *
2027 * Finally if parent and real_parent have not changed return.
2028 * If they either parent has changed drop their locks and try again.
2029 *
2030 * Changing sighand is an infrequent and somewhat expensive operation
2031 * (unshare or exec) and so even in the worst case this loop
2032 * should not loop too many times before all of the proper locks are
2033 * taken in order.
2034 *
2035 * CONTEXT:
2036 * Must be called with @current->sighand->siglock held
2037 *
2038 * RETURNS:
2039 * current's, real_parent's, and parent's siglock held.
2040 */
2041 static void lock_parents_siglocks(bool lock_tracer)
2042 __releases(&current->sighand->siglock)
2043 __acquires(&current->sighand->siglock)
2044 __acquires(&current->real_parent->sighand->siglock)
2045 __acquires(&current->parent->sighand->siglock)
2046 {
2047 struct task_struct *me = current;
> 2048 struct sighand_struct *m_sighand = me->sighand;
2049
2050 lockdep_assert_held(&m_sighand->siglock);
2051
2052 rcu_read_lock();
2053 for (;;) {
2054 struct task_struct *parent, *tracer;
2055 struct sighand_struct *p_sighand, *t_sighand, *s1, *s2, *s3;
2056
2057 parent = me->real_parent;
2058 tracer = ptrace_parent(me);
2059 if (!tracer || !lock_tracer)
2060 tracer = parent;
2061
2062 p_sighand = rcu_dereference(parent->sighand);
2063 t_sighand = rcu_dereference(tracer->sighand);
2064
2065 /* Sort the sighands so that s1 >= s2 >= s3 */
2066 s1 = m_sighand;
2067 s2 = p_sighand;
2068 s3 = t_sighand;
2069 if (s1 > s2)
2070 swap(s1, s2);
2071 if (s1 > s3)
2072 swap(s1, s3);
2073 if (s2 > s3)
2074 swap(s2, s3);
2075
2076 /* Take the locks in order */
2077 if (s1 != m_sighand) {
2078 spin_unlock(&m_sighand->siglock);
2079 spin_lock(&s1->siglock);
2080 }
2081 if (s1 != s2)
2082 spin_lock_nested(&s2->siglock, 1);
2083 if (s2 != s3)
2084 spin_lock_nested(&s3->siglock, 2);
2085
2086 /* Verify the proper locks are held */
> 2087 if (likely((s1 == m_sighand) ||
2088 ((me->real_parent == parent) &&
2089 (me->parent == tracer) &&
2090 (parent->sighand == p_sighand) &&
2091 (tracer->sighand == t_sighand)))) {
2092 break;
2093 }
2094
2095 /* Drop all but current's siglock */
2096 if (p_sighand != m_sighand)
2097 spin_unlock(&p_sighand->siglock);
2098 if (t_sighand != p_sighand)
2099 spin_unlock(&t_sighand->siglock);
2100
2101 /*
2102 * Since [pt]_sighand will likely change if we go
2103 * around, and m_sighand is the only one held, make sure
2104 * it is subclass-0, since the above 's1 != m_sighand'
2105 * clause very much relies on that.
2106 */
2107 lock_set_subclass(&m_sighand->siglock.dep_map, 0, _RET_IP_);
2108 }
2109 rcu_read_unlock();
2110 }
2111
2112 static void unlock_parents_siglocks(bool unlock_tracer)
2113 __releases(&current->real_parent->sighand->siglock)
2114 __releases(&current->parent->sighand->siglock)
2115 {
2116 struct task_struct *me = current;
> 2117 struct task_struct *parent = me->real_parent;
2118 struct task_struct *tracer = ptrace_parent(me);
2119 struct sighand_struct *m_sighand = me->sighand;
> 2120 struct sighand_struct *p_sighand = parent->sighand;
2121
2122 if (p_sighand != m_sighand)
2123 spin_unlock(&p_sighand->siglock);
2124 if (tracer && unlock_tracer) {
> 2125 struct sighand_struct *t_sighand = tracer->sighand;
2126 if (t_sighand != p_sighand)
2127 spin_unlock(&t_sighand->siglock);
2128 }
2129 }
2130

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-05-23 06:43:15

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

Hi,

On Thu, May 19, 2022 at 4:49 PM Eric W. Biederman <[email protected]> wrote:
>
> Doug Anderson <[email protected]> writes:
>
> > Hi,
> >
> > On Wed, May 18, 2022 at 3:54 PM Eric W. Biederman <[email protected]> wrote:
> >>
> >> kdb has a bug that when using the ps command to display a list of
> >> processes, if a process is being debugged the debugger as the parent
> >> process.
> >>
> >> This is silly, and I expect it never comes up in ptractice. As there
> >> is very little point in using gdb and kdb simultaneously. Update the
> >> code to use real_parent so that it is clear kdb does not want to
> >> display a debugger as the parent of a process.
> >
> > So I would tend to defer to Daniel, but I'm not convinced that the
> > behavior you describe for kdb today _is_ actually silly.
> >
> > If I was in kdb and I was listing processes, I might actually want to
> > see that a process's parent was set to gdb. Presumably that would tell
> > me extra information that might be relevant to my debug session.
> >
> > Personally, I'd rather add an extra piece of information into the list
> > showing the real parent if it's not the same as the parent. Then
> > you're not throwing away information.
>
> The name of the field is confusing for anyone who isn't intimate with
> the implementation details. The function getppid returns
> tsk->real_parent->tgid.
>
> If kdb wants information of what the tracer is that is fine, but I
> recommend putting that information in another field.
>
> Given that the original description says give the information that ps
> gives my sense is that kdb is currently wrong. Especially as it does
> not give you the actual parentage anywhere.
>
> I can certainly be convinced, but I do want some clarity. It looks very
> attractive to rename task->parent to task->ptracer and leave the field
> NULL when there is no tracer.

Fair enough. You can consider my objection rescinded.

Presumably, though, you're hoping for an Ack for your patch and you
plan to take it with the rest of the series. That's going to need to
come from Daniel anyway as he is the actual maintainer. I'm just the
peanut gallery. ;-)

-Doug

2022-05-23 06:58:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

Sebastian Andrzej Siewior <[email protected]> writes:

> On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>>
>> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
>> ptrace_freeze_traced has completed successfully. Which fundamentally
>> means the lock dance of dropping siglock and grabbing tasklist_lock does
>> not work on PREEMPT_RT. So I have worked through what is necessary so
>> that tasklist_lock does not need to be grabbed in ptrace_stop after
>> siglock is dropped.
> …
> It took me a while to realise that this is a follow-up I somehow assumed
> that you added a few patches on top. Might have been the yesterday's
> heat. b4 also refused to download this series because the v4 in this
> thread looked newer… Anyway. Both series applied:
>
> | =============================
> | WARNING: suspicious RCU usage
> | 5.18.0-rc7+ #16 Not tainted
> | -----------------------------
> | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> |
> | other info that might help us debug this:
> |
> | rcu_scheduler_active = 2, debug_locks = 1
> | 2 locks held by ssdd/1734:
> | #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> | #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> |
> | stack backtrace:
> | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> | Call Trace:
> | <TASK>
> | dump_stack_lvl+0x45/0x5a
> | unlock_parents_siglocks+0xb6/0xc0
> | ptrace_stop+0xb9/0x390
> | get_signal+0x51c/0x8d0
> | arch_do_signal_or_restart+0x31/0x750
> | exit_to_user_mode_prepare+0x157/0x220
> | irqentry_exit_to_user_mode+0x5/0x50
> | asm_sysvec_apic_timer_interrupt+0x12/0x20
>
> That is ptrace_parent() in unlock_parents_siglocks().

How odd. I thought I had the appropriate lockdep config options enabled
in my test build to catch things like this. I guess not.

Now I am trying to think how to tell it that holding the appropriate
iglock makes this ok.

Eric

2022-05-23 07:07:31

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 03/16] kdb: Use real_parent when displaying a list of processes

Peter Zijlstra <[email protected]> writes:

> On Wed, May 18, 2022 at 05:53:42PM -0500, Eric W. Biederman wrote:
>> kdb has a bug that when using the ps command to display a list of
>> processes, if a process is being debugged the debugger as the parent
>> process.
>>
>> This is silly, and I expect it never comes up in ptractice. As there
> ^^^^^^^^^
>
> Lol, love the new word :-)

It wasn't intentional but now I just might have to keep it.

Eric


2022-05-23 07:16:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On Fri, May 20, 2022 at 02:32:24PM -0500, Eric W. Biederman wrote:
> Sebastian Andrzej Siewior <[email protected]> writes:
>
> > On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
> >>
> >> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> >> ptrace_freeze_traced has completed successfully. Which fundamentally
> >> means the lock dance of dropping siglock and grabbing tasklist_lock does
> >> not work on PREEMPT_RT. So I have worked through what is necessary so
> >> that tasklist_lock does not need to be grabbed in ptrace_stop after
> >> siglock is dropped.
> > …
> > It took me a while to realise that this is a follow-up I somehow assumed
> > that you added a few patches on top. Might have been the yesterday's
> > heat. b4 also refused to download this series because the v4 in this
> > thread looked newer… Anyway. Both series applied:
> >
> > | =============================
> > | WARNING: suspicious RCU usage
> > | 5.18.0-rc7+ #16 Not tainted
> > | -----------------------------
> > | include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
> > |
> > | other info that might help us debug this:
> > |
> > | rcu_scheduler_active = 2, debug_locks = 1
> > | 2 locks held by ssdd/1734:
> > | #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
> > | #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
> > |
> > | stack backtrace:
> > | CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
> > | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> > | Call Trace:
> > | <TASK>
> > | dump_stack_lvl+0x45/0x5a
> > | unlock_parents_siglocks+0xb6/0xc0
> > | ptrace_stop+0xb9/0x390
> > | get_signal+0x51c/0x8d0
> > | arch_do_signal_or_restart+0x31/0x750
> > | exit_to_user_mode_prepare+0x157/0x220
> > | irqentry_exit_to_user_mode+0x5/0x50
> > | asm_sysvec_apic_timer_interrupt+0x12/0x20
> >
> > That is ptrace_parent() in unlock_parents_siglocks().
>
> How odd. I thought I had the appropriate lockdep config options enabled
> in my test build to catch things like this. I guess not.
>
> Now I am trying to think how to tell it that holding the appropriate
> iglock makes this ok.

The typical annotation is something like:

rcu_dereference_protected(foo, lockdep_is_held(&bar))

Except in this case I think the problem is that bar depends on foo in
non-trivial ways. That is, foo is 'task->parent' and bar is
'task->parent->sighand->siglock' or something.

The other option is to use rcu_dereference_raw() in this one instance
and have a comment that explains the situation.

Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On 2022-05-18 17:49:50 [-0500], Eric W. Biederman wrote:
>
> For ptrace_stop to work on PREEMT_RT no spinlocks can be taken once
> ptrace_freeze_traced has completed successfully. Which fundamentally
> means the lock dance of dropping siglock and grabbing tasklist_lock does
> not work on PREEMPT_RT. So I have worked through what is necessary so
> that tasklist_lock does not need to be grabbed in ptrace_stop after
> siglock is dropped.
…
It took me a while to realise that this is a follow-up I somehow assumed
that you added a few patches on top. Might have been the yesterday's
heat. b4 also refused to download this series because the v4 in this
thread looked newer… Anyway. Both series applied:

| =============================
| WARNING: suspicious RCU usage
| 5.18.0-rc7+ #16 Not tainted
| -----------------------------
| include/linux/ptrace.h:120 suspicious rcu_dereference_check() usage!
|
| other info that might help us debug this:
|
| rcu_scheduler_active = 2, debug_locks = 1
| 2 locks held by ssdd/1734:
| #0: ffff88800eaa6918 (&sighand->siglock){....}-{2:2}, at: lock_parents_siglocks+0xf0/0x3b0
| #1: ffff88800eaa71d8 (&sighand->siglock/2){....}-{2:2}, at: lock_parents_siglocks+0x115/0x3b0
|
| stack backtrace:
| CPU: 2 PID: 1734 Comm: ssdd Not tainted 5.18.0-rc7+ #16
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
| Call Trace:
| <TASK>
| dump_stack_lvl+0x45/0x5a
| unlock_parents_siglocks+0xb6/0xc0
| ptrace_stop+0xb9/0x390
| get_signal+0x51c/0x8d0
| arch_do_signal_or_restart+0x31/0x750
| exit_to_user_mode_prepare+0x157/0x220
| irqentry_exit_to_user_mode+0x5/0x50
| asm_sysvec_apic_timer_interrupt+0x12/0x20

That is ptrace_parent() in unlock_parents_siglocks().

Sebastian

2022-05-24 17:44:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach

Sorry for delay.

On 05/18, Eric W. Biederman wrote:
>
> Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> been impossible to attach another thread in the same thread group.
>
> Remove the code from __ptrace_detach that was trying to support
> detaching from a thread in the same thread group.

may be I am totally confused, but I think you misunderstood this code
and thus this patch is very wrong.

The same_thread_group() check does NOT try to check if debugger and
tracee is in the same thread group, this is indeed impossible.

We need this check to know if the tracee was ptrace_reparented() before
__ptrace_unlink() or not.


> -static int ignoring_children(struct sighand_struct *sigh)
> -{
> - int ret;
> - spin_lock(&sigh->siglock);
> - ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> - (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> - spin_unlock(&sigh->siglock);
> - return ret;
> -}

...

> @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>
> dead = !thread_group_leader(p);
>
> - if (!dead && thread_group_empty(p)) {
> - if (!same_thread_group(p->real_parent, tracer))
> - dead = do_notify_parent(p, p->exit_signal);
> - else if (ignoring_children(tracer->sighand)) {
> - __wake_up_parent(p, tracer);
> - dead = true;
> - }
> - }

So the code above does:

- if !same_thread_group(p->real_parent, tracer), then the tracee was
ptrace_reparented(), and now we need to notify its natural parent
to let it know it has a zombie child.

- otherwise, the tracee is our natural child, and it is actually dead.
however, since we are going to reap this task, we need to wake up our
sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.

See?

> + if (!dead && thread_group_empty(p))
> + dead = do_notify_parent(p, p->exit_signal);

No, this looks wrong. Or I missed something?

Oleg.


2022-05-24 19:19:38

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace

On 05/18, Eric W. Biederman wrote:
>
> The code in ptrace_signal to populate siginfo if the signal number
> changed is buggy. If the tracer contined the tracee using
> ptrace_detach it is guaranteed to use the real_parent (or possibly a
> new tracer) but definitely not the origional tracer to populate si_pid
> and si_uid.

I guess nobody cares. As the comment says

If the debugger wanted something
specific in the siginfo structure then it should
have updated *info via PTRACE_SETSIGINFO.

otherwise I don't think si_pid/si_uid have any value.

However the patch looks fine to me, just the word "buggy" looks a bit
too strong imo.

Oleg.


2022-05-25 03:35:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 07/16] signal: Wake up the designated parent

I fail to understand this patch...

On 05/18, Eric W. Biederman wrote:
>
> Today if a process is ptraced only the ptracer will ever be woken up in
> wait

and why is this wrong?

> Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")

how does this change fix 75b95953a569?

> static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
> int sync, void *key)
> {
> struct wait_opts *wo = container_of(wait, struct wait_opts,
> child_wait);
> - struct task_struct *p = key;
> + struct child_wait_info *info = key;
>
> - if (!eligible_pid(wo, p))
> + if (!eligible_pid(wo, info->p))
> return 0;
>
> - if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> - return 0;
> + if ((wo->wo_flags & __WNOTHREAD) && (wait->private != info->parent))
> + return 0;

So. wait->private is the task T which sleeping on wait_chldexit.

Before the patch the logic is clear. T called do_wait(__WNOTHREAD) and
we do not need to wake it up if it is not the "actual" parent of p.

After the patch we check it T is actual to the "parent" arg passed to
__wake_up_parent(). Why??? This arg is only used to find the
->signal->wait_chldexit wait_queue_head, and this is fine.

As I said, I don't understand this patch. But at least this change is
wrong in case when __wake_up_parent() is calles by __ptrace_detach().
(you removed it in 5/16 but this looks wrong too). Sure, we can change
ptrace_detach() to use __wake_up_parent(p, p->parent), but for what?

I must have missed something.

Oleg.


2022-05-25 03:48:54

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 06/16] ptrace: Remove unnecessary locking in ptrace_(get|set)siginfo

On 05/18, Eric W. Biederman wrote:
>
> Since commit 9899d11f6544 ("ptrace: ensure arch_ptrace/ptrace_request
> can never race with SIGKILL") it has been unnecessary for
> ptrace_getsiginfo and ptrace_setsiginfo to use lock_task_sighand.

ACK


2022-05-25 12:40:57

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 07/16] signal: Wake up the designated parent

On 05/24, Oleg Nesterov wrote:
>
> I fail to understand this patch...
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Today if a process is ptraced only the ptracer will ever be woken up in
> > wait
>
> and why is this wrong?
>
> > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>
> how does this change fix 75b95953a569?

OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
the problematic case is current->ptrace == T. Right?

I dislike this patch anyway, but let me think more about it.

Oleg.


2022-05-26 02:17:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 07/16] signal: Wake up the designated parent

On 05/24, Oleg Nesterov wrote:
>
> On 05/24, Oleg Nesterov wrote:
> >
> > I fail to understand this patch...
> >
> > On 05/18, Eric W. Biederman wrote:
> > >
> > > Today if a process is ptraced only the ptracer will ever be woken up in
> > > wait
> >
> > and why is this wrong?
> >
> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
> >
> > how does this change fix 75b95953a569?
>
> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
> the problematic case is current->ptrace == T. Right?
>
> I dislike this patch anyway, but let me think more about it.

OK, now that I understand the problem, the patch doesn't look bad to me,
although I'd ask to make the changelog more clear.

After this change __wake_up_parent() can't accept any "parent" from
p->parent thread group, but all callers look fine except ptrace_detach().

Oleg.


2022-05-26 08:52:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach

On 05/24, Oleg Nesterov wrote:
>
> Sorry for delay.
>
> On 05/18, Eric W. Biederman wrote:
> >
> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
> > been impossible to attach another thread in the same thread group.
> >
> > Remove the code from __ptrace_detach that was trying to support
> > detaching from a thread in the same thread group.
>
> may be I am totally confused, but I think you misunderstood this code
> and thus this patch is very wrong.
>
> The same_thread_group() check does NOT try to check if debugger and
> tracee is in the same thread group, this is indeed impossible.
>
> We need this check to know if the tracee was ptrace_reparented() before
> __ptrace_unlink() or not.
>
>
> > -static int ignoring_children(struct sighand_struct *sigh)
> > -{
> > - int ret;
> > - spin_lock(&sigh->siglock);
> > - ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
> > - (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
> > - spin_unlock(&sigh->siglock);
> > - return ret;
> > -}
>
> ...
>
> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
> >
> > dead = !thread_group_leader(p);
> >
> > - if (!dead && thread_group_empty(p)) {
> > - if (!same_thread_group(p->real_parent, tracer))
> > - dead = do_notify_parent(p, p->exit_signal);
> > - else if (ignoring_children(tracer->sighand)) {
> > - __wake_up_parent(p, tracer);
> > - dead = true;
> > - }
> > - }
>
> So the code above does:
>
> - if !same_thread_group(p->real_parent, tracer), then the tracee was
> ptrace_reparented(), and now we need to notify its natural parent
> to let it know it has a zombie child.
>
> - otherwise, the tracee is our natural child, and it is actually dead.
> however, since we are going to reap this task, we need to wake up our
> sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>
> See?
>
> > + if (!dead && thread_group_empty(p))
> > + dead = do_notify_parent(p, p->exit_signal);
>
> No, this looks wrong. Or I missed something?

Yes, but...

That said, it seems that we do not need __wake_up_parent() if it was our
natural child?

I'll recheck. Eric, I'll continue to read this series tomorrow, can't
concentrate on ptrace today.

Oleg.


2022-06-06 16:23:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 05/16] ptrace: Remove dead code from __ptrace_detach

Oleg Nesterov <[email protected]> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> Sorry for delay.
>>
>> On 05/18, Eric W. Biederman wrote:
>> >
>> > Ever since commit 28d838cc4dfe ("Fix ptrace self-attach rule") it has
>> > been impossible to attach another thread in the same thread group.
>> >
>> > Remove the code from __ptrace_detach that was trying to support
>> > detaching from a thread in the same thread group.
>>
>> may be I am totally confused, but I think you misunderstood this code
>> and thus this patch is very wrong.
>>
>> The same_thread_group() check does NOT try to check if debugger and
>> tracee is in the same thread group, this is indeed impossible.
>>
>> We need this check to know if the tracee was ptrace_reparented() before
>> __ptrace_unlink() or not.
>>
>>
>> > -static int ignoring_children(struct sighand_struct *sigh)
>> > -{
>> > - int ret;
>> > - spin_lock(&sigh->siglock);
>> > - ret = (sigh->action[SIGCHLD-1].sa.sa_handler == SIG_IGN) ||
>> > - (sigh->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT);
>> > - spin_unlock(&sigh->siglock);
>> > - return ret;
>> > -}
>>
>> ...
>>
>> > @@ -565,14 +552,9 @@ static bool __ptrace_detach(struct task_struct *tracer, struct task_struct *p)
>> >
>> > dead = !thread_group_leader(p);
>> >
>> > - if (!dead && thread_group_empty(p)) {
>> > - if (!same_thread_group(p->real_parent, tracer))
>> > - dead = do_notify_parent(p, p->exit_signal);
>> > - else if (ignoring_children(tracer->sighand)) {
>> > - __wake_up_parent(p, tracer);
>> > - dead = true;
>> > - }
>> > - }
>>
>> So the code above does:
>>
>> - if !same_thread_group(p->real_parent, tracer), then the tracee was
>> ptrace_reparented(), and now we need to notify its natural parent
>> to let it know it has a zombie child.
>>
>> - otherwise, the tracee is our natural child, and it is actually dead.
>> however, since we are going to reap this task, we need to wake up our
>> sub-threads possibly sleeping on ->wait_chldexit wait_queue_head_t.
>>
>> See?
>>
>> > + if (!dead && thread_group_empty(p))
>> > + dead = do_notify_parent(p, p->exit_signal);
>>
>> No, this looks wrong. Or I missed something?
>
> Yes, but...
>
> That said, it seems that we do not need __wake_up_parent() if it was our
> natural child?

Agreed on both counts.

Hmm. I see where the logic comes from. The ignoring_children test and
the __wake_up_parent are what do_notify_parent does when the parent
ignores children. Hmm. I even see all of this document in the comment
above __ptrace_detach.

So I am just going to drop this change.

> I'll recheck. Eric, I'll continue to read this series tomorrow, can't
> concentrate on ptrace today.

No worries. This was entirely too close to the merge window so I
dropped it all until today.

Eric

2022-06-06 16:31:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

Kyle Huey <[email protected]> writes:

> On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> <[email protected]> wrote:
>>
>> Sebastian Andrzej Siewior <[email protected]> writes:
>>
>> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
>> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
>> >> cleanly to Linus's tip.
>> >
>> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
>>
>> Yes that is the branch this all applies to.
>>
>> This is my second round of cleanups this cycle for this code.
>> I just keep finding little things that deserve to be changed,
>> when I am working on the more substantial issues.
>>
>> Eric
>
> When running the rr test suite, I see hangs like this

Thanks. I will dig into this.

Is there an easy way I can run the rr test suite to see if I can
reproduce this myself?

Thanks,
Eric

>
> [ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> [condvar_stress-:12152]
> [ 812.151529] Modules linked in: snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> snd_hda_codec_
> hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> [ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy
> i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> xhci_pci_renesas wmi video
> [ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> I L 5.18.0-rc1+ #2
> [ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> [ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> [ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> a c1 9a 5f 85 c0 74 02 5d
> [ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> [ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> [ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> [ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> [ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> [ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> [ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> knlGS:0000000000000000
> [ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> [ 812.151601] Call Trace:
> [ 812.151602] <TASK>
> [ 812.151604] do_signal_stop+0x228/0x260
> [ 812.151606] get_signal+0x43a/0x8e0
> [ 812.151608] arch_do_signal_or_restart+0x37/0x7d0
> [ 812.151610] ? __this_cpu_preempt_check+0x13/0x20
> [ 812.151612] ? __perf_event_task_sched_in+0x81/0x230
> [ 812.151616] ? __this_cpu_preempt_check+0x13/0x20
> [ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0
> [ 812.151620] syscall_exit_to_user_mode+0x26/0x40
> [ 812.151621] ret_from_fork+0x15/0x30
> [ 812.151623] RIP: 0033:0x7f612dfcd125
> [ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> 0 48 89 c7 b8 3c 00 00 00
> [ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> [ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> [ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> [ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> [ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> [ 812.151632] </TASK>
>
> - Kyle

2022-06-07 06:51:42

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 07/16] signal: Wake up the designated parent

Oleg Nesterov <[email protected]> writes:

> On 05/24, Oleg Nesterov wrote:
>>
>> On 05/24, Oleg Nesterov wrote:
>> >
>> > I fail to understand this patch...
>> >
>> > On 05/18, Eric W. Biederman wrote:
>> > >
>> > > Today if a process is ptraced only the ptracer will ever be woken up in
>> > > wait
>> >
>> > and why is this wrong?
>> >
>> > > Fixes: 75b95953a569 ("job control: Add @for_ptrace to do_notify_parent_cldstop()")
>> >
>> > how does this change fix 75b95953a569?
>>
>> OK, I guess you mean the 2nd do_notify_parent_cldstop() in ptrace_stop(),
>> the problematic case is current->ptrace == T. Right?
>>
>> I dislike this patch anyway, but let me think more about it.
>
> OK, now that I understand the problem, the patch doesn't look bad to me,
> although I'd ask to make the changelog more clear.

I will see what I can do.

> After this change __wake_up_parent() can't accept any "parent" from
> p->parent thread group, but all callers look fine except
> ptrace_detach().

Having looked at it a little more I think the change was too
restrictive. For the !ptrace_reparented case there are possibly
two threads of the parent process that wait_consider_task will
allow to wait even with __WNOTHREAD specified. It is desirable
to wake them both up.

Which if I have had enough sleep reduces this patch to just:

diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7..c8156366b722 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
if (!eligible_pid(wo, p))
return 0;

- if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
- return 0;
+ if ((wo->wo_flags & __WNOTHREAD) &&
+ (wait->private != p->parent) &&
+ (wait->private != p->real_parent))
+ return 0;

return default_wake_function(wait, mode, sync, key);
}


I think that solves the issue without missing wake-ups without adding
any more.

For the same set of reasons it looks like the __wake_up_parent in
__ptrace_detach is just simply dead code. I don't think there is a case
where when !ptrace_reparented the thread that is the real_parent can
sleep in do_wait when the thread that was calling ptrace could not.

That needs a very close look to confirm.

Eric

2022-06-07 15:41:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace

Oleg Nesterov <[email protected]> writes:

> On 05/18, Eric W. Biederman wrote:
>>
>> The code in ptrace_signal to populate siginfo if the signal number
>> changed is buggy. If the tracer contined the tracee using
>> ptrace_detach it is guaranteed to use the real_parent (or possibly a
>> new tracer) but definitely not the origional tracer to populate si_pid
>> and si_uid.
>
> I guess nobody cares. As the comment says
>
> If the debugger wanted something
> specific in the siginfo structure then it should
> have updated *info via PTRACE_SETSIGINFO.
>
> otherwise I don't think si_pid/si_uid have any value.

No one has complained so it is clearly no one cares. So it is
definitely not a regression. Or even anything that needs to be
backported.

However si_pid and si_uid are defined with SI_USER are defined
to be whomever sent the signal. So I would argue by definition
those values are wrong.

> However the patch looks fine to me, just the word "buggy" looks a bit
> too strong imo.

I guess I am in general agreement. Perhaps I can just say they values
are wrong by definition?

Eric


2022-06-08 04:59:23

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 07/16] signal: Wake up the designated parent

On 06/06, Eric W. Biederman wrote:
>
> Which if I have had enough sleep reduces this patch to just:
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f072959fcab7..c8156366b722 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1431,8 +1431,10 @@ static int child_wait_callback(wait_queue_entry_t *wait, unsigned mode,
> if (!eligible_pid(wo, p))
> return 0;
>
> - if ((wo->wo_flags & __WNOTHREAD) && wait->private != p->parent)
> - return 0;
> + if ((wo->wo_flags & __WNOTHREAD) &&
> + (wait->private != p->parent) &&
> + (wait->private != p->real_parent))
> + return 0;
>
> return default_wake_function(wait, mode, sync, key);
> }
>
>
> I think that solves the issue without missing wake-ups without adding
> any more.

Agreed, and looks much simpler.

> For the same set of reasons it looks like the __wake_up_parent in
> __ptrace_detach is just simply dead code. I don't think there is a case
> where when !ptrace_reparented the thread that is the real_parent can
> sleep in do_wait when the thread that was calling ptrace could not.

Yes... this doesn't really differ from the case when one thread reaps
a natural child and another thread sleep in do_wait().

> That needs a very close look to confirm.

Yes.

Oleg.

2022-06-08 05:01:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 08/16] ptrace: Only populate last_siginfo from ptrace

On 06/06, Eric W. Biederman wrote:
>
> Oleg Nesterov <[email protected]> writes:
>
> > However the patch looks fine to me, just the word "buggy" looks a bit
> > too strong imo.
>
> I guess I am in general agreement. Perhaps I can just say they values
> are wrong by definition?

Up to you. I won't really argue with "buggy".

Oleg.

2022-06-09 21:10:54

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH 00/16] ptrace: cleanups and calling do_cldstop with only siglock

On Mon, Jun 6, 2022 at 9:12 AM Eric W. Biederman <[email protected]> wrote:
>
> Kyle Huey <[email protected]> writes:
>
> > On Thu, May 19, 2022 at 11:05 AM Eric W. Biederman
> > <[email protected]> wrote:
> >>
> >> Sebastian Andrzej Siewior <[email protected]> writes:
> >>
> >> > On 2022-05-18 20:26:05 [-0700], Kyle Huey wrote:
> >> >> Is there a git branch somewhere I can pull to test this? It doesn't apply
> >> >> cleanly to Linus's tip.
> >> >
> >> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/ebiederm/user-namespace.git ptrace_stop-cleanup-for-v5.19
> >>
> >> Yes that is the branch this all applies to.
> >>
> >> This is my second round of cleanups this cycle for this code.
> >> I just keep finding little things that deserve to be changed,
> >> when I am working on the more substantial issues.
> >>
> >> Eric
> >
> > When running the rr test suite, I see hangs like this
>
> Thanks. I will dig into this.
>
> Is there an easy way I can run the rr test suite to see if I can
> reproduce this myself?

It should be a straight forward
1. https://github.com/rr-debugger/rr.git
2. mkdir obj-rr && cd obj-rr
3. cmake ../rr
4. make -jN
5. make check

If you have trouble with it feel free to email me off list.

- Kyle

> Thanks,
> Eric
>
> >
> > [ 812.151505] watchdog: BUG: soft lockup - CPU#3 stuck for 548s!
> > [condvar_stress-:12152]
> > [ 812.151529] Modules linked in: snd_hda_codec_realtek
> > snd_hda_codec_generic ledtrig_audio rfcomm cmac algif_hash
> > algif_skcipher af_alg bnep dm_crypt intel_rapl_msr mei_hdcp
> > snd_hda_codec_
> > hdmi intel_rapl_common snd_hda_intel x86_pkg_temp_thermal
> > snd_intel_dspcfg snd_intel_sdw_acpi nls_iso8859_1 intel_powerclamp
> > snd_hda_codec coretemp snd_hda_core snd_hwdep snd_pcm rtl8723be
> > btcoexist snd_seq_midi snd_seq_midi_event rtl8723_common kvm_intel
> > rtl_pci snd_rawmidi rtlwifi btusb btrtl btbcm snd_seq kvm mac80211
> > btintel btmtk snd_seq_device rapl bluetooth snd_timer i
> > ntel_cstate hp_wmi cfg80211 serio_raw snd platform_profile
> > ecdh_generic mei_me sparse_keymap efi_pstore wmi_bmof ee1004 joydev
> > input_leds ecc libarc4 soundcore mei acpi_pad mac_hid sch_fq_c
> > odel ipmi_devintf ipmi_msghandler msr vhost_vsock
> > vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb
> > tap vhci_hcd usbip_core parport_pc ppdev lp parport ip_tables x_tables
> > autofs4 btrfs blake2b_generic xor raid6_pq zstd_compress
> > [ 812.151570] libcrc32c hid_generic usbhid hid i915 drm_buddy
> > i2c_algo_bit ttm drm_dp_helper cec rc_core crct10dif_pclmul
> > drm_kms_helper crc32_pclmul syscopyarea ghash_clmulni_intel sysfi
> > llrect sysimgblt fb_sys_fops aesni_intel crypto_simd cryptd r8169
> > psmouse drm i2c_i801 realtek ahci i2c_smbus xhci_pci libahci
> > xhci_pci_renesas wmi video
> > [ 812.151584] CPU: 3 PID: 12152 Comm: condvar_stress- Tainted: G
> > I L 5.18.0-rc1+ #2
> > [ 812.151586] Hardware name: HP 750-280st/2B4B, BIOS A0.11 02/24/2016
> > [ 812.151587] RIP: 0010:_raw_spin_unlock_irq+0x15/0x40
> > [ 812.151591] Code: df e8 3f 1f 4a ff 90 5b 5d c3 66 66 2e 0f 1f 84
> > 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 00 fb 0f 1f
> > 44 00 00 <bf> 01 00 00 00 e8 41 95 46 ff 65 8b 05 9
> > a c1 9a 5f 85 c0 74 02 5d
> > [ 812.151593] RSP: 0018:ffffa863c246bd70 EFLAGS: 00000246
> > [ 812.151594] RAX: ffff8bc0913f6400 RBX: ffff8bc0913f6400 RCX: 0000000000000000
> > [ 812.151595] RDX: 0000000000000002 RSI: 00000000000a0013 RDI: ffff8bc089b63180
> > [ 812.151596] RBP: ffffa863c246bd70 R08: ffff8bc0811d6b40 R09: ffff8bc089b63180
> > [ 812.151597] R10: 0000000000000000 R11: 0000000000000004 R12: ffff8bc0913f6400
> > [ 812.151597] R13: ffff8bc089b63180 R14: ffff8bc0913f6400 R15: ffffa863c246be68
> > [ 812.151598] FS: 00007f612dda5700(0000) GS:ffff8bc7e24c0000(0000)
> > knlGS:0000000000000000
> > [ 812.151599] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 812.151600] CR2: 000055e70715692e CR3: 000000010b4e8005 CR4: 00000000003706e4
> > [ 812.151601] Call Trace:
> > [ 812.151602] <TASK>
> > [ 812.151604] do_signal_stop+0x228/0x260
> > [ 812.151606] get_signal+0x43a/0x8e0
> > [ 812.151608] arch_do_signal_or_restart+0x37/0x7d0
> > [ 812.151610] ? __this_cpu_preempt_check+0x13/0x20
> > [ 812.151612] ? __perf_event_task_sched_in+0x81/0x230
> > [ 812.151616] ? __this_cpu_preempt_check+0x13/0x20
> > [ 812.151617] exit_to_user_mode_prepare+0x130/0x1a0
> > [ 812.151620] syscall_exit_to_user_mode+0x26/0x40
> > [ 812.151621] ret_from_fork+0x15/0x30
> > [ 812.151623] RIP: 0033:0x7f612dfcd125
> > [ 812.151625] Code: 48 85 ff 74 3d 48 85 f6 74 38 48 83 ee 10 48 89
> > 4e 08 48 89 3e 48 89 d7 4c 89 c2 4d 89 c8 4c 8b 54 24 08 b8 38 00 00
> > 00 0f 05 <48> 85 c0 7c 13 74 01 c3 31 ed 58 5f ff d
> > 0 48 89 c7 b8 3c 00 00 00
> > [ 812.151626] RSP: 002b:00007f612dda4fb0 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000038
> > [ 812.151628] RAX: 0000000000000000 RBX: 00007f612dda5700 RCX: ffffffffffffffff
> > [ 812.151628] RDX: 00007f612dda59d0 RSI: 00007f612dda4fb0 RDI: 00000000003d0f00
> > [ 812.151629] RBP: 00007ffd59ad20b0 R08: 00007f612dda5700 R09: 00007f612dda5700
> > [ 812.151630] R10: 00007f612dda59d0 R11: 0000000000000246 R12: 00007ffd59ad20ae
> > [ 812.151631] R13: 00007ffd59ad20af R14: 00007ffd59ad20b0 R15: 00007f612dda4fc0
> > [ 812.151632] </TASK>
> >
> > - Kyle

2022-06-22 17:32:55

by Eric W. Biederman

[permalink] [raw]
Subject: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT


Recently I had a conversation where it was pointed out to me that
SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
difficult for a tracer to handle.

Keeping SIGKILL working for anything after the process has been killed
is also a real pain from an implementation point of view.

So I am attempting to remove this wart in the userspace API and see
if anyone cares.

Eric W. Biederman (3):
signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
signal: Drop signals received after a fatal signal has been processed

fs/coredump.c | 2 +-
include/linux/sched/signal.h | 1 +
kernel/exit.c | 20 +++++++++++++++++++-
kernel/fork.c | 2 ++
kernel/signal.c | 3 ++-
5 files changed, 25 insertions(+), 3 deletions(-)

Eric

2022-06-23 15:29:16

by Alexander Gordeev

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

On Wed, Jun 22, 2022 at 11:43:37AM -0500, Eric W. Biederman wrote:
> Recently I had a conversation where it was pointed out to me that
> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> difficult for a tracer to handle.
>
> Keeping SIGKILL working for anything after the process has been killed
> is also a real pain from an implementation point of view.
>
> So I am attempting to remove this wart in the userspace API and see
> if anyone cares.

Hi Eric,

With this series s390 hits the warning exactly same way. Is that expected?

Thanks!

2022-06-23 22:09:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

Alexander Gordeev <[email protected]> writes:

> On Wed, Jun 22, 2022 at 11:43:37AM -0500, Eric W. Biederman wrote:
>> Recently I had a conversation where it was pointed out to me that
>> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>> difficult for a tracer to handle.
>>
>> Keeping SIGKILL working for anything after the process has been killed
>> is also a real pain from an implementation point of view.
>>
>> So I am attempting to remove this wart in the userspace API and see
>> if anyone cares.
>
> Hi Eric,
>
> With this series s390 hits the warning exactly same way. Is that expected?

Yes. I was working on this before I got your mysterious bug report. I
included you because I am including everyone I know who deals with the
userspace side of this since I am very deliberately changing the user
visible behavior of PTRACE_EVENT_EXIT.

I am going to start seeing if I can find any possible explanation for
your regression report. Since I don't have much to go on I expect I
will have to revert the last change in my ptrace_stop series that
apparently triggers the WARN_ON you reported. I really would have
expected the WARN_ON to be triggered in the patch in which it was
introduced, not the final patch in the series.


To the best of my knowledge changing PTRACE_EVENT_EXIT is both desirable
from a userspace semantics standpoint and from a kernel implementation
standpoint. If someone knows any differently and depends upon sending
SIGKILL to processes in PTRACE_EVENT_EXIT to steal the process away from
the tracer I would love to hear about that case.

Eric

2022-07-08 22:45:36

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

"Eric W. Biederman" <[email protected]> writes:

> Recently I had a conversation where it was pointed out to me that
> SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> difficult for a tracer to handle.
>
> Keeping SIGKILL working for anything after the process has been killed
> is also a real pain from an implementation point of view.
>
> So I am attempting to remove this wart in the userspace API and see
> if anyone cares.
>
> Eric W. Biederman (3):
> signal: Ensure SIGNAL_GROUP_EXIT gets set in do_group_exit
> signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit
> signal: Drop signals received after a fatal signal has been processed
>
> fs/coredump.c | 2 +-
> include/linux/sched/signal.h | 1 +
> kernel/exit.c | 20 +++++++++++++++++++-
> kernel/fork.c | 2 ++
> kernel/signal.c | 3 ++-
> 5 files changed, 25 insertions(+), 3 deletions(-)

RR folks any comments?

Did I properly understand what Keno Fischer was asking for when we
talked in person?

Eric

2022-07-08 23:39:20

by Keno Fischer

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

Hi Eric,

On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <[email protected]> wrote:
> > Recently I had a conversation where it was pointed out to me that
> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> > difficult for a tracer to handle.
> >
>
> RR folks any comments?
>
> Did I properly understand what Keno Fischer was asking for when we
> talked in person?

Yes, this is indeed what I had in mind. I have not yet had the opportunity
to try out your patch series (sorry), but from visual inspection, it does indeed
do what I wanted, which is to make sure that a tracee stays in
PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
SIGKILL incoming simultaneously (since otherwise it may be impossible
for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
come in rapid succession). I will try to take this series for a proper spin
shortly.

Keno

2022-07-12 20:54:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

Keno Fischer <[email protected]> writes:

> Hi Eric,
>
> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <[email protected]> wrote:
>> > Recently I had a conversation where it was pointed out to me that
>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>> > difficult for a tracer to handle.
>> >
>>
>> RR folks any comments?
>>
>> Did I properly understand what Keno Fischer was asking for when we
>> talked in person?
>
> Yes, this is indeed what I had in mind. I have not yet had the opportunity
> to try out your patch series (sorry), but from visual inspection, it does indeed
> do what I wanted, which is to make sure that a tracee stays in
> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
> SIGKILL incoming simultaneously (since otherwise it may be impossible
> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
> come in rapid succession). I will try to take this series for a proper spin
> shortly.

Thanks,

I haven't yet figured out how to get the rr test suite to run
successfully. Something about my test machine and lack of perf counters
seems to be causing problems. So if you can perform the testing on your
side that would be fantastic.

Eric

2022-07-16 21:50:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

"Eric W. Biederman" <[email protected]> writes:

> Keno Fischer <[email protected]> writes:
>
>> Hi Eric,
>>
>> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <[email protected]> wrote:
>>> > Recently I had a conversation where it was pointed out to me that
>>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
>>> > difficult for a tracer to handle.
>>> >
>>>
>>> RR folks any comments?
>>>
>>> Did I properly understand what Keno Fischer was asking for when we
>>> talked in person?
>>
>> Yes, this is indeed what I had in mind. I have not yet had the opportunity
>> to try out your patch series (sorry), but from visual inspection, it does indeed
>> do what I wanted, which is to make sure that a tracee stays in
>> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
>> SIGKILL incoming simultaneously (since otherwise it may be impossible
>> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
>> come in rapid succession). I will try to take this series for a proper spin
>> shortly.
>
> Thanks,
>
> I haven't yet figured out how to get the rr test suite to run
> successfully. Something about my test machine and lack of perf counters
> seems to be causing problems. So if you can perform the testing on your
> side that would be fantastic.

Ok. I finally found a machine where I can run rr and the rr test suite.

It looks like there are a couple of the rr 5.5.0 test that fail on
Linus's lastest kernel simply because of changes in kernel behavior. In
particular clone_cleartid_coredump, and fcntl_rw_hints. The
clone_cleartid_coredump appears to fail because SIGSEGV no longer kills
all processes that share an mm. Which was a deliberate change.

With the lastest development version of rr, only detach_sigkill appears
to be failing on Linus's latest. That failure appears to be independent
of the patches in question as well. When run manually the
detach_sigkill test succeeds so I am not quite certain what is going on,
any thoughts?

As for my patchset it looks like it does not cause any new test failures
for rr so I will plan on getting it into linux-next shortly.

Eric

2022-07-16 23:40:32

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH 0/3] ptrace: Stop supporting SIGKILL for PTRACE_EVENT_EXIT

On Sat, Jul 16, 2022 at 2:31 PM Eric W. Biederman <[email protected]> wrote:
>
> "Eric W. Biederman" <[email protected]> writes:
>
> > Keno Fischer <[email protected]> writes:
> >
> >> Hi Eric,
> >>
> >> On Fri, Jul 8, 2022 at 6:25 PM Eric W. Biederman <[email protected]> wrote:
> >>> > Recently I had a conversation where it was pointed out to me that
> >>> > SIGKILL sent to a tracee stropped in PTRACE_EVENT_EXIT is quite
> >>> > difficult for a tracer to handle.
> >>> >
> >>>
> >>> RR folks any comments?
> >>>
> >>> Did I properly understand what Keno Fischer was asking for when we
> >>> talked in person?
> >>
> >> Yes, this is indeed what I had in mind. I have not yet had the opportunity
> >> to try out your patch series (sorry), but from visual inspection, it does indeed
> >> do what I wanted, which is to make sure that a tracee stays in
> >> PTRACE_EVENT_EXIT for the tracer to inspect, even if there is another
> >> SIGKILL incoming simultaneously (since otherwise it may be impossible
> >> for the tracer to observe the PTRACE_EVENT_EXIT if two SIGKILLs
> >> come in rapid succession). I will try to take this series for a proper spin
> >> shortly.
> >
> > Thanks,
> >
> > I haven't yet figured out how to get the rr test suite to run
> > successfully. Something about my test machine and lack of perf counters
> > seems to be causing problems. So if you can perform the testing on your
> > side that would be fantastic.
>
> Ok. I finally found a machine where I can run rr and the rr test suite.
>
> It looks like there are a couple of the rr 5.5.0 test that fail on
> Linus's lastest kernel simply because of changes in kernel behavior. In
> particular clone_cleartid_coredump, and fcntl_rw_hints. The
> clone_cleartid_coredump appears to fail because SIGSEGV no longer kills
> all processes that share an mm. Which was a deliberate change.

Yeah, we changed to handle this in
https://github.com/rr-debugger/rr/commit/04bbacdbaba1cc496e92060014442bd1fd26b41d
and https://github.com/rr-debugger/rr/commit/1a3b389c2956e1844c0d07bf4297398bb6c561ea.

> With the lastest development version of rr, only detach_sigkill appears
> to be failing on Linus's latest. That failure appears to be independent
> of the patches in question as well. When run manually the
> detach_sigkill test succeeds so I am not quite certain what is going on,
> any thoughts?

If it fails before your changes I wouldn't worry about it too much,
there's been some other failures in that test lately.

- Kyle

> As for my patchset it looks like it does not cause any new test failures
> for rr so I will plan on getting it into linux-next shortly.
>
> Eric
>