2007-12-08 01:12:39

by Roland McGrath

[permalink] [raw]
Subject: [PATCH] arch_ptrace_stop


This adds support to allow asm/ptrace.h to define two new macros,
arch_ptrace_stop_needed and arch_ptrace_stop. These control special
machine-specific actions to be done before a ptrace stop. The new
code compiles away to nothing when the new macros are not defined.
This is the case on all machines to begin with.

On ia64, these macros will be defined to solve the long-standing
issue of ptrace vs register backing store.

Signed-off-by: Roland McGrath <[email protected]>
CC: Petr Tesarik <[email protected]>
CC: Tony Luck <[email protected]>
---
include/linux/ptrace.h | 35 +++++++++++++++++++++++++++++++++++
kernel/signal.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index ae8146a..7168757 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -128,6 +128,41 @@ int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
#define force_successful_syscall_return() do { } while (0)
#endif

+#ifndef arch_ptrace_stop_needed
+/**
+ * arch_ptrace_stop_needed - Decide whether arch_ptrace_stop() should be called
+ * @code: current->exit_code value ptrace will stop with
+ * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with
+ *
+ * This is called with the siglock held, to decide whether or not it's
+ * necessary to release the siglock and call arch_ptrace_stop() with the
+ * same @code and @info arguments. It can be defined to a constant if
+ * arch_ptrace_stop() is never required, or always is. On machines where
+ * this makes sense, it should be defined to a quick test to optimize out
+ * calling arch_ptrace_stop() when it would be superfluous. For example,
+ * if the thread has not been back to user mode since the last stop, the
+ * thread state might indicate that nothing needs to be done.
+ */
+#define arch_ptrace_stop_needed(code, info) (0)
+#endif
+
+#ifndef arch_ptrace_stop
+/**
+ * arch_ptrace_stop - Do machine-specific work before stopping for ptrace
+ * @code: current->exit_code value ptrace will stop with
+ * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with
+ *
+ * This is called with no locks held when arch_ptrace_stop_needed() has
+ * just returned nonzero. It is allowed to block, e.g. for user memory
+ * access. The arch can have machine-specific work to be done before
+ * ptrace stops. On ia64, register backing store gets written back to user
+ * memory here. Since this can be costly (requires dropping the siglock),
+ * we only do it when the arch requires it for this particular stop, as
+ * indicated by arch_ptrace_stop_needed().
+ */
+#define arch_ptrace_stop(code, info) do { } while (0)
+#endif
+
#endif

#endif
diff --git a/kernel/signal.c b/kernel/signal.c
index afa4f78..2e85fcb 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1594,6 +1594,17 @@ static inline int may_ptrace_stop(void)
}

/*
+ * Return nonzero if there is a SIGKILL that should be waking us up.
+ * Called with the siglock held.
+ */
+static int sigkill_pending(struct task_struct *tsk)
+{
+ return ((sigismember(&tsk->pending.signal, SIGKILL) ||
+ sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) &&
+ !unlikely(sigismember(&tsk->blocked, SIGKILL)));
+}
+
+/*
* This must be called with current->sighand->siglock held.
*
* This should be the path for all ptrace stops.
@@ -1606,6 +1617,26 @@ static inline int may_ptrace_stop(void)
*/
static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info)
{
+ int killed = 0;
+
+ if (arch_ptrace_stop_needed(exit_code, info)) {
+ /*
+ * The arch code has something special to do before a
+ * ptrace stop. This is allowed to block, e.g. for faults
+ * on user stack pages. We can't keep the siglock while
+ * calling arch_ptrace_stop, so we must release it now.
+ * To preserve proper semantics, we must do this before
+ * any signal bookkeeping like checking group_stop_count.
+ * Meanwhile, a SIGKILL could come in before we retake the
+ * siglock. That must prevent us from sleeping in TASK_TRACED.
+ * So after regaining the lock, we must check for SIGKILL.
+ */
+ spin_unlock_irq(&current->sighand->siglock);
+ arch_ptrace_stop(exit_code, info);
+ spin_lock_irq(&current->sighand->siglock);
+ killed = sigkill_pending(current);
+ }
+
/*
* If there is a group stop in progress,
* we must participate in the bookkeeping.
@@ -1621,7 +1652,7 @@ static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info)
spin_unlock_irq(&current->sighand->siglock);
try_to_freeze();
read_lock(&tasklist_lock);
- if (may_ptrace_stop()) {
+ if (!unlikely(killed) && may_ptrace_stop()) {
do_notify_parent_cldstop(current, CLD_TRAPPED);
read_unlock(&tasklist_lock);
schedule();


2007-12-12 05:52:31

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

On Fri, 7 Dec 2007 17:11:52 -0800 (PST) Roland McGrath <[email protected]> wrote:

>
> This adds support to allow asm/ptrace.h to define two new macros,
> arch_ptrace_stop_needed and arch_ptrace_stop. These control special
> machine-specific actions to be done before a ptrace stop. The new
> code compiles away to nothing when the new macros are not defined.
> This is the case on all machines to begin with.
>
> On ia64, these macros will be defined to solve the long-standing
> issue of ptrace vs register backing store.
>
> Signed-off-by: Roland McGrath <[email protected]>
> CC: Petr Tesarik <[email protected]>
> CC: Tony Luck <[email protected]>
> ---
> include/linux/ptrace.h | 35 +++++++++++++++++++++++++++++++++++
> kernel/signal.c | 33 ++++++++++++++++++++++++++++++++-
> 2 files changed, 67 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
> index ae8146a..7168757 100644
> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -128,6 +128,41 @@ int generic_ptrace_pokedata(struct task_struct *tsk, long addr, long data);
> #define force_successful_syscall_return() do { } while (0)
> #endif
>
> +#ifndef arch_ptrace_stop_needed
> +/**
> + * arch_ptrace_stop_needed - Decide whether arch_ptrace_stop() should be called
> + * @code: current->exit_code value ptrace will stop with
> + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with
> + *
> + * This is called with the siglock held, to decide whether or not it's
> + * necessary to release the siglock and call arch_ptrace_stop() with the
> + * same @code and @info arguments. It can be defined to a constant if
> + * arch_ptrace_stop() is never required, or always is. On machines where
> + * this makes sense, it should be defined to a quick test to optimize out
> + * calling arch_ptrace_stop() when it would be superfluous. For example,
> + * if the thread has not been back to user mode since the last stop, the
> + * thread state might indicate that nothing needs to be done.
> + */
> +#define arch_ptrace_stop_needed(code, info) (0)
> +#endif
> +
> +#ifndef arch_ptrace_stop
> +/**
> + * arch_ptrace_stop - Do machine-specific work before stopping for ptrace
> + * @code: current->exit_code value ptrace will stop with
> + * @info: siginfo_t pointer (or %NULL) for signal ptrace will stop with
> + *
> + * This is called with no locks held when arch_ptrace_stop_needed() has
> + * just returned nonzero. It is allowed to block, e.g. for user memory
> + * access. The arch can have machine-specific work to be done before
> + * ptrace stops. On ia64, register backing store gets written back to user
> + * memory here. Since this can be costly (requires dropping the siglock),
> + * we only do it when the arch requires it for this particular stop, as
> + * indicated by arch_ptrace_stop_needed().
> + */
> +#define arch_ptrace_stop(code, info) do { } while (0)

Mutter. These would be better as static inlines. A macro just invites
variable-unused warnings on non-ia64 and outright compilation errors on
ia64. Speaking from experience...

static inline void arch_ptrace_stop(int exit_code, siginfo_t *info)
{
}
#define arch_ptrace_stop arch_ptrace_stop

should work?

> /*
> + * Return nonzero if there is a SIGKILL that should be waking us up.
> + * Called with the siglock held.
> + */
> +static int sigkill_pending(struct task_struct *tsk)
> +{
> + return ((sigismember(&tsk->pending.signal, SIGKILL) ||
> + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) &&
> + !unlikely(sigismember(&tsk->blocked, SIGKILL)));
> +}

Could you please take a peek at the infrastructure added by
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/add-lock_page_killable.patch
and see if there is exploitable commonality?

2007-12-12 22:37:20

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

> Mutter. These would be better as static inlines. A macro just invites
> variable-unused warnings on non-ia64 and outright compilation errors on
> ia64. Speaking from experience...
>
> static inline void arch_ptrace_stop(int exit_code, siginfo_t *info)
> {
> }
> #define arch_ptrace_stop arch_ptrace_stop
>
> should work?

That's fine with me. I just followed the example of e.g. arch_ptrace_attach.

> > /*
> > + * Return nonzero if there is a SIGKILL that should be waking us up.
> > + * Called with the siglock held.
> > + */
> > +static int sigkill_pending(struct task_struct *tsk)
> > +{
> > + return ((sigismember(&tsk->pending.signal, SIGKILL) ||
> > + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) &&
> > + !unlikely(sigismember(&tsk->blocked, SIGKILL)));
> > +}
>
> Could you please take a peek at the infrastructure added by
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/broken-out/add-lock_page_killable.patch
> and see if there is exploitable commonality?

I haven't reviewed that whole set in detail yet. So I'd rather leave this
for a later cleanup. (This ptrace change alone, along with its ia64
pieces, can probably go in much earlier without disrupting anything else.)

The fatal_signal_pending function in that patch is not exactly the same as
this check, though it might turn out that it's actually equivalent due to
higher-level constraints.


Thanks,
Roland

2007-12-13 17:28:41

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

Roland McGrath wrote:
>
> +static int sigkill_pending(struct task_struct *tsk)
> +{
> + return ((sigismember(&tsk->pending.signal, SIGKILL) ||
> + sigismember(&tsk->signal->shared_pending.signal, SIGKILL)) &&
> + !unlikely(sigismember(&tsk->blocked, SIGKILL)));
> +}

How is it possible that SIGKILL is blocked?

> static void ptrace_stop(int exit_code, int nostop_code, siginfo_t *info)
> {
> + int killed = 0;
> +
> + if (arch_ptrace_stop_needed(exit_code, info)) {
> + /*
> + * The arch code has something special to do before a
> + * ptrace stop. This is allowed to block, e.g. for faults
> + * on user stack pages. We can't keep the siglock while
> + * calling arch_ptrace_stop, so we must release it now.
> + * To preserve proper semantics, we must do this before
> + * any signal bookkeeping like checking group_stop_count.
> + * Meanwhile, a SIGKILL could come in before we retake the
> + * siglock. That must prevent us from sleeping in TASK_TRACED.
> + * So after regaining the lock, we must check for SIGKILL.
> + */
> + spin_unlock_irq(&current->sighand->siglock);
> + arch_ptrace_stop(exit_code, info);
> + spin_lock_irq(&current->sighand->siglock);
> + killed = sigkill_pending(current);
> + }
> +
> /*
> * If there is a group stop in progress,
> * we must participate in the bookkeeping.
> @@ -1604,7 +1635,7 @@ static void ptrace_stop(int exit_code, i
> spin_unlock_irq(&current->sighand->siglock);
> try_to_freeze();
> read_lock(&tasklist_lock);
> - if (may_ptrace_stop()) {
> + if (!unlikely(killed) && may_ptrace_stop()) {

Could you please explain this change in more details?

Currently ptrace_stop() schedules in TASK_TRACED state even if we have a
pending SIGKILL. With this patch this is still possible, but unless
arch_ptrace_stop_needed() is true and thus we will check sigkill_pending().

Suppose the task was SIGKILL'ed and does ptrace_notify(PTRACE_EVENT_EXIT),
now the resulting action depends on arch_ptrace_stop_needed().

I don't claim this is wrong, just trying to understand.

Thanks,

Oleg.

2007-12-13 17:31:47

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

On Thu, Dec 13, 2007 at 08:29:26PM +0300, Oleg Nesterov wrote:
> How is it possible that SIGKILL is blocked?

I *think* it's possible that kernel threads may block SIGKILL.
And I think init (pid 1) gets SIGKILL blocked.

--
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2007-12-13 17:39:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

On 12/13, Matthew Wilcox wrote:
>
> On Thu, Dec 13, 2007 at 08:29:26PM +0300, Oleg Nesterov wrote:
> > How is it possible that SIGKILL is blocked?
>
> I *think* it's possible that kernel threads may block SIGKILL.
> And I think init (pid 1) gets SIGKILL blocked.

Yes. But this shouldn't matter because we can't ptrace them. /sbin/init
doesn't block SIGKILL, but it can't be ptraced either.

Also, SIGKILL is blocked only if kthread was created by kernel_thread() and
then it does daemonize(). kthread_create()'ed kthreads don't block SIGKILL
but ignore it. The latter behaviour is more correct, but we can't do the same
for kernel_thread() threads.

Oleg.

2007-12-13 22:42:52

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

> How is it possible that SIGKILL is blocked?

It is probably not possible here. It's impossible for normal user threads,
and kernel threads or ones doing something weird inside the kernel can
probably never get here. But I'm just not trying to rely on any fancy
guarantees here. In this change I'm being conservative in just matching
the low-level logic that affects get_signal_to_deliver directly. I'd
prefer to get this in place now, where it's easy to be sure it's correct
vis a vis current behavior. Later on we can convince ourselves about the
correctness of simplifications to clean the logic up.

> Could you please explain this change in more details?

I'm glad to. I appreciate the review.

> Currently ptrace_stop() schedules in TASK_TRACED state even if we have a
> pending SIGKILL. With this patch this is still possible, but unless
> arch_ptrace_stop_needed() is true and thus we will check sigkill_pending().

Currently the siglock is always held throughout. The case this change
addresses is when no SIGKILL was already pending before we took the lock.
Currently, a new SIGKILL cannot come in until we've released the lock,
which is after we've set TASK_TRACED. The signal's sender will hold the
lock while checking each thread's state, waking up any in TASK_TRACED.

When arch_ptrace_stop_needed() is true, we release the siglock for an
unknown period (might block, etc). If a SIGKILL is sent there, it becomes
pending while we are in TASK_RUNNING or a normal blocked state. Next we
finish arch_ptrace_stop() and reacquire the siglock. Now entering
TASK_TRACED would leave us unkillable because SIGKILL is already pending
and nothing else (except PTRACE_CONT et al) will try to wake us up.

This was tested on ia64 by inserting an explicit sleep in arch_ptrace_stop
to simulate a very long block in a page fault, and then sending SIGKILL in
this interval. It was verified that this left the thread unkillable, and
that the sigkill_pending() check fixed that.

> Suppose the task was SIGKILL'ed and does ptrace_notify(PTRACE_EVENT_EXIT),
> now the resulting action depends on arch_ptrace_stop_needed().

For a SIGKILL that caused the exit to happen, then that SIGKILL was already
dequeued and is no longer pending by the time we get here, so nothing is
different.

For a new SIGKILL racing with an exit already in progress, then either it
comes before ptrace_notify(PTRACE_EVENT_EXIT) has taken the siglock, or
after it's released it (in ptrace_stop). If it comes after ptrace_stop
releases the siglock, then it wakes the thread up from that ptrace stop.
If it comes before ptrace_notify takes the siglock, then the thread stops
anyway with a pending SIGKILL. (That may be a problem, but it is not new.
We can discuss that case separately.)

The new third possibility is that it comes after ptrace_stop releases the
siglock to call arch_ptrace_stop. Then the new code will notice the
pending SIGKILL and skip the stop. The only way this differs from the
SIGKILL having arrived immediately after the stop is that the parent
SIGCHLD/wait notification with CLD_TRAPPED didn't happen. It's a race for
the parent even to notice that, since exit_code and the queued SIGCHLD's
siginfo_t will very quickly be replaced by the fresh notification sent for
the child's death.

> I don't claim this is wrong, just trying to understand.

I don't claim the logic is now perfect, just that this change is better for
the case it's intended for and not materially worse for others. We can
certainly do more to clean things up. But I don't want that complex
subject to delay the fixing and cleaning up of ia64 that this change
enables.


Thanks,
Roland

2007-12-16 16:23:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] arch_ptrace_stop

Roland, I am sorry for delay,

On 12/13, Roland McGrath wrote:
>
> > Currently ptrace_stop() schedules in TASK_TRACED state even if we have a
> > pending SIGKILL. With this patch this is still possible, but unless
> > arch_ptrace_stop_needed() is true and thus we will check sigkill_pending().
>
> Currently the siglock is always held throughout. The case this change
> addresses is when no SIGKILL was already pending before we took the lock.
> Currently, a new SIGKILL cannot come in until we've released the lock,
> which is after we've set TASK_TRACED. The signal's sender will hold the
> lock while checking each thread's state, waking up any in TASK_TRACED.
>
> When arch_ptrace_stop_needed() is true, we release the siglock for an
> unknown period (might block, etc). If a SIGKILL is sent there, it becomes
> pending while we are in TASK_RUNNING or a normal blocked state. Next we
> finish arch_ptrace_stop() and reacquire the siglock.

Yes, yes, I see.

> Now entering
> TASK_TRACED would leave us unkillable because SIGKILL is already pending
> and nothing else (except PTRACE_CONT et al) will try to wake us up.

But this doesn't differ from the case when SIGKILL was already pending when
we enter ptrace_stop, and arch_ptrace_stop_needed() == false, that was my
point.

Yes, arch_ptrace_stop() can take a long time, might block, etc. But what
about TIF_SYSCALL_TRACE ? The task can recieve SIGKILL while executing the
syscall which can also block and so on, but do_syscall_trace(entryexit == 1)
doesn't check the pending signal.

I should clarify my question. What I can't understand is the subtle dependency
on the result of arch_ptrace_stop_needed(). This means that it is hard to
predict the behaviour.

IOW, can't we

- ignore the pending SIGKILL (current behaviour)
-- OR --
- always check it unconditionally, before setting TASK_TRACED

? This looks a bit more consistent to me.

Please also note "before setting TASK_TRACED" above. With this patch we set
TASK_TRACED under ->siglock, and then change the ->state to TASK_RUNNING if
killed == 1. Minor, but this doesn't look correct, we can fool the tracer
which does ptrace_check_attach().

Oleg.