2007-06-06 05:47:55

by Satoru Takeuchi

[permalink] [raw]
Subject: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Hi,

If there is a multithread process which is waiting on restartable syscall
and ptraced, some threads may return from syscalls with a errno which should
never be seen by user programs when they receive SIGSTOP. It is not a rare
case beacuse strace send SIGSTOP to attached process on its exit (e.g. on
receiving SIGINT from terminal).

I found this problem on 2.6.22-rc3 and I also confirmed 2.6.22-rc4 has same
problem. Probably this bug is in generic signal code because this problem
occurs on both i386 box and ia64 box.

This bug is very easy to recreate and I don't know whether or not the problem
has some relation with the following bug which reported recently by Benjamin
Herrenschmidt.

http://lkml.org/lkml/2007/6/4/468

I executed this recreate program on 2.6.22-rc4 with the following Linus's
patch and this bug also occured.

http://lkml.org/lkml/2007/6/4/471

For more details, please refer to the attached recreate program.



BTW, I found one more strace related bug. I'll report it soon...

Thanks,
Satoru

-------------------------------------------------------------------------------
/*
* recreate-signal-mt-ptrace-bug-pipe - recreate a signal bug.
*
* ---------------------------------------------------------------------------
*
* Problem
* =======
*
* If there is a multithread process which is in restartable syscall and
* ptraced, some threads may return from syscalls with a errno which should
* never be seen by user programs when they receive SIGSTOP. It is not a
* rare case beacuse strace send SIGSTOP to attached process on its exit.
*
* How to recreate
* ===============
*
* 1. run this program
*
* $ ./recreate-signal-mt-ptrace-bug-pipe &
*
* 2. run strace and attach this program
*
* $ strace -f -p $!
*
* 3. C-c on terminal (*1)
*
* (*1) Directly send SIGSTOP to ./recreate-signal-mt-ptrace-bug-pipe is
* also OK
*
* Expected Result
* ===============
*
* All threads of this program was detached safely
*
* Actual Result
* =============
*
* Some threads may return from read() with ERESTARTSYS and print the
* following message.
*
* read() failed with errno 512
*
* Note
* ====
*
* This program can't always recreate a problem. However recreate
* possibility is very high.
*
*----------------------------------------------------------------------
*
* Copyright 2007 Satoru Takeuchi <[email protected]>
*
* This software may be used and distributed according to the terms
* of the GNU General Public License, incorporated herein by reference.
*
*/

#include <unistd.h>
#include <sys/types.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <err.h>

static int fd[2];

void *thread_fn(void *arg)
{
char c;

if (read(fd[0], &c, sizeof(char)) < 0)
err(EXIT_FAILURE, "read() failed with errno %d\n", errno);

return NULL;
}

#define NTHREAD 64

int main(int argc, char **argv)
{
pthread_t t[NTHREAD];
int i;

if (pipe(fd) < 0)
err(EXIT_FAILURE, "pipe() failed");

for (i = 0; i < NTHREAD; i++)
if (pthread_create(&t[i], NULL, thread_fn, NULL)) {
warn("pthread_create() failed\n");
exit(EXIT_FAILURE);
}

for (i = 0; i < NTHREAD; i++)
if (!pthread_join(t[i], NULL))
warn("pthread_join() failed");

exit(EXIT_SUCCESS);
}


2007-06-06 10:59:22

by Roland McGrath

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Oleg and I were just discussing this issue in relation to other problems.
We established that it is never safe to clear TIF_SIGPENDING on another
thread. But I hadn't really thought through that it's sometimes not safe
to clear your own TIF_SIGPENDING either. That is, any time you are not
positive you cannot be in a syscall that will return a -ERESTART* code.
(I had the ptrace_stop case lurking in the back of my mind but hadn't
considered how it would really come up.)

I have a general recollection of thinking that dequeue_signal could only
be called on current and that it mattered somehow. But aside from
avoiding recalc_sigpending, and kernel threads with notifier_mask set, I
can't see off hand what it is. I won't testify that I think signalfd is
necessarily on safe ground, though.


Thanks,
Roland

---

[PATCH] Restrict clearing TIF_SIGPENDING

This patch should get a few birds. It prevents sigaction calls from
clearing TIF_SIGPENDING in other threads, which could leak -ERESTART*.
It fixes ptrace_stop not to clear it, which done at the syscall exit
stop could leak -ERESTART*. It probably removes the harm from
signalfd, at least assuming it never calls dequeue_signal on kernel
threads that might have used block_all_signals.

Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 40 ++++++++++++++++++++++++----------------
1 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index acdfc05..dc5797c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -105,7 +105,11 @@ static int recalc_sigpending_tsk(struct
set_tsk_thread_flag(t, TIF_SIGPENDING);
return 1;
}
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
+ /*
+ * We must never clear the flag in another thread, or in current
+ * when it's possible the current syscall is returning -ERESTART*.
+ * So we don't clear it here, and only callers who know they should do.
+ */
return 0;
}

@@ -121,7 +125,9 @@ void recalc_sigpending_and_wake(struct t

void recalc_sigpending(void)
{
- recalc_sigpending_tsk(current);
+ if (!recalc_sigpending_tsk(current))
+ clear_thread_flag(TIF_SIGPENDING);
+
}

/* Given the mask, find the first available signal that should be serviced. */
@@ -385,7 +391,8 @@ int dequeue_signal(struct task_struct *t
}
}
}
- recalc_sigpending_tsk(tsk);
+ if (likely(tsk == current))
+ recalc_sigpending();
if (signr && unlikely(sig_kernel_stop(signr))) {
/*
* Set a marker that we have dequeued a stop signal. Our
@@ -1580,8 +1587,9 @@ static void ptrace_stop(int exit_code, i
/*
* Queued signals ignored us while we were stopped for tracing.
* So check for any that we should take before resuming user mode.
+ * This sets TIF_SIGPENDING, but never clears it.
*/
- recalc_sigpending();
+ recalc_sigpending_tsk(current);
}

void ptrace_notify(int exit_code)

2007-06-06 15:36:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos



On Wed, 6 Jun 2007, Roland McGrath wrote:
>
> [PATCH] Restrict clearing TIF_SIGPENDING
>
> This patch should get a few birds. It prevents sigaction calls from
> clearing TIF_SIGPENDING in other threads, which could leak -ERESTART*.
> It fixes ptrace_stop not to clear it, which done at the syscall exit
> stop could leak -ERESTART*. It probably removes the harm from
> signalfd, at least assuming it never calls dequeue_signal on kernel
> threads that might have used block_all_signals.

Ok, this one is more complex than my suggested one-liner, but seems to fix
another bug. And it's logic in dequeue_signal() is a bit prettier than
Ben's (otherwise somewhat similar) patch.

HOWEVER: I'm still a bit worried about the fact that we have about ~180
calls to "recalc_sigpending()" around the kernel, and I'm not AT ALL sure
that they are all supposed to possibly clear the TIF_SIGPENDING flag. The
fact that we had basically two independent bugs in this area really makes
me more convinced that we probably should clear TIF_SIGPENDING in the only
path that really matters, namely the one that _depends_ on it being right
(the "do_signal()" path).

So I think that the *right* place to clear TIF_SIGPENDING is actually in
"get_signal_to_deliver()", because that function is called _only_ by the
actual per-architecture "I'm going to deliver a signal now".

(Oh, and we do need to handle it in the "notifier()" case too - ugly hack
for DRM).

So how does *this* patch look? It:

- totally removes the "clear_tsk_thread_flag()" from the signal pending
recalculations (ie now both "recalc_sigpending()" _and_
"recalc_sigpending_tsk()" will ever only _set_ that bit)

- in get_signal_to_deliver(), it adds a

if (!recalc_sigpending_tsk(current))
d_flag(TIF_SIGPENDING);

at the end.

- it removes the "recalc_sigpending_tsk()" from dequeue_signal(), since
that can never clear the signal any more.

So now we should do "recalc_sigpending()" only when signals may be *added*
(where messing with the "blocked" mask obviously is a form of adding
signals, and possibly the most common reason for having to recalculate the
sigpending mask).

Comments? This patch is _entirely_ and utterly untested, so I'm only
saying that this "feels" safer and more correct to me.

Linus

---
kernel/signal.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index acdfc05..82b3c1a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -105,7 +105,6 @@ static int recalc_sigpending_tsk(struct task_struct *t)
set_tsk_thread_flag(t, TIF_SIGPENDING);
return 1;
}
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
return 0;
}

@@ -385,7 +384,6 @@ int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
}
}
}
- recalc_sigpending_tsk(tsk);
if (signr && unlikely(sig_kernel_stop(signr))) {
/*
* Set a marker that we have dequeued a stop signal. Our
@@ -1859,6 +1857,8 @@ relock:
do_group_exit(signr);
/* NOTREACHED */
}
+ if (!recalc_sigpending_tsk(current))
+ clear_thread_flag(TIF_SIGPENDING);
spin_unlock_irq(&current->sighand->siglock);
return signr;
}

2007-06-06 23:08:16

by Paul Mackerras

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Linus Torvalds writes:

> So I think that the *right* place to clear TIF_SIGPENDING is actually in
> "get_signal_to_deliver()", because that function is called _only_ by the
> actual per-architecture "I'm going to deliver a signal now".

I agree that's the right place for real user processes, but I note
that there are drivers that have kernel threads that do basically
this:

if (signal_pending(current))
dequeue_signal(current, ...);

for example, drivers/block/nbd.c, and obviously they don't want to
still see signal_pending(current) after they have dequeued all the
pending signals.

Paul.

2007-06-07 03:21:17

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Wed, 2007-06-06 at 03:59 -0700, Roland McGrath wrote:
> Oleg and I were just discussing this issue in relation to other problems.
> We established that it is never safe to clear TIF_SIGPENDING on another
> thread. But I hadn't really thought through that it's sometimes not safe
> to clear your own TIF_SIGPENDING either. That is, any time you are not
> positive you cannot be in a syscall that will return a -ERESTART* code.
> (I had the ptrace_stop case lurking in the back of my mind but hadn't
> considered how it would really come up.)

Ah, I missed that ptrace_stop() case indeed.

> I have a general recollection of thinking that dequeue_signal could only
> be called on current and that it mattered somehow.

I matter with that, and with the notifier thingy. It might matter for
other things as well, I don't know for sure yet though signalfd will
definitely cause it to be called for !current, though with my patch only
for shared signals.

> But aside from
> avoiding recalc_sigpending, and kernel threads with notifier_mask set, I
> can't see off hand what it is. I won't testify that I think signalfd is
> necessarily on safe ground, though.

Yeah, I'm a little worried too, the whole thing seems a bit fragile to
me.

I'm tempted to in fact make dequeue_signal() be the only one to every
clear TIF_SIGPENDING and only when tsk==current, but then, that does
mean there are a few cases, like when explicitely masking signals, where
we might end up with it spurriously set... thus causing spurrious -EINTR
returns for blocked signals.

I must admit at this point, it's becoming almost tempting to split it
into two different flags. TIF_SIGPENDING would then be the exact logic
"there is currently a non blocked signal pending" and could be cleared
at any time provided the siglock is help (which btw, is another source
of headaches if you grep ... people maniupate these things here or there
without the lock). And another one, TIF_SIGNALED, which would be set
always if TIG_SIGPENDING is set, and only ever cleared by
dequeue_signal, and causes the exception path to get into do_signal.

That would mean occasional spurrious trips in do_signal but that's
totally harmless and it wouldn't, I beleive, happen often enough to
impact performances significantly.

Cheers,
Ben.


2007-06-07 03:26:11

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Wed, 2007-06-06 at 08:35 -0700, Linus Torvalds wrote:
>
> So I think that the *right* place to clear TIF_SIGPENDING is actually in
> "get_signal_to_deliver()", because that function is called _only_ by the
> actual per-architecture "I'm going to deliver a signal now".

That was my initial idea but it has an issue with kernel
threads :-( There are in kernel thingies that use signals to a certain
extend and rely on being able to dequeue and/or clear sigpending (or
else they just loop instead of waiting in various loops).

A bit of a can of worms if you ask me.

Ben.


2007-06-07 03:27:45

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Wed, 2007-06-06 at 08:35 -0700, Linus Torvalds wrote:
> So now we should do "recalc_sigpending()" only when signals may be
> *added*
> (where messing with the "blocked" mask obviously is a form of adding
> signals, and possibly the most common reason for having to recalculate
> the
> sigpending mask).
>
> Comments? This patch is _entirely_ and utterly untested, so I'm only
> saying that this "feels" safer and more correct to me.

Oh and we still need to at least do the if (tsk == current) thingy
for the DRM notifier hack...

Ben.


2007-06-07 11:33:37

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Hi Linus,

At Wed, 6 Jun 2007 08:35:27 -0700 (PDT),
Linus Torvalds wrote:
> On Wed, 6 Jun 2007, Roland McGrath wrote:
> >
> > [PATCH] Restrict clearing TIF_SIGPENDING
> >
> > This patch should get a few birds. It prevents sigaction calls from
> > clearing TIF_SIGPENDING in other threads, which could leak -ERESTART*.
> > It fixes ptrace_stop not to clear it, which done at the syscall exit
> > stop could leak -ERESTART*. It probably removes the harm from
> > signalfd, at least assuming it never calls dequeue_signal on kernel
> > threads that might have used block_all_signals.
>
> Ok, this one is more complex than my suggested one-liner, but seems to fix
> another bug. And it's logic in dequeue_signal() is a bit prettier than
> Ben's (otherwise somewhat similar) patch.

I tested your patch and my problem didn't occur again, so it seems to my
this case at least. However sometimes another bug appears instead which
Roland regards it as strace bug.

http://lkml.org/lkml/2007/6/7/91

BTW, There seems to be many problem on signal code... I take a long at it
and detect more bugs (and of cource make patch if possible).

Thanks,

Satoru

2007-06-07 15:55:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos



On Thu, 7 Jun 2007, Satoru Takeuchi wrote:
>
> I tested your patch and my problem didn't occur again, so it seems to my
> this case at least.

Ok. I applied Roland's slightly bigger patch instead, since Ben and Paul
pointed out that some kernel threads depend on dequeue_signal clearing the
bit. You might want to test that one too (or just get current git), just
in case there are any differences in behaviour (not bloody likely, but
still..)

Thanks,

Linus

2007-06-07 22:25:23

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Thu, 2007-06-07 at 08:54 -0700, Linus Torvalds wrote:
>
> On Thu, 7 Jun 2007, Satoru Takeuchi wrote:
> >
> > I tested your patch and my problem didn't occur again, so it seems to my
> > this case at least.
>
> Ok. I applied Roland's slightly bigger patch instead, since Ben and Paul
> pointed out that some kernel threads depend on dequeue_signal clearing the
> bit. You might want to test that one too (or just get current git), just
> in case there are any differences in behaviour (not bloody likely, but
> still..)

Looks good to me. Do you think we need to do something about the DRM
notifier thingy too ? Or just let the code as-is, that is use the
notifier/mask (if any) of the task doing the dequeue_signal() ? I don't
see anybody insane enough to use signalfd read() with the DRM lock held
but who knows ...

Oh and Roland patch doesn't prevent signalfd() from stealing synchronous
signals such as SIGSEGV etc... which I think would result in random
behaviour.... do you want a patch for that or we just consider it broken
API usage and let it as it is ?

Cheers,
Ben.


2007-06-08 03:07:36

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Hi Linus,

At Thu, 7 Jun 2007 08:54:33 -0700 (PDT),
Linus Torvalds wrote:
>
>
>
> On Thu, 7 Jun 2007, Satoru Takeuchi wrote:
> >
> > I tested your patch and my problem didn't occur again, so it seems to my
> > this case at least.
>
> Ok. I applied Roland's slightly bigger patch instead, since Ben and Paul
> pointed out that some kernel threads depend on dequeue_signal clearing the
> bit. You might want to test that one too (or just get current git), just
> in case there are any differences in behaviour (not bloody likely, but
> still..)

I tested 2.6.22-rc4-git1 and it works fine.

Thanks,

Satoru

2007-06-08 03:21:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos



On Fri, 8 Jun 2007, Benjamin Herrenschmidt wrote:
>
> Looks good to me. Do you think we need to do something about the DRM
> notifier thingy too ?

No. I think that anybody who uses that gets whatever he deserves. It's
designed for one particular usage scenario (where it should work fine), if
you try to use it for something else across threads, you've already lost.

> Oh and Roland patch doesn't prevent signalfd() from stealing synchronous
> signals such as SIGSEGV etc... which I think would result in random
> behaviour.... do you want a patch for that or we just consider it broken
> API usage and let it as it is ?

I think Davide ack'ed your patch, but I also think that one clashes with
the one I actually ended up applying. If you were to send me the signalfd
part (tested, preferably), I'd apply it, considering that it seems to be
the right thign to do, and it already got acked by Davide.

Or did I get confused by some other patches flying around?

(Gaah, there's been _way_ too much patching going on today, I'm not happy
about how -rc4 seems to not be calming down)

Linus

2007-06-08 05:30:37

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Thu, 2007-06-07 at 20:18 -0700, Linus Torvalds wrote:
>
> > Oh and Roland patch doesn't prevent signalfd() from stealing synchronous
> > signals such as SIGSEGV etc... which I think would result in random
> > behaviour.... do you want a patch for that or we just consider it broken
> > API usage and let it as it is ?
>
> I think Davide ack'ed your patch, but I also think that one clashes with
> the one I actually ended up applying. If you were to send me the signalfd
> part (tested, preferably), I'd apply it, considering that it seems to be
> the right thign to do, and it already got acked by Davide.

Well, it's less urgent imho now that the real problems are fixed but
here it is, totally untested patch :-)
---

Don't let signalfd dequeue private signals off other threads

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---

Index: linux-work/kernel/signal.c
===================================================================
--- linux-work.orig/kernel/signal.c 2007-06-08 15:28:00.000000000 +1000
+++ linux-work/kernel/signal.c 2007-06-08 15:28:44.000000000 +1000
@@ -363,7 +363,13 @@ static int __dequeue_signal(struct sigpe
*/
int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
{
- int signr = __dequeue_signal(&tsk->pending, mask, info);
+ int signr = 0;
+
+ /* We only dequeue private signals from ourselves, we don't let
+ * signalfd steal them
+ */
+ if (tsk == current)
+ signr = __dequeue_signal(&tsk->pending, mask, info);
if (!signr) {
signr = __dequeue_signal(&tsk->signal->shared_pending,
mask, info);


2007-06-11 22:19:28

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos


Don't let signalfd dequeue private signals off other threads (in the
case of things like SIGILL or SIGSEGV, trying to do so would result
in undefined behaviour on who actually gets the signal, since they
are force unblocked).

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
---

Davide asked me to re-send this one. So here it is.


Index: linux-work/kernel/signal.c
===================================================================
--- linux-work.orig/kernel/signal.c 2007-06-08 15:28:00.000000000 +1000
+++ linux-work/kernel/signal.c 2007-06-08 15:28:44.000000000 +1000
@@ -363,7 +363,13 @@ static int __dequeue_signal(struct sigpe
*/
int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
{
- int signr = __dequeue_signal(&tsk->pending, mask, info);
+ int signr = 0;
+
+ /* We only dequeue private signals from ourselves, we don't let
+ * signalfd steal them
+ */
+ if (tsk == current)
+ signr = __dequeue_signal(&tsk->pending, mask, info);
if (!signr) {
signr = __dequeue_signal(&tsk->signal->shared_pending,
mask, info);


2007-06-13 15:15:29

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Sorry for delay, I was completely offline,

On 06/06, Roland McGrath wrote:
>
> [PATCH] Restrict clearing TIF_SIGPENDING
>
> This patch should get a few birds. It prevents sigaction calls from
> clearing TIF_SIGPENDING in other threads, which could leak -ERESTART*.
> It fixes ptrace_stop not to clear it, which done at the syscall exit
> stop could leak -ERESTART*. It probably removes the harm from
> signalfd, at least assuming it never calls dequeue_signal on kernel
> threads that might have used block_all_signals.
>
> Signed-off-by: Roland McGrath <[email protected]>
> ---
> kernel/signal.c | 40 ++++++++++++++++++++++++----------------
> 1 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index acdfc05..dc5797c 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -105,7 +105,11 @@ static int recalc_sigpending_tsk(struct
> set_tsk_thread_flag(t, TIF_SIGPENDING);
> return 1;
> }
> - clear_tsk_thread_flag(t, TIF_SIGPENDING);
> + /*
> + * We must never clear the flag in another thread, or in current
> + * when it's possible the current syscall is returning -ERESTART*.
> + * So we don't clear it here, and only callers who know they should do.
> + */
> return 0;
> }

This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING for
kernel threads. Otherwise we may have subtle failures if try_to_freeze_tasks()
fails.

Also, whith this change do_sigaction()->recalc_sigpending_and_wake() doesn't
make sense any longer, yes?

> @@ -385,7 +391,8 @@ int dequeue_signal(struct task_struct *t
> }
> }
> }
> - recalc_sigpending_tsk(tsk);
> + if (likely(tsk == current))
> + recalc_sigpending();

In theory, flush_signals(t) needs a similar change. However, it is always
called with t == current. Perhaps it makes sense to make it flush_signals(void) ?
Do you see any valid usage of flush_signals(t) when t != current ?

(Actually, imho the same is true for dequeue_signal(). Except for signalfd.c
dequeue_signal() should operate on current. Perhaps it would be a bit cleaner
to have dequeue_signal_tsk(tsk) and dequeue_signal(void), the latter does
recalc_sigpending).

Oleg.

2007-06-13 22:07:27

by Roland McGrath

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

> HOWEVER: I'm still a bit worried about the fact that we have about ~180
> calls to "recalc_sigpending()" around the kernel, and I'm not AT ALL sure
> that they are all supposed to possibly clear the TIF_SIGPENDING flag.

Most of those calls are in arch or compat code's sigreturn, sigsuspend, et
al functions, which are reasonably considered within the "private" realm of
the core signals code. Those aside, I count fewer than 40 other uses.
So maybe there is some hope of figuring this out and cleaning it up without
enormous amounts of effort. But those uses are not necessarily all trivial
to understand, which is why I was conservative.

allow_signal and disallow_signal do not touch anything that affects
recalc_sigpending, and by rights should not call it. But they used to
diddle the blocked set, and when we changed them we were afraid that some
of their callers might have subtle dependencies on recalc_sigpending having
been called. Off hand that seems unduly paranoid, but you never know.

The call in copy_process might be superfluous, but perhaps it can be
necessary if the thread was in syscall entry stop (TASK_TRACED) and so got
skipped for setting TIF_SIGPENDING (wants_signal) for a process-shared
signal, then entered fork.

The freezer stuff just makes me want to go hide in the corner. (It's
enough work already repressing the knowledge of what's happening when I
resume the laptop from suspend and my shells all report "Stopped" again
for the jobs already in job control stop.)

I can't tell off hand what's going on in sunrpc. I think it's only
dealing with its own kernel threads. Same goes for fs/lockd/clntproc.c.

nbd looks like some honest-to-goodness questionable diddling with user threads.
dlm too. autofs4 too, but wronger. 9p, ncpfs, saa5249 all seem
dubious to me, but it's pretty clear they do expect recalc_sigpending to
clear TIF_SIGPENDING and that's a primary thing they need done.

handle_vm86_trap must be something ancient that should have just been
using send_sigtrap.

arch/ia64/kernel/fsys.S seems to have recalc_sigpending open coded in
assembly, so heaven help them.

So from that quick survey, it does seem that recalc_sigpending itself
called directly is not often used for the kernel-thread situations. As
to current correctness, I'd say most of these calls do indeed depend on
it to clear TIF_SIGPENDING. Nearly all of its uses are at least somewhat
questionable and to me indicate that there should be some cleaner entry
points into the signals code that all these places can use to accomplish
what they need without having their fingers in the subtle internals of
signals data structures.


signalfd is a whole other can of worms, and I have not really looked into
it but would approach the whole area with caution.


Thanks,
Roland

2007-06-13 22:36:35

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Wed, 2007-06-13 at 19:15 +0400, Oleg Nesterov wrote:
>
> This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING
> for kernel threads. Otherwise we may have subtle failures if
> try_to_freeze_tasks() fails.

The freezer is crap... news at 11. Maybe a quick hack would be to let it
clear sigpending if tsk->mm == NULL but that's ugly. Note that there's
anything pretty about the freezer anyway...

> Also, whith this change do_sigaction()->recalc_sigpending_and_wake()
> doesn't make sense any longer, yes?

Well.. why was it _and_wake() in the first place anyway ? Or do I miss
something ? Why would we need to wake a thread for which we are removing
signals ?

What about something like:

do {
rm_from_queue_full(&mask, &t->pending);
- recalc_sigpending_and_wake(t);
t = next_thread(t);
} while (t != current);
+ recalc_sigpending();

> > @@ -385,7 +391,8 @@ int dequeue_signal(struct task_struct *t
> > }
> > }
> > }
> > - recalc_sigpending_tsk(tsk);
> > + if (likely(tsk == current))
> > + recalc_sigpending();
>
> In theory, flush_signals(t) needs a similar change. However, it is
> always
> called with t == current. Perhaps it makes sense to make it
> flush_signals(void) ?

Agreed.

> Do you see any valid usage of flush_signals(t) when t != current ?
>
> (Actually, imho the same is true for dequeue_signal(). Except for
> signalfd.c
> dequeue_signal() should operate on current. Perhaps it would be a bit
> cleaner
> to have dequeue_signal_tsk(tsk) and dequeue_signal(void), the latter
> does
> recalc_sigpending).

That's been part of the discussion so far ... so yes, maybe. I also
think dequeue_signal_tsk would then only dequeue shared signals... But
then, that means signalfd would have to do a if (tsk == current) to
know which one to call...

So at the end of the day, easier to test it inside dequeue_signal().

Ben.


2007-06-13 22:53:27

by Roland McGrath

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

> This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING for
> kernel threads. Otherwise we may have subtle failures if try_to_freeze_tasks()
> fails.

Ok.

> Also, whith this change do_sigaction()->recalc_sigpending_and_wake() doesn't
> make sense any longer, yes?

Yes. As we discussed at the time, it was my idea of the most conservative
fix to make it recalc_sigpending_and_wake there instead of just dropping
the recalc_sigpending_tsk call. That is, any subtle perturbations of
behavior from a place where TIF_SIGPENDING had been cleared before and now
no longer would be; either would be correct, but applications do not always
limit themselves to relying on the stated guarantees. But that was before
considering the -ERESTART* issue, which made me realize that we must rule
out clearing someone else's TIF_SIGPENDING across the board to prevent that
other class of actual bugs. So now there is no reason whatsoever to keep
the call in do_sigaction.

> In theory, flush_signals(t) needs a similar change. However, it is always
> called with t == current. Perhaps it makes sense to make it flush_signals(void) ?
> Do you see any valid usage of flush_signals(t) when t != current ?

I can't really imagine one, no. In fact, I don't think flush_signals is
something that really makes sense to have exported and used in the way that
it is.

> (Actually, imho the same is true for dequeue_signal(). Except for signalfd.c
> dequeue_signal() should operate on current. Perhaps it would be a bit cleaner
> to have dequeue_signal_tsk(tsk) and dequeue_signal(void), the latter does
> recalc_sigpending).

I would be happier if none of these things were exported as they are now
nor called directly from anywhere but the core signals code and kthread
setup code. If signalfd is a new core thing, it should be integrated well
with the signals code, rearranging internal calls as necessary to make that
clean.

The kernel thread uses of flush_signals seem to be just the simplified
variant of how kernel threads use dequeue_signal. If all you want is to
wake up an interruptible wait, you don't actually need to queue a signal.
You just need to set TIF_SIGPENDING on your kernel thread. With the recent
changes you can rely on no other thread ever clearing it, so it just needs
to clear it after waking up. We can support that with a simplified
interface so callers just use:

if (kthread_interrupted())
... bail out ...;

and:

kthread_interrupt(my_kthread);

Similarly, dequeue_signal is used by kernel threads just to get the signal
number and siginfo_t they were sent. I don't know how many of these uses
actually care what the signal was, or if it's ever a signal sent by some
normal kill call instead of an internal wakeup. Perhaps all they need is:

if (kthread_interrupted(&info)) {
... pay attention to info.si_signo et al ...
}

and perhaps:

kthread_interrupt(my_kthread, &info);

with NULL arguments for the simplified semantics of the first example.
This would get kernel thread code out of the business of knowing directly
about the siglock and all.


Thanks,
Roland

2007-06-13 23:02:18

by Roland McGrath

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

> The freezer is crap... news at 11. Maybe a quick hack would be to let it
> clear sigpending if tsk->mm == NULL but that's ugly. Note that there's
> anything pretty about the freezer anyway...

I think it might be made not too unreasonable by adding a TASK_FROZEN state.
But I am still persuaded by my "hide in the corner" plan.

> Well.. why was it _and_wake() in the first place anyway ? Or do I miss
> something ? Why would we need to wake a thread for which we are removing
> signals ?

The bug was about a case where recalc_sigpending_tsk would set
TIF_SIGPENDING when it hadn't been set before (wants_signal). It has
nothing to do with the rm_from_queue_full being done there. It's just a
violation of the necessary rule that when you set TIF_SIGPENDING on another
thread you better call signal_wake_up on it.

> What about something like:
>
> do {
> rm_from_queue_full(&mask, &t->pending);
> - recalc_sigpending_and_wake(t);
> t = next_thread(t);
> } while (t != current);
> + recalc_sigpending();

There is no need for the +, just the -. The calling thread is the one
where know there is certainly no perturbation of behavior due to leaving
TIF_SIGPENDING set rather than clearing it. It's just going to exit the
syscall and deal with signal state properly on the way out either way.
Doing recalc_sigpending is an unnecessary optimization of the corner case.

> So at the end of the day, easier to test it inside dequeue_signal().

Before completely revamping the whole set of entrypoints to be saner all
around, yes.


Thanks,
Roland

2007-06-13 23:19:22

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Wed, 2007-06-13 at 16:01 -0700, Roland McGrath wrote:
>
> > What about something like:
> >
> > do {
> > rm_from_queue_full(&mask, &t->pending);
> > - recalc_sigpending_and_wake(t);
> > t = next_thread(t);
> > } while (t != current);
> > + recalc_sigpending();
>
> There is no need for the +, just the -. The calling thread is the one
> where know there is certainly no perturbation of behavior due to leaving
> TIF_SIGPENDING set rather than clearing it. It's just going to exit the
> syscall and deal with signal state properly on the way out either way.
> Doing recalc_sigpending is an unnecessary optimization of the corner case.

Fair enough. I'll cook a patch for that one when I'm at work.

> > So at the end of the day, easier to test it inside dequeue_signal().
>
> Before completely revamping the whole set of entrypoints to be saner all
> around, yes.

btw, another interesting grep is to see how tweak TIF_SIGPENDING by
hand ... there's some scary bits in the tty code too...

Ben.


2007-06-14 00:03:27

by Roland McGrath

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

> btw, another interesting grep is to see how tweak TIF_SIGPENDING by
> hand ... there's some scary bits in the tty code too...

All I see there are the calls just added recently by Oleg, which are
correct and necessary to fix a bug. Wrapping it in something less magical
might be nice. It is the right thing to do in any place that is
introducing an -ERESTART* error code without having just checked signal_pending.


Thanks,
Roland

2007-06-14 12:20:44

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

Sorry for being late, I've just realized that you are discussing the freezer
here. ;-)

On Wednesday, 13 June 2007 17:15, Oleg Nesterov wrote:
> Sorry for delay, I was completely offline,
>
> On 06/06, Roland McGrath wrote:
> >
> > [PATCH] Restrict clearing TIF_SIGPENDING
> >
> > This patch should get a few birds. It prevents sigaction calls from
> > clearing TIF_SIGPENDING in other threads, which could leak -ERESTART*.
> > It fixes ptrace_stop not to clear it, which done at the syscall exit
> > stop could leak -ERESTART*. It probably removes the harm from
> > signalfd, at least assuming it never calls dequeue_signal on kernel
> > threads that might have used block_all_signals.
> >
> > Signed-off-by: Roland McGrath <[email protected]>
> > ---
> > kernel/signal.c | 40 ++++++++++++++++++++++++----------------
> > 1 files changed, 24 insertions(+), 16 deletions(-)
> >
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index acdfc05..dc5797c 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -105,7 +105,11 @@ static int recalc_sigpending_tsk(struct
> > set_tsk_thread_flag(t, TIF_SIGPENDING);
> > return 1;
> > }
> > - clear_tsk_thread_flag(t, TIF_SIGPENDING);
> > + /*
> > + * We must never clear the flag in another thread, or in current
> > + * when it's possible the current syscall is returning -ERESTART*.
> > + * So we don't clear it here, and only callers who know they should do.
> > + */
> > return 0;
> > }
>
> This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING for
> kernel threads. Otherwise we may have subtle failures if try_to_freeze_tasks()
> fails.

Well, the only code path in which we'd want to call cancel_freezing() for kernel
threads is when the freezing of kernel threads. However, this only happens if
one of the kernel threads declares itself as freezable and the fails to call
try_to_freeze(), which is a bug. Thus I don't think that we need to worry
about that case too much.

Moreover, I'm not sure that it's a good idea at all to send signals to kernel
threads from the freezer, since in fact we only need to wake them up to make
them call try_to_freeze() (after we've set TIF_FREEZE for them).

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth

2007-06-14 12:58:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On 06/14, Rafael J. Wysocki wrote:
>
> Sorry for being late, I've just realized that you are discussing the freezer
> here. ;-)

my fault, I was going to cc you but forgot, sorry!

> On Wednesday, 13 June 2007 17:15, Oleg Nesterov wrote:
>
> > > @@ -105,7 +105,11 @@ static int recalc_sigpending_tsk(struct
> > > set_tsk_thread_flag(t, TIF_SIGPENDING);
> > > return 1;
> > > }
> > > - clear_tsk_thread_flag(t, TIF_SIGPENDING);
> > > + /*
> > > + * We must never clear the flag in another thread, or in current
> > > + * when it's possible the current syscall is returning -ERESTART*.
> > > + * So we don't clear it here, and only callers who know they should do.
> > > + */
> > > return 0;
> > > }
> >
> > This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING for
> > kernel threads. Otherwise we may have subtle failures if try_to_freeze_tasks()
> > fails.
>
> Well, the only code path in which we'd want to call cancel_freezing() for kernel
> threads is when the freezing of kernel threads. However, this only happens if
> one of the kernel threads declares itself as freezable and the fails to call
> try_to_freeze(), which is a bug.

But this happens? We know a lot of reasons why try_to_freeze() can fail just
because some kthread waits for already frozen task.

> Thus I don't think that we need to worry
> about that case too much.

Well, we can have very subtle problems because a kernel thread may run with
TIF_SIGPENDING forever. This means in particualar that any wait_event_interruptible()
can't succeed. I think this is worse than explicit failure (like -ERESSTART... leak),
because it is hard to reproduce/debug.

> Moreover, I'm not sure that it's a good idea at all to send signals to kernel
> threads from the freezer, since in fact we only need to wake them up to make
> them call try_to_freeze() (after we've set TIF_FREEZE for them).

Yes! I completely agree.

Oleg.

2007-06-14 23:28:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Thursday, 14 June 2007 14:58, Oleg Nesterov wrote:
> On 06/14, Rafael J. Wysocki wrote:
> >
> > Sorry for being late, I've just realized that you are discussing the freezer
> > here. ;-)
>
> my fault, I was going to cc you but forgot, sorry!

No problem. :-)

> > On Wednesday, 13 June 2007 17:15, Oleg Nesterov wrote:
> >
> > > > @@ -105,7 +105,11 @@ static int recalc_sigpending_tsk(struct
> > > > set_tsk_thread_flag(t, TIF_SIGPENDING);
> > > > return 1;
> > > > }
> > > > - clear_tsk_thread_flag(t, TIF_SIGPENDING);
> > > > + /*
> > > > + * We must never clear the flag in another thread, or in current
> > > > + * when it's possible the current syscall is returning -ERESTART*.
> > > > + * So we don't clear it here, and only callers who know they should do.
> > > > + */
> > > > return 0;
> > > > }
> > >
> > > This breaks cancel_freezing(). Somehow we should clear TIF_SIGPENDING for
> > > kernel threads. Otherwise we may have subtle failures if try_to_freeze_tasks()
> > > fails.
> >
> > Well, the only code path in which we'd want to call cancel_freezing() for kernel
> > threads is when the freezing of kernel threads. However, this only happens if
> > one of the kernel threads declares itself as freezable and the fails to call
> > try_to_freeze(), which is a bug.
>
> But this happens? We know a lot of reasons why try_to_freeze() can fail just
> because some kthread waits for already frozen task.
>
> > Thus I don't think that we need to worry
> > about that case too much.
>
> Well, we can have very subtle problems because a kernel thread may run with
> TIF_SIGPENDING forever. This means in particualar that any wait_event_interruptible()
> can't succeed. I think this is worse than explicit failure (like -ERESSTART... leak),
> because it is hard to reproduce/debug.
>
> > Moreover, I'm not sure that it's a good idea at all to send signals to kernel
> > threads from the freezer, since in fact we only need to wake them up to make
> > them call try_to_freeze() (after we've set TIF_FREEZE for them).
>
> Yes! I completely agree.

Hmm, what about the appended patch, then?

I've tested it a bit on a UP system.

Greetings,
Rafael


---
include/linux/freezer.h | 9 -----
include/linux/sched.h | 12 +++++++
include/linux/wait.h | 6 +--
kernel/power/process.c | 76 ++++++++++++++++++++++++++++++++++--------------
4 files changed, 70 insertions(+), 33 deletions(-)

Index: linux-2.6.22-rc4-mm2/kernel/power/process.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/kernel/power/process.c 2007-06-15 01:05:33.000000000 +0200
+++ linux-2.6.22-rc4-mm2/kernel/power/process.c 2007-06-15 01:33:52.000000000 +0200
@@ -75,19 +75,67 @@ void refrigerator(void)
__set_current_state(save);
}

-static void freeze_task(struct task_struct *p)
+static void send_fake_signal(struct task_struct *p)
{
unsigned long flags;

+ if (p->state == TASK_STOPPED)
+ force_sig_specific(SIGSTOP, p);
+ spin_lock_irqsave(&p->sighand->siglock, flags);
+ signal_wake_up(p, p->state == TASK_STOPPED);
+ spin_unlock_irqrestore(&p->sighand->siglock, flags);
+}
+
+/**
+ * freeze_user_process - freeze user space process @p.
+ *
+ * Kernel threads should not have TIF_FREEZE set at this point, so we must
+ * ensure that either p->mm is not NULL *and* PF_BORROWED_MM is unset, or
+ * TIF_FRREZE is left unset. The task_lock() is necessary to prevent races
+ * with exit_mm() or use_mm()/unuse_mm() from occuring.
+ */
+static int freeze_user_process(struct task_struct *p)
+{
+ int ret = 1;
+
+ task_lock(p);
+ if (!p->mm || (p->flags & PF_BORROWED_MM)) {
+ ret = 0;
+ } else if (!freezing(p)) {
+ rmb();
+ if (!frozen(p)) {
+ set_freeze_flag(p);
+ task_unlock(p);
+ send_fake_signal(p);
+ return ret;
+ }
+ }
+ task_unlock(p);
+ return ret;
+}
+
+/**
+ * freeze_task - freeze taks @p, regardless of whether or not it is a
+ * user space process.
+ *
+ * The task_lock() is necessary to prevent races with use_mm()/unuse_mm()
+ * from occuring.
+ */
+static void freeze_task(struct task_struct *p)
+{
if (!freezing(p)) {
rmb();
if (!frozen(p)) {
set_freeze_flag(p);
- if (p->state == TASK_STOPPED)
- force_sig_specific(SIGSTOP, p);
- spin_lock_irqsave(&p->sighand->siglock, flags);
- signal_wake_up(p, p->state == TASK_STOPPED);
- spin_unlock_irqrestore(&p->sighand->siglock, flags);
+ task_lock(p);
+ /* We don't want to send signals to kernel threads */
+ if (p->mm && !(p->flags & PF_BORROWED_MM)) {
+ task_unlock(p);
+ send_fake_signal(p);
+ } else {
+ task_unlock(p);
+ wake_up_state(p, TASK_INTERRUPTIBLE);
+ }
}
}
}
@@ -125,22 +173,8 @@ static int try_to_freeze_tasks(int freez
cancel_freezing(p);
continue;
}
- /*
- * Kernel threads should not have TIF_FREEZE set
- * at this point, so we must ensure that either
- * p->mm is not NULL *and* PF_BORROWED_MM is
- * unset, or TIF_FRREZE is left unset.
- * The task_lock() is necessary to prevent races
- * with exit_mm() or use_mm()/unuse_mm() from
- * occuring.
- */
- task_lock(p);
- if (!p->mm || (p->flags & PF_BORROWED_MM)) {
- task_unlock(p);
+ if (!freeze_user_process(p))
continue;
- }
- freeze_task(p);
- task_unlock(p);
} else {
freeze_task(p);
}
Index: linux-2.6.22-rc4-mm2/include/linux/sched.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/sched.h 2007-06-15 01:05:33.000000000 +0200
+++ linux-2.6.22-rc4-mm2/include/linux/sched.h 2007-06-15 01:05:41.000000000 +0200
@@ -1797,6 +1797,18 @@ static inline void inc_syscw(struct task
}
#endif

+#ifdef CONFIG_PM
+static inline int freezing(struct task_struct *p)
+{
+ return unlikely(test_tsk_thread_flag(p, TIF_FREEZE));
+}
+#else
+static inline int freezing(struct task_struct *p)
+{
+ return 0;
+}
+#endif
+
#endif /* __KERNEL__ */

#endif
Index: linux-2.6.22-rc4-mm2/include/linux/freezer.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/freezer.h 2007-06-15 01:05:33.000000000 +0200
+++ linux-2.6.22-rc4-mm2/include/linux/freezer.h 2007-06-15 01:05:41.000000000 +0200
@@ -15,14 +15,6 @@ static inline int frozen(struct task_str
}

/*
- * Check if there is a request to freeze a process
- */
-static inline int freezing(struct task_struct *p)
-{
- return test_tsk_thread_flag(p, TIF_FREEZE);
-}
-
-/*
* Request that a process be frozen
*/
static inline void set_freeze_flag(struct task_struct *p)
@@ -128,7 +120,6 @@ static inline void set_freezable(void)

#else
static inline int frozen(struct task_struct *p) { return 0; }
-static inline int freezing(struct task_struct *p) { return 0; }
static inline void set_freeze_flag(struct task_struct *p) {}
static inline void clear_freeze_flag(struct task_struct *p) {}
static inline int thaw_process(struct task_struct *p) { return 1; }
Index: linux-2.6.22-rc4-mm2/include/linux/wait.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/include/linux/wait.h 2007-06-15 01:05:33.000000000 +0200
+++ linux-2.6.22-rc4-mm2/include/linux/wait.h 2007-06-15 01:05:41.000000000 +0200
@@ -240,7 +240,7 @@ do { \
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
if (condition) \
break; \
- if (!signal_pending(current)) { \
+ if (!signal_pending(current) && !freezing(current)) { \
schedule(); \
continue; \
} \
@@ -281,7 +281,7 @@ do { \
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
if (condition) \
break; \
- if (!signal_pending(current)) { \
+ if (!signal_pending(current) && !freezing(current)) { \
ret = schedule_timeout(ret); \
if (!ret) \
break; \
@@ -327,7 +327,7 @@ do { \
TASK_INTERRUPTIBLE); \
if (condition) \
break; \
- if (!signal_pending(current)) { \
+ if (!signal_pending(current) && !freezing(current)) { \
schedule(); \
continue; \
} \

2007-06-15 00:07:41

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Thu, 2007-06-14 at 16:58 +0400, Oleg Nesterov wrote:
>
> > Well, the only code path in which we'd want to call cancel_freezing() for kernel
> > threads is when the freezing of kernel threads. However, this only happens if
> > one of the kernel threads declares itself as freezable and the fails to call
> > try_to_freeze(), which is a bug.
>
> But this happens? We know a lot of reasons why try_to_freeze() can fail just
> because some kthread waits for already frozen task.

The freezer is a deadlock-o-matic by design. It's one of the major if
not the main reason why we don't use the code in kernel/power.c on
PowerBooks for suspend-to-ram. I don't like it and paulus hates it.

> Well, we can have very subtle problems because a kernel thread may run with
> TIF_SIGPENDING forever. This means in particualar that any wait_event_interruptible()
> can't succeed. I think this is worse than explicit failure (like -ERESSTART... leak),
> because it is hard to reproduce/debug.

Then just don't send signals to them...

> > Moreover, I'm not sure that it's a good idea at all to send signals to kernel
> > threads from the freezer, since in fact we only need to wake them up to make
> > them call try_to_freeze() (after we've set TIF_FREEZE for them).
>
> Yes! I completely agree.

Sounds like a plan...

Freezer problem 1/102893264 solved.

Ben.


2007-06-15 11:31:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On 06/15, Rafael J. Wysocki wrote:
>
> +static void freeze_task(struct task_struct *p)
> +{
> if (!freezing(p)) {
> rmb();
> if (!frozen(p)) {
> set_freeze_flag(p);
> - if (p->state == TASK_STOPPED)
> - force_sig_specific(SIGSTOP, p);
> - spin_lock_irqsave(&p->sighand->siglock, flags);
> - signal_wake_up(p, p->state == TASK_STOPPED);
> - spin_unlock_irqrestore(&p->sighand->siglock, flags);
> + task_lock(p);
> + /* We don't want to send signals to kernel threads */
> + if (p->mm && !(p->flags & PF_BORROWED_MM)) {
> + task_unlock(p);
> + send_fake_signal(p);
> + } else {
> + task_unlock(p);
> + wake_up_state(p, TASK_INTERRUPTIBLE);
> + }

I don't think this is enough. Note that recalc_sigpending() checks freezing().
So a kernel thread still can get TIF_SIGPENDING if it does recalc_sigpending().

> --- linux-2.6.22-rc4-mm2.orig/include/linux/wait.h 2007-06-15 01:05:33.000000000 +0200
> +++ linux-2.6.22-rc4-mm2/include/linux/wait.h 2007-06-15 01:05:41.000000000 +0200
> @@ -240,7 +240,7 @@ do { \
> prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> if (condition) \
> break; \
> - if (!signal_pending(current)) { \
> + if (!signal_pending(current) && !freezing(current)) { \
> schedule(); \
> continue; \
> } \

Personally, I think we should not modify wait_event_interruptible() and friends.
If a kernel thread wants to be frozen, it should take care about freezing()
itself.


OK, I guess I was too paranoid and you were right, it is better to ignore this
minor problem for now.

Oleg.

2007-06-15 21:41:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [BUG] ptraced process waiting on syscall may return kernel internal errnos

On Friday, 15 June 2007 13:31, Oleg Nesterov wrote:
> On 06/15, Rafael J. Wysocki wrote:
> >
> > +static void freeze_task(struct task_struct *p)
> > +{
> > if (!freezing(p)) {
> > rmb();
> > if (!frozen(p)) {
> > set_freeze_flag(p);
> > - if (p->state == TASK_STOPPED)
> > - force_sig_specific(SIGSTOP, p);
> > - spin_lock_irqsave(&p->sighand->siglock, flags);
> > - signal_wake_up(p, p->state == TASK_STOPPED);
> > - spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > + task_lock(p);
> > + /* We don't want to send signals to kernel threads */
> > + if (p->mm && !(p->flags & PF_BORROWED_MM)) {
> > + task_unlock(p);
> > + send_fake_signal(p);
> > + } else {
> > + task_unlock(p);
> > + wake_up_state(p, TASK_INTERRUPTIBLE);
> > + }
>
> I don't think this is enough. Note that recalc_sigpending() checks freezing().
> So a kernel thread still can get TIF_SIGPENDING if it does recalc_sigpending().

Yes, you're right, I have overlooked that.

Still, this can be prevented by changing recalc_sigpending_tsk() in the following way:

--- linux-2.6.22-rc4.orig/kernel/signal.c 2007-06-07 00:01:48.000000000 +0200
+++ linux-2.6.22-rc4/kernel/signal.c 2007-06-15 21:22:00.000000000 +0200
@@ -99,13 +99,13 @@ static inline int has_pending_signals(si
static int recalc_sigpending_tsk(struct task_struct *t)
{
if (t->signal->group_stop_count > 0 ||
- (freezing(t)) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
return 1;
}
- clear_tsk_thread_flag(t, TIF_SIGPENDING);
+ if (!freezing(t))
+ clear_tsk_thread_flag(t, TIF_SIGPENDING);
return 0;
}

> > --- linux-2.6.22-rc4-mm2.orig/include/linux/wait.h 2007-06-15 01:05:33.000000000 +0200
> > +++ linux-2.6.22-rc4-mm2/include/linux/wait.h 2007-06-15 01:05:41.000000000 +0200
> > @@ -240,7 +240,7 @@ do { \
> > prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
> > if (condition) \
> > break; \
> > - if (!signal_pending(current)) { \
> > + if (!signal_pending(current) && !freezing(current)) { \
> > schedule(); \
> > continue; \
> > } \
>
> Personally, I think we should not modify wait_event_interruptible() and friends.
> If a kernel thread wants to be frozen, it should take care about freezing()
> itself.

Yes, I agree, but wanted to get a working patch quickly. ;-)

I think we might define freezer-friendly versions of wait_event_interruptible()
and friends in <linux/freezer.h>, like this:

+#define wait_event_interruptible_ff(wq, condition) \
+({ \
+ int __ret = 0; \
+ if (!(condition) && !freezing(current)) \
+ __wait_event_interruptible(wq, \
+ (condition) || freezing(current), \
+ __ret); \
+ if (!(condition) && freezing(current)) \
+ __ret = -ERESTARTSYS; \
+ __ret; \
+})

and make the freezable kernel threads use them instead of the ones defined
in <linux/wait.h>. There are only a few threads that need that, BTW.

> OK, I guess I was too paranoid and you were right, it is better to ignore this
> minor problem for now.

Okay, but I still think we shouldn't send fake signals to kernel threads. :-)

I'd like to revisit it after 2.6.22 is out, if you don't mind.

Greetings,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth