2003-02-12 01:57:17

by Roland McGrath

[permalink] [raw]
Subject: another subtle signals issue

I have run across another issue brought up my signals changes. As it
stands now, a SIG_DFL signal whose default action is to do nothing will
deliver the signal and only inside get_signal_to_deliver decide to ignore
it. So there can be a TIF_SIGPENDING wakeup that results in no signal
being delivered and is intended to have no visible effect, in particular
POSIX semantics are that it won't cause an EINTR result from some random
blocking call. It had been my assumption that this would be ok, though it
really ought to be optimized away.

However, it's clearly not ok for sys_semop. It's plain from reading the
code that it will always return EINTR if it gets a signal wakeup, and that
indeed explains some failures we're seeing in the LSB testsuite on current
kernels.

We are testing the patch below. Please do not put this patch in. The
optimization should go in eventually, but this patch needs to be cleaned up
so we don't bother queuing and dequeuing the ignored signal, and to have
some comments. Moreover, I want to better understand what is going on
before we put the question to bed.

I think sys_semop would be closer to right if it used ERESTARTSYS instead
of EINTR. However, using this patch fixes some other LSB test problems I
hadn't yet figured out, as well as semop. The other affected tests involve
wait4 and read (n_tty), which do appear to use ERESTARTSYS in the
appropriate fashion. Thus I am confused as to why this should matter to
those cases at all, and suspect there is some other problem lurking.

The reason I am concerned about this is that I think any case that is
broken by the lack of the optimization in the patch below must also be
broken vis a vis the semantics of stop signals and SIGCONT (when SIG_DFL,
SIG_IGN, or blocked). POSIX says that when a process is stopped by
e.g. SIGSTOP, and then continued by SIGCONT, any functions that were in
progress at the time of stop are unaffected unless SIGCONT runs a handler.
That is, nobody returns EINTR because of the stop/continue.

In the semop tests, a SIGCHLD signal (when set to SIG_DFL, which means do
nothing) was what made semop return EINTR when it should have kept
blocking. I haven't fully investigated the other cases whose behavior was
affected, but the same thing seems reasonably likely. It seems to me that,
anything whose behavior is perturbed by waking up, dequeueing a SIGCHLD and
doing nothing about it in get_signal_to_deliver, and resuming, would also
be perturbed by waking up, dequeuing a SIGSTOP, stopping, be continued by
someone sending SIGCONT, dequeuing a SIGCONT, do nothing about it, and
resuming. So if the patch below fixes it, it's already a bug.

As I said, sys_semop is clearly wrong in its use of EINTR. Any use of
EINTR is suspect and perhaps ought to be ERESTARTSYS (which gets morphed
into EINTR when it's appropriate to return EINTR). But that does not
explain the other behaviors I saw change, so I wonder what I am missing.
I would appreciate any thoughts you have on this.

That mystery is my primary concern. But that aside, it further seems wrong
to use ERESTARTSYS in semop when there is a timeout involved. Functions
such as semop and nanosleep are specified to forget their timed blocks when
the run a signal handler. But I haven't seen anything in POSIX that allows
unexpired timeouts to be perturbing by stopping and continuing. Say you
have some program that blocks in semop with a timeout of a minute, you wait
30 seconds and then hit C-z and then fg, all in under 5 seconds--it should
be more than 25 seconds before semop returns EAGAIN, but less than 30, and
it should never return EINTR in this case. Right now, I think semop will
return EINTR as soon as you fg, which is wrong. But changing it to
ERESTARTSYS would make it start the timer at 60 seconds again after more
than 30 seconds had already ticked off. Any place that uses a timeout
probably needs to do an intelligent restart like nanosleep does, or else
do something highly questionable like call do_signal itself (since it wants
to bail in the handling case and can stay inside its loop otherwise).


Thanks,
Roland



--- /home/roland/redhat/linux-2.5.59-1.1007/kernel/signal.c.~2~ Sun Feb 9 03:58:57 2003
+++ /home/roland/redhat/linux-2.5.59-1.1007/kernel/signal.c Tue Feb 11 13:45:25 2003
@@ -730,6 +732,11 @@ specific_send_sig_info(int sig, struct s
if (LEGACY_QUEUE(&t->pending, sig))
return 0;

+ if (sig_kernel_ignore(sig) &&
+ t->sighand->action[sig-1].sa.sa_handler == SIG_DFL &&
+ !sigismember(&t->blocked, sig))
+ return 0;
+
ret = send_signal(sig, info, &t->pending);
if (!ret && !sigismember(&t->blocked, sig))
signal_wake_up(t, sig == SIGKILL);
@@ -862,6 +869,12 @@ __group_send_sig_info(int sig, struct si
p->signal->curr_target = t;
}

+ if (sig_kernel_ignore(sig) &&
+ p->sighand->action[sig-1].sa.sa_handler == SIG_DFL) {
+ rm_from_queue(sigmask(sig), &p->signal->shared_pending);
+ return 0;
+ }
+
/*
* Found a killable thread. If the signal will be fatal,
* then start taking the whole group down immediately.


2003-02-12 02:39:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Tue, 11 Feb 2003, Roland McGrath wrote:
>
> I think sys_semop would be closer to right if it used ERESTARTSYS instead
> of EINTR.

You probably mean ERESTARTSYSNOHAND.

There are lots of system calls that simply are not restartable. So
TIF_SIGPENDING in general should be set only if required, and not "because
it's easier".

> The reason I am concerned about this is that I think any case that is
> broken by the lack of the optimization in the patch below must also be
> broken vis a vis the semantics of stop signals and SIGCONT (when SIG_DFL,
> SIG_IGN, or blocked). POSIX says that when a process is stopped by
> e.g. SIGSTOP, and then continued by SIGCONT, any functions that were in
> progress at the time of stop are unaffected unless SIGCONT runs a handler.
> That is, nobody returns EINTR because of the stop/continue.

This is what ERESTARTNOHAND does, but quite often if you get interrupted
you have to return _partial_ results, which is quite inefficient and
sometimes breaks programs (ie you get things like a read() from a pipe
that returns a partial result because you resized the window, and a
SIGWINCH happened - and that is _bad_).

The old code tried rather hard to make signals that were truly ignored
(SIGSTOP/SIGCONT is not of that kind) be total non-events because of
things like this.

Linus

2003-02-12 02:58:20

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: another subtle signals issue

On Tue, Feb 11, 2003 at 06:45:19PM -0800, Linus Torvalds wrote:
>
> On Tue, 11 Feb 2003, Roland McGrath wrote:
> >
> > I think sys_semop would be closer to right if it used ERESTARTSYS instead
> > of EINTR.
>
> You probably mean ERESTARTSYSNOHAND.
>
> There are lots of system calls that simply are not restartable. So
> TIF_SIGPENDING in general should be set only if required, and not "because
> it's easier".
>
> > The reason I am concerned about this is that I think any case that is
> > broken by the lack of the optimization in the patch below must also be
> > broken vis a vis the semantics of stop signals and SIGCONT (when SIG_DFL,
> > SIG_IGN, or blocked). POSIX says that when a process is stopped by
> > e.g. SIGSTOP, and then continued by SIGCONT, any functions that were in
> > progress at the time of stop are unaffected unless SIGCONT runs a handler.
> > That is, nobody returns EINTR because of the stop/continue.
>
> This is what ERESTARTNOHAND does, but quite often if you get interrupted
> you have to return _partial_ results, which is quite inefficient and
> sometimes breaks programs (ie you get things like a read() from a pipe
> that returns a partial result because you resized the window, and a
> SIGWINCH happened - and that is _bad_).
>
> The old code tried rather hard to make signals that were truly ignored
> (SIGSTOP/SIGCONT is not of that kind) be total non-events because of
> things like this.

For things with a timeout, shouldn't they be converted to use
ERESTART_RESTARTBLOCK? The situation Roland is describing is just
about the same as the original problem with nanosleep.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2003-02-12 03:04:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Tue, 11 Feb 2003, Daniel Jacobowitz wrote:
>
> For things with a timeout, shouldn't they be converted to use
> ERESTART_RESTARTBLOCK? The situation Roland is describing is just
> about the same as the original problem with nanosleep.

The thing is, I don't think the case Roland is describing is the _real_
case.

The real case you want to look at is a simple pipe read. See fs/pipe.c,
pipe_read(), and grok it (or "pipe_write()", for that matter).

It should not return early for something like a SIGWINCH that is ignored.
Returning early literally breaks things like old versions of "tar" that
want full-sized reads and don't do internal blocking on their own etc.

Now, if a user does ^Z or somebody ptrace's you, we _have_ to return out
of the read(), and return a partial result. Fair enough. We'd prefer a
ptrace to not perturb the results at all, but that's just not possible
with the way tracing works. But there are signals that truly don't do
anything, and those we _can_ avoid causing partial reads. SIGWINCH is one
(very common) example.

I think this is even codified in POSIX, but if it isn't, I don't much
care: it's also a quality of implementation issue.

And the simple way to do this right is to not set TIF_SIGPENDING if you
don't have to.

Linus

2003-02-12 03:03:57

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> You probably mean ERESTARTSYSNOHAND.

Indeed, that's the right one for functions like semop that are specified in
1003.1-2001 explicitly to wake up when a signal was caught. I read the
SA_RESTART wording as not intending to apply to functions that say that
explicitly.

> There are lots of system calls that simply are not restartable.

POSIX permits partial results for cases like read and write. Those aside,
and leaving aside calls in uninterruptible sleeps (in which stops take
place only after the sleep wakes), POSIX would seem to require that they be
restarted when there is a job control stop and continue.

> > The reason I am concerned about this is that I think any case that is
> > broken by the lack of the optimization in the patch below must also be
> > broken vis a vis the semantics of stop signals and SIGCONT (when SIG_DFL,
> > SIG_IGN, or blocked). POSIX says that when a process is stopped by
> > e.g. SIGSTOP, and then continued by SIGCONT, any functions that were in
> > progress at the time of stop are unaffected unless SIGCONT runs a handler.
> > That is, nobody returns EINTR because of the stop/continue.
>
> This is what ERESTARTNOHAND does [...]

ERESTARTSYS and ERESTARTNOHAND only differ when a handler is run,
which is not the case I am talking about.

> The old code tried rather hard to make signals that were truly ignored
> (SIGSTOP/SIGCONT is not of that kind)

POSIX clearly specifies that stopping and continuing "shall not affect the
behavior of any function" (when SIGCONT is SIG_DFL or SIG_IGN, or is blocked).

2003-02-12 03:12:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Tue, 11 Feb 2003, Roland McGrath wrote:
>
> POSIX clearly specifies that stopping and continuing "shall not affect the
> behavior of any function" (when SIGCONT is SIG_DFL or SIG_IGN, or is blocked).

You just have to read it in a way that says "partial results are
permissible, and are part of the normal behaviour". And then the fact that
when ^Z happens you get partial results from pipes is not "different
behaviour" from a qualitative standpoint - even though in fact we'd get a
full result if the ^Z didn't happen.

Put another way: ^Z does perturb the exact details of a read from a pipe,
but it does not in any way change the _semantics_ of the read.

Linus

2003-02-12 03:13:33

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> I think this is even codified in POSIX, but if it isn't, I don't much
> care: it's also a quality of implementation issue.
>
> And the simple way to do this right is to not set TIF_SIGPENDING if you
> don't have to.

There is no argument about this. Dan and I are talking about real cases
that are definitely specified by POSIX, and you have not responded about them.

2003-02-12 03:32:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Tue, 11 Feb 2003, Roland McGrath wrote:
>
> There is no argument about this. Dan and I are talking about real cases
> that are definitely specified by POSIX, and you have not responded about them.

First off, realize that

- POSIX is less relevant than "what the 2.4.x" do. MUCH less. Binary
compatibility is very important, much more so than some paper standard.

- and even if you care 100% about POSIX, that still leaves the fact that
everybody who has ever implemented POSIX also did their "this is how we
differ" exception lists. It's part of the process.

With that out of the way, I think I _did_ respond about them: ^Z _will_
cause interruptible system calls to return EINTR/ERESTARTNOHAND or one of
the versions thereof. There are no ifs, buts and maybes about it. It will
happen. It's very fundamental to how Linux signal handling and job control
has always worked, and it's not even worth worrying about it (see above on
_why_ it's not worth worrying about).

Linus

2003-02-12 03:41:02

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> You just have to read it in a way that says "partial results are
> permissible, and are part of the normal behaviour". And then the fact that
> when ^Z happens you get partial results from pipes is not "different
> behaviour" from a qualitative standpoint - even though in fact we'd get a
> full result if the ^Z didn't happen.

I'm not talking about reading from pipes, that was your example. I was
talking about calls with timeouts, like semop, whose interface do not
permit partial results. Anyway, I find your reading insupportable even in
reference to read or write. read and write are explicitly specified to
return partial results when interrupted by a signal, and are not permitted
to do so otherwise. 1003.1-2001 2.4.4 defines "interrupted" in reference
only to signals that are caught.

2003-02-12 04:15:05

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Tue, 11 Feb 2003, Roland McGrath wrote:
>
> I'm not talking about reading from pipes, that was your example.

The reason I harp on pipes is that I remember something like that breaking
real programs in the past, which is a lot more important.

(Thinking about it, it can't have been pipe reading, it must have been
writing. Reading a pipe always returns partial results early if it is
successful. Writing doesn't. Trivial test-case appended, see what happens
when you ^Z it. And realize that more programs care about this than about
timed semaphore operations).

> I was
> talking about calls with timeouts, like semop, whose interface do not
> permit partial results.

And I have multiple times said that as far as Linux is concerned, ^Z is,
has always been, and certainly for the 2.6.x timeframe _will_ be a signal
that the kernel considers "caught".

Another way of saying it: just leave the system calls alone. If they
aren't restartable, they have to return -EINTR. And if you think that
disagrees with POSIX, and if RH really wants certification, then you
should advice RH to document the difference as just that - a difference.
And push it through as such.

Anyway, try out the following test-program in an xterm. Resize the xterm
and see what happens. Press ^Z and see what happens. I'm saying that the
^Z behaviour is something Linux always has done (and as such not something
you should worry about overmuch), while the SIGWINCH behaviour is new with
the new signal handling code and _is_ worrysome, since that can break old
programs (but I think your patch fixes it, so I'm not worried).

To recap:
- common functions are a lot more important than uncommon ones (which is
why the pipe thing is interesting)
- old behaviour is a lot more important than POSIX (existing binaries
have all been tested with old behaviour).

Considering that the SIGCHLD:SIG_IGN change to child reaping apparently
broke at least one existing program, I hope you can see my point. At some
point it just doesn't _matter_ what POSIX says.

Linus

---
#include <signal.h>
#include <stdio.h>

#define SIZE 131072

int main(void)
{
char buffer[SIZE];
int fd[2];

pipe(fd);
write(fd[1], buffer, SIZE);
}

2003-02-12 20:06:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


Btw, Roland, instead of your previous patch I would prefer something that
just makes "sig_ignored()" test the state of the signal better. Ie
something like the appended.

This should bring it much closer to the old code, which got this _right_.
For example, a signal that is blocked is _never_ ignored, since even if
the handler is SIG_IGN right now, it may not be by the time it is
unblocked.

I bet the blocked case accounts for some of the new test failures, while
the (SIG_DFL && sig_kernel_ignore(sig)) case will account for a few more.
In general I _think_ this should get us pretty much back to the old
behaviour.

At least this fixes the pipe write() / SIGWINCH testcase for me.

Linus

---
===== kernel/signal.c 1.69 vs edited =====
--- 1.69/kernel/signal.c Tue Feb 11 17:33:52 2003
+++ edited/kernel/signal.c Wed Feb 12 12:03:28 2003
@@ -141,14 +141,34 @@
(((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_DFL) && \
((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_IGN))

-#define sig_ignored(t, signr) \
- (!((t)->ptrace & PT_PTRACED) && \
- (t)->sighand->action[(signr)-1].sa.sa_handler == SIG_IGN)
-
#define sig_fatal(t, signr) \
(!T(signr, SIG_KERNEL_IGNORE_MASK|SIG_KERNEL_STOP_MASK) && \
(t)->sighand->action[(signr)-1].sa.sa_handler == SIG_DFL)

+static inline int sig_ignored(struct task_struct *t, int sig)
+{
+ void * handler;
+
+ /*
+ * Tracers always want to know about signals..
+ */
+ if (t->ptrace & PT_PTRACED)
+ return 0;
+
+ /*
+ * Blocked signals are never ignored, since the
+ * signal handler may change by the time it is
+ * unblocked.
+ */
+ if (sigismember(&t->blocked, sig))
+ return 0;
+
+ /* Is it explicitly or implicitly ignored? */
+ handler = t->sighand->action[sig-1].sa.sa_handler;
+ return handler == SIG_IGN ||
+ (handler == SIG_DFL && sig_kernel_ignore(sig));
+}
+
/*
* Re-calculate pending state from the set of locally pending
* signals, globally pending signals, and blocked signals.
@@ -642,7 +662,7 @@
* TIF_SIGPENDING
*/
state = TASK_STOPPED;
- if (!sigismember(&t->blocked, SIGCONT)) {
+ if (sig_user_defined(t, SIGCONT) && !sigismember(&t->blocked, SIGCONT)) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
state |= TASK_INTERRUPTIBLE;
}

2003-02-12 21:01:29

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> Btw, Roland, instead of your previous patch I would prefer something that
> just makes "sig_ignored()" test the state of the signal better. Ie
> something like the appended.

This should be fine (almost). POSIX leaves it unspecified whether a
blocked, ignored signal is left pending or not. The only thing it requires
is that setting a blocked signal to SIG_IGN clears any pending signal, and
sigaction already does that.

When I started hacking on this, I only looked at the old new code (i.e.
Ingo's thread group rewrite) not the old old code (i.e. 2.4). Everything
that was not wrong I left as it was, and bailing early on SIG_IGN signals
without regard to blocking is what that code did when I started with it.
Bailing early can avoid going through the loop and possibly perturbing the
load balancing state, so I had assumed Ingo did it intentionally as an
optimization.

Your patch as is won't fix the ignored-SIG_DFL-interrupts bug in the MT
case. That is, in __group_send_sig_info if P blocks the signal but some
other thread does not, then the that thread will get woken up and be
subject to all those problems. So you need another check in or after the
loop, or to move the sig_ignored use after the loop. After the loop or in
the loop exit condition, you can't get there if the signal is blocked on T,
so for that test you can omit the blocked and ptrace checks that your
sig_ignored does.

I don't see any other problems with doing this instead of just the checks
inserted my previous patch.


Enjoy,
Roland

2003-02-12 21:13:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Wed, 12 Feb 2003, Roland McGrath wrote:
>
> Your patch as is won't fix the ignored-SIG_DFL-interrupts bug in the MT
> case. That is, in __group_send_sig_info if P blocks the signal but some
> other thread does not, then the that thread will get woken up and be
> subject to all those problems.

Yeah. There's another issue too, which is the "preferred thread" thing. We
should probably _prefer_ threads that are interruptible as opposed to
threads that are in disk wait, the same way we prefer threads that are not
stopped. It might improve throughput.

Linus

2003-02-12 21:36:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: another subtle signals issue


On Wed, 12 Feb 2003, Roland McGrath wrote:
>
> This should be fine (almost). POSIX leaves it unspecified whether a
> blocked, ignored signal is left pending or not. The only thing it requires
> is that setting a blocked signal to SIG_IGN clears any pending signal, and
> sigaction already does that.

Hmm.. We could move the blocking test down, and only consider that for the
"SIG_DFL" case.

The function I did matches what the old signal code did, but the more
signals we can truly ignore, the better. I dunno.

Linus

2003-02-12 21:49:13

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> Yeah. There's another issue too, which is the "preferred thread" thing. We
> should probably _prefer_ threads that are interruptible as opposed to
> threads that are in disk wait, the same way we prefer threads that are not
> stopped. It might improve throughput.

I am really only concerned with the correctness issues, and don't have much
opinion on optimization choices like this. It think the tradeoffs on who
it give it to vs the complexity of the scan and such depend heavily on how
many threads you have and what they are doing.

2003-02-12 21:54:45

by Roland McGrath

[permalink] [raw]
Subject: Re: another subtle signals issue

> Hmm.. We could move the blocking test down, and only consider that for the
> "SIG_DFL" case.

That is the same resultant behavior as my last patch, though your patch
changed only as you've just described would leave the signal in the queue
though not wake anyone. That differs in practice only if someone calls
recalc_sigpending_tsk on some blocked threads for some reason.

> The function I did matches what the old signal code did, but the more
> signals we can truly ignore, the better. I dunno.

Once modified to work in the MT case the same as in the no-thread-groups case,
either way is fine by the spec and by me. Your call.