2007-12-02 15:14:51

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 0/2] fix the long standing exec vs kill race

Depends on
[PATCH] __group_complete_signal: fix coredump with group stop race
http://marc.info/?l=linux-kernel&m=119653436116036

Needs review and testing.

Please comment, I think at least the idea is promising.

Oleg.


2007-12-02 16:59:19

by Simon Holm Thøgersen

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race


søn, 02 12 2007 kl. 18:14 +0300, skrev Oleg Nesterov:
> Depends on
> [PATCH] __group_complete_signal: fix coredump with group stop race
> http://marc.info/?l=linux-kernel&m=119653436116036
>
> Needs review and testing.
>
> Please comment, I think at least the idea is promising.
>
I have an issue that sounds related, but I might be completely off. I
would expect the simple attached program to keep receiving the same
signal, i.e. respond to
killall signal-exec -s SIGHUP

I tried your patches, but they didn't help.

Any ideas?


Simon Holm Thøgersen


Attachments:
signal-exec.c (469.00 B)

2007-12-02 17:17:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race

On 12/02, Simon Holm Th?gersen wrote:
>
> s??n, 02 12 2007 kl. 18:14 +0300, skrev Oleg Nesterov:
> >
> > Please comment, I think at least the idea is promising.
> >
> I have an issue that sounds related, but I might be completely off. I
> would expect the simple attached program to keep receiving the same
> signal, i.e. respond to
> killall signal-exec -s SIGHUP
>
> I tried your patches, but they didn't help.
>
> Any ideas?
>
>
> Simon Holm Th??gersen

> #include <signal.h>
> #include <stdio.h>
> #include <unistd.h>
>
> static char **argv_;
>
> static void handler(int signal)
> {
> printf("got signal %d\n", signal);
> execv(argv_[0], argv_);
> }
>
> int main(int argc, char *argv[])
> {
> printf("spawned\n");
> argv_ = argv;
> if (signal(SIGTERM, handler) == SIG_ERR)
> err(1, "could not set signal handler for SIGTERM");
> if (signal(SIGHUP, handler) == SIG_ERR)
> err(1, "could not set signal handler for SIGTERM");
> sleep(60);
> return 0;
> }
>

I think this is another issue which should be solved (?).

exec() from the signal handler doesn't do sys_sigreturn(), so we don't unblock
the signal, and it remains blocked after exec().

Hmm. Is this linux bug, or application bug?

Oleg.

2007-12-02 18:01:10

by Simon Holm Thøgersen

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race


søn, 02 12 2007 kl. 20:18 +0300, skrev Oleg Nesterov:
> On 12/02, Simon Holm Th?gersen wrote:
> >
> > s??n, 02 12 2007 kl. 18:14 +0300, skrev Oleg Nesterov:
> > >
> > > Please comment, I think at least the idea is promising.
> > >
> > I have an issue that sounds related, but I might be completely off. I
> > would expect the simple attached program to keep receiving the same
> > signal, i.e. respond to
> > killall signal-exec -s SIGHUP
> >
> > I tried your patches, but they didn't help.
> >
> > Any ideas?
> >
> >
> > Simon Holm Th??gersen
>
> > #include <signal.h>
> > #include <stdio.h>
> > #include <unistd.h>
> >
> > static char **argv_;
> >
> > static void handler(int signal)
> > {
> > printf("got signal %d\n", signal);
> > execv(argv_[0], argv_);
> > }
> >
> > int main(int argc, char *argv[])
> > {
> > printf("spawned\n");
> > argv_ = argv;
> > if (signal(SIGTERM, handler) == SIG_ERR)
> > err(1, "could not set signal handler for SIGTERM");
> > if (signal(SIGHUP, handler) == SIG_ERR)
> > err(1, "could not set signal handler for SIGTERM");
> > sleep(60);
> > return 0;
> > }
> >
>
> I think this is another issue which should be solved (?).
>
> exec() from the signal handler doesn't do sys_sigreturn(), so we don't unblock
> the signal, and it remains blocked after exec().
>
> Hmm. Is this linux bug, or application bug?

Good question. I haven't been able to find something in the
documentation for execve(2) and signal(2) saying it shouldn't be
possible, and it works on Solaris 10, so I'd say it is a Linux bug.
Actually, having another look at the documentation, signal(7) mentions
that POSIX.1-2003 requires that execve is safe to call from inside a
signal handler.


Simon Holm Thøgersen

2007-12-02 18:53:56

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race

On 12/02, Simon Holm Th?gersen wrote:
>
> s??n, 02 12 2007 kl. 20:18 +0300, skrev Oleg Nesterov:
> > On 12/02, Simon Holm Th?gersen wrote:
> > >
> > > I have an issue that sounds related, but I might be completely off. I
> > > would expect the simple attached program to keep receiving the same
> > > signal, i.e. respond to
> > > killall signal-exec -s SIGHUP
> > >
> > > I tried your patches, but they didn't help.
> > >
> > > Any ideas?
> > >
> > >
> > > Simon Holm Th??gersen
> >
> > > #include <signal.h>
> > > #include <stdio.h>
> > > #include <unistd.h>
> > >
> > > static char **argv_;
> > >
> > > static void handler(int signal)
> > > {
> > > printf("got signal %d\n", signal);
> > > execv(argv_[0], argv_);
> > > }
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > printf("spawned\n");
> > > argv_ = argv;
> > > if (signal(SIGTERM, handler) == SIG_ERR)
> > > err(1, "could not set signal handler for SIGTERM");
> > > if (signal(SIGHUP, handler) == SIG_ERR)
> > > err(1, "could not set signal handler for SIGTERM");
> > > sleep(60);
> > > return 0;
> > > }
> > >
> >
> > I think this is another issue which should be solved (?).
> >
> > exec() from the signal handler doesn't do sys_sigreturn(), so we don't unblock
> > the signal, and it remains blocked after exec().
> >
> > Hmm. Is this linux bug, or application bug?
>
> Good question. I haven't been able to find something in the
> documentation for execve(2) and signal(2) saying it shouldn't be
> possible, and it works on Solaris 10, so I'd say it is a Linux bug.

Well, as I said, I don't know what would be the right behaviour,

> Actually, having another look at the documentation, signal(7) mentions
> that POSIX.1-2003 requires that execve is safe to call from inside a
> signal handler.

... but this doesn't look very clear to me.

- Linux can perfectly exec from inside a signal handler

- the application should know that the signal is blocked when the handler runs

- exec should preserve the ->blocked mask

So, is this really buggy? Do we break the "execve should be signal-safe" rule?
I don't know, but our CC: list is good ;)

Oleg.

2007-12-02 20:27:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race



On Sun, 2 Dec 2007, Oleg Nesterov wrote:
>
> exec() from the signal handler doesn't do sys_sigreturn(), so we don't unblock
> the signal, and it remains blocked after exec().
>
> Hmm. Is this linux bug, or application bug?

I think that's an application bug.

The kernel does the obvious (and required) thing: it preserves the
list of blocked signals over the execve(). And if you call execve() from
within a signal handler, that list of blocked signals will obviously
include the signals that got blocked by the execution of the signal
itself.

(Side note: I also suspect that the program is not strictly POSIX
conforming, and that execve() isn't in the list of functions that are safe
to call from a signal handler in the first place, but that's a totally
separate issue).

So if havign the signal blocked isn't what the application wants, I'd
suggest one of:
- just set the signal mask by hand to whatever mask you want (perhaps
also marking the signal handler with SIGIGN or SIGDFL or whatever)
- alternatively, if you control the program being execve'd, just do it in
that progam instead.
- use siglongjmp in the signal handler to get out of the signal handler
context and do it that way.
- use a "sigatomic_t" flag, set it in the signal handler, and then do the
execve() in the main loop if it's set.

The last one is the safest one in many ways (since it doesn't care if you
get a hundred of those signals in close succession - and you could also
make it a counter or something if you want to actually count those
things).

Linus

2007-12-03 16:38:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race



On Sun, 2 Dec 2007, Oleg Nesterov wrote:
>
> Depends on
> [PATCH] __group_complete_signal: fix coredump with group stop race
> http://marc.info/?l=linux-kernel&m=119653436116036
>
> Needs review and testing.
>
> Please comment, I think at least the idea is promising.

It looks clean and sane to me, but I'm currently more worried about
2.6.24, and even the first patch this depends on (coredump/stop race)
makes me a bit nervous since all these things tend to have some rather
subtle interactions with other parts that depended on the exact semantics
of all the signal issues.

So my gut feel - considering that none of the problems involved here are
exactly new - is that this is good material for early in the 2.6.25 cycle.

But I think the whole series looks ok, and if people press me and convince
me it's (a) well tested and (b) needed early, then I guess it can be
pushed into 2.6.24.

Anybody?

Linus

2007-12-03 17:41:06

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 0/2] fix the long standing exec vs kill race

On 12/03, Linus Torvalds wrote:
>
> On Sun, 2 Dec 2007, Oleg Nesterov wrote:
> >
> > Depends on
> > [PATCH] __group_complete_signal: fix coredump with group stop race
> > http://marc.info/?l=linux-kernel&m=119653436116036
> >
> > Needs review and testing.
> >
> > Please comment, I think at least the idea is promising.
>
> It looks clean and sane to me, but I'm currently more worried about
> 2.6.24, and even the first patch this depends on (coredump/stop race)
> makes me a bit nervous since all these things tend to have some rather
> subtle interactions with other parts that depended on the exact semantics
> of all the signal issues.
>
> So my gut feel - considering that none of the problems involved here are
> exactly new - is that this is good material for early in the 2.6.25 cycle.
>
> But I think the whole series looks ok, and if people press me and convince
> me it's (a) well tested and (b) needed early, then I guess it can be
> pushed into 2.6.24.

No, no, I don't think this should be pushed into 2.6.24 (even the first patch).

These problems are very old afaics, and nobody complained so far.

Even if correct, this needs more testing. I don't think this can break exec
or coredump in some "obvious" way, but I'm afraid this can introduce new
races / corner cases.

<offtopic>

I hope that with the new meaning of ->group_exit_task we can re-introduce the
"coredump signal "freezes" the thread group at sender's side" property, but we
need some hack to do this. OTOH, it was always a hack.

</offtopic>

Oleg.