2006-03-05 01:26:14

by Matthew Grant

[permalink] [raw]
Subject: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

Problem is that new sys_rt_sigsuspend in kernel/signal.c in 2.6.16-rc2+
does not return EINTR.

System: Ubuntu i386 breezy badger.
Arch: i386
Kernel version: 2.6.16-rc5, gcc 3.3

This break the removable media handling of nautilus on Ubuntu Breezy
Badger, Gnome 2.12, Drives mount, but mount status tracking in GNOME is
broken, and they are not shown as mounted. What is going on is that
rt_sigsuspend() gets called as part of the external call to pmount-hal
to mount the device. The reads of the FIFOs and sockets between hald
and Nautilus get are not retried as rt_sigsuspend(2) does not return
EINTR.


Here is the strace output for 2.6.16-rc5 for nautilus when trying to
mount a drive (problem happening):


read(3, "\1\2m$\0\0\0\0]\0 \1\4\0\0\0\0\0\0\0$\344\1\0000\375\246"...,
32) = 32
access("/usr/bin/pmount-hal", X_OK) = 0
getuid32() = 1000
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
write(20, "`\230\217@\0\0\0\0\0\0\0\0\362\372i@\370}W\10\0\0\0\200"...,
148) = 148
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([]) = ? ERESTARTNOHAND (To be
restarted)
--- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
sigreturn() = ? (mask now [RTMIN])
gettimeofday({1141503088, 150962}, NULL) = 0
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN, revents=POLLIN},
{fd=8, events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14,
events=POLLIN}], 5, 0) = 1


Comparitive strace output under kernel 2.6.15.4:

poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14,
events=POLLIN}], 5, 0) = 0
gettimeofday({1141488494, 765282}, NULL) = 0
access("/usr/bin/pmount-hal", X_OK) = 0
getuid32() = 1000
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
write(20, "`\230\217@\0\0\0\0\0\0\0\0\362\372i@@\2U\10\0\0\0\200\0"...,
148) = 148
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([] <unfinished ...>
--- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
<... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
sigreturn() = ? (mask now [RTMIN])
gettimeofday({1141488494, 778060}, NULL) = 0
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8,
events=POLLIN|POLLPRI}, {fd=10, events=POLLIN|POLLPRI}, {fd=14,
events=POLLIN}], 5, 0) = 0
write(3, "*\2\3\0]\0 \1\344r\2\0+\0\1\0\22\0\7\0\\\0 \1\2\1\0\0\6"...,
44) = 44
ioctl(3, FIONREAD, [160]) =

I think David woodhouse may be responsible for this....

Regards,

Matthew Grant


--
Matthew Grant <[email protected]>
Matthew's UNIX Box


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-03-05 11:24:31

by David Woodhouse

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

On Sun, 2006-03-05 at 14:26 +1300, Matthew Grant wrote:
> Problem is that new sys_rt_sigsuspend in kernel/signal.c in 2.6.16-rc2+
> does not return EINTR.

It does for me -- try the trivial test case at
http://david.woodhou.se/sigsusptest.c

If you strace that under old and new kernels you'll see a difference in
the strace output, but it should be entirely cosmetic. The old code
would incestuously call do_signal() inside sys_rt_sigsuspend(), and
would never need to use the mechanism we have for restarting system
calls. Either it would know it delivered a signal and it would return
-EINTR, or it would know that it _didn't_, and it would loop for itself.
Now it behaves like all the other restartable syscalls, and ptrace will
actually see the -ERESTARTNOHAND return code which later gets converted
by the signal code either to -EINTR or to an actual restart, as
appropriate.

In short, I think what you've picked up on in the strace output is
entirely cosmetic, and shouldn't affect the behaviour of the program in
any way. In each case, it comes back from the signal and goes
immediately into gettimeofday() and then poll() -- it _has_ come out of
the sigsuspend(). You then find that poll() gives different results in
each case, and I'd be inclined to suspect that the _real_ change in
behaviour goes from that point.

> I think David woodhouse may be responsible for this....

I read lkml sporadically; usually better to Cc me when I'm to blame :)

--
dwmw2

2006-03-06 08:27:56

by Matthew Grant

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

David,

OK, a major piece of software is broken for mounting removable media. A
kernel upgrade from 2.6.15 SHOULDn't do that.

Could you please tell me where go I go next?

Thanks and Cheers,

Matthew Grant

On Sun, 2006-03-05 at 11:24 +0000, David Woodhouse wrote:
> On Sun, 2006-03-05 at 14:26 +1300, Matthew Grant wrote:
> > Problem is that new sys_rt_sigsuspend in kernel/signal.c in 2.6.16-rc2+
> > does not return EINTR.
>
> It does for me -- try the trivial test case at
> http://david.woodhou.se/sigsusptest.c
>
> If you strace that under old and new kernels you'll see a difference in
> the strace output, but it should be entirely cosmetic. The old code
> would incestuously call do_signal() inside sys_rt_sigsuspend(), and
> would never need to use the mechanism we have for restarting system
> calls. Either it would know it delivered a signal and it would return
> -EINTR, or it would know that it _didn't_, and it would loop for itself.
> Now it behaves like all the other restartable syscalls, and ptrace will
> actually see the -ERESTARTNOHAND return code which later gets converted
> by the signal code either to -EINTR or to an actual restart, as
> appropriate.
>
> In short, I think what you've picked up on in the strace output is
> entirely cosmetic, and shouldn't affect the behaviour of the program in
> any way. In each case, it comes back from the signal and goes
> immediately into gettimeofday() and then poll() -- it _has_ come out of
> the sigsuspend(). You then find that poll() gives different results in
> each case, and I'd be inclined to suspect that the _real_ change in
> behaviour goes from that point.
>
> > I think David woodhouse may be responsible for this....
>
> I read lkml sporadically; usually better to Cc me when I'm to blame :)
>
--
Matthew Grant <[email protected]>
Matthew's UNIX Box


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2006-03-06 10:47:03

by Andrew Morton

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

Matthew Grant <[email protected]> wrote:
>
> OK, a major piece of software is broken for mounting removable media. A
> kernel upgrade from 2.6.15 SHOULDn't do that.

Yes, this is a serious problem and we need to get to the bottom of it.

Could you please describe, in super-simple steps, how one should go about
reproducing it?

Also, the entire strace output for both good and bad kernels might be
useful. Am wondering what your fd #4 refers to.

A socket, I guess. It _might_ not be a signal or poll problem at all (but
it probably is, given the track record of those patches).

Do you have time to do a git-bisect?

2006-03-06 11:25:33

by Paul Mackerras

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

Andrew Morton writes:

> Matthew Grant <[email protected]> wrote:
> >
> > OK, a major piece of software is broken for mounting removable media. A
> > kernel upgrade from 2.6.15 SHOULDn't do that.
>
> Yes, this is a serious problem and we need to get to the bottom of it.

I have been looking at the equivalent code on powerpc. This looks to
me like we aren't calling do_signal on the way back out of the system
call to userspace on x86 when the _TIF_RESTORE_SIGMASK thread_info
flag is set. I had a look at the code in arch/i386/kernel and I can't
see why we wouldn't be getting to do_signal, but x86 assembly is not
my strong point.

The fact that userspace is seeing the -ERESTARTNOHAND return value
from the rt_sigsuspend strongly suggests that we aren't actually
calling do_signal, though.

Paul.

2006-03-06 11:30:21

by David Woodhouse

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

On Mon, 2006-03-06 at 22:25 +1100, Paul Mackerras wrote:
> I have been looking at the equivalent code on powerpc. This looks to
> me like we aren't calling do_signal on the way back out of the system
> call to userspace on x86 when the _TIF_RESTORE_SIGMASK thread_info
> flag is set. I had a look at the code in arch/i386/kernel and I can't
> see why we wouldn't be getting to do_signal, but x86 assembly is not
> my strong point.

You can't see it because it's not in assembly -- it's in
do_notify_resume() in signal.c :)

> The fact that userspace is seeing the -ERESTARTNOHAND return value
> from the rt_sigsuspend strongly suggests that we aren't actually
> calling do_signal, though.

That's just because we do the ptrace stop _before_ do_signal now, just
as we already did for all other restartable syscalls. The signal really
is happening -- you can see the sigreturn().

--
dwmw2

2006-03-06 11:31:11

by David Woodhouse

[permalink] [raw]
Subject: Re: PROBLEM: rt_sigsuspend() does not return EINTR on 2.6.16-rc2+

On Mon, 2006-03-06 at 21:28 +1300, Matthew Grant wrote:
> OK, a major piece of software is broken for mounting removable media.
> A kernel upgrade from 2.6.15 SHOULDn't do that.
>
> Could you please tell me where go I go next?

Your approach seemed fine -- strace it, look for important differences.
You were quite right to pick up on the difference in sigsuspend()
behaviour, but in reality it's going to be a false positive, so you
should keep looking for other differences.

And as Andrew says, give us enough information to reproduce for
ourselves -- and describe the real problem, because 'rt_sigsuspend()
does not return -EINTR' isn't it, although was a reasonable enough
assumption on your part.

--
dwmw2