Hello Roland,
I'm seeing a strange behaviour with waitid() in 2.6.12-rc3
(I'm fairly sure the behaviour also appears in other 2.6.x).
If we establish a handler for a signal with SA_RESTART, and that
signal interrupts a blocked waitid(), then waitid() fails with
the error EINVAL. I would expect the call to be restarted
(like wait(), waitpid(), etc.) or if you are choosing to design
waitid() to ignore SA_RESTART, then fail with EINTR.
I haven't been able to spot the source of the EINVAL in kernel
or glibc sources.
The program below can be used to demonstrate the problem.
Cheers,
Michael
/* waitid_intr.c
Michael Kerrisk, May 2005
Demonstration of a problem with waitid() on Linux (Linux 2.6.12-rc3).
If S_A_RESTART was specified for a signal handler, and that signal
is delivered during a blocked waitid(), then the waitid() fails
with EINVAL, when it should either be restarted or (if the intent
is to ignore SA_RESTART in the waitid() implementation) fail
with EINTR. This program demonstrates the problem.
The program creates two children. The first is a child that sleeps
for argv[2] seconds and is then reaped by waitid() in the parent.
The second child sends a signal (SIG) to the parent after argv[3]
seconds.
argv[1] specifies options for the waitid() call (see the code
below).
If the optional argv[4] is present (as any string), then the
handler for the signal sent by the second child is established
with SA_RESTART.
To demonstrate the waitid() problem, use the following command:
./waitid_intr esc 10 2 restart
*/
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <string.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <syscall.h>
#include <unistd.h>
#include <errno.h>
#define errExit(msg) { perror(msg); exit(EXIT_FAILURE); }
#define fatalErr(msg) { fprintf(stderr, "%s\n", msg); \
exit(EXIT_FAILURE); }
static void
handler(int sig)
{
int savedErrno;
savedErrno = errno;
printf("Caught signal\n");
errno = savedErrno;
} /* handler */
#define SIG SIGUSR1
int
main(int argc, char *argv[])
{
siginfo_t info;
int options, s;
char *p;
pid_t pid;
struct rusage ru;
struct sigaction sa;
setbuf(stdout, NULL);
printf("Parent's PID is %ld\n", (long) getpid());
if (argc < 4) {
fprintf(stderr, "%s {escHW} child-secs sig-secs [sa_restart]\n",
argv[0]);
exit(EXIT_FAILURE);
}
options = 0;
for (p = argv[1]; *p != '\0'; p++) {
if (*p == 'e') options |= WEXITED;
else if (*p == 's') options |= WSTOPPED;
else if (*p == 'c') options |= WCONTINUED;
else if (*p == 'H') options |= WNOHANG;
else if (*p == 'W') options |= WNOWAIT;
else fatalErr("Bad option letter");
}
/* Create first child to waitid() for */
pid = fork();
if (pid == -1) errExit("fork");
if (pid == 0) {
printf("child (PID = %ld) started\n", (long) getpid());
sleep(atoi(argv[2]));
printf("child (PID = %ld) finished\n", (long) getpid());
exit(EXIT_SUCCESS);
}
/* Parent falls through */
/* Set up handler for signal sent by child 2 */
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = (argc > 4) ? SA_RESTART : 0;
if (sigaction(SIG, &sa, NULL) == -1) errExit("sigaction");
/* Create a second child that will send parent a signal */
switch (fork()) {
case -1:
errExit("fork-2");
case 0:
sleep(atoi(argv[3]));
printf("Sending signal to parent\n");
if (kill(getppid(), SIG) == -1) errExit("kill");
exit(EXIT_SUCCESS);
default:
break;
}
/* Parent now waits for the first child */
memset(&info, 0, sizeof(siginfo_t));
s = waitid(P_PID, pid, &info, options);
//s = syscall(SYS_waitid, P_ALL, 0, &info, options, &ru);
//s = syscall(SYS_waitpid, -1, NULL, 0);
if (s == -1) errExit("waitid");
printf("waitid() returned %d\n", s);
printf(" ");
printf("si_pid=%ld; ", (long) info.si_pid);
printf("si_uid=%ld; ", (long) info.si_uid);
printf("si_signo=%ld; ", (long) info.si_signo);
printf("\n");
printf(" ");
printf("si_status=%ld; ", (long) info.si_status);
printf("si_code=%ld ", (long) info.si_code);
printf("(%s); ",
(info.si_code == CLD_EXITED) ? "CLD_EXITED" :
(info.si_code == CLD_KILLED) ? "CLD_KILLED" :
(info.si_code == CLD_DUMPED) ? "CLD_DUMPED" :
(info.si_code == CLD_TRAPPED) ? "CLD_TRAPPED" :
(info.si_code == CLD_STOPPED) ? "CLD_STOPPED" :
(info.si_code == CLD_CONTINUED) ? "CLD_CONTINUED" :
"????");
printf("\n");
} /* main */
--
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl
Your test works fine on FC4 kernels (which are pretty current),
but it does hit the problem on my vanilla build. strace output suggests
something is clobbering two registers:
waitid(P_PID, 1740, Sending signal to parent
0xbfc8b198, WEXITED|WSTOPPED|WCONTINUED, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
write(1, "Caught signal", 13Caught signal) = 13
write(1, "\n", 1
) = 1
sigreturn() = ? (mask now [])
waitid(0x6cc /* P_??? */, 14, 0xbfc8b198, 0, NULL) = -1 EINVAL (Invalid argument)
I will look into it further.
Thanks,
Roland
Other machines are subject to the same risk and should have a
prevent_tail_call definition too. The asm-i386/linkage.h version probably
works fine for every machine. It might as well be generic, I'd say.
We fixed this bug before, but it didn't take. It may have been the case
that the problem was first noticed to occur in a CONFIG_REGPARM compile.
But it's not regparm functions that need not to make tail calls, it's
asmlinkage functions called with a user pt_regs frame on the stack
supplying their arguments. prevent_tail_call probably doesn't do anything
at all in regparm functions (your argument registers are going to be
clobbered, period). It was a braino to conditionalize that definition in
the first place.
Signed-off-by: Roland McGrath <[email protected]>
---
commit 20da6762d8c1921800686ce31b1563e3ac24880d
tree cffaf6d5044bb44f13bb540d750f1e44562901e4
parent 3b54f47d661b933498f0709e5ce215d0f285e928
author Roland McGrath <[email protected]> Thu, 26 May 2005 15:19:26 -0700
committer Roland McGrath <[email protected]> Thu, 26 May 2005 15:19:26 -0700
asm-i386/linkage.h | 4 +---
1 files changed, 1 insertion(+), 3 deletions(-)
Index: include/asm-i386/linkage.h
===================================================================
--- a2dc8eec0e30ffa0f39d2f835e018ee86387ff61/include/asm-i386/linkage.h (mode:100644)
+++ cffaf6d5044bb44f13bb540d750f1e44562901e4/include/asm-i386/linkage.h (mode:100644)
@@ -5,9 +5,7 @@
#define FASTCALL(x) x __attribute__((regparm(3)))
#define fastcall __attribute__((regparm(3)))
-#ifdef CONFIG_REGPARM
-# define prevent_tail_call(ret) __asm__ ("" : "=r" (ret) : "0" (ret))
-#endif
+#define prevent_tail_call(ret) __asm__ ("" : "=r" (ret) : "0" (ret))
#ifdef CONFIG_X86_ALIGNMENT_16
#define __ALIGN .align 16,0x90
From: Roland McGrath <[email protected]>
Date: Thu, 26 May 2005 15:22:17 -0700
> Other machines are subject to the same risk and should have a
> prevent_tail_call definition too. The asm-i386/linkage.h version probably
> works fine for every machine. It might as well be generic, I'd say.
Sparc, for one, doesn't need it. We pass the pt_regs in via a pointer
to the trap level stack frame which won't be released by a tail-call
in C code.
On Thu, 26 May 2005, David S. Miller wrote:
>
> > Other machines are subject to the same risk and should have a
> > prevent_tail_call definition too. The asm-i386/linkage.h version probably
> > works fine for every machine. It might as well be generic, I'd say.
>
> Sparc, for one, doesn't need it. We pass the pt_regs in via a pointer
> to the trap level stack frame which won't be released by a tail-call
> in C code.
x86 has largely tried to move in that direction too, ie a lot of the
asm-calls have been turned into FASTCALL() with %eax pointing to the
stack.
Roland, I applied the patch, but if there was some particular case that
triggered this, maybe it's worth trying to re-write that one.
Linus
> x86 has largely tried to move in that direction too, ie a lot of the
> asm-calls have been turned into FASTCALL() with %eax pointing to the
> stack.
>
> Roland, I applied the patch, but if there was some particular case that
> triggered this, maybe it's worth trying to re-write that one.
It's a danger for any system call. Here it was sys_waitid.
On Thursday 26 May 2005 19:22, Roland McGrath wrote:
> > x86 has largely tried to move in that direction too, ie a lot of the
> > asm-calls have been turned into FASTCALL() with %eax pointing to the
> > stack.
> >
> > Roland, I applied the patch, but if there was some particular case that
> > triggered this, maybe it's worth trying to re-write that one.
>
> It's a danger for any system call. Here it was sys_waitid.
On X86_64 the given program fails with Operation Not Supported error on
waitid() - both 32bit and 64bit modes. The problem seems to be WCONTINUED -
is it unimplemented on X86_64?
You are using the test with an old glibc that doesn't know about the waitid
systme call. Use strace to see what system calls are actually being made.
On Thursday 26 May 2005 20:11, Roland McGrath wrote:
> You are using the test with an old glibc that doesn't know about the waitid
> systme call. Use strace to see what system calls are actually being made.
I am using the current version of glibc - 2.3.4 - and man pages seem to know
about the waitid sys call and WCONTINUED(Since Linux 2.6.10). Also waitid
succeeds if I remove WCONTINUED flag, so it would seem that WCONTINUED is
not supported on X86_64 - I am running a 2.6.11-gentoo kernel.
Here is the strace output -
--------------------------------------------------------------------------
strace ./a.out esc 10 2 restart
execve("./a.out", ["./a.out", "esc", "10", "2", "restart"], [/* 51 vars */]) =
0
uname({sys="Linux", node="tux-gentoo", ...}) = 0
brk(0) = 0x502000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaac0000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=91598, ...}) = 0
mmap(NULL, 91598, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2aaaaaac1000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\310\1\0"..., 640) =
640
lseek(3, 64, SEEK_SET) = 64
read(3, "\6\0\0\0\5\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0"..., 616) =
616
lseek(3, 680, SEEK_SET) = 680
read(3, "\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0"..., 32) = 32
fstat(3, {st_mode=S_IFREG|0755, st_size=1270208, ...}) = 0
lseek(3, 64, SEEK_SET) = 64
read(3, "\6\0\0\0\5\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0"..., 616) =
616
mmap(NULL, 2248808, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
0x2aaaaabc1000
mprotect(0x2aaaaacdd000, 1085544, PROT_NONE) = 0
mmap(0x2aaaaadc1000, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_DENYWRITE, 3, 0x100000) = 0x2aaaaadc1000
mmap(0x2aaaaade2000, 16488, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x2aaaaade2000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaade7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaade8000
mprotect(0x2aaaaaddc000, 12288, PROT_READ) = 0
arch_prctl(ARCH_SET_FS, 0x2aaaaade7ae0) = 0
munmap(0x2aaaaaac1000, 91598) = 0
open("/dev/urandom", O_RDONLY) = 3
read(3, ")\371\226%\260\224\'\351", 8) = 8
close(3) = 0
getpid() = 7682
write(1, "Parent\'s PID is 7682\n", 21Parent's PID is 7682
) = 21
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaade7b70) = 7683
child (PID = 7683) started
rt_sigaction(SIGUSR1, {0x400aa0, [], SA_RESTORER|SA_RESTART, 0x2aaaaabefca0},
NULL, 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaade7b70) = 7684
write(1, "Error Code from waitid - -1\n", 28Error Code from waitid - -1
) = 28
dup(2) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
brk(0) = 0x502000
brk(0x523000) = 0x523000
fstat(3, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2aaaaaac1000
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
write(3, "waitid: Operation not supported\n", 32waitid: Operation not
supported
) = 32
close(3) = 0
munmap(0x2aaaaaac1000, 4096) = 0
exit_group(1)
--
Question: Is it better to abide by the rules until they're changed or
help speed the change by breaking them?
On Thursday 26 May 2005 20:37, Parag Warudkar wrote:
With WCONTINUED - there is no waitid in the strace output -
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2aaaaade7b70) = 7683
> child (PID = 7683) started
> rt_sigaction(SIGUSR1, {0x400aa0, [], SA_RESTORER|SA_RESTART,
> 0x2aaaaabefca0}, NULL, 8) = 0
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2aaaaade7b70) = 7684
> write(1, "Error Code from waitid - -1\n", 28Error Code from waitid - -1
> ) = 28
Without WCONTINUED -
>clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaade7b70) = 7898
>child (PID = 7898) started
>rt_sigaction(SIGUSR1, {0x400aa0, [], SA_RESTORER|SA_RESTART, 0x2aaaaabefca0},
>NULL, 8) = 0
>clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2aaaaade7b70) = 7899
>wait4(7898, Sending signal to parent
>0x7fffffffeecc, WSTOPPED, NULL) = ? ERESTARTSYS (To be restarted)
>--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
>--- SIGCHLD (Child exited) @ 0 (0) ---
>write(1, "Caught signal", 13Caught signal) = 13
>write(1, "\n", 1
>) = 1
>rt_sigreturn(0x1) = 61
It's of course beyond me why this difference - strace output should have had
mention of wait4(...) = ENOTSUP (or whatever) when WCONTINUED is used.
WCONTINUED is declared in include files and mentioned as supported in the man
pages.
Parag
Ok.. Nothing wrong with the kernel - I used the syscall(...) macro with 247 as
the syscall number (__NR_waitid) the program works as expected. My version of
glibc might not actually support WCONTINUED.
Thanks
Parag
> On Thu, 26 May 2005, David S. Miller wrote:
> >
> > > Other machines are subject to the same risk and should have a
> > > prevent_tail_call definition too. The asm-i386/linkage.h version
> > > probably
> > > works fine for every machine. It might as well be generic, I'd say.
> >
> > Sparc, for one, doesn't need it. We pass the pt_regs in via a pointer
> > to the trap level stack frame which won't be released by a tail-call
> > in C code.
>
> x86 has largely tried to move in that direction too, ie a lot of the
> asm-calls have been turned into FASTCALL() with %eax pointing to the
> stack.
>
> Roland, I applied the patch, but if there was some particular case that
> triggered this, maybe it's worth trying to re-write that one.
Did I miss something. Did this patch come through the
list? (I didn't see it.)
Cheers,
Michael
--
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse f?r Mail, Message, More +++