2009-01-13 22:38:32

by Michael Kerrisk

[permalink] [raw]
Subject: waitid() return value strangeness when infop is NULL

[Resend with right LKML address.]

Hello Roland,

Vegard sent me a report of a possible bug in the waitid(2) man page,
but on closer examination, the issue seems to be in the waitid()
system call.

POSIX.1 says of waitid():

RETURN VALUE
If WNOHANG was specified and there are no children to wait for,
0 shall be returned. If waitid() returns due to the change of
state of one of its children, 0 shall be returned. Otherwise,
-1 shall be returned and errno set to indicate the error.

The Linux waitid() follows this behavior, as long as infop is not
NULL, which is what POSIX requires:

The application shall ensure that the infop argument points to
a siginfo_t structure.

However, Linux's waitid() does permit infop to be NULL. But in this
case, a successful call returns the PID of the waited-for child.
Thinking about it, I suppose there is a rationale for doing this,
since otherwise the caller of waitid() has no way of obtaining the PID
of the waited-for child. Anyway, this behavior should either be
documented, or "fixed" (i.e., change waitid() to return 0 on success
even if infop is NULL). So, some questions:

a) Is it intentional to support the infop==NULL case for waitid() --
rather than simply giving an error return?

b) If so, what is the rationale for supporting this feature?

c) If the the answer a) is "yes", is it intentional for
waitid(idtype,id,NULL,options) to return the PID of the child, rather
than returning 0?

I'll post a test program in a follow-up message.

Cheers,

Michael



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html


2009-01-13 22:43:25

by Michael Kerrisk

[permalink] [raw]
Subject: Re: waitid() return value strangeness when infop is NULL

Test program below. Two test runs, of which the second shows the
behavior I'm reporting.

$ ./a.out
Created child with PID 7727
infop = 0xbfe608e0
waitid() returned 0
$ ./a.out n
Created child with PID 7729
infop = (nil)
waitid() returned 7729

===

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
siginfo_t info, *infop;
int options, idtype;
long s;
pid_t child;

child = fork();
if (child == 0)
exit(0);
printf("Created child with PID %ld\n", (long) child);

idtype = P_PID;
options = WEXITED;
infop = (argc > 1 && strchr(argv[1], 'n') != NULL) ? NULL : &info;

printf("infop = %p\n", infop);

s = syscall(SYS_waitid, idtype, child, infop, options, NULL);
if (s == -1) {
perror("waitid");
exit(EXIT_FAILURE);
}

printf("waitid() returned %ld\n", s);

exit(EXIT_SUCCESS);
}

2009-01-13 22:48:21

by Roland McGrath

[permalink] [raw]
Subject: Re: waitid() return value strangeness when infop is NULL

> a) Is it intentional to support the infop==NULL case for waitid() --
> rather than simply giving an error return?

No. It's just an artifact of the internals shared with wait4/waitpid.
Patch to follow.


Thanks,
Roland

2009-01-13 22:50:33

by Roland McGrath

[permalink] [raw]
Subject: [PATCH] sys_waitid: return -EFAULT for NULL

It's always been invalid to call waitid() with a NULL pointer. It was an
oversight that it was allowed (and acts like a wait4() call instead).

Signed-off-by: Roland McGrath <[email protected]>
---
kernel/exit.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index c7740fa..fa25790 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1760,6 +1760,8 @@ asmlinkage long sys_waitid(int which, pid_t upid,
enum pid_type type;
long ret;

+ if (unlikely(!infop))
+ return -EFAULT;
if (options & ~(WNOHANG|WNOWAIT|WEXITED|WSTOPPED|WCONTINUED))
return -EINVAL;
if (!(options & (WEXITED|WSTOPPED|WCONTINUED)))

2009-01-13 23:14:53

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 11:49 AM, Roland McGrath <[email protected]> wrote:
> It's always been invalid to call waitid() with a NULL pointer. It was an
> oversight that it was allowed (and acts like a wait4() call instead).
>
> Signed-off-by: Roland McGrath <[email protected]>

Modulo the observation that this change will break any Linux-specific
application that violate POSIX.1's requirement that infop not be NULL
(*), and rely on the existing Linux behavior for
waitd(idtype,id,NULL,options):

Acked-by: Michael Kerrisk <[email protected]>

(*) It seems unlikely that such applications exist, and we really
should make this change for POSIX.1 conformance.

> ---
> kernel/exit.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index c7740fa..fa25790 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -1760,6 +1760,8 @@ asmlinkage long sys_waitid(int which, pid_t upid,
> enum pid_type type;
> long ret;
>
> + if (unlikely(!infop))
> + return -EFAULT;
> if (options & ~(WNOHANG|WNOWAIT|WEXITED|WSTOPPED|WCONTINUED))
> return -EINVAL;
> if (!(options & (WEXITED|WSTOPPED|WCONTINUED)))
>



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-13 23:26:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, 14 Jan 2009 12:14:20 +1300
"Michael Kerrisk" <[email protected]> wrote:

> On Wed, Jan 14, 2009 at 11:49 AM, Roland McGrath <[email protected]> wrote:
> > It's always been invalid to call waitid() with a NULL pointer. It was an
> > oversight that it was allowed (and acts like a wait4() call instead).
> >
> > Signed-off-by: Roland McGrath <[email protected]>
>
> Modulo the observation that this change will break any Linux-specific
> application that violate POSIX.1's requirement that infop not be NULL
> (*), and rely on the existing Linux behavior for
> waitd(idtype,id,NULL,options):
>

Well yes. waitid() has been in there since 2.6.9. I assume that it
has had this wait4-emulation mode for that amount of time as well?

>
> (*) It seems unlikely that such applications exist, and we really
> should make this change for POSIX.1 conformance.

Well, we might get away with it. But formally speaking, we should live
with our Linux-specific screwup.

If we _are_ going to make this change then we should merge it as far
back in -stable as we can, to reduce the risk that people will develop
applications on kernel version A only to have then behave differently
on version B.

2009-01-13 23:36:59

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 12:24 PM, Andrew Morton
<[email protected]> wrote:
> On Wed, 14 Jan 2009 12:14:20 +1300
> "Michael Kerrisk" <[email protected]> wrote:
>
>> On Wed, Jan 14, 2009 at 11:49 AM, Roland McGrath <[email protected]> wrote:
>> > It's always been invalid to call waitid() with a NULL pointer. It was an
>> > oversight that it was allowed (and acts like a wait4() call instead).
>> >
>> > Signed-off-by: Roland McGrath <[email protected]>
>>
>> Modulo the observation that this change will break any Linux-specific
>> application that violate POSIX.1's requirement that infop not be NULL
>> (*), and rely on the existing Linux behavior for
>> waitd(idtype,id,NULL,options):
>>
>
> Well yes. waitid() has been in there since 2.6.9. I assume that it
> has had this wait4-emulation mode for that amount of time as well?

AFIACS, yes.

>> (*) It seems unlikely that such applications exist, and we really
>> should make this change for POSIX.1 conformance.
>
> Well, we might get away with it. But formally speaking, we should live
> with our Linux-specific screwup.

Yes, tricky. On the one hand, we shouldn't break the ABI. On the
other hand, POSIX.1 is explicit in disallowing the case that would
lead the ABI change made by this patch. (For what it is worth, the
man page was released at pretty much the same time as the syscall, and
has always documented that the return value on success was 0, and
Vegard was the first person to report this case that deviated from the
doc.)

> If we _are_ going to make this change then we should merge it as far
> back in -stable as we can, to reduce the risk that people will develop
> applications on kernel version A only to have then behave differently
> on version B.

Ack.

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-13 23:48:05

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

> Well yes. waitid() has been in there since 2.6.9. I assume that it
> has had this wait4-emulation mode for that amount of time as well?

Correct.

> Well, we might get away with it. But formally speaking, we should live
> with our Linux-specific screwup.

I agree with both of you and I don't care which we do.


Thanks,
Roland

2009-01-14 00:15:11

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 12:47 PM, Roland McGrath <[email protected]> wrote:
>> Well yes. waitid() has been in there since 2.6.9. I assume that it
>> has had this wait4-emulation mode for that amount of time as well?
>
> Correct.
>
>> Well, we might get away with it. But formally speaking, we should live
>> with our Linux-specific screwup.
>
> I agree with both of you and I don't care which we do.

I don't feel strongly about this, but lean towards the idea that we
should fix things, as per Roland's patch, and I suspect we can get
away with it, for the reasons I mentioned already.


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-14 00:25:14

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Kerrisk wrote:
> I don't feel strongly about this, but lean towards the idea that we
> should fix things, as per Roland's patch, and I suspect we can get
> away with it, for the reasons I mentioned already.

Yes, change it, it will only cause harm leaving it. I very much doubt
any programmer deliberately uses this case and if it's not deliberate
they likely want to know about it.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkltMMcACgkQ2ijCOnn/RHQ8fQCgnghAaJQ4fkEMZCnQwsaza80X
WrIAn2z7tEsiMy8pYJs3gkdzO6CNiKTz
=WaAV
-----END PGP SIGNATURE-----

2009-01-14 00:33:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL



On Tue, 13 Jan 2009, Roland McGrath wrote:
>
> It's always been invalid to call waitid() with a NULL pointer. It was an
> oversight that it was allowed (and acts like a wait4() call instead).

I'm not going to take this.

If it was some new system call, of if there was some downside to out
behavior, I might be interested. As it is, our behaviour has zero
downside, and changing existing interfaces simply isn't worth it.

The alleged "downsides" are bogus:

- POSIX is not that strict. EFAULT is one of the odd error cases anyway,
and even explicit requirements are irrelevant: if somebody wants to get
strict conformance paperwork done, you just need to tell where you
differ, and you're basically done. But perhaps more important, nobody
cares.

- The "portability" argument is totally bogus, since it's not like you
compile programs without even testing to another UNIX _anyway_.

So I'm simply not going to potentially break binaries over something that
is so _totally_ irrelevant. Document it in the man-page instead.

Linus

2009-01-14 00:53:23

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 1:33 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Tue, 13 Jan 2009, Roland McGrath wrote:
>>
>> It's always been invalid to call waitid() with a NULL pointer. It was an
>> oversight that it was allowed (and acts like a wait4() call instead).
>
> I'm not going to take this.
>
> If it was some new system call, of if there was some downside to out
> behavior, I might be interested. As it is, our behaviour has zero
> downside, and changing existing interfaces simply isn't worth it.

It has zero downside for *us*. But it is yet another example of Linux
littering the Unix landscape with unnecessary inconsistencies that
application writers must deal with. That's a downside for the app
writers. (But, given how long the existing behavior has been in the
wild, my argument here is somewhat academic...)

> The alleged "downsides" are bogus:
>
> - POSIX is not that strict.

Well, POSIX.1-2001 is fairly clear:

The application shall ensure that the infop argument points to
a siginfo_t structure.

(Admittedly, this is a requirement imposed on the application, rather
than the implementation, but the standard goes on to say that the
implemenation shall fill in the structure pointed to by infop.)

Cheers,

Michael

> EFAULT is one of the odd error cases anyway,
> and even explicit requirements are irrelevant: if somebody wants to get
> strict conformance paperwork done, you just need to tell where you
> differ, and you're basically done. But perhaps more important, nobody
> cares.
>
> - The "portability" argument is totally bogus, since it's not like you
> compile programs without even testing to another UNIX _anyway_.
>
> So I'm simply not going to potentially break binaries over something that
> is so _totally_ irrelevant. Document it in the man-page instead.

Well, that's doable of course.

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-14 00:58:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL



On Wed, 14 Jan 2009, Michael Kerrisk wrote:
>
> It has zero downside for *us*. But it is yet another example of Linux
> littering the Unix landscape with unnecessary inconsistencies that
> application writers must deal with.

Bah. Not so. It matters not at all if you try to write portable code.

Linux has extensions. Deal with it. We have literally _thousands_ of
things that work on Linux but not on other OS's. The fact is, you can't
just recompile and assume something works, and waitid() has absolutely
nothing to do with it.

> Well, POSIX.1-2001 is fairly clear:
>
> The application shall ensure that the infop argument points to
> a siginfo_t structure.

Right. So the application should do that, and Linux does the right thing.
Problem solved.

Linus

2009-01-14 01:39:23

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 1:57 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Wed, 14 Jan 2009, Michael Kerrisk wrote:
>>
>> It has zero downside for *us*. But it is yet another example of Linux
>> littering the Unix landscape with unnecessary inconsistencies that
>> application writers must deal with.
>
> Bah. Not so. It matters not at all if you try to write portable code.
>
> Linux has extensions.

This a behavior that was unintended by the implementer, wasn't
requested by application writers, isn't present on other Unix systems,
and isn't specified by the standards.

It isn't an extension. It's an accident.

And the fact that such accidents happen more often than necessary is
the real problem, rather than the fact that this API in particular is
inconsistent with expectations.

> Deal with it. We have literally _thousands_ of
> things that work on Linux but not on other OS's. The fact is, you can't
> just recompile and assume something works, and waitid() has absolutely
> nothing to do with it.
>
>> Well, POSIX.1-2001 is fairly clear:
>>
>> The application shall ensure that the infop argument points to
>> a siginfo_t structure.
>
> Right. So the application should do that, and Linux does the right thing.
> Problem solved.

Right. And this fact is why, while I incline to think we should fix
the interface, I don't feel strongly about it.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-14 02:12:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL



On Wed, 14 Jan 2009, Michael Kerrisk wrote:
>
> It isn't an extension. It's an accident.

Bah. It doesn't matter. It's an ABI. Your arguments make no sense - simply
because there are other differences that means that if you develop under
one OS, you can't expect it to run on another.

Arguing that "accident" is somehow different than "extension" is a totally
inane and _idiotic_ argument. Because it's just arguing about the words,
not about the end result.

If you're doing a dissertation in English literature, arguing about the
words you use is valid. But if you're talking about operating systems,
it's just pointless masturbation.

Linus

2009-01-14 02:28:49

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [PATCH] sys_waitid: return -EFAULT for NULL

On Wed, Jan 14, 2009 at 3:10 PM, Linus Torvalds
<[email protected]> wrote:
>
>
> On Wed, 14 Jan 2009, Michael Kerrisk wrote:
>>
>> It isn't an extension. It's an accident.
>
> Bah. It doesn't matter. It's an ABI. Your arguments make no sense - simply
> because there are other differences that means that if you develop under
> one OS, you can't expect it to run on another.

(I don't think implied such an expectation. On the other hand, we can
do things to make more or less difficult for writers of portable
applications.)

> Arguing that "accident" is somehow different than "extension" is a totally
> inane and _idiotic_ argument. Because it's just arguing about the words,
> not about the end result.
>
> If you're doing a dissertation in English literature, arguing about the
> words you use is valid. But if you're talking about operating systems,
> it's just pointless masturbation.

I agree that for practical purposes with respect to waitid(),
"extension" versus "accident" is only about words: we've had the
interface in its current form for so long that changing it shouldn't
be done without consideration.

But (and this was why I constrasted "extension" with "accident"), I
already made it clear that IMO there was another point that was more
important than this specific case:

: And the fact that such accidents happen more often than necessary is
: the real problem, rather than the fact that this API in particular is
: inconsistent with expectations.

but you responded to the point I found less important.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html