As it is currently written, sys_select checks its return code to convert
ERESTARTNOHAND to EINTR. However, the check is within an if (tvp) clause, and
so if select is called from userspace with a NULL timeval, then it is possible
for the ERESTARTNOHAND errno to leak into userspace, which is incorrect. This
patch moves that check outside of the conditional, and prevents the errno leak.
Thanks & Regards
Neil
Signed-Off-By: Neil Horman <[email protected]>
select.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index fe0893a..afcd691 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -415,20 +415,12 @@ asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
rtv.tv_sec = timeout;
if (timeval_compare(&rtv, &tv) >= 0)
rtv = tv;
- if (copy_to_user(tvp, &rtv, sizeof(rtv))) {
-sticky:
- /*
- * If an application puts its timeval in read-only
- * memory, we don't want the Linux-specific update to
- * the timeval to cause a fault after the select has
- * completed successfully. However, because we're not
- * updating the timeval, we can't restart the system
- * call.
- */
- if (ret == -ERESTARTNOHAND)
- ret = -EINTR;
- }
+ if (copy_to_user(tvp, &rtv, sizeof(rtv)))
+ return -EFAULT;
}
+sticky:
+ if (ret == -ERESTARTNOHAND)
+ ret = -EINTR;
return ret;
}
On Tue, 16 Jan 2007 15:13:32 -0500
Neil Horman <[email protected]> wrote:
> As it is currently written, sys_select checks its return code to convert
> ERESTARTNOHAND to EINTR. However, the check is within an if (tvp) clause, and
> so if select is called from userspace with a NULL timeval, then it is possible
> for the ERESTARTNOHAND errno to leak into userspace, which is incorrect. This
> patch moves that check outside of the conditional, and prevents the errno leak.
the ERESTARTNOHAND thing is handled in arch specific signal code,
syscalls can return -ERESTARTNOHAND as much as they want (and your
change breaks the current behaviour of select()).
For example:
arch/x86_64/kernel/signal.c
/* Are we from a system call? */
if ((long)regs->orig_rax >= 0) {
/* If so, check system call restarting.. */
switch (regs->rax) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
regs->rax = -EINTR;
break;
--
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
> On Tue, 16 Jan 2007 15:13:32 -0500
> Neil Horman <[email protected]> wrote:
>
> > As it is currently written, sys_select checks its return code to convert
> > ERESTARTNOHAND to EINTR. However, the check is within an if (tvp) clause, and
> > so if select is called from userspace with a NULL timeval, then it is possible
> > for the ERESTARTNOHAND errno to leak into userspace, which is incorrect. This
> > patch moves that check outside of the conditional, and prevents the errno leak.
>
> the ERESTARTNOHAND thing is handled in arch specific signal code,
In the signal handling path yes.
Not always in the case of select, though. Check core_sys_select:
if (!ret) {
ret = -ERESTARTNOHAND;
if (signal_pending(current))
goto out;
ret = 0;
}
...
out:
if (bits != stack_fds)
kfree(bits);
out_nofds:
return ret;
Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, which
will in turn (as its currently written), return that value back to user space.
> syscalls can return -ERESTARTNOHAND as much as they want (and your
> change breaks the current behaviour of select()).
>
It doesn't break it, it fixes it. select isn't meant to ever return
ERESTARTNOHAND to user space:
http://www.opengroup.org/onlinepubs/009695399/functions/select.html
And ENORESTARTHAND isn't defined in the userspace errno.h, so this causes all
sorts of confusion for apps that don't expect it.
Neil
> For example:
>
> arch/x86_64/kernel/signal.c
>
> /* Are we from a system call? */
> if ((long)regs->orig_rax >= 0) {
> /* If so, check system call restarting.. */
> switch (regs->rax) {
> case -ERESTART_RESTARTBLOCK:
> case -ERESTARTNOHAND:
> regs->rax = -EINTR;
> break;
>
> --
> Paolo Ornati
> Linux 2.6.20-rc5 on x86_64
On Mon, 22 Jan 2007, Neil Horman wrote:
> On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
> >
> > the ERESTARTNOHAND thing is handled in arch specific signal code,
>
> In the signal handling path yes.
Right.
> Not always in the case of select, though. Check core_sys_select:
No, even in the case of select().
> if (!ret) {
> ret = -ERESTARTNOHAND;
> if (signal_pending(current))
> goto out;
> ret = 0;
Since we have "signal_pending(current)" being true, we _know_ that the
signal handling path will be triggered, so the ERESTARTNOHAND will be
changed into the appropriate error return (or restart) by the signal
handling code.
> Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, which
> will in turn (as its currently written), return that value back to user space.
No. Exactly because sys_select() will always return through the system
call handling path, and that will turn the ERESTARTNOHAND into something
else.
NOTE! If you use "ptrace()", you may see the internal errors. But that's a
ptrace-only thing, and may have fooled you into thinking that the actual
_application_ sees those internal errors. It won't.
Of course, we could have some signal-handling bug here, but if so, it
would affect a lot more than just select(). Have you actually seen
ERESTARTNOINTR in the app (not just ptrace?)
Linus
On Mon, Jan 22, 2007 at 08:03:53AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 22 Jan 2007, Neil Horman wrote:
>
> > On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
> > >
> > > the ERESTARTNOHAND thing is handled in arch specific signal code,
> >
> > In the signal handling path yes.
>
> Right.
>
> > Not always in the case of select, though. Check core_sys_select:
>
> No, even in the case of select().
>
> > if (!ret) {
> > ret = -ERESTARTNOHAND;
> > if (signal_pending(current))
> > goto out;
> > ret = 0;
>
> Since we have "signal_pending(current)" being true, we _know_ that the
> signal handling path will be triggered, so the ERESTARTNOHAND will be
> changed into the appropriate error return (or restart) by the signal
> handling code.
>
> > Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, which
> > will in turn (as its currently written), return that value back to user space.
>
> No. Exactly because sys_select() will always return through the system
> call handling path, and that will turn the ERESTARTNOHAND into something
> else.
>
> NOTE! If you use "ptrace()", you may see the internal errors. But that's a
> ptrace-only thing, and may have fooled you into thinking that the actual
> _application_ sees those internal errors. It won't.
>
> Of course, we could have some signal-handling bug here, but if so, it
> would affect a lot more than just select(). Have you actually seen
> ERESTARTNOINTR in the app (not just ptrace?)
>
The error was reported to me second hand. I'm expecting a reproducer (although
to date, I'm still waiting for it, so I may have jumped the gun here). In fact,
I see what your saying now, down in the assembly glue for our arches (x86 in
this case) we jump to do_notify_resume since we have a pending signal, and
inside do_signal from there we fix up ERESTARTNOHAND to be something sane for
userspace. Ok, I withdraw this patch. I'll repost when/if I get my hands on
the reproducer and see that something is actually slipping through.
Neil
> Linus
On Mon, Jan 22, 2007 at 11:24:06AM -0500, Neil Horman wrote:
> The error was reported to me second hand. I'm expecting a reproducer (although
> to date, I'm still waiting for it, so I may have jumped the gun here). In fact,
I've asked for a repoducer weeks ago and nothing happened, nobody even
bothered to reply - I wrote a small test-app that might exhibit the problem,
but didn't manage to trigger it.
Bert
--
http://www.PowerDNS.com Open source, database driven DNS Software
http://netherlabs.nl Open and Closed source services
From: Neil Horman <[email protected]>
Date: Tue, 16 Jan 2007 15:13:32 -0500
> As it is currently written, sys_select checks its return code to convert
> ERESTARTNOHAND to EINTR. However, the check is within an if (tvp) clause, and
> so if select is called from userspace with a NULL timeval, then it is possible
> for the ERESTARTNOHAND errno to leak into userspace, which is incorrect. This
> patch moves that check outside of the conditional, and prevents the errno leak.
>
> Signed-Off-By: Neil Horman <[email protected]>
As has been probably mentioned, this change is bogus.
core_sys_select() returns -ERESTARTNOHAND in exactly the
case where that return value is legal, when a signal is
pending, so that the signal dispatch code (on return from
the system call) will "fixup" the -ERESTARTNOHAND return
value so that userspace does not see it.
What could be happening is that, somehow, we see the signal
pending in core_sys_select(), but for some reason by the time
the signal dispatch code would run, the signal is cleared.
I always found this scheme a little fishy, relying on signal
pending state to guarentee the fixup of the syscall restart
error return values.
For example, I see nothing that prevents another thread sharing
this signal state from clearing the signal between the syscall
checking in the other thread and the signal dispatch check running
in that other thread.
cpu 1 cpu 2
check sigpending
clear signal
syscall return
no signal panding
get -ERESTARTNOHAND
Perhaps something makes this no happen, but I don't see it :)
On Tue, Jan 23, 2007 at 09:59:26PM -0800, David Miller wrote:
> From: Neil Horman <[email protected]>
> Date: Tue, 16 Jan 2007 15:13:32 -0500
>
> > As it is currently written, sys_select checks its return code to convert
> > ERESTARTNOHAND to EINTR. However, the check is within an if (tvp) clause, and
> > so if select is called from userspace with a NULL timeval, then it is possible
> > for the ERESTARTNOHAND errno to leak into userspace, which is incorrect. This
> > patch moves that check outside of the conditional, and prevents the errno leak.
> >
> > Signed-Off-By: Neil Horman <[email protected]>
>
> As has been probably mentioned, this change is bogus.
>
> core_sys_select() returns -ERESTARTNOHAND in exactly the
> case where that return value is legal, when a signal is
> pending, so that the signal dispatch code (on return from
> the system call) will "fixup" the -ERESTARTNOHAND return
> value so that userspace does not see it.
>
> What could be happening is that, somehow, we see the signal
> pending in core_sys_select(), but for some reason by the time
> the signal dispatch code would run, the signal is cleared.
>
> I always found this scheme a little fishy, relying on signal
> pending state to guarentee the fixup of the syscall restart
> error return values.
>
> For example, I see nothing that prevents another thread sharing
> this signal state from clearing the signal between the syscall
> checking in the other thread and the signal dispatch check running
> in that other thread.
>
> cpu 1 cpu 2
> check sigpending
> clear signal
> syscall return
> no signal panding
> get -ERESTARTNOHAND
>
> Perhaps something makes this no happen, but I don't see it :)
This is exactly the theory that I've been tossing around with the person that
reported it to me. We're still witing on a reproduer, but I'm currently working
on a multithreaded test case to try and trigger user space leaks of
ERESTARTNOHAND to user space.
Neil
Hi Neil,
We've been having problems with this select patch change.
Specifically -- previously, when a ptrace attach was done to a task
blocked in a select() call and that task had a timeout value,
the task would restart the select() call with an updated timeout value.
With this patch in place, the task now instead returns EINTR.
A test that shows this issue is provided below.
We also confirmed that attaching to a program sitting
in select() with gdb makes the select get an EINTR, so this
behavior also shows up in gdb.
Thank you for your considerations in this matter.
------------------- -------------------
#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
int
main(int argc, char ** argv) {
pid_t kid;
if ((kid = fork()) == 0) {
int ms_wait = 2000;
int rval;
struct timeval timeout;
timeout.tv_sec = ms_wait / 1000;
timeout.tv_usec = (ms_wait % 1000) * 1000;
rval = select(0, NULL, NULL, NULL, &timeout);
if (rval == -1) {
int errcode = errno;
printf("Hey! Why did my select error off with errno %d (%s)?\n",
errcode, strerror(errcode));
fflush(stdout);
} else {
printf("select call completed, return value: %d\n", rval);
}
exit(0);
} else if (kid == (pid_t)-1) {
perror("fork");
} else {
int ms_wait = 500;
int rval;
struct timeval timeout;
/* Wait a bit to make sure kid has a chance to get into its
* select call
*/
timeout.tv_sec = ms_wait / 1000;
timeout.tv_usec = (ms_wait % 1000) * 1000;
rval = select(0, NULL, NULL, NULL, &timeout);
/* Now attach to the kid, then continue him.
*/
if (ptrace(PTRACE_ATTACH, kid, (void *)0, (void *)0) != 0) {
perror("ptrace");
}
if (waitpid(kid, &rval, 0) != kid) {
perror("waitpid");
}
if (ptrace(PTRACE_CONT, kid, (void *)0, (void *)0) != 0) {
perror("ptrace");
}
if (waitpid(kid, &rval, 0) != kid) {
perror("waitpid");
}
}
return 0;
}
On Fri, Aug 17, 2007 at 11:41:20AM -0400, John Blackwood wrote:
> Hi Neil,
>
> We've been having problems with this select patch change.
>
> Specifically -- previously, when a ptrace attach was done to a task
> blocked in a select() call and that task had a timeout value,
> the task would restart the select() call with an updated timeout value.
>
> With this patch in place, the task now instead returns EINTR.
>
> A test that shows this issue is provided below.
>
> We also confirmed that attaching to a program sitting
> in select() with gdb makes the select get an EINTR, so this
> behavior also shows up in gdb.
>
> Thank you for your considerations in this matter.
>
I'm on vacation at the moment, but will look at this more closely when I get
back on tuesday
Regards
Neil
> ------------------- -------------------
> #include <stdio.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <unistd.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <string.h>
> #include <ctype.h>
> #include <time.h>
> #include <signal.h>
> #include <sys/wait.h>
> #include <sys/ptrace.h>
>
> int
> main(int argc, char ** argv) {
> pid_t kid;
> if ((kid = fork()) == 0) {
> int ms_wait = 2000;
> int rval;
> struct timeval timeout;
>
> timeout.tv_sec = ms_wait / 1000;
> timeout.tv_usec = (ms_wait % 1000) * 1000;
> rval = select(0, NULL, NULL, NULL, &timeout);
> if (rval == -1) {
> int errcode = errno;
> printf("Hey! Why did my select error off with errno %d (%s)?\n",
> errcode, strerror(errcode));
> fflush(stdout);
> } else {
> printf("select call completed, return value: %d\n", rval);
> }
> exit(0);
> } else if (kid == (pid_t)-1) {
> perror("fork");
> } else {
> int ms_wait = 500;
> int rval;
> struct timeval timeout;
>
> /* Wait a bit to make sure kid has a chance to get into its
> * select call
> */
> timeout.tv_sec = ms_wait / 1000;
> timeout.tv_usec = (ms_wait % 1000) * 1000;
> rval = select(0, NULL, NULL, NULL, &timeout);
>
> /* Now attach to the kid, then continue him.
> */
> if (ptrace(PTRACE_ATTACH, kid, (void *)0, (void *)0) != 0) {
> perror("ptrace");
> }
> if (waitpid(kid, &rval, 0) != kid) {
> perror("waitpid");
> }
> if (ptrace(PTRACE_CONT, kid, (void *)0, (void *)0) != 0) {
> perror("ptrace");
> }
> if (waitpid(kid, &rval, 0) != kid) {
> perror("waitpid");
> }
> }
> return 0;
> }
--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/