2007-01-11 00:37:29

by NeilBrown

[permalink] [raw]
Subject: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Wednesday January 10, [email protected] wrote:
>
> In looking at the Linux code for ERESTARTNOHAND, I see that
> include/linux/errno.h says this errno should never make it to the user.
> However, in this instance we ARE seeing it. Looking around on google shows
> others are seeing it as well, though hits are few.
..
>
> Thoughts?

Just a 'me too' at this point.
The X server on my shiny new notebook (Core 2 Duo) occasionally dies
with 'select' repeatedly returning ERESTARTNOHAND. It is most
annoying!

You don't mention in the Email which kernel version you use but I see
from the web page you reference it is 2.6.19.1. I'm using
2.6.18.something.

I thought I'd have a quick look at the code, comparing i386 to x86-64
and guess what I found.....

On x86-64, regs->rax is "unsigned long", so the following is
needed....

I haven't tried it yet.

NeilBrown


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./arch/x86_64/kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/arch/x86_64/kernel/signal.c ./arch/x86_64/kernel/signal.c
--- .prev/arch/x86_64/kernel/signal.c 2007-01-11 11:33:27.000000000 +1100
+++ ./arch/x86_64/kernel/signal.c 2007-01-11 11:34:01.000000000 +1100
@@ -331,7 +331,7 @@ handle_signal(unsigned long sig, siginfo
/* Are we from a system call? */
if ((long)regs->orig_rax >= 0) {
/* If so, check system call restarting.. */
- switch (regs->rax) {
+ switch ((long)regs->rax) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
regs->rax = -EINTR;


2007-01-11 00:40:56

by Andi Kleen

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Thursday 11 January 2007 01:37, Neil Brown wrote:
> On Wednesday January 10, [email protected] wrote:
> >
> > In looking at the Linux code for ERESTARTNOHAND, I see that
> > include/linux/errno.h says this errno should never make it to the user.
> > However, in this instance we ARE seeing it. Looking around on google shows
> > others are seeing it as well, though hits are few.
> ..
> >
> > Thoughts?
>
> Just a 'me too' at this point.
> The X server on my shiny new notebook (Core 2 Duo) occasionally dies
> with 'select' repeatedly returning ERESTARTNOHAND. It is most
> annoying!

Normally it should be only visible in strace. Did you see it without
strace?

>
> You don't mention in the Email which kernel version you use but I see
> from the web page you reference it is 2.6.19.1. I'm using
> 2.6.18.something.
>
> I thought I'd have a quick look at the code, comparing i386 to x86-64
> and guess what I found.....
>
> On x86-64, regs->rax is "unsigned long", so the following is
> needed....

regs->rax is unsigned long.
I don't think your patch will make any difference. What do you think
it will change?

-Andi

2007-01-11 00:43:21

by David Miller

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

From: Neil Brown <[email protected]>
Date: Thu, 11 Jan 2007 11:37:05 +1100

> On x86-64, regs->rax is "unsigned long", so the following is
> needed....
>
> I haven't tried it yet.

Doesn't type promotion take care of that? Did you verify
that assember?

I checked the assembler on sparc64 for similar constructs
and it does the right thing.

2007-01-11 01:03:16

by NeilBrown

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Thursday January 11, [email protected] wrote:
> > Just a 'me too' at this point.
> > The X server on my shiny new notebook (Core 2 Duo) occasionally dies
> > with 'select' repeatedly returning ERESTARTNOHAND. It is most
> > annoying!
>
> Normally it should be only visible in strace. Did you see it without
> strace?

No, only in strace.

>
> >
> > You don't mention in the Email which kernel version you use but I see
> > from the web page you reference it is 2.6.19.1. I'm using
> > 2.6.18.something.
> >
> > I thought I'd have a quick look at the code, comparing i386 to x86-64
> > and guess what I found.....
> >
> > On x86-64, regs->rax is "unsigned long", so the following is
> > needed....
>
> regs->rax is unsigned long.
> I don't think your patch will make any difference. What do you think
> it will change?

If regs->rax is unsigned long, then I would think the compiler would
be allowed to convert

switch (regs->rax) {
case -514 : whatever;
}

to a no-op, as regs->rax will never have a negative value.

However it appears that the current compiler doesn't make that
optimisation so I guess I was too hasty.

Still, I think it would be safer to have the cast, in case the compiler
decided to be clever.... or does the C standard ensure against that?

Sorry for the noise,

NeilBrown

2007-01-11 01:37:12

by Andi Kleen

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Thursday 11 January 2007 02:02, Neil Brown wrote:
> On Thursday January 11, [email protected] wrote:
> > > Just a 'me too' at this point.
> > > The X server on my shiny new notebook (Core 2 Duo) occasionally dies
> > > with 'select' repeatedly returning ERESTARTNOHAND. It is most
> > > annoying!
> >
> > Normally it should be only visible in strace. Did you see it without
> > strace?
>
> No, only in strace.

strace leaks internal errors. At some point that should be fixed,
but it's not really a serious problem.

There was one other report of internal errors leaking without strace,
but it was vague and I never got confirmation.

> Still, I think it would be safer to have the cast, in case the compiler
> decided to be clever.... or does the C standard ensure against that?

It does.

-Andi

2007-01-11 04:11:11

by Sean Reifschneider

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Thu, Jan 11, 2007 at 12:02:53PM +1100, Neil Brown wrote:
>On Thursday January 11, [email protected] wrote:
>> Normally it should be only visible in strace. Did you see it without
>> strace?
>
>No, only in strace.

I am absolutely seeing it outside of strace. It is showing up as an errno
to the select call:

if (select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &t) != 0) {
if (errno != EINTR) {
PyErr_SetFromErrno(PyExc_IOError);
return -1;
}

This code is seeing errno=514.

>> > You don't mention in the Email which kernel version you use but I see
>> > from the web page you reference it is 2.6.19.1. I'm using

The production system is running CentOS 4.4, 2.6.9 kernel. However, it
looks to be the same issue all the way up to 2.6.19.1, and google shows
reports of it on 2.6.17.

Thanks,
Sean
--
George Washington was first in war, first in peace -- and first to
have his birthday juggled to make a long weekend.
Sean Reifschneider, Member of Technical Staff <[email protected]>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability

2007-01-11 19:39:51

by Denys Vlasenko

[permalink] [raw]
Subject: Re: PATCH - x86-64 signed-compare bug, was Re: select() setting ERESTARTNOHAND (514).

On Thursday 11 January 2007 02:02, Neil Brown wrote:
> If regs->rax is unsigned long, then I would think the compiler would
> be allowed to convert
>
> switch (regs->rax) {
> case -514 : whatever;
> }
>
> to a no-op, as regs->rax will never have a negative value.

In C, you never actually compare different types. They always
promoted to some common type first.

both sides of (impicit) == here get promoted to "biggest" integer,
in this case, to unsigned long. "-514" is an int, so it gets
sign extended to the width of "long" and then converted to
unsigned long.
--
vda