2002-09-19 14:40:32

by dvorak

[permalink] [raw]
Subject: Syscall changes registers beyond %eax, on linux-i386

Hi,

recently i came across a situation were on linux-i386 not only %eax was
altered after a syscall but also %ebx. I tracked this problem down, to
gcc re-using a variable passed to a function.

This was found on a debian system with a 2.4.17 kernel compiled with gcc
2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
Attached is small program to test for this 'bug'

a syscall gets his data off the stack, the stack looks like:

saved(edx)
saved(ecx)
saved(ebx)
return_addres (somewhere in entry.S)

When the syscall is called.

the register came there through use of 'SAVE_ALL'.

After the syscall returns these registers are restored using RESTORE_ALL
and execution is transferred to userland again.

A short snippet of sys_poll, with irrelavant data removed.

sys_poll(struct pollfd *ufds, .. , ..) {
...
ufds++;
...
}

It seems that gcc in certain cases optimizes in such a way that it changes
the variable ufds as placed on the stack directly. Which results in saved(ebx)
being overwritten and thus in a changed %ebx on return from the system call.

I don't know if this is considered a bug, and if it is, from whom.
If it's not a bug it means low-level userland programs need to be rewritten
to store all registers on a syscall and restore them on return.

It shouldn't be a bug in gcc, since the C-standard doesn't talk about how to
pass variables and stuff. So it seems like a kernel(-gcc-interaction) bug.

To solve this issue 2 solutions spring to mind
1) add a flag to gcc to tell it that it shouldn't do this optimization, this
won't work with the gcc's already out there.
2) When calling a syscall explicitly push all variable an extra time, since
the code in entry.S doesn't know the amount of variables to a syscall it
needs to push all theoretical 6 parameters every time, a not so nice
overhead.


I hope someone can shed some light on this issue, i am not myself reading
the linux-kernel mailing list, and would like to be cc'd if possible (i'll
also check the archives so it's not 100% needed).

Thanks in advance,
Dvorak


Attachments:
(No filename) (2.05 kB)
reg-bug.c (1.21 kB)
Download all attachments

2002-09-19 16:04:39

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, dvorak wrote:

> Hi,
>
> recently i came across a situation were on linux-i386 not only %eax was
> altered after a syscall but also %ebx. I tracked this problem down, to
> gcc re-using a variable passed to a function.
>
> This was found on a debian system with a 2.4.17 kernel compiled with gcc
> 2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
> Attached is small program to test for this 'bug'
>
> a syscall gets his data off the stack, the stack looks like:
>
> saved(edx)
> saved(ecx)
> saved(ebx)
> return_addres (somewhere in entry.S)
>
> When the syscall is called.
>
> the register came there through use of 'SAVE_ALL'.
>
> After the syscall returns these registers are restored using RESTORE_ALL
> and execution is transferred to userland again.
>
> A short snippet of sys_poll, with irrelavant data removed.
>
> sys_poll(struct pollfd *ufds, .. , ..) {
> ...
> ufds++;
> ...
> }
>
> It seems that gcc in certain cases optimizes in such a way that it changes
> the variable ufds as placed on the stack directly. Which results in saved(ebx)
> being overwritten and thus in a changed %ebx on return from the system call.
>

The 'C' compiler must make room on the stack for any local
variables except register types. If it was doing as you state, you
couldn't even execute a "hello world" program. Further, the local
variables are after the return address. It would screw up the return
address and you'd go off into hyper-space upon return.


> I don't know if this is considered a bug, and if it is, from whom.
> If it's not a bug it means low-level userland programs need to be rewritten
> to store all registers on a syscall and restore them on return.
>

No. Various 'C' implementers have standardized calling methods even
though it's not part of the 'C' standard. gcc and others assume that
a called procedure is not going to change any segments or index registers.
There are various optimization things, like "-fcaller-saves" where the
called procedure can destroy anything. You may be using something that
was wrongly compiled using that switch.


> It shouldn't be a bug in gcc, since the C-standard doesn't talk about how to
> pass variables and stuff. So it seems like a kernel(-gcc-interaction) bug.
>
> To solve this issue 2 solutions spring to mind
> 1) add a flag to gcc to tell it that it shouldn't do this optimization, this
> won't work with the gcc's already out there.
> 2) When calling a syscall explicitly push all variable an extra time, since
> the code in entry.S doesn't know the amount of variables to a syscall it
> needs to push all theoretical 6 parameters every time, a not so nice
> overhead.
>
>

There is a bug in some other code. Try this. It will show
that ebx is not being killed in a syscall. You can prove
that this code works by changing ebx to eax, which will
get destroyed and print "Broken" before exit.


#include <stdio.h>
#include <unistd.h>

void barf(void);
void barf()
{
puts("Broken\n");
exit(0);
}

int main()
{
__asm__ __volatile__("movl $0xdeadface, %ebx\n");
(void)getpid();
__asm__ __volatile__("cmpl $0xdeadface, %ebx\n"
"jnz barf\n");

return 0;
}


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 17:04:54

by Brian Gerst

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Richard B. Johnson wrote:
> On Thu, 19 Sep 2002, dvorak wrote:
>
>
>>Hi,
>>
>>recently i came across a situation were on linux-i386 not only %eax was
>>altered after a syscall but also %ebx. I tracked this problem down, to
>>gcc re-using a variable passed to a function.
>>
>>This was found on a debian system with a 2.4.17 kernel compiled with gcc
>>2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
>>Attached is small program to test for this 'bug'
>>
>>a syscall gets his data off the stack, the stack looks like:
>>
>>saved(edx)
>>saved(ecx)
>>saved(ebx)
>>return_addres (somewhere in entry.S)
>>
>>When the syscall is called.
>>
>>the register came there through use of 'SAVE_ALL'.
>>
>>After the syscall returns these registers are restored using RESTORE_ALL
>>and execution is transferred to userland again.
>>
>>A short snippet of sys_poll, with irrelavant data removed.
>>
>>sys_poll(struct pollfd *ufds, .. , ..) {
>> ...
>> ufds++;
>> ...
>>}
>>
>>It seems that gcc in certain cases optimizes in such a way that it changes
>>the variable ufds as placed on the stack directly. Which results in saved(ebx)
>>being overwritten and thus in a changed %ebx on return from the system call.
>>
>
>
> The 'C' compiler must make room on the stack for any local
> variables except register types. If it was doing as you state, you
> couldn't even execute a "hello world" program. Further, the local
> variables are after the return address. It would screw up the return
> address and you'd go off into hyper-space upon return.
>
>
>
>>I don't know if this is considered a bug, and if it is, from whom.
>>If it's not a bug it means low-level userland programs need to be rewritten
>>to store all registers on a syscall and restore them on return.
>>
>
>
> No. Various 'C' implementers have standardized calling methods even
> though it's not part of the 'C' standard. gcc and others assume that
> a called procedure is not going to change any segments or index registers.
> There are various optimization things, like "-fcaller-saves" where the
> called procedure can destroy anything. You may be using something that
> was wrongly compiled using that switch.
>
>
>
>>It shouldn't be a bug in gcc, since the C-standard doesn't talk about how to
>>pass variables and stuff. So it seems like a kernel(-gcc-interaction) bug.
>>
>>To solve this issue 2 solutions spring to mind
>>1) add a flag to gcc to tell it that it shouldn't do this optimization, this
>> won't work with the gcc's already out there.
>>2) When calling a syscall explicitly push all variable an extra time, since
>> the code in entry.S doesn't know the amount of variables to a syscall it
>> needs to push all theoretical 6 parameters every time, a not so nice
>> overhead.
>>
>>
>
>
> There is a bug in some other code. Try this. It will show
> that ebx is not being killed in a syscall. You can prove
> that this code works by changing ebx to eax, which will
> get destroyed and print "Broken" before exit.

The bug is only with _some_ syscalls, and getpid() is not one of them,
so your example is flawed. It happens when a syscall modifies one of
it's parameter values. The solution is to assign the parameter to a
local variable before modifying it.

--
Brian Gerst

2002-09-19 17:15:09

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, Brian Gerst wrote:

> Richard B. Johnson wrote:
> > On Thu, 19 Sep 2002, dvorak wrote:
> >
> >
> >>Hi,
> >>
> >>recently i came across a situation were on linux-i386 not only %eax was
> >>altered after a syscall but also %ebx. I tracked this problem down, to
> >>gcc re-using a variable passed to a function.
> >>
> >>This was found on a debian system with a 2.4.17 kernel compiled with gcc
> >>2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
> >>Attached is small program to test for this 'bug'
> >>
> >>a syscall gets his data off the stack, the stack looks like:
> >>
> >>saved(edx)
> >>saved(ecx)
> >>saved(ebx)
> >>return_addres (somewhere in entry.S)
> >>
> >>When the syscall is called.
> >>
> >>the register came there through use of 'SAVE_ALL'.
> >>
> >>After the syscall returns these registers are restored using RESTORE_ALL
> >>and execution is transferred to userland again.
> >>
> >>A short snippet of sys_poll, with irrelavant data removed.
> >>
> >>sys_poll(struct pollfd *ufds, .. , ..) {
> >> ...
> >> ufds++;
> >> ...
> >>}
> >>
> >>It seems that gcc in certain cases optimizes in such a way that it changes
> >>the variable ufds as placed on the stack directly. Which results in saved(ebx)
> >>being overwritten and thus in a changed %ebx on return from the system call.
> >>
> >
> >
> > The 'C' compiler must make room on the stack for any local
> > variables except register types. If it was doing as you state, you
> > couldn't even execute a "hello world" program. Further, the local
> > variables are after the return address. It would screw up the return
> > address and you'd go off into hyper-space upon return.
> >
> >
> >
> >>I don't know if this is considered a bug, and if it is, from whom.
> >>If it's not a bug it means low-level userland programs need to be rewritten
> >>to store all registers on a syscall and restore them on return.
> >>
> >
> >
> > No. Various 'C' implementers have standardized calling methods even
> > though it's not part of the 'C' standard. gcc and others assume that
> > a called procedure is not going to change any segments or index registers.
> > There are various optimization things, like "-fcaller-saves" where the
> > called procedure can destroy anything. You may be using something that
> > was wrongly compiled using that switch.
> >
> >
> >
> >>It shouldn't be a bug in gcc, since the C-standard doesn't talk about how to
> >>pass variables and stuff. So it seems like a kernel(-gcc-interaction) bug.
> >>
> >>To solve this issue 2 solutions spring to mind
> >>1) add a flag to gcc to tell it that it shouldn't do this optimization, this
> >> won't work with the gcc's already out there.
> >>2) When calling a syscall explicitly push all variable an extra time, since
> >> the code in entry.S doesn't know the amount of variables to a syscall it
> >> needs to push all theoretical 6 parameters every time, a not so nice
> >> overhead.
> >>
> >>
> >
> >
> > There is a bug in some other code. Try this. It will show
> > that ebx is not being killed in a syscall. You can prove
> > that this code works by changing ebx to eax, which will
> > get destroyed and print "Broken" before exit.
>
> The bug is only with _some_ syscalls, and getpid() is not one of them,
> so your example is flawed. It happens when a syscall modifies one of
> it's parameter values. The solution is to assign the parameter to a
> local variable before modifying it.
>

Well which one? Here is an ioctl(). It certainly modifies one
of its parameter values.

#include <stdio.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <termios.h>

void barf(void);
void barf()
{
puts("Broken\n");
exit(0);
}
int main()
{
struct termios t;

__asm__ __volatile__("movl $0xdeadface, %ebx\n");
(void)ioctl(0, TCGETS, &t);
(void)getpid();
__asm__ __volatile__("cmpl $0xdeadface, %ebx\n"
"jnz barf\n");

return 0;
}


Until you can show the syscall that doesn't follow the correct
rules, then my example is not flawed. In fact a modified example can
be used to find any broken calls.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 17:39:50

by Petr Vandrovec

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On 19 Sep 02 at 13:22, Richard B. Johnson wrote:

> > >>A short snippet of sys_poll, with irrelavant data removed.
> > >>
> > >>sys_poll(struct pollfd *ufds, .. , ..) {
> > >> ...
> > >> ufds++;
> > >> ...
>
> Well which one? Here is an ioctl(). It certainly modifies one
> of its parameter values.

poll(), as was already noted. Program below should
print same value for B= and F=, but it reports f + 8*c instead
(where c = number of filedescriptors passed to poll).

And you must call it from assembly, as your calls to getpid() or
ioctl() (or poll()) are wrapped in libc - and glibc's code begins with
push %ebx because of %ebx is used by -fPIC code.

It is questinable whether we should try to not modify parameters
passed into functions. It is definitely nice behavior, but I think
that we should only guarantee that syscalls do not modify unused
registers.
Petr Vandrovec
[email protected]

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/poll.h>

struct pollfd f[5];

int main(int argc, char* argv[]) {
unsigned int i;
void * reg;

for (i = 0; i < 5; i++) {
f[i].fd = 0;
f[i].events = POLLIN;
}
__asm__ __volatile__("int $0x80\n" : "=b"(reg) : "a"(168), "0"(f), "c"(5), "d"(1));
printf("B=%p F=%p\n", reg, f);
return 0;
}

2002-09-19 17:46:31

by Brian Gerst

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Richard B. Johnson wrote:
> On Thu, 19 Sep 2002, Brian Gerst wrote:
>>Richard B. Johnson wrote:
>>>There is a bug in some other code. Try this. It will show
>>>that ebx is not being killed in a syscall. You can prove
>>>that this code works by changing ebx to eax, which will
>>>get destroyed and print "Broken" before exit.
>>
>>The bug is only with _some_ syscalls, and getpid() is not one of them,
>>so your example is flawed. It happens when a syscall modifies one of
>>it's parameter values. The solution is to assign the parameter to a
>>local variable before modifying it.
>>
>
>
> Well which one? Here is an ioctl(). It certainly modifies one
> of its parameter values.
>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/ioctl.h>
> #include <termios.h>
>
> void barf(void);
> void barf()
> {
> puts("Broken\n");
> exit(0);
> }
> int main()
> {
> struct termios t;
>
> __asm__ __volatile__("movl $0xdeadface, %ebx\n");
> (void)ioctl(0, TCGETS, &t);
> (void)getpid();
> __asm__ __volatile__("cmpl $0xdeadface, %ebx\n"
> "jnz barf\n");
>
> return 0;
> }
>
>
> Until you can show the syscall that doesn't follow the correct
> rules, then my example is not flawed. In fact a modified example can
> be used to find any broken calls.

Well the original poster gave one valid example: sys_poll(). We're not
talking about it modifying userspace though a pointer. We're talking
about it taking it's parameter on the kernel stack (which is really the
pt_regs structure saved from user space) and modifying it. Which then
gets restored to the user registers upon syscall exit.

This is how the kernel stack looks like inside a syscall (x86):
OLDSS
OLDESP
EFLAGS
CS
EIP
ORIG_EAX
ES
DS
EAX <- syscall number
EBP <- syscall arg6
EDI <- syscall arg5
ESI <- syscall arg4
EDX <- syscall arg3
ECX <- syscall arg2
EBX <- syscall arg1
(return address)
(local variables)

Everything above the return address is the pt_regs struct that gets
restored to user space. If the syscall modifies any of its args (*not
memory pointed to by the args*), they get written back to the stack in
the pt_regs area, and then get restored to userspace modified.
Understand now?

--
Brian Gerst

2002-09-19 17:59:58

by Brian Gerst

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Petr Vandrovec wrote:
> On 19 Sep 02 at 13:22, Richard B. Johnson wrote:
>
>
>>>>>A short snippet of sys_poll, with irrelavant data removed.
>>>>>
>>>>>sys_poll(struct pollfd *ufds, .. , ..) {
>>>>> ...
>>>>> ufds++;
>>>>> ...
>>>>
>>Well which one? Here is an ioctl(). It certainly modifies one
>>of its parameter values.
>
>
> poll(), as was already noted. Program below should
> print same value for B= and F=, but it reports f + 8*c instead
> (where c = number of filedescriptors passed to poll).
>
> And you must call it from assembly, as your calls to getpid() or
> ioctl() (or poll()) are wrapped in libc - and glibc's code begins with
> push %ebx because of %ebx is used by -fPIC code.
>
> It is questinable whether we should try to not modify parameters
> passed into functions. It is definitely nice behavior, but I think
> that we should only guarantee that syscalls do not modify unused
> registers.
> Petr Vandrovec
> [email protected]

Now that I've thought about it more, I think the best solution is to go
through all the syscalls (a big job, I know), and declare the parameters
as const, so that gcc knows it can't modify them, and will throw a
warning if we try.

--
Brian Gerst


2002-09-19 17:54:46

by dvorak

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, Sep 19, 2002 at 01:22:35PM -0400, Richard B. Johnson wrote:
> On Thu, 19 Sep 2002, Brian Gerst wrote:
>
> > Richard B. Johnson wrote:
> > > On Thu, 19 Sep 2002, dvorak wrote:
> > >
> > >
> > >>Hi,
> > >>
> > >>recently i came across a situation were on linux-i386 not only %eax was
> > >>altered after a syscall but also %ebx. I tracked this problem down, to
> > >>gcc re-using a variable passed to a function.
> > >>
> > >>This was found on a debian system with a 2.4.17 kernel compiled with gcc
> > >>2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
> > >>Attached is small program to test for this 'bug'
> > >>
<SNIP part of the explanation>

> > >>It seems that gcc in certain cases optimizes in such a way that it changes
> > >>the variable ufds as placed on the stack directly. Which results in saved(ebx)
> > >>being overwritten and thus in a changed %ebx on return from the system call.
> > >>
> > >
> > >
> > > The 'C' compiler must make room on the stack for any local
> > > variables except register types. If it was doing as you state, you
> > > couldn't even execute a "hello world" program. Further, the local
> > > variables are after the return address. It would screw up the return
> > > address and you'd go off into hyper-space upon return.

The problem is it uses one of the _arguments_ passed to the function,
that argument gets modified, normally this happens on a copy, but there
is no 'garantue' that is doesn't modify the original argument as
putted on the stack by the calling function.

> > > No. Various 'C' implementers have standardized calling methods even
> > > though it's not part of the 'C' standard. gcc and others assume that
> > > a called procedure is not going to change any segments or index registers.
> > > There are various optimization things, like "-fcaller-saves" where the
> > > called procedure can destroy anything. You may be using something that
> > > was wrongly compiled using that switch.
This is not what happens here, what happens is that one of the _arguments_
placed on the stack is being modified, normally a calling function discards
these values after use (addl $0x10, %esp or similar) but in this case they
are reused. (in the RESTORE_ALL call)

> >
> > The bug is only with _some_ syscalls, and getpid() is not one of them,
> > so your example is flawed. It happens when a syscall modifies one of
> > it's parameter values. The solution is to assign the parameter to a
> > local variable before modifying it.
> >
and only with _some_ compiler + kernel combinations.

> int main()
> {
> struct termios t;
>
> __asm__ __volatile__("movl $0xdeadface, %ebx\n");
> (void)ioctl(0, TCGETS, &t);
> (void)getpid();
> __asm__ __volatile__("cmpl $0xdeadface, %ebx\n"
> "jnz barf\n");
>
> return 0;
> }

> Until you can show the syscall that doesn't follow the correct
> rules, then my example is not flawed. In fact a modified example can
> be used to find any broken calls.
I putted in some assembler code in my original post that uses the sys_poll
syscall of which i _know_ it modifies one of it's arguments, to be more
specific, it's first argument, which is %ebx on passing in:
asmlinkage int sys_poll(struct pollfd * ufds, unsigned int nfds, long timeout)
....
for(i=0; i < (int)nfds; i++, ufds++, fds1++) {
....

and in fact we saw that the change in %ebx is proportional to the nfds
as passed to sys_poll.

now however for sys_ioctl:
it's first argument, fd (%ebx on passing) is never modifed in the code
nowhere is there an fd++ or similar, so again this 'example' of yours is
flawed.

gtx.
dvorak


P.S. i think my original was quite clear and INCLUDED example code that can
easily be checked by someone who reads asm, i attach an extra copy which
explains all the asm in there for easier reference.


Attachments:
(No filename) (3.78 kB)
reg-bug.c (2.42 kB)
Download all attachments

2002-09-19 18:22:18

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, Brian Gerst wrote:

> Richard B. Johnson wrote:
> > On Thu, 19 Sep 2002, Brian Gerst wrote:
> >>Richard B. Johnson wrote:
> >>>There is a bug in some other code. Try this. It will show
> >>>that ebx is not being killed in a syscall. You can prove
> >>>that this code works by changing ebx to eax, which will
> >>>get destroyed and print "Broken" before exit.
> >>
> >>The bug is only with _some_ syscalls, and getpid() is not one of them,
> >>so your example is flawed. It happens when a syscall modifies one of
> >>it's parameter values. The solution is to assign the parameter to a
> >>local variable before modifying it.
> >>
> >
> >
> > Well which one? Here is an ioctl(). It certainly modifies one
> > of its parameter values.
> >
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <sys/ioctl.h>
> > #include <termios.h>
> >
> > void barf(void);
> > void barf()
> > {
> > puts("Broken\n");
> > exit(0);
> > }
> > int main()
> > {
> > struct termios t;
> >
> > __asm__ __volatile__("movl $0xdeadface, %ebx\n");
> > (void)ioctl(0, TCGETS, &t);
> > (void)getpid();
> > __asm__ __volatile__("cmpl $0xdeadface, %ebx\n"
> > "jnz barf\n");
> >
> > return 0;
> > }
> >
> >
> > Until you can show the syscall that doesn't follow the correct
> > rules, then my example is not flawed. In fact a modified example can
> > be used to find any broken calls.
>
> Well the original poster gave one valid example: sys_poll(). We're not
> talking about it modifying userspace though a pointer. We're talking
> about it taking it's parameter on the kernel stack (which is really the
> pt_regs structure saved from user space) and modifying it. Which then
> gets restored to the user registers upon syscall exit.
>
> This is how the kernel stack looks like inside a syscall (x86):
> OLDSS
> OLDESP
> EFLAGS
> CS
> EIP
> ORIG_EAX
> ES
> DS
> EAX <- syscall number
> EBP <- syscall arg6
> EDI <- syscall arg5
> ESI <- syscall arg4
> EDX <- syscall arg3
> ECX <- syscall arg2
> EBX <- syscall arg1
> (return address)
> (local variables)
>
> Everything above the return address is the pt_regs struct that gets
> restored to user space. If the syscall modifies any of its args (*not
> memory pointed to by the args*), they get written back to the stack in
> the pt_regs area, and then get restored to userspace modified.
> Understand now?
>

Maybe. So, if the 'C' runtime library puts 0xdeadfeed into the ebx
register and executes a syscall, upon return from the syscall, this
value is no longer 0xdeadfeed? If this is true, then is the kernel
supposed to save the values of registers modified by user-code,
before calling the function? I expect that the 'C' runtime library
expects the index registers to be preserved and EBX is an index
register.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 18:26:12

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, Sep 19, 2002 at 02:04:43PM -0400, Brian Gerst wrote:
> Now that I've thought about it more, I think the best solution is to go
> through all the syscalls (a big job, I know), and declare the parameters
> as const, so that gcc knows it can't modify them, and will throw a
> warning if we try.

The parameter area belongs to the callee, and it may *always* be modified.


r~

2002-09-19 18:24:54

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, dvorak wrote:

> On Thu, Sep 19, 2002 at 01:22:35PM -0400, Richard B. Johnson wrote:
> > On Thu, 19 Sep 2002, Brian Gerst wrote:
> >
> > > Richard B. Johnson wrote:
> > > > On Thu, 19 Sep 2002, dvorak wrote:
> > > >
> > > >
> > > >>Hi,
> > > >>
> > > >>recently i came across a situation were on linux-i386 not only %eax was
> > > >>altered after a syscall but also %ebx. I tracked this problem down, to
> > > >>gcc re-using a variable passed to a function.
> > > >>
> > > >>This was found on a debian system with a 2.4.17 kernel compiled with gcc
> > > >>2.95.2 and verified on another system, kernel 2.4.18 compiled with 2.95.4
> > > >>Attached is small program to test for this 'bug'
> > > >>
> <SNIP part of the explanation>
>
> > > >>It seems that gcc in certain cases optimizes in such a way that it changes
> > > >>the variable ufds as placed on the stack directly. Which results in saved(ebx)
> > > >>being overwritten and thus in a changed %ebx on return from the system call.
> > > >>
> > > >
> > > >
> > > > The 'C' compiler must make room on the stack for any local
> > > > variables except register types. If it was doing as you state, you
> > > > couldn't even execute a "hello world" program. Further, the local
> > > > variables are after the return address. It would screw up the return
> > > > address and you'd go off into hyper-space upon return.
>
> The problem is it uses one of the _arguments_ passed to the function,
> that argument gets modified, normally this happens on a copy, but there
> is no 'garantue' that is doesn't modify the original argument as
> putted on the stack by the calling function.
>
> > > > No. Various 'C' implementers have standardized calling methods even
> > > > though it's not part of the 'C' standard. gcc and others assume that
> > > > a called procedure is not going to change any segments or index registers.
> > > > There are various optimization things, like "-fcaller-saves" where the
> > > > called procedure can destroy anything. You may be using something that
> > > > was wrongly compiled using that switch.
> This is not what happens here, what happens is that one of the _arguments_
> placed on the stack is being modified, normally a calling function discards
> these values after use (addl $0x10, %esp or similar) but in this case they
> are reused. (in the RESTORE_ALL call)
>
> > >
> > > The bug is only with _some_ syscalls, and getpid() is not one of them,
> > > so your example is flawed. It happens when a syscall modifies one of
> > > it's parameter values. The solution is to assign the parameter to a
> > > local variable before modifying it.
> > >
> and only with _some_ compiler + kernel combinations.
[SNIPPED...]

Okay. Thanks for the explaination.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 18:47:06

by Brian Gerst

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Richard Henderson wrote:
> On Thu, Sep 19, 2002 at 02:04:43PM -0400, Brian Gerst wrote:
>
>>Now that I've thought about it more, I think the best solution is to go
>>through all the syscalls (a big job, I know), and declare the parameters
>>as const, so that gcc knows it can't modify them, and will throw a
>>warning if we try.
>
>
> The parameter area belongs to the callee, and it may *always* be modified.
>
>
> r~
>

The parameters can not be modified if they are declared const though,
that's my point.

--
Brian Gerst

2002-09-19 18:53:04

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, Sep 19, 2002 at 02:51:44PM -0400, Brian Gerst wrote:
> > The parameter area belongs to the callee, and it may *always* be modified.
>
> The parameters can not be modified if they are declared const though,
> that's my point.

Yes they can.

extern void bar(int x, int y, int z);
void foo(const int a, const int b, const int c)
{
bar(a+1, b+1, c+1);
}

subl $12, %esp
movl 20(%esp), %eax
incl %eax
movl %eax, 20(%esp)
movl 16(%esp), %eax
incl %eax
incl 24(%esp)
movl %eax, 16(%esp)
addl $12, %esp
jmp bar

(Not sure why gcc doesn't use incl on all three memories, nor
should it allocate that stack frame...)


r~

2002-09-19 19:10:21

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, Brian Gerst wrote:

> Richard Henderson wrote:
> > On Thu, Sep 19, 2002 at 02:04:43PM -0400, Brian Gerst wrote:
> >
> >>Now that I've thought about it more, I think the best solution is to go
> >>through all the syscalls (a big job, I know), and declare the parameters
> >>as const, so that gcc knows it can't modify them, and will throw a
> >>warning if we try.
> >
> >
> > The parameter area belongs to the callee, and it may *always* be modified.
> >
> >
> > r~
> >
>
> The parameters can not be modified if they are declared const though,
> that's my point.

Yes. A temporary declaration change to compile the kernel and
see where it complains.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 19:19:49

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, Sep 19, 2002 at 02:04:43PM -0400, Brian Gerst wrote:
> Petr Vandrovec wrote:
> >On 19 Sep 02 at 13:22, Richard B. Johnson wrote:
> >
> >
> >>>>>A short snippet of sys_poll, with irrelavant data removed.
> >>>>>
> >>>>>sys_poll(struct pollfd *ufds, .. , ..) {
> >>>>> ...
> >>>>> ufds++;
> >>>>> ...
> >>>>
> >>Well which one? Here is an ioctl(). It certainly modifies one
> >>of its parameter values.
> >
> >
> >poll(), as was already noted. Program below should
> >print same value for B= and F=, but it reports f + 8*c instead
> >(where c = number of filedescriptors passed to poll).
> >
> >And you must call it from assembly, as your calls to getpid() or
> >ioctl() (or poll()) are wrapped in libc - and glibc's code begins with
> >push %ebx because of %ebx is used by -fPIC code.
> >
> >It is questinable whether we should try to not modify parameters
> >passed into functions. It is definitely nice behavior, but I think
> >that we should only guarantee that syscalls do not modify unused
> >registers.
> > Petr Vandrovec
> > [email protected]
>
> Now that I've thought about it more, I think the best solution is to go
> through all the syscalls (a big job, I know), and declare the parameters
> as const, so that gcc knows it can't modify them, and will throw a
> warning if we try.

That's not going to help. As Richard said, the memory in question
belongs to the called function. GCC knows this. It can freely modify
it. The fact that the value of the parameter is const is a
language-level, semantic thing. It doesn't say anything about the
const-ness of that memory. Only the ABI does.

--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer

2002-09-19 19:33:17

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, Richard Henderson wrote:

> On Thu, Sep 19, 2002 at 02:51:44PM -0400, Brian Gerst wrote:
> > > The parameter area belongs to the callee, and it may *always* be modified.
> >
> > The parameters can not be modified if they are declared const though,
> > that's my point.
>
> Yes they can.
>
> extern void bar(int x, int y, int z);
> void foo(const int a, const int b, const int c)
> {
> bar(a+1, b+1, c+1);
> }
>
> subl $12, %esp
> movl 20(%esp), %eax
> incl %eax
> movl %eax, 20(%esp)
> movl 16(%esp), %eax
> incl %eax
> incl 24(%esp)
> movl %eax, 16(%esp)
> addl $12, %esp
> jmp bar
>
> (Not sure why gcc doesn't use incl on all three memories, nor
> should it allocate that stack frame...)
>
>
> r~
>

Well it's not modifying those values. It's putting the
constant value into a register and modifying the value
in the register before calling a function that takes int.
Note that the parameter passed to the function, a, b, and c,
are local copies. gcc can whack those anyway it wants. In
fact, it does strange things above which may not be valid.
It subtracts an offset from esp for local variables ($12).
There aren't any local variables!. Therefore, it has to
access the passed parameters at their pushed offset + 12.
Then, after it's through mucking with them, it collapses
the local stack area (levels the stack), then jumps
to the called function. It will use the early 'call'
return-value to return to the caller.
It's really bad code because it could have done:

incl $0x04(%esp)
incl $0x08(%esp)
incl $0x1c(%esp)
jmp bar

Note that, in every case, the constant value was pushed onto the
stack and this function called. That copy of the constant value
can be trashed anyway the callee wants. It's his copy.


I thought you were going to do something like:

Script started on Thu Sep 19 15:22:05 2002
# cat zzz.c

int foo(const int a, const int b, const int c)
{
a += b;
a += c;
return a;
}
# gcc -c -o zzz zzz.c
zzz.c: In function `foo':
zzz.c:6: warning: assignment of read-only location
zzz.c:7: warning: assignment of read-only location
# exit
exit

Script done on Thu Sep 19 15:22:23 2002

Which makes gcc barf when you attempt to modify the
const value. This allows you to check if the code is
doing the wrong thing.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 19:36:32

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, Sep 19, 2002 at 03:40:52PM -0400, Richard B. Johnson wrote:
> Well it's not modifying those values.

It's not modifying "a", true, but it _is_ modifying the parameter
area. Which is exactly the kernel bug in question.

> It's really bad code because it could have done:
>
> incl $0x04(%esp)
> incl $0x08(%esp)
> incl $0x1c(%esp)
> jmp bar

Yes, I know.


r~

2002-09-19 19:45:38

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Thu, 19 Sep 2002, Richard Henderson wrote:

> On Thu, Sep 19, 2002 at 03:40:52PM -0400, Richard B. Johnson wrote:
> > Well it's not modifying those values.
>
> It's not modifying "a", true, but it _is_ modifying the parameter
> area. Which is exactly the kernel bug in question.
>

Yep. This can't be found by the compiler. The parameter area is
writable so it looks like somebody needs to do some 'code inspection'
and some additional testing.

> > It's really bad code because it could have done:
> >
> > incl $0x04(%esp)
> > incl $0x08(%esp)
> > incl $0x1c(%esp)
> > jmp bar
>
> Yes, I know.
>

It's a problem with a 'general purpose' compiler that wants to
be "all things" to all people. If somebody made a gcc-compatible
compiler, tuned to the ix86 characteristics, I think we could
cut the extra instructions by at least 1/2, maybe more.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-19 20:20:08

by Mikael Pettersson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Daniel Jacobowitz writes:
> That's not going to help. As Richard said, the memory in question
> belongs to the called function. GCC knows this. It can freely modify
> it. The fact that the value of the parameter is const is a
> language-level, semantic thing. It doesn't say anything about the
> const-ness of that memory. Only the ABI does.

Does Linux/x86 even have a proper ABI document? I've never seen one.
The closest I've seen would be the SVR4 i386 psABI, but it
deliberately doesn't define the raw syscall interface, only the
each-syscall-is-a-C-function one implemented by the C library,
and that interface doesn't suffer from the current issue.

IOW, the kernel may not be at fault if user-space code invokes int
$0x80 directly and then sees clobbered registers.

/Mikael

2002-09-19 22:41:21

by J.A. Magallon

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386


On 2002.09.19 Richard B. Johnson wrote:
>On Thu, 19 Sep 2002, Richard Henderson wrote:
>
[...]
>> > It's really bad code because it could have done:
>> >
>> > incl $0x04(%esp)
>> > incl $0x08(%esp)
>> > incl $0x1c(%esp)
>> > jmp bar
>>
[...]
>
>It's a problem with a 'general purpose' compiler that wants to
>be "all things" to all people. If somebody made a gcc-compatible
>compiler, tuned to the ix86 characteristics, I think we could
>cut the extra instructions by at least 1/2, maybe more.
>

Curiosity killed the cat....
Just tried it with gcc-3.2.
C code:
extern void bar(int x, int y, int z);
void foo(const int a, const int b, const int c)
{
bar(a+1, b+1, c+1);
}

- gcc -S -O0:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
subl $4, %esp
movl 16(%ebp), %eax
incl %eax
pushl %eax
movl 12(%ebp), %eax
incl %eax
pushl %eax
movl 8(%ebp), %eax
incl %eax
pushl %eax
call bar
addl $16, %esp
leave
ret

- gcc -S -O1:
pushl %ebp
movl %esp, %ebp
subl $12, %esp
movl 16(%ebp), %eax
incl %eax
pushl %eax
movl 12(%ebp), %eax
incl %eax
pushl %eax
movl 8(%ebp), %eax
incl %eax
pushl %eax
call bar
addl $16, %esp
movl %ebp, %esp
popl %ebp
ret

- gcc -S -O2:
movl 12(%esp), %eax
incl %eax
movl %eax, 12(%esp)
movl 8(%esp), %eax
incl %eax
movl %eax, 8(%esp)
movl 4(%esp), %eax
incl %eax
movl %eax, 4(%esp)
jmp bar

- gcc -S -O2 -march=[i686,pentium2,pentium3]:
incl 4(%esp)
movl 8(%esp), %eax
incl %eax
movl %eax, 8(%esp)
movl 12(%esp), %eax
incl %eax
movl %eax, 12(%esp)
jmp bar

- gcc -S -O2 -march=pentium4:
movl 8(%esp), %eax
addl $1, 4(%esp)
addl $1, %eax
movl %eax, 8(%esp)
movl 12(%esp), %eax
addl $1, %eax
movl %eax, 12(%esp)
jmp bar

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.0 (Cooker) for i586
Linux 2.4.20-pre7-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-1mdk))

2002-09-20 08:27:35

by George Anzinger

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Mikael Pettersson wrote:
>
> Daniel Jacobowitz writes:
> > That's not going to help. As Richard said, the memory in question
> > belongs to the called function. GCC knows this. It can freely modify
> > it. The fact that the value of the parameter is const is a
> > language-level, semantic thing. It doesn't say anything about the
> > const-ness of that memory. Only the ABI does.
>
> Does Linux/x86 even have a proper ABI document? I've never seen one.
> The closest I've seen would be the SVR4 i386 psABI, but it
> deliberately doesn't define the raw syscall interface, only the
> each-syscall-is-a-C-function one implemented by the C library,
> and that interface doesn't suffer from the current issue.
>
> IOW, the kernel may not be at fault if user-space code invokes int
> $0x80 directly and then sees clobbered registers.

Ah, that, indeed is the issue. As far as C is concerned,
the call is NOT a call, but a bit of asm. If the asm is
correctly written the problem goes away, not because the
register is not modified, but because C is on notice that it
MIGHT be modified and thus not to count on it.

As a practical matter, ebx is used to pass arg1 to the
kernel so it must be changed by the asm code, the further
listing of it beyond the third ":" in the asm inline, will
cause the compiler to not rely on it being further
modified. The same is true of all the registers used to
pass parameters. (These are: arg1 ebx, arg2 ecx, arg3 edx,
arg4 esi, arg5 edi, and arg6 ebp.)

So, is there a problem? Yes, neither the call stub macros
in asm/unistd.h nor those in glibc bother to list the used
registers beyond the third ":". And, if I understand this
right, the glibc code to save ebx in another register
suffers from the false assumption that THAT register can be
clobbered, but this is only true if C sees the code as a
function, not an inline asm, but most system calls in glibc
are coded as inline asm, not separate functions (not to be
confused with the C inline, which is a separate function).

At least that is how I see it. Comments?

-g

>
> /Mikael
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-09-20 12:21:25

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Fri, 20 Sep 2002, J.A. Magallon wrote:

>
> On 2002.09.19 Richard B. Johnson wrote:
> >On Thu, 19 Sep 2002, Richard Henderson wrote:
> >
> [...]
> >> > It's really bad code because it could have done:
> >> >
> >> > incl $0x04(%esp)
> >> > incl $0x08(%esp)
> >> > incl $0x1c(%esp)
> >> > jmp bar
> >>
> [...]
> >
> >It's a problem with a 'general purpose' compiler that wants to
> >be "all things" to all people. If somebody made a gcc-compatible
> >compiler, tuned to the ix86 characteristics, I think we could
> >cut the extra instructions by at least 1/2, maybe more.
> >
>
> Curiosity killed the cat....
> Just tried it with gcc-3.2.
> C code:
> extern void bar(int x, int y, int z);
> void foo(const int a, const int b, const int c)
> {
> bar(a+1, b+1, c+1);
> }
>
> - gcc -S -O0:
> pushl %ebp
> movl %esp, %ebp
> subl $8, %esp
> subl $4, %esp
> movl 16(%ebp), %eax
> incl %eax
> pushl %eax
> movl 12(%ebp), %eax
> incl %eax
> pushl %eax
> movl 8(%ebp), %eax
> incl %eax
> pushl %eax
> call bar
> addl $16, %esp
> leave
> ret
>
> - gcc -S -O1:
> pushl %ebp
> movl %esp, %ebp
> subl $12, %esp
> movl 16(%ebp), %eax
> incl %eax
> pushl %eax
> movl 12(%ebp), %eax
> incl %eax
> pushl %eax
> movl 8(%ebp), %eax
> incl %eax
> pushl %eax
> call bar
> addl $16, %esp
> movl %ebp, %esp
> popl %ebp
> ret
>
> - gcc -S -O2:
> movl 12(%esp), %eax
> incl %eax
> movl %eax, 12(%esp)
> movl 8(%esp), %eax
> incl %eax
> movl %eax, 8(%esp)
> movl 4(%esp), %eax
> incl %eax
> movl %eax, 4(%esp)
> jmp bar
>
> - gcc -S -O2 -march=[i686,pentium2,pentium3]:
> incl 4(%esp)
> movl 8(%esp), %eax
> incl %eax
> movl %eax, 8(%esp)
> movl 12(%esp), %eax
> incl %eax
> movl %eax, 12(%esp)
> jmp bar
>
> - gcc -S -O2 -march=pentium4:
> movl 8(%esp), %eax
> addl $1, 4(%esp)
> addl $1, %eax
> movl %eax, 8(%esp)
> movl 12(%esp), %eax
> addl $1, %eax
> movl %eax, 12(%esp)
> jmp bar
>
> --
> J.A. Magallon <[email protected]> \ Software is like sex:
> werewolf.able.es \ It's better when it's free
> Mandrake Linux release 9.0 (Cooker) for i586
> Linux 2.4.20-pre7-jam0 (gcc 3.2 (Mandrake Linux 9.0 3.2-1mdk))
>

Notice that it always gets some value from memory, modifies it,
then writes it back. Adding 1 to %eax is plain dumb. Those instructions
have to be fetched! Any instruction that's longer than the constant
long-word in that instruction should be reviewed. Also that 1 is
4 bytes long. It has a single-byte oprand. That means the next instruction
fetch will be at an odd address if it started on even because that
sequence is 5 bytes in length.


.if 0
You can assemble this directly .....

You know there are continuous complaints about
ix86 processors being "register starved", but somehow
the 'C' compilers often don't use the capabilities that
are available with the processors. The following is some
'code' that will assemble. It doesn't do anything useful,
but shows some addressing capability that is often ignored.
.endif


foo: .long 0

bar: incl (foo) # Bump the value of foo directly
addl %eax,(foo) # Add eax to value in foo
addl $0x10,(foo) # Add constant to value in foo
addl (foo),%eax # Add value in foo to eax
pushl (foo) # Put value in foo onto stack
popl (foo) # Pop value on stack into foo
movl %eax, foo(%ebx) # Put eax value into memory at foo + ebx
incb (foo) # This is atomic, no lock required
movl 14(%esp, %ebx), %eax # Get value from stack at offset
# ESP + EBX (good for local arrays)

.if 0

Most of the gcc code that deals with memory oprands, gets a value
from memory, modifies it, then writes it back. This is a "throw-back"
from processors that only have load and store operations. The ix86
processors can directly modify a single bit, anywhere in memory without
having to put it into a register. Of course, what the hardware
physically does may be quite another thing altogether. But I suggest
that the CPU/Hardware combination is more capable of doing the right
thing in executing the binary than any compiler that forces a load
into a register, modification of register contents, then a write
back to memory.

Timing tests with rdtsc show many cycles are often wasted with these forced
load and store operations.

.endif



Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-20 17:11:52

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Fri, Sep 20, 2002 at 08:27:32AM -0400, Richard B. Johnson wrote:
> Adding 1 to %eax is plain dumb.

No it isn't. P4 has a partial register stall on the
flags register when using incl. You'll notice that
we *do* use incl except when optimizing for P4.

> Also that 1 is 4 bytes long.

No it isn't. There is an 8-bit signed immediate form.

As for the rest of the memory operand rant, the problem
is not that gcc won't try to use memory operands, it's
that the bit of code that's supposed to put these
memory operands back together is like 10 years old and
hasn't been taught about the memory aliasing subsystem.
So any time it sees a memory load cross a memory store,
it gives up.

Perhaps I'll have this fixed for gcc 3.4.



r~

2002-09-21 06:15:09

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Fri, Sep 20, 2002 at 01:32:05AM -0700, george anzinger wrote:
> So, is there a problem? Yes, neither the call stub macros
> in asm/unistd.h nor those in glibc bother to list the used
> registers beyond the third ":".

No, this is not the real problem. The real problem is that if
the program receives a signal during a system call, the kernel
will return all the way up to entry.S, deliver the signal and
then restart the syscall.

Except the syscall will restart with the corrupted registers.

Hilarity ensues.



r~

2002-09-21 08:04:35

by George Anzinger

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Richard Henderson wrote:
>
> On Fri, Sep 20, 2002 at 01:32:05AM -0700, george anzinger wrote:
> > So, is there a problem? Yes, neither the call stub macros
> > in asm/unistd.h nor those in glibc bother to list the used
> > registers beyond the third ":".
>
> No, this is not the real problem. The real problem is that if
> the program receives a signal during a system call, the kernel
> will return all the way up to entry.S, deliver the signal and
> then restart the syscall.
>
> Except the syscall will restart with the corrupted registers.
>
> Hilarity ensues.
>
I submit that BOTH of these are problems. And only the
kernel can fix the latter.

-g
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-09-21 15:03:18

by Richard Henderson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Sat, Sep 21, 2002 at 01:09:12AM -0700, george anzinger wrote:
> > Except the syscall will restart with the corrupted registers.
> >
> > Hilarity ensues.
> >
> I submit that BOTH of these are problems. And only the
> kernel can fix the latter.

If the later is fixed, so is the former.


r~

2002-09-22 20:47:05

by Pavel Machek

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Hi!

> It's a problem with a 'general purpose' compiler that wants to
> be "all things" to all people. If somebody made a gcc-compatible
> compiler, tuned to the ix86 characteristics, I think we could
> cut the extra instructions by at least 1/2, maybe more.

Remember pgcc?

And btw cutting instructions by 1/2might look nice but unless you can
keep it as fast as it was, its useless.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-09-23 13:04:39

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

On Sun, 22 Sep 2002, Pavel Machek wrote:

> Hi!
>
> > It's a problem with a 'general purpose' compiler that wants to
> > be "all things" to all people. If somebody made a gcc-compatible
> > compiler, tuned to the ix86 characteristics, I think we could
> > cut the extra instructions by at least 1/2, maybe more.
>
> Remember pgcc?
>
> And btw cutting instructions by 1/2might look nice but unless you can
> keep it as fast as it was, its useless.
> Pavel
> --
Yes, but to see the affect of cutting down the instruction length, you
need to make benchmarks that emulate running 'forever'. Many bench-
marks access some memory over-and-over again in a loop. This does
not exercise the need to refill prefetch so the benchmarks ignore
the advantages obtained by reducing the amount of instructions needed
to be fetched from memory.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-23 18:28:32

by Pavel Machek

[permalink] [raw]
Subject: Re: Syscall changes registers beyond %eax, on linux-i386

Hi!

> > > It's a problem with a 'general purpose' compiler that wants to
> > > be "all things" to all people. If somebody made a gcc-compatible
> > > compiler, tuned to the ix86 characteristics, I think we could
> > > cut the extra instructions by at least 1/2, maybe more.
> >
> > Remember pgcc?
> >
> > And btw cutting instructions by 1/2might look nice but unless you can
> > keep it as fast as it was, its useless.
> > Pavel
> > --
> Yes, but to see the affect of cutting down the instruction length, you
> need to make benchmarks that emulate running 'forever'. Many bench-

Specs contain things like perl and gcc, those are I believe far too
big to be put entirely into cache and emulate "Real Life" quite
well...

Pavel
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.