2005-02-28 20:54:03

by Bernd Schubert

[permalink] [raw]
Subject: x86_64: 32bit emulation problems

Hi,

I'm just looking into a very strange problem. Some of our systems have
athlon64 CPUs. Due to our diskless nfs environment we currently still prefer
a 32bit userspace environment, but would like to be able to use a 64-bit
chroot environment.

Well, currently there seems to be a stat64() NFS problem when a x86_64 kernel
is booted and stat64() comes from a 32bit libc.

Here's just an example:

hitchcock:/home/bernd/src/tests# ./test_stat64 /mnt/test/yp
stat() works fine.


hitchcock:/home/bernd/src/tests# ./test_stat32 /mnt/test/yp
stat for /mnt/test/yp failed


The test program looks rather simple:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>


int main(int argc, char **argv)
{
char *dir;
struct stat buf;

dir = argv[1];

if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);
else
fprintf(stderr, "stat() works fine.\n");
return (0);
}


Here are the strace outputs:
=====================

32bit:
------
hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x80ad000
brk(0x80ce000) = 0x80ce000
stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
) = 30
exit_group(0) = ?

64bit:
-------
hitchcock:/home/bernd/src/tests# strace ./test_stat64 /mnt/test/yp
execve("./test_stat64", ["./test_stat64", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x572000
brk(0x593000) = 0x593000
stat("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat() works fine.\n", 19stat() works fine.
) = 19
_exit(0) = ?



Anyone having an idea whats going on? The ethereal capture also looks pretty
normal. The kernel of this system is 2.6.9, but it also happens on another
system with 2.6.11-rc5.
As usual we are using unfs3 for /etc and /var, but for me that looks like a
client problem. I'm even not sure if this is limited to NFS at all.


Thanks in advance,
Bernd


2005-02-28 21:00:11

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems


> As usual we are using unfs3 for /etc and /var, but for me that looks like a
> client problem. I'm even not sure if this is limited to NFS at all.

Sorry, that was easy to test, of course. This problem doesn't seem to exist on
a local disk.

2005-03-01 20:24:17

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

> 32bit:
> ------
> hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
> execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
> 0
> uname({sys="Linux", node="hitchcock", ...}) = 0
> brk(0) = 0x80ad000
> brk(0x80ce000) = 0x80ce000
> stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0

It returns 0 which is success. How can it match this code?

if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);

It is most likely some kind of user space problem. I would change
it to int err = stat(dir, &buf);
and then go through it with gdb and see what value err gets assigned.

I cannot see any kernel problem.

> write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
> ) = 30
> exit_group(0) = ?

-Andi

2005-03-01 21:07:01

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Hello Andi,

sorry, due to some mail sending/refusing problems, I had to resend to t=
he=20
nfs-list, which prevented the answers there to be posted to the other C=
Cs.

> It is most likely some kind of user space problem. I would change
> it to int err =3D stat(dir, &buf);
> and then go through it with gdb and see what value err gets assigned.
>
> I cannot see any kernel problem.

The err value will become -1 here.

Trond Myklebust already suggested to look at the results of errno:

On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
> On Monday 28 February 2005 23:26, you wrote:
> > Given that strace shows that both syscalls (stat64() and stat())
> > succeed, I expect the "problem" is probably just glibc setting an
> > EOVERFLOW error in the 32-bit case. That's what it is supposed to d=
o if
> > a 64 bit value overflows the 32-bit buffers.
>
> Right, thanks.
>
> > Have you tried looking at errno?
>
> bernd@hitchcock tests>./test_stat32 /mnt/test/yp
> stat for /mnt/test/yp failed
> ernno: 75 (Value too large for defined data type)
>
> But why does stat64() on a 64-bit kernel tries to fill in larger data=
than
> on a 32-bit kernel and larger data also only for nfs-mount points? Hm=
m, I
> will tomorrow compare the tcp-packges sent by the server.

So I still think thats a kernel bug.


Thanks,
Bernd

--=20
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]