Hi,
I'm just looking into a very strange problem. Some of our systems have
athlon64 CPUs. Due to our diskless nfs environment we currently still prefer
a 32bit userspace environment, but would like to be able to use a 64-bit
chroot environment.
Well, currently there seems to be a stat64() NFS problem when a x86_64 kernel
is booted and stat64() comes from a 32bit libc.
Here's just an example:
hitchcock:/home/bernd/src/tests# ./test_stat64 /mnt/test/yp
stat() works fine.
hitchcock:/home/bernd/src/tests# ./test_stat32 /mnt/test/yp
stat for /mnt/test/yp failed
The test program looks rather simple:
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
char *dir;
struct stat buf;
dir = argv[1];
if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);
else
fprintf(stderr, "stat() works fine.\n");
return (0);
}
Here are the strace outputs:
=====================
32bit:
------
hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x80ad000
brk(0x80ce000) = 0x80ce000
stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
) = 30
exit_group(0) = ?
64bit:
-------
hitchcock:/home/bernd/src/tests# strace ./test_stat64 /mnt/test/yp
execve("./test_stat64", ["./test_stat64", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x572000
brk(0x593000) = 0x593000
stat("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat() works fine.\n", 19stat() works fine.
) = 19
_exit(0) = ?
Anyone having an idea whats going on? The ethereal capture also looks pretty
normal. The kernel of this system is 2.6.9, but it also happens on another
system with 2.6.11-rc5.
As usual we are using unfs3 for /etc and /var, but for me that looks like a
client problem. I'm even not sure if this is limited to NFS at all.
Thanks in advance,
Bernd
> As usual we are using unfs3 for /etc and /var, but for me that looks like a
> client problem. I'm even not sure if this is limited to NFS at all.
Sorry, that was easy to test, of course. This problem doesn't seem to exist on
a local disk.
> 32bit:
> ------
> hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
> execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
> 0
> uname({sys="Linux", node="hitchcock", ...}) = 0
> brk(0) = 0x80ad000
> brk(0x80ce000) = 0x80ce000
> stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
It returns 0 which is success. How can it match this code?
if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);
It is most likely some kind of user space problem. I would change
it to int err = stat(dir, &buf);
and then go through it with gdb and see what value err gets assigned.
I cannot see any kernel problem.
> write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
> ) = 30
> exit_group(0) = ?
-Andi
Hello Andi,
sorry, due to some mail sending/refusing problems, I had to resend to t=
he=20
nfs-list, which prevented the answers there to be posted to the other C=
Cs.
> It is most likely some kind of user space problem. I would change
> it to int err =3D stat(dir, &buf);
> and then go through it with gdb and see what value err gets assigned.
>
> I cannot see any kernel problem.
The err value will become -1 here.
Trond Myklebust already suggested to look at the results of errno:
On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
> On Monday 28 February 2005 23:26, you wrote:
> > Given that strace shows that both syscalls (stat64() and stat())
> > succeed, I expect the "problem" is probably just glibc setting an
> > EOVERFLOW error in the 32-bit case. That's what it is supposed to d=
o if
> > a 64 bit value overflows the 32-bit buffers.
>
> Right, thanks.
>
> > Have you tried looking at errno?
>
> bernd@hitchcock tests>./test_stat32 /mnt/test/yp
> stat for /mnt/test/yp failed
> ernno: 75 (Value too large for defined data type)
>
> But why does stat64() on a 64-bit kernel tries to fill in larger data=
than
> on a 32-bit kernel and larger data also only for nfs-mount points? Hm=
m, I
> will tomorrow compare the tcp-packges sent by the server.
So I still think thats a kernel bug.
Thanks,
Bernd
--=20
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]