2005-02-28 20:55:07

by Bernd Schubert

[permalink] [raw]
Subject: x86_64: 32bit emulation problems

Hi,

I'm just looking into a very strange problem. Some of our systems have
athlon64 CPUs. Due to our diskless nfs environment we currently still prefer
a 32bit userspace environment, but would like to be able to use a 64-bit
chroot environment.

Well, currently there seems to be a stat64() NFS problem when a x86_64 kernel
is booted and stat64() comes from a 32bit libc.

Here's just an example:

hitchcock:/home/bernd/src/tests# ./test_stat64 /mnt/test/yp
stat() works fine.


hitchcock:/home/bernd/src/tests# ./test_stat32 /mnt/test/yp
stat for /mnt/test/yp failed


The test program looks rather simple:

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>


int main(int argc, char **argv)
{
char *dir;
struct stat buf;

dir = argv[1];

if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);
else
fprintf(stderr, "stat() works fine.\n");
return (0);
}


Here are the strace outputs:
=====================

32bit:
------
hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x80ad000
brk(0x80ce000) = 0x80ce000
stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
) = 30
exit_group(0) = ?

64bit:
-------
hitchcock:/home/bernd/src/tests# strace ./test_stat64 /mnt/test/yp
execve("./test_stat64", ["./test_stat64", "/mnt/test/yp"], [/* 39 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x572000
brk(0x593000) = 0x593000
stat("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "stat() works fine.\n", 19stat() works fine.
) = 19
_exit(0) = ?



Anyone having an idea whats going on? The ethereal capture also looks pretty
normal. The kernel of this system is 2.6.9, but it also happens on another
system with 2.6.11-rc5.
As usual we are using unfs3 for /etc and /var, but for me that looks like a
client problem. I'm even not sure if this is limited to NFS at all.


Thanks in advance,
Bernd


2005-02-28 21:04:54

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems


> As usual we are using unfs3 for /etc and /var, but for me that looks like a
> client problem. I'm even not sure if this is limited to NFS at all.

Sorry, that was easy to test, of course. This problem doesn't seem to exist on
a local disk.

2005-03-01 20:24:23

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

> 32bit:
> ------
> hitchcock:/home/bernd/src/tests# strace32 ./test_stat32 /mnt/test/yp
> execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 39 vars */]) =
> 0
> uname({sys="Linux", node="hitchcock", ...}) = 0
> brk(0) = 0x80ad000
> brk(0x80ce000) = 0x80ce000
> stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0

It returns 0 which is success. How can it match this code?

if (stat (dir, &buf) == -1)
fprintf(stderr, "stat for %s failed \n", dir);

It is most likely some kind of user space problem. I would change
it to int err = stat(dir, &buf);
and then go through it with gdb and see what value err gets assigned.

I cannot see any kernel problem.

> write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
> ) = 30
> exit_group(0) = ?

-Andi

2005-03-01 21:07:49

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Hello Andi,

sorry, due to some mail sending/refusing problems, I had to resend to the
nfs-list, which prevented the answers there to be posted to the other CCs.

> It is most likely some kind of user space problem. I would change
> it to int err = stat(dir, &buf);
> and then go through it with gdb and see what value err gets assigned.
>
> I cannot see any kernel problem.

The err value will become -1 here.

Trond Myklebust already suggested to look at the results of errno:

On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
> On Monday 28 February 2005 23:26, you wrote:
> > Given that strace shows that both syscalls (stat64() and stat())
> > succeed, I expect the "problem" is probably just glibc setting an
> > EOVERFLOW error in the 32-bit case. That's what it is supposed to do if
> > a 64 bit value overflows the 32-bit buffers.
>
> Right, thanks.
>
> > Have you tried looking at errno?
>
> bernd@hitchcock tests>./test_stat32 /mnt/test/yp
> stat for /mnt/test/yp failed
> ernno: 75 (Value too large for defined data type)
>
> But why does stat64() on a 64-bit kernel tries to fill in larger data than
> on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
> will tomorrow compare the tcp-packges sent by the server.

So I still think thats a kernel bug.


Thanks,
Bernd

--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit?t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]

2005-03-01 21:48:36

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Tue, Mar 01, 2005 at 10:07:01PM +0100, Bernd Schubert wrote:
> Hello Andi,
>
> sorry, due to some mail sending/refusing problems, I had to resend to the
> nfs-list, which prevented the answers there to be posted to the other CCs.
>
> > It is most likely some kind of user space problem. I would change
> > it to int err = stat(dir, &buf);
> > and then go through it with gdb and see what value err gets assigned.
> >
> > I cannot see any kernel problem.
>
> The err value will become -1 here.

strace didn't say so, and normally it doesn't lie about things like this.

> > bernd@hitchcock tests>./test_stat32 /mnt/test/yp
> > stat for /mnt/test/yp failed
> > ernno: 75 (Value too large for defined data type)

errno is undefined unless a system call returned -1 before or
you set it to 0 before.

> > But why does stat64() on a 64-bit kernel tries to fill in larger data than

A 64bit kernel has no stat64(). All stats are 64bit.

> > on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
> > will tomorrow compare the tcp-packges sent by the server.
>
> So I still think thats a kernel bug.

Your data so far doesn't support this assertion.

-Andi

2005-03-01 22:10:59

by Andreas Schwab

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Bernd Schubert <[email protected]> writes:

>> It is most likely some kind of user space problem. I would change
>> it to int err = stat(dir, &buf);
>> and then go through it with gdb and see what value err gets assigned.
>>
>> I cannot see any kernel problem.
>
> The err value will become -1 here.

That's because there are some values in the stat64 buffer delivered by the
kernel which cannot be packed into the stat buffer that you pass to stat.
Use stat64 or _FILE_OFFSET_BITS=64.

> Trond Myklebust already suggested to look at the results of errno:
>
> On Tuesday 01 March 2005 00:43, Bernd Schubert wrote:
>> On Monday 28 February 2005 23:26, you wrote:
>> > Given that strace shows that both syscalls (stat64() and stat())
>> > succeed,

The trace does not say anything about the user-level stat().

>> bernd@hitchcock tests>./test_stat32 /mnt/test/yp
>> stat for /mnt/test/yp failed
>> ernno: 75 (Value too large for defined data type)
>>
>> But why does stat64() on a 64-bit kernel tries to fill in larger data than
>> on a 32-bit kernel and larger data also only for nfs-mount points? Hmm, I
>> will tomorrow compare the tcp-packges sent by the server.
>
> So I still think thats a kernel bug.

This has nothing to do with the kernel.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-03-01 22:20:21

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Tue, Mar 01, 2005 at 11:10:38PM +0100, Andreas Schwab wrote:
> That's because there are some values in the stat64 buffer delivered by the
> kernel which cannot be packed into the stat buffer that you pass to stat.
> Use stat64 or _FILE_OFFSET_BITS=64.

If that had been the case strace would have reported EOVERFLOW
or E2BIG. But it returned 0 according to the log that was posted.

-Andi

2005-03-01 22:33:16

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

> strace didn't say so, and normally it doesn't lie about things like this.

Well, I show you the updated source code and strace output and if you still
don't believe me, ask me for a login to our system ;)


#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>


int main(int argc, char **argv)
{
char *dir;
struct stat *buf;
int err;

dir = argv[1];

buf = malloc(sizeof(struct stat));

errno = 0;

err = stat(dir, buf);
if ( err ) {
fprintf(stderr, "err = %i\n", err);
fprintf(stderr, "stat for %s failed \n", dir);
fprintf(stderr, "ernno: %i (%s)\n", errno, strerror(errno));
} else
fprintf(stderr, "stat() works fine.\n");

return (0);
}


>
> > > bernd@hitchcock tests>./test_stat32 /mnt/test/yp
> > > stat for /mnt/test/yp failed
> > > ernno: 75 (Value too large for defined data type)
>
> errno is undefined unless a system call returned -1 before or
> you set it to 0 before.

See above.

>
> > > But why does stat64() on a 64-bit kernel tries to fill in larger data
> > > than
>
> A 64bit kernel has no stat64(). All stats are 64bit.

bernd@hitchcock tests>strace32 ./test_stat32 /mnt/test/yp
execve("./test_stat32", ["./test_stat32", "/mnt/test/yp"], [/* 43 vars */]) =
0
uname({sys="Linux", node="hitchcock", ...}) = 0
brk(0) = 0x80ad000
brk(0x80ce000) = 0x80ce000
stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0
write(2, "err = -1\n", 9err = -1
) = 9
write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
) = 30
write(2, "ernno: 75 (Value too large for d"..., 50ernno: 75 (Value too large
for defined data type)
) = 50
exit_group(0) = ?

You certainly know much better than me, but I think strace shows that its
calling stat64.

>
> > > on a 32-bit kernel and larger data also only for nfs-mount points? Hmm,
> > > I will tomorrow compare the tcp-packges sent by the server.
> >
> > So I still think thats a kernel bug.
>
> Your data so far doesn't support this assertion.

I have to admit that knfsd-mount moints are not affected, but on the other
hand, I really cant't see anything in the ethereal captures. If someone
should be interested, I have uploaded them:

http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/nfs-stat/


Cheers,
Bernd


--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit?t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]

2005-03-01 23:08:07

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

> stat64("/mnt/test/yp", {st_mode=S_IFDIR|0755, st_size=2704, ...}) = 0

It returns 0. No error. Someone else in user space must be adding the EOVERFLOW.
glibc code does quite a lot of strange things with stat, perhaps
it comes from there.

> write(2, "err = -1\n", 9err = -1
> ) = 9
> write(2, "stat for /mnt/test/yp failed \n", 30stat for /mnt/test/yp failed
> ) = 30
> write(2, "ernno: 75 (Value too large for d"..., 50ernno: 75 (Value too large
> for defined data type)
> ) = 50
> exit_group(0) = ?

-Andi

2005-03-01 23:19:45

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Tuesday 01 March 2005 23:10, Andreas Schwab wrote:
> Bernd Schubert <[email protected]> writes:
> >> It is most likely some kind of user space problem. I would change
> >> it to int err = stat(dir, &buf);
> >> and then go through it with gdb and see what value err gets assigned.
> >>
> >> I cannot see any kernel problem.
> >
> > The err value will become -1 here.
>
> That's because there are some values in the stat64 buffer delivered by the
> kernel which cannot be packed into the stat buffer that you pass to stat.
> Use stat64 or _FILE_OFFSET_BITS=64.

Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
it work without this option on a 32bit kernel, but not on a 64bit kernel?

32bit kernel, 32bit binary: always works
64bit kernel, 64bit binary: always works

64bit kernel, 32bit binary:
- always works on knfsd mount points
- always works with -D_FILE_OFFSET_BITS=64
- only works on unfs3 mount points with _FILE_OFFSET_BITS=64


Do I really have to write a bug report for every single debian package that
access /etc and /var to make the maintainers recompile it with
-D_FILE_OFFSET_BITS=64?
Btw, whats about Suse, are there all packages compiled with this option? ;)


Cheers,
(a completely confused) Bernd

2005-03-01 23:22:13

by Andreas Schwab

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Andi Kleen <[email protected]> writes:

> On Tue, Mar 01, 2005 at 11:10:38PM +0100, Andreas Schwab wrote:
>> That's because there are some values in the stat64 buffer delivered by the
>> kernel which cannot be packed into the stat buffer that you pass to stat.
>> Use stat64 or _FILE_OFFSET_BITS=64.
>
> If that had been the case strace would have reported EOVERFLOW
> or E2BIG.

No, the values are ok for stat64.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-03-01 23:41:20

by Andreas Schwab

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Bernd Schubert <[email protected]> writes:

> Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
> it work without this option on a 32bit kernel, but not on a 64bit kernel?

Most likely the inode number (which is the only non-filesize related item
that is different between struct stat and struct stat64) overflows ino_t.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-03-01 23:46:46

by Andreas Schwab

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

Bernd Schubert <[email protected]> writes:

> Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
> it work without this option on a 32bit kernel, but not on a 64bit kernel?

See nfs_fileid_to_ino_t for why the inode number is different between
32bit and 64bit kernels.

Andreas.

--
Andreas Schwab, SuSE Labs, [email protected]
SuSE Linux Products GmbH, Maxfeldstra?e 5, 90409 N?rnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

2005-03-02 08:19:06

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Wed, Mar 02, 2005 at 12:46:23AM +0100, Andreas Schwab wrote:
> Bernd Schubert <[email protected]> writes:
>
> > Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
> > it work without this option on a 32bit kernel, but not on a 64bit kernel?
>
> See nfs_fileid_to_ino_t for why the inode number is different between
> 32bit and 64bit kernels.

Ok that explains it. Thanks.

Best would be probably to just do the shift unconditionally on 64bit kernels
too.

Trond, what do you think?

-Andi

2005-03-02 09:13:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

on den 02.03.2005 Klokka 09:18 (+0100) skreiv Andi Kleen:
> On Wed, Mar 02, 2005 at 12:46:23AM +0100, Andreas Schwab wrote:
> > Bernd Schubert <[email protected]> writes:
> >
> > > Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
> > > it work without this option on a 32bit kernel, but not on a 64bit kernel?
> >
> > See nfs_fileid_to_ino_t for why the inode number is different between
> > 32bit and 64bit kernels.
>
> Ok that explains it. Thanks.
>
> Best would be probably to just do the shift unconditionally on 64bit kernels
> too.
>
> Trond, what do you think?

Why would this be more appropriate than defining __kernel_ino_t on the
x86_64 platform to be of the size that you actually want the kernel to
support?

I can see no good reason for truncating inode number values on platforms
that actually do support 64-bit inode numbers, but I can see several
reasons why you might want not to (utilities that need to detect hard
linked files for instance).

Cheers,
Trond
--
Trond Myklebust <[email protected]>

2005-03-02 11:35:13

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Wednesday 02 March 2005 10:13, Trond Myklebust wrote:
> on den 02.03.2005 Klokka 09:18 (+0100) skreiv Andi Kleen:
> > On Wed, Mar 02, 2005 at 12:46:23AM +0100, Andreas Schwab wrote:
> > > Bernd Schubert <[email protected]> writes:
> > > > Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But
> > > > why does it work without this option on a 32bit kernel, but not on a
> > > > 64bit kernel?
> > >
> > > See nfs_fileid_to_ino_t for why the inode number is different between
> > > 32bit and 64bit kernels.
> >
> > Ok that explains it. Thanks.

Many thanks also from me!

> >
> > Best would be probably to just do the shift unconditionally on 64bit
> > kernels too.
> >
> > Trond, what do you think?
>
> Why would this be more appropriate than defining __kernel_ino_t on the
> x86_64 platform to be of the size that you actually want the kernel to
> support?
>
> I can see no good reason for truncating inode number values on platforms
> that actually do support 64-bit inode numbers, but I can see several

Well, at least we would have a reason ;)

> reasons why you might want not to (utilities that need to detect hard
> linked files for instance).

Anyway, glibc already seems to have a condition for that, so IMHO glibc also
could truncate the inode numbers if needed. And finally glibc probably knows
best if its compiled as 32bit or 64bit. Will take a look into the glibc
sources.

Many, many thanks to all for their help!

Best wishes,
Bernd

2005-03-02 16:58:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

on den 02.03.2005 Klokka 12:33 (+0100) skreiv Bernd Schubert:

> > I can see no good reason for truncating inode number values on platforms
> > that actually do support 64-bit inode numbers, but I can see several
>
> Well, at least we would have a reason ;)

A 32-bit emulation mode is clearly a "platform" which does NOT support
64-bit inode numbers, however there is (currently) no way for the kernel
to detect that you are running that. Any extra truncation should
therefore ideally be done by the emulation layer rather than the kernel
itself.

Cheers,
Trond
--
Trond Myklebust <[email protected]>

2005-03-02 18:21:17

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Wednesday 02 March 2005 17:53, Trond Myklebust wrote:
> on den 02.03.2005 Klokka 12:33 (+0100) skreiv Bernd Schubert:
> > > I can see no good reason for truncating inode number values on
> > > platforms that actually do support 64-bit inode numbers, but I can see
> > > several
> >
> > Well, at least we would have a reason ;)
>
> A 32-bit emulation mode is clearly a "platform" which does NOT support
> 64-bit inode numbers, however there is (currently) no way for the kernel
> to detect that you are running that. Any extra truncation should
> therefore ideally be done by the emulation layer rather than the kernel
> itself.
>

I already found the function in glibc and it looks as if it would be rather
easy to do it there. I only hope the glibc maintainers will accept this kind
of fixes (hope they won't say that nobody needs this).


Cheers,
Bernd


PS: Also many thanks for fixing other bugs in the NFS client. Until 2.6.9 init
somehow could not open /dev/console on a readonly mountpoint. With 2.6.11
this problem has disappeared, thanks a lot for fixing this and other
problems. I never had the time to write a bugreport for that.


--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit?t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]

2005-03-03 09:14:28

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Wed, Mar 02, 2005 at 01:13:38AM -0800, Trond Myklebust wrote:
> on den 02.03.2005 Klokka 09:18 (+0100) skreiv Andi Kleen:
> > On Wed, Mar 02, 2005 at 12:46:23AM +0100, Andreas Schwab wrote:
> > > Bernd Schubert <[email protected]> writes:
> > >
> > > > Hmm, after compiling with -D_FILE_OFFSET_BITS=64 it works fine. But why does
> > > > it work without this option on a 32bit kernel, but not on a 64bit kernel?
> > >
> > > See nfs_fileid_to_ino_t for why the inode number is different between
> > > 32bit and 64bit kernels.
> >
> > Ok that explains it. Thanks.
> >
> > Best would be probably to just do the shift unconditionally on 64bit kernels
> > too.
> >
> > Trond, what do you think?
>
> Why would this be more appropriate than defining __kernel_ino_t on the
> x86_64 platform to be of the size that you actually want the kernel to
> support?
> I can see no good reason for truncating inode number values on platforms
> that actually do support 64-bit inode numbers, but I can see several

If you include 32bit emulation x86-64 doesn't support them.
I guess you could make it dependent on CONFIG_COMPAT, but I expect
near all people running an x86-64 to have it set.

> reasons why you might want not to (utilities that need to detect hard
> linked files for instance).

32bit compatibility is a good reason. 32bit compatibility
is fairly important

Afaik the only "pure" 64bit architecture without 32bit emulation
is Alpha. In theory you could enable it unconditionally there,
but I'm not sure it's worth it.


-Andi

2005-03-03 09:23:30

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Wed, Mar 02, 2005 at 08:53:07AM -0800, Trond Myklebust wrote:
> on den 02.03.2005 Klokka 12:33 (+0100) skreiv Bernd Schubert:
>
> > > I can see no good reason for truncating inode number values on platforms
> > > that actually do support 64-bit inode numbers, but I can see several
> >
> > Well, at least we would have a reason ;)
>
> A 32-bit emulation mode is clearly a "platform" which does NOT support
> 64-bit inode numbers, however there is (currently) no way for the kernel
> to detect that you are running that. Any extra truncation should
> therefore ideally be done by the emulation layer rather than the kernel
> itself.

The problem here is that glibc uses stat64() which supports
64bit inode numbers. But glibc does the overflow checking itself
and generates the EOVERFLOW in user space. Nothing we can do
about that. The 64bit inodes work under 32bit too, so your
code checking for 64bitness is totally bogus.

The old stat interface doesn't check that case currently either
(will fix that), but that's not the problem here.

But in general the emulation layer cannot do truncation because
it doesn't know if it is ok to do for the low level file system.
If anything this has to be done in the fs.

-Andi

2005-03-03 21:45:06

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

> So what do you actually suggest? On the one hand you say even 32bit userspace
> supports 64bit inodes, if it wants. On the other hand you say the truncation
> needs to be done on file system level.
> To my mind this is contradicting, the first statement suggests to do the
> truncation in userspace, the second says it can only be done in the kernel?

We have to live with glibc not supporting this, so it would be probably
best to always do the truncation in NFS.

-Andi

2005-03-03 21:49:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

to den 03.03.2005 Klokka 10:19 (+0100) skreiv Andi Kleen:
> The problem here is that glibc uses stat64() which supports
> 64bit inode numbers. But glibc does the overflow checking itself
> and generates the EOVERFLOW in user space. Nothing we can do
> about that. The 64bit inodes work under 32bit too, so your
> code checking for 64bitness is totally bogus.

As far as the kernel is concerned, asm/posix_types defines
__kernel_ino_t as "unsigned long" on most platforms (except a few which
define is as "unsigned int). We don't care what size type glibc itself
uses.

> The old stat interface doesn't check that case currently either
> (will fix that), but that's not the problem here.
>
> But in general the emulation layer cannot do truncation because
> it doesn't know if it is ok to do for the low level file system.
> If anything this has to be done in the fs.

Inode numbers are provided for informational reasons only. There are no
POSIX or SUSv3 interfaces that take an inode number as an argument. The
only processing you can do with an inode number is to compare it for
equality to another inode number.

So I don't see how the file system would be able to do a better job of
truncation here. In principle you should *never* truncate inode numbers.

Cheers,
Trond

--
Trond Myklebust <[email protected]>

2005-03-03 22:28:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

to den 03.03.2005 Klokka 22:46 (+0100) skreiv Andi Kleen:

> > As far as the kernel is concerned, asm/posix_types defines
> > __kernel_ino_t as "unsigned long" on most platforms (except a few which
> > define is as "unsigned int). We don't care what size type glibc itself
> > uses.
>
> That could easily be changed and even pass out 64bit inodes
> on 32bit systems. The stat64 syscall ABI allows this.
>
> Perhaps that should be done and then you could drop the truncation
> code.

That would be the ideal solution. I don't see that the current system of
truncating is helping anyone.

> Of couse this would expose the glibc Bug Bernd ran into on 32bit
> too, but at some point they have to fix that bogosity anyways.

Right.

Cheers,
Trond
--
Trond Myklebust <[email protected]>

2005-03-03 21:58:41

by Andi Kleen

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Thu, Mar 03, 2005 at 01:37:26PM -0800, Trond Myklebust wrote:
> to den 03.03.2005 Klokka 10:19 (+0100) skreiv Andi Kleen:
> > The problem here is that glibc uses stat64() which supports
> > 64bit inode numbers. But glibc does the overflow checking itself
> > and generates the EOVERFLOW in user space. Nothing we can do
> > about that. The 64bit inodes work under 32bit too, so your
> > code checking for 64bitness is totally bogus.
>
> As far as the kernel is concerned, asm/posix_types defines
> __kernel_ino_t as "unsigned long" on most platforms (except a few which
> define is as "unsigned int). We don't care what size type glibc itself
> uses.

That could easily be changed and even pass out 64bit inodes
on 32bit systems. The stat64 syscall ABI allows this.

Perhaps that should be done and then you could drop the truncation
code.

Of couse this would expose the glibc Bug Bernd ran into on 32bit
too, but at some point they have to fix that bogosity anyways.


> So I don't see how the file system would be able to do a better job of
> truncation here. In principle you should *never* truncate inode numbers.

Agreed, except we are stuck with broken user land here.

But - if you ever chose to truncate you should do it on 64bit
system too to avoid problems with the 32bit emulation.

-Andi

2005-03-03 21:28:57

by Bernd Schubert

[permalink] [raw]
Subject: Re: x86_64: 32bit emulation problems

On Thursday 03 March 2005 10:19, Andi Kleen wrote:
> On Wed, Mar 02, 2005 at 08:53:07AM -0800, Trond Myklebust wrote:
> > on den 02.03.2005 Klokka 12:33 (+0100) skreiv Bernd Schubert:
> > > > I can see no good reason for truncating inode number values on
> > > > platforms that actually do support 64-bit inode numbers, but I can
> > > > see several
> > >
> > > Well, at least we would have a reason ;)
> >
> > A 32-bit emulation mode is clearly a "platform" which does NOT support
> > 64-bit inode numbers, however there is (currently) no way for the kernel
> > to detect that you are running that. Any extra truncation should
> > therefore ideally be done by the emulation layer rather than the kernel
> > itself.
>
> The problem here is that glibc uses stat64() which supports
> 64bit inode numbers. But glibc does the overflow checking itself
> and generates the EOVERFLOW in user space. Nothing we can do
> about that. The 64bit inodes work under 32bit too, so your
> code checking for 64bitness is totally bogus.
>
> The old stat interface doesn't check that case currently either
> (will fix that), but that's not the problem here.
>
> But in general the emulation layer cannot do truncation because
> it doesn't know if it is ok to do for the low level file system.
> If anything this has to be done in the fs.
>

So what do you actually suggest? On the one hand you say even 32bit userspace
supports 64bit inodes, if it wants. On the other hand you say the truncation
needs to be done on file system level.
To my mind this is contradicting, the first statement suggests to do the
truncation in userspace, the second says it can only be done in the kernel?

Cheers,
Bernd