2005-10-14 09:06:31

by Ruediger Oberhage

[permalink] [raw]
Subject: NFS client problem with kernel 2.6 and SGI IRIX 6.5

Dear Trond Myklebust,

my name is Ruediger Oberhage, I'm (amongst other duties)
administering computers for the Theoretical Physics in Essen of
the university Duisburg-Essen, Germany, and I do have a (client)
problem (severe to us) with the 2.6 kernel series and nfs, when
served from an SGI IRIX 6.5 system (type: Origin 200).

Since I use the Debian GNU/Linux distribution, I contacted its kernel
maintainer (Horms) first, and he pointed me to you (I'll add the
problem report(s) below).

The problem was registered with the Debian Bug Tracking System as
Bug#325117.

The summary is as follows: I do have problems with the 2.6 series
kernel, which do not occur with a 2.4 series kernel (and an other-
wise unchanged system). I discovered it with Mathematica version 5.0,
but do think that other programs are also involved (e.g. OpenOffice
1.1.4, that doesn't find its default (or any other) printer any
longer). The symptom is, that certain ressources are reported
missing, that are definitively there and which lie somewhere
within the application-tree, that tree lying within a hierarchie
being nfs-auto-mounted from the SGI system to the (Intel architec-
ture) Linux client. File contents (or whole files?) seems to get
'lost' somehow.

It doesn't seem to be the MSBit Problem of the 32bit nfs cookies
(alone) - the branch is exported with the IRIX '32bitclients'
option, to avoid the 64bit cookies, that led to a similar problem
with the printer in OpenOffice under the 2.4 series kernels, and
vanished with the 32bit-option. The reason for me to state this
is, that when I applied a 32bit-'SGI-IRIX-induced'-patch for (early)
2.6 kernels (Debians 2.6.8) the problem didn't go away, and it also
still occurs when using the 2.6.12-kernel, where some kernel-version
ago (2.6.10 or 11?) that part of the cookie problem was solved via a
translation table (once and for all, I hope).

The problem occurs when requesting nfs v2 as well as nfs v3 protocol.
An LD_ASSUME_KERNEL does not seem to help, as it does with other
problems.

When testing or compiling kernels, I always used the 'debianized'
versions, but to my understanding, they are nearly unaltered compared
to the 'plain' kernels (see Debian changelogs).

The problem is severe to us, as the same configuration also exports
our home-directories, which are, of course, writeable, contrary to
the application-tree, which is read-only. Thus any help will be
welcome.

I'm willing to try whatever I can do to resolve the problem, but I
need guidance in what to do and what (else) you need to know.

Many thanks,
Ruediger Oberhage

Please find the 'Debian bug reports and replies' below
(sorry, it's long, but you may skip it should you prefer to
get it from Debian's Bug Tracking System directly!):

Package: kernel-image-2.6.8-2-686
Version: 2.6.8-16

Severity: critical

Hello!

This is about an (at least to us) critical bug within NFS in the
current Debian 3.1 (stable=sarge) version Intel i386 architecture
with kernel 2.6 only! All the phaenomena reported do not(!) occur
with kernel 2.4 (here 2.4.27, more precisely 2.4.27-2-686).

First symptom: when I change into any NFS-mounted directory or
subdirectory thereof and issue the command 'find . -print', I get
the following result:

/Net/Apps# find . -print
.
find: .: Value too large for defined data type

The same is true, if I address that directory 'from the outside':

/tmp# find /Net/Apps/. -print
/Net/Apps/.
find: /Net/Apps/.: Value too large for defined data type

[the '.' after the /Net/Apps/ is necessary, as this is a
symlink here! But the same happens, when that is not the
case!]

I've read about such a problem in the Ubuntu bug-tracking
system, and they claim to have a solution for this one.
This could be true, as this problem doesn't show, when I
use the Knoppix 4.0 DVD (which uses a 2.6.12-kernel, iirc).
I did compile and try under 'sarge' the latest kernel available
in the Debian repository at this time (2.6.11-7) from
kernel-source-2.6.11_2.6.11-7_all.deb and accessories via
'make-kpkg', a 'sarge'-version of
"kernel-image-2.6.11-1-686_2.6.11-7_i386.deb" so to speak,
and this one, too, shows the error. So it isn't gone in
Debian!
libc6 is: Version: 2.3.2.ds1-22, the 'standard one', but
I don't think, it does matter.
[As written above, it doesn't show up with kernel 2.4!]


The second problem is the critical failure of applications
in such an NFS-mounted tree. E.g. Mathematica v5.0 crashes,
with a 'segmentation fault', after not only complaining about
problems with "fonts" (that can often be ignored), but
also with reporting missing 'structures' (read files!) from
that tree, finally resulting in the abort. These files are
definitely there and not 'harmed' - it does work with a 2.4 kernel
and an otherwise unchanged 'sarge' system. [An LD_ASSUME_KERNEL=2.4
does not(!) help here for 2.6 kernels, as it does with e.g.
Maple v.8, where a missing 'errno' variable is (otherwise) reported
for libc6 by the dynamic linker with 2.6 kernels.]

This problem does not(!) go away with the KNOPPIX 4.0 DVD kernel
version, contrary to the 'find'-problem!

Also playing around with every parameter of the NFS-system (like
NFS-version (2 or 3), tcp, r/wsize etc.) that makes sense to me, did
not result in a working system.

The server(s) here is (are) Origin 200 SGI IRIX 6.5 system(s) with
xfs filesystems! But I don't think this matters, either, see the
'Ubuntu'-problem report. Linux servers might work, though, by
canceallation of errors in server and client.

I don't dare to use such a combination on the 'writable' NFS-home-
directories of our users, for fear of destroying files [the 'apps'
are mounted read-only (ro) and are not a problem in this regard].

As this concerns the (NFS-mounted) applications as well as the
home-directories of our users, I regard this problem as critical!
Thus the severity rating! It is probably less severe for someone
not using 'NFS' or using 'Linux only' systems - where I can't
say, if the problem arises. The only workaround for me is to use a
2.4 kernel, which isn't nice - udev/hal and other component highly
advisable for a desktop system (e.g. for USB-memory-sticks. other
removable media etc.) are not available then!

With the plea for a fast fix and best regards,
Ruediger Oberhage

-----

On Fri, Aug 26, 2005 at 10:52:06AM +0200, Ruediger Oberhage wrote:
> Package: kernel-image-2.6.8-2-686
> Version: 2.6.8-16
>
> Severity: critical

Hi,

is it possible for you to test the 2.6.12 kernel package
that has been produced for Sarge. Its available at the
following URL as 2.6.12-5.99

It would be good to know if the problem was fixed between
2.6.8 and 2.6.12. If not I would recommend starting a dialog
with the upstream NFS maintainers, I can point you to the right place.
If so, we have a starting point to try and isolate the change
that resolve the problem. Though it may prove too extensive
to be appropriate for backporting to 2.6.8.

Regards

-----

> Hi,

Hello,

many thanks for your kind reply.

> is it possible for you to test the 2.6.12 kernel package
> that has been produced for Sarge. Its available at the
> following URL as 2.6.12-5.99

Well, I did try some packages mentioned to be available on
http://packages.vergenet.net/testing/linux-2.6/
[linux-image-2.6.12-1-686_2.6.12-5.99.sarge1_i386.deb and
dependancies] and they didn't work, either, or more precisely showed
the same symptom.
[This and a patch I found for the MSB-problem of the 32bit
cookies (or even 64bit cookies without export-option) for which
SGI IRIX 6(.5) is notorius for and which I applied to earlier 2.6er
kernels seem to indicate, that the problem hasn't vanished in
between and is not related to (only) the 32bit nfs-cookie thing.
I'm not sure if I mentioned that in the original message.]

> It would be good to know if the problem was fixed between
> 2.6.8 and 2.6.12.

I don't think so (see above).

> If not I would recommend starting a dialog with the upstream NFS
> maintainers, I can point you to the right place.

That would be nice thank you. I'm willing to try everything that
I'm carefully guided to :-), as long as my resources allow.
Since it is important to me for this to work, I'd like to help
where I can.

> If so, we have a starting point to try and isolate the change
> that resolve the problem. Though it may prove too extensive
> to be appropriate for backporting to 2.6.8.

Yes, I do understand this, and I would gladly be willing to
switch to a newer kernel. 2.6.8 is a non-optimal choice anyway
in my eyes, being the last kernel which has practically no
useful (udev) classes but the most general (e.g. the 'dvb' class
is still missing from its modules/drivers).

Thus it wouldn't be that hard for me to part with 2.6.8, but
a transition beyond 2.6.12 (e.g. 2.6.13) with 'sarge' might
be hard (or impossible?), too, regarding its 'tools' dependancies.

The most important thing would be, to learn what's going wrong
with 'nfs', though, I think. At least to me and may be to you, too.

Thanks again and regards,
Ruediger Oberhage

-----

tag 325117 +upstream
thanks

On Fri, Oct 07, 2005 at 09:11:56AM +0200, Ruediger Oberhage wrote:
> > Hi,
>
> Hello,
>
> many thanks for your kind reply.
>
> > is it possible for you to test the 2.6.12 kernel package
> > that has been produced for Sarge. Its available at the
> > following URL as 2.6.12-5.99
>
> Well, I did try some packages mentioned to be available on
> http://packages.vergenet.net/testing/linux-2.6/
> [linux-image-2.6.12-1-686_2.6.12-5.99.sarge1_i386.deb and
> dependancies] and they didn't work, either, or more precisely showed
> the same symptom.
> [This and a patch I found for the MSB-problem of the 32bit
> cookies (or even 64bit cookies without export-option) for which
> SGI IRIX 6(.5) is notorius for and which I applied to earlier 2.6er
> kernels seem to indicate, that the problem hasn't vanished in
> between and is not related to (only) the 32bit nfs-cookie thing.
> I'm not sure if I mentioned that in the original message.]
>
> > It would be good to know if the problem was fixed between
> > 2.6.8 and 2.6.12.
>
> I don't think so (see above).

Yes I agree

> > If not I would recommend starting a dialog with the upstream NFS
> > maintainers, I can point you to the right place.
>
> That would be nice thank you. I'm willing to try everything that
> I'm carefully guided to :-), as long as my resources allow.
> Since it is important to me for this to work, I'd like to help
> where I can.

As I understand your problem seems to be with the NFS client,
not the NFS server portion of the kernel. The contact for
that is Trond Myklebust <[email protected]>, you
should also CC [email protected].

If you see anything related to this message in dmsg, send that too.

On the Debian side, it would be good to CC [email protected],
to keep this bug up to date. Upstream lives on CC, so it
probably won't drop off in a hurry.

> > If so, we have a starting point to try and isolate the change
> > that resolve the problem. Though it may prove too extensive
> > to be appropriate for backporting to 2.6.8.
>
> Yes, I do understand this, and I would gladly be willing to
> switch to a newer kernel. 2.6.8 is a non-optimal choice anyway
> in my eyes, being the last kernel which has practically no
> useful (udev) classes but the most general (e.g. the 'dvb' class
> is still missing from its modules/drivers).
>
> Thus it wouldn't be that hard for me to part with 2.6.8, but
> a transition beyond 2.6.12 (e.g. 2.6.13) with 'sarge' might
> be hard (or impossible?), too, regarding its 'tools' dependancies.
>
> The most important thing would be, to learn what's going wrong
> with 'nfs', though, I think. At least to me and may be to you, too.

--
H.-R. Oberhage
Mail: Univ. Duisburg-Essen E-Mail: [email protected]
Fachbereich Physik [email protected]
Campus Essen, S05 V07 E88
Universitaetsstrasse 5 Phone: {+49|0} 201 / 183-2493
45141 Essen, Germany FAX: {+49|0} 201 / 183-4578


2005-10-14 18:22:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client problem with kernel 2.6 and SGI IRIX 6.5

fr den 14.10.2005 Klokka 11:05 (+0200) skreiv Ruediger Oberhage:
> Dear Trond Myklebust,
>
> my name is Ruediger Oberhage, I'm (amongst other duties)
> administering computers for the Theoretical Physics in Essen of
> the university Duisburg-Essen, Germany, and I do have a (client)
> problem (severe to us) with the 2.6 kernel series and nfs, when
> served from an SGI IRIX 6.5 system (type: Origin 200).
>
> Since I use the Debian GNU/Linux distribution, I contacted its kernel
> maintainer (Horms) first, and he pointed me to you (I'll add the
> problem report(s) below).
>
> The problem was registered with the Debian Bug Tracking System as
> Bug#325117.
>
> The summary is as follows: I do have problems with the 2.6 series
> kernel, which do not occur with a 2.4 series kernel (and an other-
> wise unchanged system). I discovered it with Mathematica version 5.0,
> but do think that other programs are also involved (e.g. OpenOffice
> 1.1.4, that doesn't find its default (or any other) printer any
> longer). The symptom is, that certain ressources are reported
> missing, that are definitively there and which lie somewhere
> within the application-tree, that tree lying within a hierarchie
> being nfs-auto-mounted from the SGI system to the (Intel architec-
> ture) Linux client. File contents (or whole files?) seems to get
> 'lost' somehow.
>
> It doesn't seem to be the MSBit Problem of the 32bit nfs cookies
> (alone) - the branch is exported with the IRIX '32bitclients'
> option, to avoid the 64bit cookies, that led to a similar problem
> with the printer in OpenOffice under the 2.4 series kernels, and
> vanished with the 32bit-option. The reason for me to state this
> is, that when I applied a 32bit-'SGI-IRIX-induced'-patch for (early)
> 2.6 kernels (Debians 2.6.8) the problem didn't go away, and it also
> still occurs when using the 2.6.12-kernel, where some kernel-version
> ago (2.6.10 or 11?) that part of the cookie problem was solved via a
> translation table (once and for all, I hope).
>

Have you tried running "strace" on this find command in order to figure
out which syscall is returning EOVERFLOW? If it is getdents, please
could you confirm that the same error occurs in the same place on
2.6.12?

...Oh, and could I have a binary tcpdump of the traffic between the
client and server when this happens. Please use something like

tcpdump -w /tmp/dump.out -s 9000 host <servername> and port 2049

Cheers,
Trond

2005-10-17 14:18:16

by James Pearson

[permalink] [raw]
Subject: Re: NFS client problem with kernel 2.6 and SGI IRIX 6.5

> The summary is as follows: I do have problems with the 2.6 series
> kernel, which do not occur with a 2.4 series kernel (and an other-
> wise unchanged system). I discovered it with Mathematica version 5.0,
> but do think that other programs are also involved (e.g. OpenOffice
> 1.1.4, that doesn't find its default (or any other) printer any
> longer). The symptom is, that certain ressources are reported
> missing, that are definitively there and which lie somewhere
> within the application-tree, that tree lying within a hierarchie
> being nfs-auto-mounted from the SGI system to the (Intel architec-
> ture) Linux client. File contents (or whole files?) seems to get
> 'lost' somehow.
>
> It doesn't seem to be the MSBit Problem of the 32bit nfs cookies
> (alone) - the branch is exported with the IRIX '32bitclients'
> option, to avoid the 64bit cookies, that led to a similar problem
> with the printer in OpenOffice under the 2.4 series kernels, and
> vanished with the 32bit-option. The reason for me to state this
> is, that when I applied a 32bit-'SGI-IRIX-induced'-patch for (early)
> 2.6 kernels (Debians 2.6.8) the problem didn't go away, and it also
> still occurs when using the 2.6.12-kernel, where some kernel-version
> ago (2.6.10 or 11?) that part of the cookie problem was solved via a
> translation table (once and for all, I hope).
>
> The problem occurs when requesting nfs v2 as well as nfs v3 protocol.
> An LD_ASSUME_KERNEL does not seem to help, as it does with other
> problems.
>
> When testing or compiling kernels, I always used the 'debianized'
> versions, but to my understanding, they are nearly unaltered compared
> to the 'plain' kernels (see Debian changelogs).
>
> The problem is severe to us, as the same configuration also exports
> our home-directories, which are, of course, writeable, contrary to
> the application-tree, which is read-only. Thus any help will be
> welcome.
>
> I'm willing to try whatever I can do to resolve the problem, but I
> need guidance in what to do and what (else) you need to know.

Is this similar to the issue in the following thread? :

http://marc.theaimsgroup.com/?l=linux-kernel&m=108741268200839&w=2

James Pearson

2005-10-17 15:41:12

by Ruediger Oberhage

[permalink] [raw]
Subject: Re: NFS client problem with kernel 2.6 and SGI IRIX 6.5

> Is this similar to the issue in the following thread? :

> http://marc.theaimsgroup.com/?l=linux-kernel&m=108741268200839&w=2

I still have to investigate Trond Myklebust's suggestions (strace on
'df' with unmodified 2.6 kernel and 2.6.12 and the tcpdump on
the malfunctioning nfs, hopefully can do it tomorrow, but from
memory, I think the following holds:

When 'your' thread above, which leads to the URL
http://www.fys.uio.no/~trondmy/src/2.4.18/linux-2.4.18-seekdir.dif,
that isn't available any longer (at least here), is similar to

http://www.ussg.iu.edu/hypermail/linux/kernel/0502.1/0506.html
http://kerneltrap.org/mailarchive/1/message/19372/thread

that is the patch, that I applied to some kernels earlier than
2.6.11, then the behaviour is, that with this patch or a 'recent'
kernel (I do believe it starts with 2.6.11), the 'find'-problem
goes away - I'll re-check that -, but the 'other' problems
(Mathematica and OpenOffice not finding certain 'resources')
stay.

Nevertheless, 'seekdir' sounds very promising as a cause for
the problem(s), so if 'linux-2.4.18-seekdir.dif' is handling a
problem different from '0506.html', then it may be worth
investigating.

Since I can't access linux-2.4.18-seekdir.dif, could you please
either have a look if it's the same thing else send me that patch?

Thank you very much,
Ruediger Oberhage

2005-10-19 16:53:41

by Ruediger Oberhage

[permalink] [raw]
Subject: Re: NFS client problem with kernel 2.6 and SGI IRIX 6.5

Hello again.

Some first findings regarding nfs problems:

At first I have to apologize for my memory (again :-)) serving me
wrong: I did state, that the "find /nfsDir -print" problem was
(generally) gone with the 2.6.12 kernel; this is wrong(!).

The problem does exist for both (Debianized) kernels 2.6.8 as well
as 2.6.12 (the details follow below in the 'strace'-dump). The
(find-)problem does NOT exist for the (2.6.12-)kernel delivered on
the KNOPPIX 4.0 DVD!!! So there is a cure for some kernel for this
one. The 'resources'-problem (OpenOffice/Mathematica) still remains
for this kernel, too!

The second thing is, that James Pearson <[email protected]>
was very helpful with SGI IRIX specifics. From that it is now clear,
that my nfs-server uses a "naming=version 1" variant of the xfs-file-
system, where a 'version 2' also exists and is standard with IRIXes
6.5.5 or newer. The problem might (or might not) vanish when
'version 2' is used. The transition, however, requires a total
backup, xfs-reformat, and restore procedure. Such a filesystem also
won't mount at all on IRIX versions prior to 6.5.5 (according to the
man-page). If you're willing and helping, I would still like to find
and remove the cause of the problem. James suggests to have a look
into a 'seekdir'-patch for 2.4 kernels
[http://marc.theaimsgroup.com/?l=linux-kernel&m=108741268200839&w=2],
but its URL
[http://www.fys.uio.no/~trondmy/src/2.4.18/linux-2.4.18-seekdir.dif]
doesn't seem to be available any longer. Nevertheless 'seekdir' could
be a hot candidate.

I've not yet found the time for the 'tcpdump'-analysis (sorry, but it
will follow), but I did the 'strace' on 'find /nfsdir -print' and
'getdents64', that Trond Myklebust asked to pay attention at, and it
does report different second argument values (512 and 32768) for both
(failing!) kernels:
2.6.8:
getdents64(4, /* 9 entries */, 512) = 256
_llseek(4, 1826255771, [1826255771], SEEK_SET) = 0
getdents64(4, /* 2 entries */, 512) = 64
2.6.12:
getdents64(4, /* 9 entries */, 32768) = 256
_llseek(4, 1826255771, [1826255771], SEEK_SET) = 0
getdents64(4, /* 2 entries */, 32768) = 64

As written above: for KNOPPIX 4.0 'find' answers in the way expected,
listing lots and lots of directories and files, without any "Value
too large for defined data type" error!

[The directory has 7 'normal' subdirectories and '..' and '.', no
regular files in them (the 9 entries?), but lots of them in sub-
subdirectories of that tree.]


Thanks for any further help - the 'tcpdump' will follow.

Regards,
Ruediger Oberhage

Please find the 'strace' diagnostics for both (failing) kernels
attached below - it is the same nfs-tree that is called here, for
both kernels:

Version 2.6.8 (Debian):
[Linux host 2.6.8-2-686 #1 Thu May 19 17:53:30 JST 2005 i686 GNU/Linux]
$find /Net/Apps/. -print
/Net/Apps/.
find: /Net/Apps/.: Value too large for defined data type

~$ strace find /Net/Apps/. -print
execve("/usr/bin/find", ["find", "/Net/Apps/.", "-print"], [/* 19
vars */]) = 0
uname({sys="Linux", node="host", ...}) = 0
brk(0) = 0x8055000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=52022, ...}) = 0
old_mmap(NULL, 52022, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=52022, ...}) = 0
old_mmap(NULL, 52022, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/lib/tls/libc.so.6", O_RDONLY) = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`Z\1\000"..., 512) =
512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1254468, ...}) = 0
old_mmap(NULL, 1264780, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
0x40025000
old_mmap(0x4014f000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED, 3, 0x129000) = 0x4014f000
old_mmap(0x40158000, 7308, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40158000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4015a000
set_thread_area({entry_number:-1 -> 6, base_addr:0x4015a2a0,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0x40018000, 52022) = 0
brk(0) = 0x8055000
brk(0x8076000) = 0x8076000
brk(0) = 0x8076000
time(NULL) = 1129736098
open(".", O_RDONLY|O_LARGEFILE) = 3
fchdir(3) = 0
lstat64(".", {st_mode=S_IFDIR|S_ISGID|0755, st_size=4096, ...}) = 0
lstat64("/Net/Apps/.", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
chdir("/Net/Apps/.") = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x40018000
write(1, "/Net/Apps/.\n", 12/Net/Apps/.) = 12
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
fstat64(4, {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
getdents64(4, /* 9 entries */, 512) = 256
_llseek(4, 1826255771, [1826255771], SEEK_SET) = 0
getdents64(4, /* 2 entries */, 512) = 64
close(4) = 0
write(2, "find: ", 6find: ) = 6
write(2, "/Net/Apps/.", 11/Net/Apps/.) = 11
write(2, ": Value too large for defined da"..., 39: Value too large
for defined data type) = 39
write(2, "\n", 1) = 1
fchdir(3) = 0
munmap(0x40018000, 4096) = 0
exit_group(1) = ?

#####

Version 2.6.12 (Debian)
[kernel-image-2.6-686_2.6.12-2.6.12-5.99.sarge1_i386.deb]:
[Linux host 2.6.12-1-686 #1 Mon Sep 12 08:34:03 UTC 2005 i686 GNU/Linux]
~$ find /Net/Apps/. -print
/Net/Apps/.
find: /Net/Apps/.: Value too large for defined data type

~$ strace find /Net/Apps/. -print
execve("/usr/bin/find", ["find", "/Net/Apps/.", "-print"], [/* 18
vars */]) = 0
uname({sys="Linux", node="host", ...}) = 0
brk(0) = 0x8055000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f59000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=110500, ...}) = 0
old_mmap(NULL, 110500, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f3e000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/lib/tls/libc.so.6", O_RDONLY) = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`Z\1\000"..., 512) =
512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1254468, ...}) = 0
old_mmap(NULL, 1264780, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) =
0xb7e09000
old_mmap(0xb7f33000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED, 3, 0x129000) = 0xb7f33000
old_mmap(0xb7f3c000, 7308, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f3c000
close(3) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e08000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e08460,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f3e000, 110500) = 0
brk(0) = 0x8055000
brk(0x8076000) = 0x8076000
brk(0) = 0x8076000
time(NULL) = 1129736649
open(".", O_RDONLY|O_LARGEFILE) = 3
fchdir(3) = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat64("/Net/Apps/.", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
chdir("/Net/Apps/.") = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
lstat64(".", {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f58000
write(1, "/Net/Apps/.\n", 12/Net/Apps/.) = 12
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 4
fstat64(4, {st_mode=S_IFDIR|0755, st_size=114, ...}) = 0
fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
getdents64(4, /* 9 entries */, 32768) = 256
_llseek(4, 1826255771, [1826255771], SEEK_SET) = 0
getdents64(4, /* 2 entries */, 32768) = 64
close(4) = 0
write(2, "find: ", 6find: ) = 6
write(2, "/Net/Apps/.", 11/Net/Apps/.) = 11
write(2, ": Value too large for defined da"..., 39: Value too large
for defined data type) = 39
write(2, "\n", 1) = 1
fchdir(3) = 0
munmap(0xb7f58000, 4096) = 0
exit_group(1) = ?

--
H.-R. Oberhage
Mail: Univ. Duisburg-Essen E-Mail: [email protected]
Fachbereich Physik [email protected]
Campus Essen, S05 V07 E88
Universitaetsstrasse 5 Phone: {+49|0} 201 / 183-2493
45141 Essen, Germany FAX: {+49|0} 201 / 183-4578

2005-10-19 21:13:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client problem with kernel 2.6 and SGI IRIX 6.5

on den 19.10.2005 klokka 18:52 (+0200) skreiv Ruediger Oberhage:
> Hello again.
>
> Some first findings regarding nfs problems:
>
> At first I have to apologize for my memory (again :-)) serving me
> wrong: I did state, that the "find /nfsDir -print" problem was
> (generally) gone with the 2.6.12 kernel; this is wrong(!).
>
> The problem does exist for both (Debianized) kernels 2.6.8 as well
> as 2.6.12 (the details follow below in the 'strace'-dump). The
> (find-)problem does NOT exist for the (2.6.12-)kernel delivered on
> the KNOPPIX 4.0 DVD!!! So there is a cure for some kernel for this
> one. The 'resources'-problem (OpenOffice/Mathematica) still remains
> for this kernel, too!

Recent kernels (2.6.13 and above - sorry, I though it was 2.6.12) have
the following patch applied

http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-43-dirent_fix.dif

This should normally suffice to fix the SGI problem.

Cheers,
Trond

2005-10-26 04:34:17

by Simon Horman [Horms]

[permalink] [raw]
Subject: Re: Bug#325117: NFS client problem with kernel 2.6 and SGI IRIX 6.5

On Wed, Oct 19, 2005 at 02:13:41PM -0700, Trond Myklebust wrote:
> on den 19.10.2005 klokka 18:52 (+0200) skreiv Ruediger Oberhage:
> > Hello again.
> >
> > Some first findings regarding nfs problems:
> >
> > At first I have to apologize for my memory (again :-)) serving me
> > wrong: I did state, that the "find /nfsDir -print" problem was
> > (generally) gone with the 2.6.12 kernel; this is wrong(!).
> >
> > The problem does exist for both (Debianized) kernels 2.6.8 as well
> > as 2.6.12 (the details follow below in the 'strace'-dump). The
> > (find-)problem does NOT exist for the (2.6.12-)kernel delivered on
> > the KNOPPIX 4.0 DVD!!! So there is a cure for some kernel for this
> > one. The 'resources'-problem (OpenOffice/Mathematica) still remains
> > for this kernel, too!
>
> Recent kernels (2.6.13 and above - sorry, I though it was 2.6.12) have
> the following patch applied
>
> http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-43-dirent_fix.dif
>
> This should normally suffice to fix the SGI problem.

Thanks, I'll confine subseqent discussion to [email protected]
as debian packaging issues don't need to be on lkml.

--
Horms

2005-10-26 08:05:04

by Ruediger Oberhage

[permalink] [raw]
Subject: Re: Bug#325117: NFS client problem with kernel 2.6 and SGI IRIX 6.5

Hello!

>
http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-43-dirent_fix.dif
> >
> > This should normally suffice to fix the SGI problem.
>
> Thanks, I'll confine subseqent discussion to [email protected]
> as debian packaging issues don't need to be on lkml.

That's fine with me - for the moment at least. I'm busy applying
the patch to the "2.6.12-Debian-sarge"-kernel package. The patch
doesn't apply automatically, but I think I succeeded in doing so
manually. Now my problem is, that the kernel doesn't compile - but
not because of the patch or in the nfs-region, but with a (unused by
me) scsi-driver at the moment. I'll try to sort things out, have a
proper 'patched' kernel and shall report afterwards.

I also think, that a 'tcpdump' of the nfs-traffic makes only sense
after(!) applying the patch from a patched kernel; so I'll postpone
its submission till after I succeed (hopefully).

Be warned, though, that the KNOPPIX-kernel still has the 'resources'-
problem, although not the 'find' one. Thus if this kernel has the
'dirent_fix'-patch incorporated, it doesn't suffice. This may - later
- lead to the re-involvment of the 'nfs-kernel-group', eventually.

In the meantime, many thanks for the help - I'll try to do 'my'
homework as fast as my time allows.

Regards,
Ruediger Oberhage

2005-10-28 14:35:54

by Ruediger Oberhage

[permalink] [raw]
Subject: Re: Bug#325117: NFS client problem with kernel 2.6 and SGI IRIX 6.5

Hello all!

I would like to report/confirm success in solving the problem
described after applying the patch below:

> >
http://client.linux-nfs.org/Linux-2.6.x/2.6.12/linux-2.6.12-43-dirent_fix.dif
> >
> > This should normally suffice to fix the SGI problem.

Yes, it effectively eliminates both type of problems for our/my
configuration here - the 'find'-error as well as the 'resources'-
error (= OpenOffice 'printer' and Mathematica 'fonts, files,
directories etc.). Thus this patch is more effective that the
one in KNOPPIX' 4.0 kernel!

Thought, you'd like to know.

My sincere thanks to all helping out in this, here.

As I normally don't read the lists involved, I won't see other
problems with nfs and the SGI configuration. Should you feel
that testing here could be of any help, then please don't
hesitate to ask me about it - I'd like to return the favour
granted, if I can.

> Thanks, I'll confine subseqent discussion to [email protected]
> as debian packaging issues don't need to be on lkml.

This is fine with me - I just wanted to let everyone involved know
about the outcome. [This is most probably my last report regarding
this 'bug'. Thus you're all going to miss this 'fine tcpdump'-list
I promised; that is, unless somebody asks for it :-).]

Thanks again,
Ruediger Oberhage
--
H.-R. Oberhage
Mail: Univ. Duisburg-Essen E-Mail: [email protected]
Fachbereich Physik [email protected]
Campus Essen, S05 V07 E88
Universitaetsstrasse 5 Phone: {+49|0} 201 / 183-2493
45141 Essen, Germany FAX: {+49|0} 201 / 183-4578