Recently I experience oddities with my home nfsv4 client/server
connection which worked long very well. I don't know exactly with which
version it begun. I experienced once in a while, that I delete files but
after rebooting the computers the next day or after doing an ls on the
directory, the files are still there. While I did not take this sriously
first I now have the problem copying an ~600MB avi onto the nfs share.
After 25M or 30M (each time after this amount already transferred) the
connection stalls:
nfs: server 10.10.0.1 not responding, timed out
On my client I have 2.6.17_rc3 and this fstab entry:
10.10.0.1:/pub/ /pub nfs4 tcp,async,hard,intr,noauto,user,sec=sys 0 0
this is only an example, I tried udp, soft and timeo=8 instead of hard
and intr.
On server I had 2.6.25 with this exports:
/exports
10.10.0.0/255.255.0.0(rw,fsid=0,insecure,no_subtree_check,sync)
/exports/pub
10.10.0.0/255.255.0.0(rw,nohide,insecure,no_subtree_check,async)
When this happened, the server had this in dmesg:
RPC: bad TCP reclen 0x0010011a (non-terminal)
or:
RPC: bad TCP reclen 0x3b468abe (large)
In 2.6.27_rc (runnning now), the part in svchost.c was spelled other, it
says now:
RPC: fragment too large: 0x6be367a9
or
RPC: fragment too large: 0x1287087c
Starting the nfs stuff yields into
RPC: multiple fragments per record not supported
with no other regression though.
What can I do to narrow this down?
Kind Regards, Konsti
--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF
Hm, may be the ethernet driver atl1 is the culprit.
I stumpled upon "sendfile() broken with 2.6.26 + Apache 2 ?" and
especially http://marc.info/?l=linux-kernel&m=121619345703525&w=2
So I did "ethtool -K eth0 tso off" and I was able to transfer the file
immediately twice, all three (origin, copy 1 and 2) have identical
md5sum and the transferrate was reasonable at ca. 12MB/s.
Regards, Konsti
--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF