2008-08-25 13:17:17

by Konstantin Kletschke

[permalink] [raw]
Subject: nfs4v stalls transfering files, stalls

Recently I experience oddities with my home nfsv4 client/server
connection which worked long very well. I don't know exactly with which
version it begun. I experienced once in a while, that I delete files but
after rebooting the computers the next day or after doing an ls on the
directory, the files are still there. While I did not take this sriously
first I now have the problem copying an ~600MB avi onto the nfs share.
After 25M or 30M (each time after this amount already transferred) the
connection stalls:

nfs: server 10.10.0.1 not responding, timed out

On my client I have 2.6.17_rc3 and this fstab entry:

10.10.0.1:/pub/ /pub nfs4 tcp,async,hard,intr,noauto,user,sec=sys 0 0

this is only an example, I tried udp, soft and timeo=8 instead of hard
and intr.

On server I had 2.6.25 with this exports:

/exports
10.10.0.0/255.255.0.0(rw,fsid=0,insecure,no_subtree_check,sync)
/exports/pub
10.10.0.0/255.255.0.0(rw,nohide,insecure,no_subtree_check,async)

When this happened, the server had this in dmesg:

RPC: bad TCP reclen 0x0010011a (non-terminal)

or:

RPC: bad TCP reclen 0x3b468abe (large)

In 2.6.27_rc (runnning now), the part in svchost.c was spelled other, it
says now:

RPC: fragment too large: 0x6be367a9
or
RPC: fragment too large: 0x1287087c

Starting the nfs stuff yields into

RPC: multiple fragments per record not supported

with no other regression though.


What can I do to narrow this down?

Kind Regards, Konsti

--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF


2008-08-25 19:34:16

by Konstantin Kletschke

[permalink] [raw]
Subject: Re: nfs4v stalls transfering files, stalls


Hm, may be the ethernet driver atl1 is the culprit.
I stumpled upon "sendfile() broken with 2.6.26 + Apache 2 ?" and
especially http://marc.info/?l=linux-kernel&m=121619345703525&w=2

So I did "ethtool -K eth0 tso off" and I was able to transfer the file
immediately twice, all three (origin, copy 1 and 2) have identical
md5sum and the transferrate was reasonable at ca. 12MB/s.

Regards, Konsti

--
GPG KeyID EF62FCEF
Fingerprint: 13C9 B16B 9844 EC15 CC2E A080 1E69 3FDA EF62 FCEF