2010-06-25 04:44:10

by Jasper Mackenzie

[permalink] [raw]
Subject: Problem: Clients freeze on transfer of large files, w gigabit lan

Good Day All,
I and many others have been plagued by a problem that can be summarised
as follows:

Client hangs upon copying of large files TO the server. Transfer begins
quickly then hangs, sometimes taking the client o/s with it, until
transfer starts again. In less extreme cases transfer is sporadic.
I am using nfs4 w gigabit nics.

My experience is with debian and ubuntu clients and servers, as such I
started a thread on ubuntu forum to find similar cases to give you more to
work from:
http://ubuntuforums.org/showthread.php?p=9269703
It appears that this problem is not restricted to ubuntu and exists in
at
least fedora and redhat as well.

More details and user experience is in the above forum thread.

Help in solving this would be greatly appreciated.

Thanks.

Jasper


2010-06-25 18:51:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problem: Clients freeze on transfer of large files, w gigabit lan

On Fri, 2010-06-25 at 16:44 +1200, Jasper Mackenzie wrote:
> Good Day All,
> I and many others have been plagued by a problem that can be summarised
> as follows:
>
> Client hangs upon copying of large files TO the server. Transfer begins
> quickly then hangs, sometimes taking the client o/s with it, until
> transfer starts again. In less extreme cases transfer is sporadic.
> I am using nfs4 w gigabit nics.
>
> My experience is with debian and ubuntu clients and servers, as such I
> started a thread on ubuntu forum to find similar cases to give you more to
> work from:
> http://ubuntuforums.org/showthread.php?p=9269703
> It appears that this problem is not restricted to ubuntu and exists in
> at
> least fedora and redhat as well.
>
> More details and user experience is in the above forum thread.
>
> Help in solving this would be greatly appreciated.

Could this perhaps be related to the following bugzilla entry?
https://bugzilla.kernel.org/show_bug.cgi?id=16213

If so, then could you please try the proposed fix and see if it helps.

Cheers
Trond


2010-06-28 20:28:10

by Jasper Mackenzie

[permalink] [raw]
Subject: Re: Problem: Clients freeze on transfer of large files, w gigabit lan


>> >> I and many others have been plagued by a problem that can be
>> >> summarised
>> >> as follows:
>> >>
>> >> Client hangs upon copying of large files TO the server. Transfer
>> begins
>> >> quickly then hangs, sometimes taking the client o/s with it, until
>> >> transfer starts again. In less extreme cases transfer is sporadic.
>> >> I am using nfs4 w gigabit nics.
>> >> http://ubuntuforums.org/showthread.php?p=9269703
>> >> It appears that this problem is not restricted to ubuntu and
>> exists
>> >
>> > Could this perhaps be related to the following bugzilla entry?
>> > https://bugzilla.kernel.org/show_bug.cgi?id=16213
>> >
>> > If so, then could you please try the proposed fix and see if it helps.
>> >
>> > Cheers
>> > Trond
>> Thanks Trond,
>> The patch solved the client lockups with a patched vanilla kernel (will
>> keep trying with an ubuntu kernel, as it should do the same, but didnt)
>>
>> Unfortunatley it dousnt fix the other problem, as it seems they are
>> separate, of the transfer happening in bursts, reducing the actual
>> throughput dramatically. i.e transfer starts at 16mb/s (according to
>> nautilus), then 3 or 4 seconds later the progress bar stops, the hdd
>> activity also stops (its the same with cp of course, nautilus just gives
>> me a good indication of xfer speed), then 4 or 5 seconds later it starts
>> again for 3 or 4 seconds.... repeat ad nausium...
>> Refer to the forum thread for graphs etc. of throughput if nesc.
>
> That is usually because the server is caching too much data instead of
> progressively writing it out. When the client calls 'commit' (the NFS
> equivalent of fsync()) then the disk on the server goes into a frenzy of
> writing, and the client does the RPC equivalent of twiddling its thumbs
> until the server is done...
>
> I'd suggest trying to lower the values
> of /proc/sys/vm/dirty_expire_centisecs
> and /proc/sys/vm/dirty_background_ratio on the server. You might also
> try lowering /proc/sys/vm/dirty_writeback_centisecs...
>
> Cheers
> Trond
I reduced them to 10% of the value I found them at to 40,1,20
respectivly, with no improvement.
I think I need to play with it more to see if the lockups are gone. Ime
not sure if the original problem is quite fixed...
dammit. was too easy !

When tested with ubuntu more, I will see how the ppl on the above forum
go.

any other ideas?

Thanks

Jasper

2010-06-28 20:13:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problem: Clients freeze on transfer of large files, w gigabit lan

On Tue, 2010-06-29 at 07:45 +1200, Jasper Mackenzie wrote:
> >> I and many others have been plagued by a problem that can be
> >> summarised
> >> as follows:
> >>
> >> Client hangs upon copying of large files TO the server. Transfer begins
> >> quickly then hangs, sometimes taking the client o/s with it, until
> >> transfer starts again. In less extreme cases transfer is sporadic.
> >> I am using nfs4 w gigabit nics.
> >> http://ubuntuforums.org/showthread.php?p=9269703
> >> It appears that this problem is not restricted to ubuntu and exists
> >
> > Could this perhaps be related to the following bugzilla entry?
> > https://bugzilla.kernel.org/show_bug.cgi?id=16213
> >
> > If so, then could you please try the proposed fix and see if it helps.
> >
> > Cheers
> > Trond
> Thanks Trond,
> The patch solved the client lockups with a patched vanilla kernel (will
> keep trying with an ubuntu kernel, as it should do the same, but didnt)
>
> Unfortunatley it dousnt fix the other problem, as it seems they are
> separate, of the transfer happening in bursts, reducing the actual
> throughput dramatically. i.e transfer starts at 16mb/s (according to
> nautilus), then 3 or 4 seconds later the progress bar stops, the hdd
> activity also stops (its the same with cp of course, nautilus just gives
> me a good indication of xfer speed), then 4 or 5 seconds later it starts
> again for 3 or 4 seconds.... repeat ad nausium...
> Refer to the forum thread for graphs etc. of throughput if nesc.

That is usually because the server is caching too much data instead of
progressively writing it out. When the client calls 'commit' (the NFS
equivalent of fsync()) then the disk on the server goes into a frenzy of
writing, and the client does the RPC equivalent of twiddling its thumbs
until the server is done...

I'd suggest trying to lower the values
of /proc/sys/vm/dirty_expire_centisecs
and /proc/sys/vm/dirty_background_ratio on the server. You might also
try lowering /proc/sys/vm/dirty_writeback_centisecs...

Cheers
Trond


2010-06-28 19:45:51

by Jasper Mackenzie

[permalink] [raw]
Subject: Re: Problem: Clients freeze on transfer of large files, w gigabit lan


>> I and many others have been plagued by a problem that can be
>> summarised
>> as follows:
>>
>> Client hangs upon copying of large files TO the server. Transfer begins
>> quickly then hangs, sometimes taking the client o/s with it, until
>> transfer starts again. In less extreme cases transfer is sporadic.
>> I am using nfs4 w gigabit nics.
>> http://ubuntuforums.org/showthread.php?p=9269703
>> It appears that this problem is not restricted to ubuntu and exists
>
> Could this perhaps be related to the following bugzilla entry?
> https://bugzilla.kernel.org/show_bug.cgi?id=16213
>
> If so, then could you please try the proposed fix and see if it helps.
>
> Cheers
> Trond
Thanks Trond,
The patch solved the client lockups with a patched vanilla kernel (will
keep trying with an ubuntu kernel, as it should do the same, but didnt)

Unfortunatley it dousnt fix the other problem, as it seems they are
separate, of the transfer happening in bursts, reducing the actual
throughput dramatically. i.e transfer starts at 16mb/s (according to
nautilus), then 3 or 4 seconds later the progress bar stops, the hdd
activity also stops (its the same with cp of course, nautilus just gives
me a good indication of xfer speed), then 4 or 5 seconds later it starts
again for 3 or 4 seconds.... repeat ad nausium...
Refer to the forum thread for graphs etc. of throughput if nesc.

Thanks.

Jasper