2004-08-24 11:08:08

by NeilBrown

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Monday August 16, [email protected] wrote:
>
> I bumped the nfsd's up to 64 (from 32) and subjectively the problem gets
> worse. I then reduced them to 16 and things are a bit better...

Odd.

>
> Would changing some of the bdflush settings help at all?

Maybe. I would start with
echo 200 > /proc/sys/vm/dirty_expire_centisecs

You said you are using ext3. Are you using journal=data or the
default journal=ordered ??

Also, it would be interesting to compare nfs ops per second against
disk i/os per second over time.
Something like..

while :
do
perl -ne 'if (/^proc3/) { @a=split ; shift @a; shift @a; print eval(join("+", @a))." ";}' /proc/net/rpc/nfsd
perl -ne 'if (/hda /) { @a=split; print $a[9]."\n";}' /proc/diskstats
sleep 1
done | perl -ne '@_=split; print( ($_[0]-$a[0])." ".($_[1]-$a[1])."\n"); @a=@_;'

If the pauses correspond to periods with very low nfs ops/sec and very
high writes per second, then it confirms that it is a disk flushing
problem.

It would also be interesting to see if there was a pattern in the
timing, particular how long the interval was between one pause and the
next.
Also getting these sets of number for different numbers of nfsd
threads could turn your subjective impression into objective data.

NeilBrown


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-24 14:22:43

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Neil Brown wrote:
> On Monday August 16, [email protected] wrote:
>
>>I bumped the nfsd's up to 64 (from 32) and subjectively the problem gets
>>worse. I then reduced them to 16 and things are a bit better...
>
>
> Odd.
>
>>Would changing some of the bdflush settings help at all?
>
>
> Maybe. I would start with
> echo 200 > /proc/sys/vm/dirty_expire_centisecs

> You said you are using ext3. Are you using journal=data or the
> default journal=ordered ??

I'm using the default on Fedora 1, ordered data.

> Also, it would be interesting to compare nfs ops per second against
> disk i/os per second over time.
> Something like..
>
> while :
> do
> perl -ne 'if (/^proc3/) { @a=split ; shift @a; shift @a; print eval(join("+", @a))." ";}' /proc/net/rpc/nfsd
> perl -ne 'if (/hda /) { @a=split; print $a[9]."\n";}' /proc/diskstats
> sleep 1
> done | perl -ne '@_=split; print( ($_[0]-$a[0])." ".($_[1]-$a[1])."\n"); @a=@_;'
>
> If the pauses correspond to periods with very low nfs ops/sec and very
> high writes per second, then it confirms that it is a disk flushing
> problem.

I'm using FC1 which has 2.4.22 as its base. I think the
/proc/sys/vm/dirty_expire_centisecs and the above scriptlet are
for 2.6 only. The nfs stats work, but the /proc/diskstats file
is missing.

Do you have any suggestions for /proc/sys/vm/bdflush instead ? Here
are the current settings:

30 500 0 0 500 3000 60 20 0

> It would also be interesting to see if there was a pattern in the
> timing, particular how long the interval was between one pause and the
> next.

I'll start keeping a note of the time of these events.

> Also getting these sets of number for different numbers of nfsd
> threads could turn your subjective impression into objective data.
>
> NeilBrown

Thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-26 11:02:13

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server (with piccies)

Neil Brown wrote:
> On Monday August 16, [email protected] wrote:
>
>>I bumped the nfsd's up to 64 (from 32) and subjectively the problem gets
>>worse. I then reduced them to 16 and things are a bit better...
>
>
> Odd.
>
>
>>Would changing some of the bdflush settings help at all?
>
>
> Maybe. I would start with
> echo 200 > /proc/sys/vm/dirty_expire_centisecs
>
> You said you are using ext3. Are you using journal=data or the
> default journal=ordered ??
>
> Also, it would be interesting to compare nfs ops per second against
> disk i/os per second over time.
> Something like..
>
> while :
> do
> perl -ne 'if (/^proc3/) { @a=split ; shift @a; shift @a; print eval(join("+", @a))." ";}' /proc/net/rpc/nfsd
> perl -ne 'if (/hda /) { @a=split; print $a[9]."\n";}' /proc/diskstats
> sleep 1
> done | perl -ne '@_=split; print( ($_[0]-$a[0])." ".($_[1]-$a[1])."\n"); @a=@_;'
>
> If the pauses correspond to periods with very low nfs ops/sec and very
> high writes per second, then it confirms that it is a disk flushing
> problem.
>
> It would also be interesting to see if there was a pattern in the
> timing, particular how long the interval was between one pause and the
> next.
> Also getting these sets of number for different numbers of nfsd
> threads could turn your subjective impression into objective data.
>
> NeilBrown

Neil, and others

I've gathered some useful data (I hope) on the problem. I ran a variant
of Neil's script for 2.4 kernel for most of a day (9.30-15.00).

It's all here:

http://www.stams.strath.ac.uk/~ian/nfs/

Files:
data.all.raw raw data, 3 columns: HH:MM:SS nfsops diskwrites
data.all.plot massaged data, 3 columns: SECONDS nfsops diskwrites
data.all.eps postscript plot of massaged data
data.all.gif gif plot of massaged data

(The massaged data has had missing data filled in. Sometimes the
seconds field jumps 2 seconds. HH:MM:SS changed to seconds from start)

I think this shows no correlation between NFS ops and disk writes
with respect to these big slowdowns (the big peaks in lower graph, there
are 5). Something with a periodicity of 600 seconds is also writing to
the disk. (This was done with 32 nfsd threads, BTW)

There is another similar set of files zooming in on the first 2
events (data.zoom.*). You can see from this graph that in the disk
writing event lasts about 50 seconds, and the client machines hang on
NFS ops for this period (pretty annoying, I can tell you!).

Also in the directory is the output of "vmstat 1" showing one of these
events (vmstat.log).

Can anyone deduce anything from this?

Many thanks

Ian

PS: How big are these "wsect" counts in /proc/partitions in terms of bytes ?

--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs