2004-08-11 10:55:21

by Ian Thurlbeck

[permalink] [raw]
Subject: Strange delays on NFS server


Dear All

I am getting strange delays on our NFS server. Everything will
be fine for maybe 5 to 10 minutes, then saving a small text file
from a client machine will suddenly take 15 seconds or more.
Similarly, mozilla will pause accessing a page for ages ~30secs
(I assume it's poking about in ~/.mozilla). Everything pops back to
normal afterwards. NFS server is lightly used at the moment.

System is running Fedora Core1 (2188 kernel), 3ware 7500 raid, ext3.
I have 32 nfsd processes, running NFS V3/tcp, 8192 block size,
single duplex 100Mbit etherpro. Clients have similar OS/mount opts.

I noticed a possible link to this:

http://sourceforge.net/mailarchive/message.php?msg_id=9080621

Anyone care to comment?

On the NFS FAQ it suggests raising:

# echo 262144 > /proc/sys/net/core/rmem_default
# echo 262144 > /proc/sys/net/core/rmem_max

Mine are 65536 and 131071 respectively, but the FAQ gives
strong caveats about doing this.

Here is the stats off the server (I haven't rebooted since
I switched to V3):

root@dunnet ipv4]# nfsstat -s
Server rpc stats:
calls badcalls badauth badclnt xdrcall
1699297854 2 2 0 0
Server nfs v2:
null getattr setattr root lookup readlink
907181 1% 19385896 29% 375501 0% 0 0% 5647809 8% 7246 0%
read wrcache write create remove rename
24270181 37% 0 0% 13513519 20% 261062 0% 321986 0% 64110 0%
link symlink mkdir rmdir readdir fsstat
119741 0% 2444 0% 1372 0% 2242 0% 125821 0% 45434 0%

Server nfs v3:
null getattr setattr lookup access readlink
2894636 0% 61177594 3% 1701443 0% 15723650 0% 39582580 2% 47421 0%
read write create mkdir symlink mknod
1428854462 87% 71166229 4% 763544 0% 4621 0% 18314 0% 0 0%
remove rmdir rename link readdir readdirplus
602276 0% 1631 0% 326031 0% 169218 0% 115075 0% 698649 0%
fsstat fsinfo pathconf commit
54385 0% 72137 0% 2 0% 10272411 0%

Any suggestions ?


Thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-11 11:58:39

by Olaf Kirch

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Wed, Aug 11, 2004 at 11:55:17AM +0100, Ian Thurlbeck wrote:
> System is running Fedora Core1 (2188 kernel), 3ware 7500 raid, ext3.
> I have 32 nfsd processes, running NFS V3/tcp, 8192 block size,
> single duplex 100Mbit etherpro. Clients have similar OS/mount opts.

Anyone more familiar with the Fedora kernel - does it happen
to have io barriers enabled by default?

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 12:58:34

by Steve Dickson

[permalink] [raw]
Subject: Re: Strange delays on NFS server



Ian Thurlbeck wrote:

> I am getting strange delays on our NFS server. Everything will
> be fine for maybe 5 to 10 minutes, then saving a small text file
> from a client machine will suddenly take 15 seconds or more.
> Similarly, mozilla will pause accessing a page for ages ~30secs
> (I assume it's poking about in ~/.mozilla). Everything pops back to
> normal afterwards. NFS server is lightly used at the moment.

Boy... it sure sounds to me like some type of daemon or something is
starting up
that is bring the machine to its knees... . Would it be possible to have
a top -i running
on the server when this pause occurs?

SteveD.


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 16:08:53

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Steve Dickson wrote:
>
>
> Ian Thurlbeck wrote:
>
>> I am getting strange delays on our NFS server. Everything will
>> be fine for maybe 5 to 10 minutes, then saving a small text file
>> from a client machine will suddenly take 15 seconds or more.
>> Similarly, mozilla will pause accessing a page for ages ~30secs
>> (I assume it's poking about in ~/.mozilla). Everything pops back to
>> normal afterwards. NFS server is lightly used at the moment.
>
>
> Boy... it sure sounds to me like some type of daemon or something is
> starting up
> that is bring the machine to its knees... . Would it be possible to have
> a top -i running
> on the server when this pause occurs?
>
> SteveD.
>

OK, I've been running "top -d 1 -i" and trying to see what comes up when
the server freezes. I caught one instance where a delay coincided
with about 15 nfsd + 1 kjournald process appearing in the top
display. I'm simultaneously looking at a graphical network tool to try
and see the traffic going to the server - anyone got a better suggestion?

Tomorrow I'll batch top to log to a file and get some users to
note the exact time of any delays (using the ntp-synced clocks) and
see if I can see any correlation.

I'll report back with any findings.

Many thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 16:41:45

by Olaf Kirch

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian Thurlbeck wrote:
> OK, I've been running "top -d 1 -i" and trying to see what comes up when
> the server freezes. I caught one instance where a delay coincided
> with about 15 nfsd + 1 kjournald process appearing in the top
> display. I'm simultaneously looking at a graphical network tool to try
> and see the traffic going to the server - anyone got a better suggestion?

This sounds exactly like the COMMIT stall problem for which I submitted
the early-writeout patch to this list about a week ago.

I've been thinking about this a little more. It may be that one reason
the problem is more pronounced in in 2.6 than in 2.4 is the new io
barrier code. In 2.6 ext3 uses barriers by default; Suse's 2.6 has reiserfs
patches that add barriers (and enables them by default). We've reports of
this problem on both file systems.

JFS does i/o barriers while XFS does not; and this also fits the pattern
of what Ian reports. I dimly remember there's a kernel command line
option to turn off barriers at the block io level. Can you try if
that helps, Ian?

The more I think about this, the more I believe the early-writeout patch
is the right way to address this problem (short of turning off barriers).
When data hits the NFS server, it is supposed to go to disk rather
soonishly. This also covers most of the rewrite case, at least as long
as you have just one application writing to the file - all rewriting
happens in the client cache.

The crucial question is, what is a good heursitic to choose when to
initiate a write-out. Sequential writes to the end of file are easy
enough to detect.

I have a somewhat updated version of my patch that covers just this
case, and exports a sysctl to let you tune how often it initiates
an early write-out.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 16:53:24

by Phy Prabab

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Is it possible to get the patch to test it out? I am
having some issues that are similar and would like to
see if your patch helps.

Thanks!
Phy

--- Olaf Kirch <[email protected]> wrote:

> On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian
> Thurlbeck wrote:
> > OK, I've been running "top -d 1 -i" and trying to
> see what comes up when
> > the server freezes. I caught one instance where a
> delay coincided
> > with about 15 nfsd + 1 kjournald process appearing
> in the top
> > display. I'm simultaneously looking at a graphical
> network tool to try
> > and see the traffic going to the server - anyone
> got a better suggestion?
>
> This sounds exactly like the COMMIT stall problem
> for which I submitted
> the early-writeout patch to this list about a week
> ago.
>
> I've been thinking about this a little more. It may
> be that one reason
> the problem is more pronounced in in 2.6 than in 2.4
> is the new io
> barrier code. In 2.6 ext3 uses barriers by default;
> Suse's 2.6 has reiserfs
> patches that add barriers (and enables them by
> default). We've reports of
> this problem on both file systems.
>
> JFS does i/o barriers while XFS does not; and this
> also fits the pattern
> of what Ian reports. I dimly remember there's a
> kernel command line
> option to turn off barriers at the block io level.
> Can you try if
> that helps, Ian?
>
> The more I think about this, the more I believe the
> early-writeout patch
> is the right way to address this problem (short of
> turning off barriers).
> When data hits the NFS server, it is supposed to go
> to disk rather
> soonishly. This also covers most of the rewrite
> case, at least as long
> as you have just one application writing to the file
> - all rewriting
> happens in the client cache.
>
> The crucial question is, what is a good heursitic to
> choose when to
> initiate a write-out. Sequential writes to the end
> of file are easy
> enough to detect.
>
> I have a somewhat updated version of my patch that
> covers just this
> case, and exports a sysctl to let you tune how often
> it initiates
> an early write-out.
>
> Olaf
> --
> Olaf Kirch | The Hardware Gods hate me.
> [email protected] |
> ---------------+
>
>
>
-------------------------------------------------------
> SF.Net email is sponsored by Shop4tech.com-Lowest
> price on Blank Media
> 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R
> for only $33
> Save 50% off Retail on Ink & Toner - Free Shipping
> and Free Gift.
>
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 16:58:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Wed, Aug 11, 2004 at 06:41:35PM +0200, Olaf Kirch wrote:
> I've been thinking about this a little more. It may be that one reason
> the problem is more pronounced in in 2.6 than in 2.4 is the new io
> barrier code.

2.6 mainline doesn't have barrier support.



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 19:07:22

by Steve Dickson

[permalink] [raw]
Subject: Re: Strange delays on NFS server



Ian Thurlbeck wrote:

> I'm simultaneously looking at a graphical network tool to try
> and see the traffic going to the server - anyone got a better suggestion?

Grab the latest xosview.... it show disk,cpu and nfs activity graphically...

SteveD.


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-11 19:43:03

by Norman Weathers

[permalink] [raw]
Subject: Re: Strange delays on NFS server


This would be probably what I am experiencing as well. Using XFS on our NFS
server (2.6.7 kernel, Fedora Core 2), we get exceptional numbers. Using JFS,
it stalls to the point of being unusable. Does anyone know the kenrnel
command line option that Olaf is mentioning?

On Wednesday 11 August 2004 11:41, Olaf Kirch wrote:
> On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian Thurlbeck wrote:
> > OK, I've been running "top -d 1 -i" and trying to see what comes up when
> > the server freezes. I caught one instance where a delay coincided
> > with about 15 nfsd + 1 kjournald process appearing in the top
> > display. I'm simultaneously looking at a graphical network tool to try
> > and see the traffic going to the server - anyone got a better suggestion?
>
> This sounds exactly like the COMMIT stall problem for which I submitted
> the early-writeout patch to this list about a week ago.
>
> I've been thinking about this a little more. It may be that one reason
> the problem is more pronounced in in 2.6 than in 2.4 is the new io
> barrier code. In 2.6 ext3 uses barriers by default; Suse's 2.6 has reiserfs
> patches that add barriers (and enables them by default). We've reports of
> this problem on both file systems.
>
> JFS does i/o barriers while XFS does not; and this also fits the pattern
> of what Ian reports. I dimly remember there's a kernel command line
> option to turn off barriers at the block io level. Can you try if
> that helps, Ian?
>
> The more I think about this, the more I believe the early-writeout patch
> is the right way to address this problem (short of turning off barriers).
> When data hits the NFS server, it is supposed to go to disk rather
> soonishly. This also covers most of the rewrite case, at least as long
> as you have just one application writing to the file - all rewriting
> happens in the client cache.
>
> The crucial question is, what is a good heursitic to choose when to
> initiate a write-out. Sequential writes to the end of file are easy
> enough to detect.
>
> I have a somewhat updated version of my patch that covers just this
> case, and exports a sysctl to let you tune how often it initiates
> an early write-out.
>
> Olaf

--

Norman Weathers
SIP Linux Cluster
TCE UNIX
ConocoPhillips
Houston, TX

Office: LO2003
Phone: ETN 639-2727
or (281) 293-2727


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-12 08:04:32

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Olaf Kirch wrote:
> On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian Thurlbeck wrote:
>
>>OK, I've been running "top -d 1 -i" and trying to see what comes up when
>>the server freezes. I caught one instance where a delay coincided
>>with about 15 nfsd + 1 kjournald process appearing in the top
>>display. I'm simultaneously looking at a graphical network tool to try
>>and see the traffic going to the server - anyone got a better suggestion?
>
>
> This sounds exactly like the COMMIT stall problem for which I submitted
> the early-writeout patch to this list about a week ago.
>
> I've been thinking about this a little more. It may be that one reason
> the problem is more pronounced in in 2.6 than in 2.4 is the new io
> barrier code. In 2.6 ext3 uses barriers by default; Suse's 2.6 has reiserfs
> patches that add barriers (and enables them by default). We've reports of
> this problem on both file systems.

Olaf

I'm confused - you're talking about 2.6 but I'm running 2.4.22 (FC1).
Does your analysis apply to both 2.4 and 2.6 kernels?

Subjectively, the problem seems to be getting worse the longer the
server is running (uptime 80 days currently). This could be a
red-herring of course.

Ian

PS Red-herring = misleading or incorrect clue or idea!

> JFS does i/o barriers while XFS does not; and this also fits the pattern
> of what Ian reports. I dimly remember there's a kernel command line
> option to turn off barriers at the block io level. Can you try if
> that helps, Ian?
>
> The more I think about this, the more I believe the early-writeout patch
> is the right way to address this problem (short of turning off barriers).
> When data hits the NFS server, it is supposed to go to disk rather
> soonishly. This also covers most of the rewrite case, at least as long
> as you have just one application writing to the file - all rewriting
> happens in the client cache.
>
> The crucial question is, what is a good heursitic to choose when to
> initiate a write-out. Sequential writes to the end of file are easy
> enough to detect.
>
> I have a somewhat updated version of my patch that covers just this
> case, and exports a sysctl to let you tune how often it initiates
> an early write-out.
>
> Olaf


--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-12 15:15:33

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Olaf Kirch wrote:
> On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian Thurlbeck wrote:
>
>>OK, I've been running "top -d 1 -i" and trying to see what comes up when
>>the server freezes. I caught one instance where a delay coincided
>>with about 15 nfsd + 1 kjournald process appearing in the top
>>display. I'm simultaneously looking at a graphical network tool to try
>>and see the traffic going to the server - anyone got a better suggestion?
>
>
> This sounds exactly like the COMMIT stall problem for which I submitted
> the early-writeout patch to this list about a week ago.
>
> I've been thinking about this a little more. It may be that one reason
> the problem is more pronounced in in 2.6 than in 2.4 is the new io
> barrier code. In 2.6 ext3 uses barriers by default; Suse's 2.6 has reiserfs
> patches that add barriers (and enables them by default). We've reports of
> this problem on both file systems.
>
> JFS does i/o barriers while XFS does not; and this also fits the pattern
> of what Ian reports. I dimly remember there's a kernel command line
> option to turn off barriers at the block io level. Can you try if
> that helps, Ian?
>
> The more I think about this, the more I believe the early-writeout patch
> is the right way to address this problem (short of turning off barriers).
> When data hits the NFS server, it is supposed to go to disk rather
> soonishly. This also covers most of the rewrite case, at least as long
> as you have just one application writing to the file - all rewriting
> happens in the client cache.
>
> The crucial question is, what is a good heursitic to choose when to
> initiate a write-out. Sequential writes to the end of file are easy
> enough to detect.
>
> I have a somewhat updated version of my patch that covers just this
> case, and exports a sysctl to let you tune how often it initiates
> an early write-out.
>
> Olaf

Dear All

I ran top in batch mode this afternoon, turned off imapd/smb services
for good measure, and noted the times of any delays. I then checked
the top log file and they all coincide with something like this
turning up in the process list:

14:28:30 up 80 days, 5:25, 2 users, load average: 0.42, 0.15, 0.12
118 processes: 116 sleeping, 1 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0%
Mem: 514664k av, 380296k used, 134368k free, 0k shrd, 37464k
buff
66096k active, 231056k inactive
Swap: 1052216k av, 14712k used, 1037504k free 239788k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
6365 root 16 0 1244 1244 880 R 0.9 0.2 0:15 0 top
156 root 15 0 0 0 0 DW 0.0 0.0 25:45 0 kjournald


The kjournald process is quickly joined by a bunch of nfsd's:


14:28:34 up 80 days, 5:25, 2 users, load average: 0.63, 0.20, 0.14
118 processes: 115 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 38.0% 0.0% 31.4% 0.0% 0.0% 0.0% 30.4%
Mem: 514664k av, 509240k used, 5424k free, 0k shrd, 29328k
buff
64028k active, 361036k inactive
Swap: 1052216k av, 14712k used, 1037504k free 375832k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
6365 root 16 0 1244 1244 880 R 0.9 0.2 0:15 0 top
156 root 15 0 0 0 0 DW 0.0 0.0 25:45 0 kjournald
2021 root 15 0 0 0 0 DW 0.0 0.0 17:38 0 nfsd
2037 root 15 0 0 0 0 DW 0.0 0.0 17:51 0 nfsd
2042 root 15 0 0 0 0 DW 0.0 0.0 17:07 0 nfsd
2044 root 15 0 0 0 0 DW 0.0 0.0 16:44 0 nfsd

And then the bdflush process joins in and the number of nfsd processes
grows, then shrinks back, and finally they all disappear.

Full log is at:

http://www.stams.strath.ac.uk/~ian/nfs/

File top.log (2.8MB)

Events are around 14:29:10, 14:49:00, 14:47:50, and a belter at 15:11:05

Is there anything more useful I can do ??

Many thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-13 14:53:32

by Steve Dickson

[permalink] [raw]
Subject: Re: Strange delays on NFS server



Ian Thurlbeck wrote:

> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
> 6365 root 16 0 1244 1244 880 R 0.9 0.2 0:15 0 top
> 156 root 15 0 0 0 0 DW 0.0 0.0 25:45 0 kjournald
> 2021 root 15 0 0 0 0 DW 0.0 0.0 17:38 0 nfsd
> 2037 root 15 0 0 0 0 DW 0.0 0.0 17:51 0 nfsd
> 2042 root 15 0 0 0 0 DW 0.0 0.0 17:07 0 nfsd
> 2044 root 15 0 0 0 0 DW 0.0 0.0 16:44 0 nfsd
>
> And then the bdflush process joins in and the number of nfsd processes
> grows, then shrinks back, and finally they all disappear.
>
Hmm... this is kinda what I expected... nfsd is waiting on the local
filesystem....
Would it be nice if nfsd would know how to speak aio... but back to
reality.....
What happens if you double/triple (assuming you have enough memory) the
number nfsd
that are started? does it help or hurt? My guess is it probably will not
make a difference
but you never know....

SteveD.


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-16 12:40:50

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Steve Dickson wrote:
>
>
> Ian Thurlbeck wrote:
>
>> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
>> 6365 root 16 0 1244 1244 880 R 0.9 0.2 0:15 0 top
>> 156 root 15 0 0 0 0 DW 0.0 0.0 25:45 0 kjournald
>> 2021 root 15 0 0 0 0 DW 0.0 0.0 17:38 0 nfsd
>> 2037 root 15 0 0 0 0 DW 0.0 0.0 17:51 0 nfsd
>> 2042 root 15 0 0 0 0 DW 0.0 0.0 17:07 0 nfsd
>> 2044 root 15 0 0 0 0 DW 0.0 0.0 16:44 0 nfsd
>>
>> And then the bdflush process joins in and the number of nfsd processes
>> grows, then shrinks back, and finally they all disappear.
>>
> Hmm... this is kinda what I expected... nfsd is waiting on the local
> filesystem....
> Would it be nice if nfsd would know how to speak aio... but back to
> reality.....
> What happens if you double/triple (assuming you have enough memory) the
> number nfsd
> that are started? does it help or hurt? My guess is it probably will not
> make a difference

I bumped the nfsd's up to 64 (from 32) and subjectively the problem gets
worse. I then reduced them to 16 and things are a bit better...

Would changing some of the bdflush settings help at all?

Thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs