2004-08-24 09:48:29

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server


Dear Olaf, NFS list

Olaf sent me a patch to test:

--- fs/nfsd/vfs.c.orig 2003-06-13 16:51:37.000000000 +0200
+++ fs/nfsd/vfs.c 2004-08-16 16:52:53.000000000 +0200
@@ -760,6 +760,21 @@
}
last_ino = inode->i_ino;
last_dev = inode->i_dev;
+ } else if (err >= 0 && !stable) {
+ /* If we've been writing several pages, schedule them
+ * for the disk immediately. The client may be streaming
+ * and we don't want to hang on a huge journal sync when the
+ * commit comes in
+ */
+ struct address_space *mapping;
+
+ /* This assumes a minimum page size of 1K, and will issue
+ * a filemap_flushfast call every 64 pages written by the
+ * client. */
+ if ((cnt & 4095) == 0
+ && ((offset >> 12) & 63) == 0
+ && (mapping = inode->i_mapping) != NULL)
+ filemap_fdatasync(mapping);
}

dprintk("nfsd: write complete err=%d\n", err);



I have tested the patched kernel - it is no different. In fact
I'd say it's slightly worse. I've been running "atop" to monitor
net/disk io and during an "event" I see that the main 3ware raid
array is writing at 100% (60MB/sec+) for up to 50 seconds. There
is little network traffic. During this period client machines are
unusable if they are requesting stuff off the server.

What on earth could be writing up to 3GB of data to the drive ?
As before the usual suspects of kjournald, kupdated, bdflush and
nfsd turn up on the process list.

Just how big is the nfs write cache on a typical server ? Is it
writing the whole lot one go ?

Thanks for your efforts

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-24 10:27:25

by Jan Bruvoll

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Dear all,

Ian Thurlbeck wrote:

> I have tested the patched kernel - it is no different. In fact
> I'd say it's slightly worse. I've been running "atop" to monitor
> net/disk io and during an "event" I see that the main 3ware raid
> array is writing at 100% (60MB/sec+) for up to 50 seconds. There
> is little network traffic. During this period client machines are
> unusable if they are requesting stuff off the server.


I haven't had the opportunity to test things this thoroughly yet, but I
see the same symptoms on our server here - when the "hick-up" comes, the
server, if under heavy fire, will be completely unresponsive for as long
as it is trying to get out of the mess. After a little while, things
will clear up again (I guess that could have to do with clients just
giving up on things), just to clog up about a minute later.

And as Ian mentions, the clients will be stuck for as long as this goes on.

Thanks very much for your continued efforts - if there is anything I
could do to help debug this issue or otherwise contribute to the fixing
of this problem, I'd happily oblige!

Best regards
Jan



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-25 02:02:55

by Greg Banks

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Tue, Aug 24, 2004 at 10:48:14AM +0100, Ian Thurlbeck wrote:
>
> What on earth could be writing up to 3GB of data to the drive ?
> As before the usual suspects of kjournald, kupdated, bdflush and
> nfsd turn up on the process list.
>
> Just how big is the nfs write cache on a typical server ?

It can be most of RAM.

> Is it
> writing the whole lot one go ?

It might be. What ratio of COMMIT to WRITE calls are you
seeing in nfsstat? Does the export have "sync" or "async"
in its options?

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-25 08:40:47

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Greg Banks wrote:
> On Tue, Aug 24, 2004 at 10:48:14AM +0100, Ian Thurlbeck wrote:
>
>>What on earth could be writing up to 3GB of data to the drive ?
>>As before the usual suspects of kjournald, kupdated, bdflush and
>>nfsd turn up on the process list.
>>
>>Just how big is the nfs write cache on a typical server ?
>
>
> It can be most of RAM.
>
>
>>Is it
>>writing the whole lot one go ?
>
>
> It might be. What ratio of COMMIT to WRITE calls are you
> seeing in nfsstat? Does the export have "sync" or "async"
> in its options?
>
> Greg.

Greg

Server stats since reboot Monday morning. It is lightly
loaded mostly:

Server nfs v3:
null getattr setattr lookup access readlink
85912 2% 1626277 46% 12375 0% 135833 3% 1295417 36% 773 0%
read write create mkdir symlink mknod
136099 3% 151569 4% 7899 0% 64 0% 194 0% 0 0%
remove rmdir rename link readdir readdirplus
6270 0% 37 0% 2931 0% 969 0% 266 0% 4893 0%
fsstat fsinfo pathconf commit
1508 0% 890 0% 0 0% 41205 1%

So that's about 4 writes for every commit.

File system export:

/export/raid50 @unix(rw,sync,no_wdelay,hide,nocrossmnt,secure,root_squash,
no_all_squash,subtree_check,secure_locks,mapping=identity,
anonuid=-2,anongid=-2)

I've got Neil's script running to correlate disk writes with
nfs ops and these slowdown event times. I'll post the results
in a day or so.

Thanks

Ian
--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-25 10:02:35

by Greg Banks

[permalink] [raw]
Subject: Re: Strange delays on NFS server

On Wed, Aug 25, 2004 at 09:40:36AM +0100, Ian Thurlbeck wrote:
> Server stats since reboot Monday morning. It is lightly
> loaded mostly:
>
> Server nfs v3:
> null getattr setattr lookup access readlink
> 85912 2% 1626277 46% 12375 0% 135833 3% 1295417 36% 773 0%
> read write create mkdir symlink mknod
> 136099 3% 151569 4% 7899 0% 64 0% 194 0% 0 0%
> remove rmdir rename link readdir readdirplus
> 6270 0% 37 0% 2931 0% 969 0% 266 0% 4893 0%
> fsstat fsinfo pathconf commit
> 1508 0% 890 0% 0 0% 41205 1%

Yes, it is lightly loaded. You said your block size is 8K, so
the server has seen about 1.18 GB of writes in 3 days, which is
just over twice RAM size. How many of these slowdown events
did you see in that time?

Can you run "vmstat 1" and log the result please?

You said you have a 3ware RAID controller. What is RAID
configuration, especially RAID level and stripe width? What options
did you use to build the ext3 filesystem?

> So that's about 4 writes for every commit.

Ok, so your clients are doing enough COMMITs.

> File system export:
>
> /export/raid50 @unix(rw,sync,no_wdelay,hide,nocrossmnt,secure,root_squash,
> no_all_squash,subtree_check,secure_locks,mapping=identity,
> anonuid=-2,anongid=-2)

Hmm, no "async". So much for that idea.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-25 10:38:17

by Ian Thurlbeck

[permalink] [raw]
Subject: Re: Strange delays on NFS server

Greg Banks wrote:
> On Wed, Aug 25, 2004 at 09:40:36AM +0100, Ian Thurlbeck wrote:
>
>>Server stats since reboot Monday morning. It is lightly
>>loaded mostly:
>>
>>Server nfs v3:
>>null getattr setattr lookup access readlink
>>85912 2% 1626277 46% 12375 0% 135833 3% 1295417 36% 773 0%
>>read write create mkdir symlink mknod
>>136099 3% 151569 4% 7899 0% 64 0% 194 0% 0 0%
>>remove rmdir rename link readdir readdirplus
>>6270 0% 37 0% 2931 0% 969 0% 266 0% 4893 0%
>>fsstat fsinfo pathconf commit
>>1508 0% 890 0% 0 0% 41205 1%
>
>
> Yes, it is lightly loaded. You said your block size is 8K, so
> the server has seen about 1.18 GB of writes in 3 days, which is
> just over twice RAM size. How many of these slowdown events
> did you see in that time?

Well, at least 10 per day, probably more.

> Can you run "vmstat 1" and log the result please?

Not sure what you mean. Quick snapshot follows, but I'll leave it
running and watch what happens during an "event".

procs memory swap io system
cpu
r b swpd free buff cache si so bi bo in cs us
sy wa id
0 0 12248 43816 25148 70164 1 0 141 203 234 102 2
2 0 96
0 0 12248 43804 25148 70168 0 0 0 0 114 82 0
2 0 98
0 0 12248 43772 25180 70168 0 0 0 212 154 125 0
1 0 99
0 0 12248 43772 25180 70168 0 0 0 0 111 82 0
1 0 99


> You said you have a 3ware RAID controller. What is RAID
> configuration, especially RAID level and stripe width? What options
> did you use to build the ext3 filesystem?

Capacity: 247.04 GB (482505216 blocks)
Write Cache: In Use
Configuration: Striped with Parity 64K (RAID 5)
Stripe size: 64K (65536 bytes)

Ext3 options: anyone know what the default is for Fedora Core 1?

Here is output from tune2fs -l:

tune2fs 1.34 (25-Jul-2003)
Filesystem volume name: /spare
Last mounted on: <not available>
Filesystem UUID: 5070c833-cdfe-408b-ae12-ee4587e7ffc4
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery
sparse_super large_file
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 30162944
Block count: 60312018
Reserved block count: 3015600
Free blocks: 34369240
Free inodes: 29320077
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Filesystem created: Mon May 19 03:28:12 2003
Last mount time: Mon Aug 23 08:45:09 2004
Last write time: Mon Aug 23 08:45:09 2004
Mount count: 31
Maximum mount count: -1
Last checked: Mon May 19 03:28:12 2003
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: e37b261f-2740-4f6c-a970-d4aade2b54be

Ian

>
>>So that's about 4 writes for every commit.
>
>
> Ok, so your clients are doing enough COMMITs.
>
>
>>File system export:
>>
>>/export/raid50 @unix(rw,sync,no_wdelay,hide,nocrossmnt,secure,root_squash,
>>no_all_squash,subtree_check,secure_locks,mapping=identity,
>>anonuid=-2,anongid=-2)
>
>
> Hmm, no "async". So much for that idea.
>
> Greg.


--
Ian Thurlbeck http://www.stams.strath.ac.uk/
Statistics and Modelling Science, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, UK, G1 1XH
Tel: +44 (0)141 548 3667 Fax: +44 (0)141 552 2079



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs