2005-11-15 18:51:50

by Joshua Baker-LePain

[permalink] [raw]
Subject: Odd errors and bad performance -- NFS/MD/centOS 4

I recently upgraded 2 of my older, 2TB file servers from RH7.3(!!) to
centos-4. The servers each have 2 3ware 7500-8 boards and 16 drives. In
7.3, I ran the 3wares in hardware RAID5 mode (with a hot spare), and a
software RAID0 across the 2 arrays. I used XFS, and saw local write
speeds of 150MB/s and reads of 300MB/s.

Given RH's absurd attitude toward XFS, I decided it was time to bite the
bullet and transition to ext3. So I took a snapshot backup, reinstalled,
and tried to run in the same setup. Performance with ext3 was a joke.
Despite days of tweaking (all detailed on nahant-list), ext3 topped out at
about 34 MB/s writing.

So I replicated my setup in software RAID -- 2 RAID5s and a RAID0 of those
(I stayed away from 1 big RAID5 so as not to lose any redundancy compared
to the original setup, and RAID6 in centos 4 seems to have a bad resync
bug). Local performance was just fine -- 120MB/s writes and 180MB/s reads
(as measured by bonnie++ -- tiobench gave good numbers as well). Yay, say
I, now I can finally start restoring the data.

However, the NFS performance of these beasts is bad with the added fun of
odd quirks. Directory listings of even small directories can hang for
long (10+ min) periods of times on one client while instantaneously
returning on another. At random times, users report that they get "No
such file or directory" when trying to 'ls' dirs they know are there, or
get "Stale NFS file handle" when 'cd'ing into said directories. These
tend to be accompanied by the following in the client logs:

RPC: error 512 connecting to server $SERVER
nfs_statfs: statfs error = 512

Clients are a mix of both centOS 3 and centOS 4, both 100Mbps and 1Gbps.
The FSs are exported with (rw,sync,no_root_squash) and mounted with
wsize=32768,rsize=32768,hard,intr,tcp. The servers are Gbps with the
following TCP related sysctl.conf options:

net.core.wmem_max = 8388608
net.core.rmem_max = 8388608
net.ipv4.tcp_rmem = 4096 16777216 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

In terms of performance, a remote bonnie++ run via a gigabit connected
client gives these numbers:

Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
$HOST 4G 11392 2 8153 4 50265 13 250.2 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 71 0 7031 10 76 0 70 0 8070 9 75 0
harry.egr.duke.edu,4G,,,11392,2,8153,4,,,50265,13,250.2,0,16,71,0,7031,10,76,0,70,0,8070,9,75,0

The write speed and the creation and deletion speeds seem awfully slow to
me. In addition, the load on the server goes *very* high despite little
actual CPU usage (see <http://www.duke.edu/~jlb17/md.html> for ganglia
generated graphs during the bonnie run). I've tried pinning the IRQs for
the network interfaces and 3wares to separate CPUs, but that had little to
no effect on performance.

While I can somewhat live with the performance (although I'd rather not
have to), the errors are frustrating as hell (especially as they're
difficult to reproduce at will). Is there a differential diagnosis for
this galaxy of symptoms (sorry -- my wife's a doc)? Any help would be
*much* appreciated.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-11-15 23:24:11

by NeilBrown

[permalink] [raw]
Subject: Re: Odd errors and bad performance -- NFS/MD/centOS 4

On Tuesday November 15, [email protected] wrote:
> I recently upgraded 2 of my older, 2TB file servers from RH7.3(!!) to
> centos-4. The servers each have 2 3ware 7500-8 boards and 16 drives. In
> 7.3, I ran the 3wares in hardware RAID5 mode (with a hot spare), and a
> software RAID0 across the 2 arrays. I used XFS, and saw local write
> speeds of 150MB/s and reads of 300MB/s.
>
> Given RH's absurd attitude toward XFS, I decided it was time to bite the
> bullet and transition to ext3. So I took a snapshot backup, reinstalled,
> and tried to run in the same setup. Performance with ext3 was a joke.
> Despite days of tweaking (all detailed on nahant-list), ext3 topped out at
> about 34 MB/s writing.

Sounds like the 3WARE driver is busted in recent kernels...

>
> So I replicated my setup in software RAID -- 2 RAID5s and a RAID0 of those
> (I stayed away from 1 big RAID5 so as not to lose any redundancy compared
> to the original setup, and RAID6 in centos 4 seems to have a bad resync
> bug). Local performance was just fine -- 120MB/s writes and 180MB/s reads
> (as measured by bonnie++ -- tiobench gave good numbers as well). Yay, say
> I, now I can finally start restoring the data.
>
> However, the NFS performance of these beasts is bad with the added fun of
> odd quirks. Directory listings of even small directories can hang for
> long (10+ min) periods of times on one client while instantaneously
> returning on another. At random times, users report that they get "No
> such file or directory" when trying to 'ls' dirs they know are there, or
> get "Stale NFS file handle" when 'cd'ing into said directories. These
> tend to be accompanied by the following in the client logs:

What kernel version, and in particular, what is the timestamp in
'uname -a'. There was a bug that was fixed around April (?) which
affected NFS service of EXT3 filesystems with hash-directories
enabled. This particularly hit redhat as they turn on
hash-directories, and so probably affects CentOS too. You can try
using tune2fs to turn off hash-directories and see what happens.
That could explain the 'No such file or directory' and 'Stale' errors,
but I don't think it explains the slow directory listing.

>
> In terms of performance, a remote bonnie++ run via a gigabit connected
> client gives these numbers:
>
> Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> $HOST 4G 11392 2 8153 4 50265 13 250.2 0
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 71 0 7031 10 76 0 70 0 8070 9 75 0
> harry.egr.duke.edu,4G,,,11392,2,8153,4,,,50265,13,250.2,0,16,71,0,7031,10,76,0,70,0,8070,9,75,0
>
> The write speed and the creation and deletion speeds seem awfully slow to
> me. In addition, the load on the server goes *very* high despite little
> actual CPU usage (see <http://www.duke.edu/~jlb17/md.html> for ganglia
> generated graphs during the bonnie run). I've tried pinning the IRQs for
> the network interfaces and 3wares to separate CPUs, but that had little to
> no effect on performance.

Writing to NFS usually is slow as the server has to commit everything
to disk before returning. You can try 'data=journal' as a mount
option. It sometimes makes NFS writes faster, but it might make local
writes slower.

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-16 13:46:01

by Joshua Baker-LePain

[permalink] [raw]
Subject: Re: Odd errors and bad performance -- NFS/MD/centOS 4

On Wed, 16 Nov 2005 at 10:24am, Neil Brown wrote

> On Tuesday November 15, [email protected] wrote:
>>
>> Given RH's absurd attitude toward XFS, I decided it was time to bite the
>> bullet and transition to ext3. So I took a snapshot backup, reinstalled,
>> and tried to run in the same setup. Performance with ext3 was a joke.
>> Despite days of tweaking (all detailed on nahant-list), ext3 topped out at
>> about 34 MB/s writing.
>
> Sounds like the 3WARE driver is busted in recent kernels...

Not really -- I didn't tell you the whole story. I also tested ext2 and
XFS (via the centosplus kernel):

write read
----- ----
ext2 81 180
ext3 34 222
XFS 109 213

Given that, I tried ext3 with an external journal, but even that didn't
help.

> What kernel version, and in particular, what is the timestamp in
> 'uname -a'. There was a bug that was fixed around April (?) which
> affected NFS service of EXT3 filesystems with hash-directories
> enabled. This particularly hit redhat as they turn on
> hash-directories, and so probably affects CentOS too. You can try
> using tune2fs to turn off hash-directories and see what happens.
> That could explain the 'No such file or directory' and 'Stale' errors,
> but I don't think it explains the slow directory listing.

uname -a says "2.6.9-11.ELsmp #1 SMP Wed Jun 8 17:54:20 CDT 2005". Also,
Seth pointed out to me on another list that this may well be fixed in the
most recent centos kernel, due to
<https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158293>. I'm going
to give that kernel a shot.

> Writing to NFS usually is slow as the server has to commit everything
> to disk before returning. You can try 'data=journal' as a mount
> option. It sometimes makes NFS writes faster, but it might make local
> writes slower.

These systems are pure file servers, so local performance is of no
concern. I'll give data=journal a shot. Thanks!

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-16 16:58:23

by Joshua Baker-LePain

[permalink] [raw]
Subject: Re: Odd errors and bad performance -- NFS/MD/centOS 4

On Wed, 16 Nov 2005 at 8:45am, Joshua Baker-LePain wrote

> On Wed, 16 Nov 2005 at 10:24am, Neil Brown wrote

>> Writing to NFS usually is slow as the server has to commit everything
>> to disk before returning. You can try 'data=journal' as a mount
>> option. It sometimes makes NFS writes faster, but it might make local
>> writes slower.
>
> These systems are pure file servers, so local performance is of no concern.
> I'll give data=journal a shot. Thanks!

data=journal was actually worse than the default data=ordered, while
data=writeback was slightly better (but not enough, in my mind, to justify
using it). I guess I just expected to see the CPUs maxed out if disk were
the bottleneck on a software RAID system.

Thanks again.

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-11-16 19:55:52

by Joshua Baker-LePain

[permalink] [raw]
Subject: Re: Odd errors and bad performance -- NFS/MD/centOS 4

On Wed, 16 Nov 2005 at 8:45am, Joshua Baker-LePain wrote

> On Wed, 16 Nov 2005 at 10:24am, Neil Brown wrote

>> What kernel version, and in particular, what is the timestamp in
>> 'uname -a'. There was a bug that was fixed around April (?) which
>> affected NFS service of EXT3 filesystems with hash-directories
>> enabled. This particularly hit redhat as they turn on
>> hash-directories, and so probably affects CentOS too. You can try
>> using tune2fs to turn off hash-directories and see what happens.
>> That could explain the 'No such file or directory' and 'Stale' errors,
>> but I don't think it explains the slow directory listing.
>
> uname -a says "2.6.9-11.ELsmp #1 SMP Wed Jun 8 17:54:20 CDT 2005". Also,
> Seth pointed out to me on another list that this may well be fixed in the
> most recent centos kernel, due to
> <https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=158293>. I'm going to
> give that kernel a shot.

It didn't help. 'uname -a' now says "2.6.9-22.0.1.ELsmp #1 SMP Thu Oct 27
13:14:25 CDT 2005", but a client just hung doing 'df' and got this in the
logs:

RPC: error 512 connecting to server $SERVER
nfs_statfs: statfs error = 512

I've now disabled hash-directories. Any other hints on places to look?

--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University


-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
Register for a JBoss Training Course. Free Certification Exam
for All Training Attendees Through End of 2005. For more info visit:
http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs