2005-09-30 06:51:32

by subbie subbie

[permalink] [raw]
Subject: 3Ware 9500S-12 RAID controller -- poor performance

Dear list,

After almost two weeks of experimentation, google
searches and reading of posts, bug reports and
discussions I'm still far from an answer. I'm hoping
someone on this list could shed some light on the
subject.

I'm using a 3Ware 9500S-12 card and am able to produce
up to 400MB/s sustained read from my 12-disk 4.1TB
RAID5 SATA array, 128MB cache onboard, ext3 formatted.
All is well when performing a single read -- it
works nice and fast.

The system is a web server, serving mid-size files
(50MB, each, on average). All hell breaks loose when
doing many concurrent reads, anywhere between 200 to
400 concurrent streams things simply grind to a halt
and the system transfers a maximum of 12-14MB/s.

I'm in the process of clearing up the array (this
would take some time) and restructuring it to JBOD
mode in order to use each disk individually. I will
use a filesystem more suitable to streaming large
files, such as XFS. But this would take time and I
would very much appreciate the advice of people in the
know if this is going to help at all. It's hard for
me to make extreme experimentation (deleting,
formatting, reformatting) as this is a productio n
system with many files that I have no other place to
dump until they can be safely removed. Though I'm
working on dumping them slowly to other, remote,
machines.

I'm running the latest kernel, 2.6.13.2 and the latest
3Ware driver, taken from the 3ware.com web site which
upon insmod, updates the card's firmware to the latest
version as well.

In my experiments, I've tried using larger readahead,
currently at 16k (this helps, higher values do not
seem to help much), using the deadline scheduler for
this device, booting the system with the 'noapic'
option and playing with a bunch of VM tunable
parameters which I'm not sure that I should really be
touching. At the moment only the readahead
modification is used as the other stuff simply didn't
help at all.

With the stock kernel shipped with my distribution,
2.6.8 and its old 3ware driver things were just as
worse but manifested themselves differently. The
system was visibly (top, vmstat...) spending most of
its time in io-wait and load average was extremely
high, in the area of 10 to 20. With the recent
kernel and driver mentioned above, the excessive
io-wait and load seems to have been resolved and
observed loadavg is between 1 and 4.

I don't have much experience with systems that are
supposed to stream many files concurrently off a
hardware RAID of this configuration, but my gut
feeling is that something is very wrong and I should
be seeing a much higher read throughput.

Trying to preempt people's questions I've tried to
include as much information as possible, a lot of
stuff is pasted below.

I've just seen that the 3ware driver shares the same
IRQ with my ethernet card, which has got me a little
worried, should I be?

System uptime, exactly 1 day:

# cat /proc/interrupts
CPU0 CPU1
0: 21619638 0 XT-PIC timer
2: 0 0 XT-PIC cascade
8: 4 0 XT-PIC rtc
10: 268753224 0 XT-PIC 3w-9xxx,
eth0
14: 11 0 XT-PIC ide0
15: 337881 0 XT-PIC libata
NMI: 0 0
LOC: 21110402 21557685
ERR: 0
MIS: 0

# free
total used free shared
buffers cached
Mem: 2075260 2024724 50536 0
5184 1388408
-/+ buffers/cache: 631132 1444128
Swap: 3903784 0 3903784

# vmstat -n 1 (output of the last few seconds):
procs -----------memory---------- ---swap--
-----io---- --system-- ----cpu----
0 0 0 49932 4760 1392980 0 0 15636
32 3169 3697 4 6 30 60
0 0 0 50816 4752 1392376 0 0 5844
0 3114 3929 3 5 91 1
0 0 0 50924 4772 1391404 0 0 9360
0 3187 4348 6 6 76 13
0 2 0 50552 4780 1391532 0 0 24976
44 4077 3906 3 7 65 25
0 1 0 50444 4780 1392688 0 0 20192
0 5048 3914 7 8 56 30
0 1 0 50568 4756 1392508 0 0 21248
0 4060 3603 4 6 48 41
0 0 0 50704 4724 1392268 0 0 30004
0 3834 3369 4 9 65 22
0 3 0 50556 4728 1392468 0 0 3248
1832 2974 4514 2 5 58 35
0 3 0 50308 4724 1392200 0 0 1288
336 1766 1886 1 3 50 47
0 4 0 50308 4732 1391852 0 0 2300
408 1919 2158 0 3 51 46
0 4 0 50556 4736 1390692 0 0 1856
532 1488 1846 3 1 50 46
0 3 0 50680 4740 1390620 0 0 4016
1296 1577 1682 2 2 50 47
0 3 0 50432 4752 1391628 0 0 2180
72 1730 1945 2 2 51 46
2 2 0 49924 4772 1391540 0 0 44372
564 3403 2847 4 5 50 42
0 0 0 50684 4784 1391528 0 0 28640
216 3804 3847 7 8 69 16

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 1
cpu MHz : 2993.035
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
pbe nx lm pni monitor ds_c
pl cid cx16 xtpr
bogomips : 5993.68

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 1
cpu MHz : 2993.035
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 3
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
pbe nx lm pni monitor ds_c
pl cid cx16 xtpr
bogomips : 5985.52

# lspci
0000:00:00.0 Host bridge: Intel Corporation E7520
Memory Controller Hub (rev 0c)
0000:00:00.1 ff00: Intel Corporation E7525/E7520 Error
Reporting Registers (rev
0c)
0000:00:01.0 System peripheral: Intel Corporation
E7520 DMA Controller (rev 0c)
0000:00:02.0 PCI bridge: Intel Corporation
E7525/E7520/E7320 PCI Express Port A
(rev 0c)
0000:00:04.0 PCI bridge: Intel Corporation E7525/E7520
PCI Express Port B (rev 0
c)
0000:00:05.0 PCI bridge: Intel Corporation E7520 PCI
Express Port B1 (rev 0c)
0000:00:06.0 PCI bridge: Intel Corporation E7520 PCI
Express Port C (rev 0c)
0000:00:1d.0 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
0000:00:1d.7 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI
Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER
(ICH5/ICH5R) LPC Interface
Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation
82801EB/ER (ICH5/ICH5R) IDE Contro
ller (rev 02)
0000:00:1f.2 IDE interface: Intel Corporation 82801EB
(ICH5) SATA Controller (re
v 02)
0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER
(ICH5/ICH5R) SMBus Controller (
rev 02)
0000:01:00.0 PCI bridge: Intel Corporation 6700PXH PCI
Express-to-PCI Bridge A (
rev 09)
0000:01:00.1 PIC: Intel Corporation 6700/6702PXH
I/OxAPIC Interrupt Controller A
(rev 09)
0000:01:00.2 PCI bridge: Intel Corporation 6700PXH PCI
Express-to-PCI Bridge B (
rev 09)
0000:01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC
Interrupt Controller B (rev
09)
0000:03:01.0 RAID bus controller: 3ware Inc
9xxx-series SATA-RAID
0000:05:00.0 Ethernet controller: Marvell Technology
Group Ltd. 88E8050 Gigabit
Ethernet Controller (rev 18)
0000:07:04.0 Ethernet controller: Intel Corporation
82541GI/PI Gigabit Ethernet
Controller (rev 05)
0000:07:0c.0 VGA compatible controller: ATI
Technologies Inc Rage XL (rev 27)

As you can see, this is a fairly recent motherboard
that's supposed to perform well, though I don't know
the manufacturer of this board as the machine is
hosted and I don't have physical access, though I can
ask them if anyone would like to know.

The ethernet card actually being used is the Intel
E1000 with NAPI support compiled.

If there's any bit of information that's missing,
please let me know and I'd he happy to provide it
quickly.

If you can suggest a better (non Netapp, EMC, etc)
solution that is somehow affordable and can provide
the very high read throughputs please let me know, I'm
very interested in solutions that can staturate
multiple gigabit links (of course, using more than one
machine ;)

Please CC me on any replies as I'm not subscribed to
the list.

Thank you for listening!



__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com


2005-10-05 20:07:46

by Miquel van Smoorenburg

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

In article <[email protected]>,
subbie subbie <[email protected]> wrote:
>I'm using a 3Ware 9500S-12 card and am able to produce
>up to 400MB/s sustained read from my 12-disk 4.1TB
>RAID5 SATA array, 128MB cache onboard, ext3 formatted.
> All is well when performing a single read -- it
>works nice and fast.
>
>The system is a web server, serving mid-size files
>(50MB, each, on average). All hell breaks loose when
>doing many concurrent reads, anywhere between 200 to
>400 concurrent streams things simply grind to a halt
>and the system transfers a maximum of 12-14MB/s.

There are a couple of things you should do:

1. Use the CFQ I/O scheduler, and increase nr-requests:
echo cfq > /sys/block/hda/queue/scheduler
echo 1024 > /sys/block/hda/queue/nr_requests

2. Make sure that your filesystem knows about the stripe size
and number of disks in the array. E.g. for a raid5 array
with a stripe size of 64K and 6 disks (effectively 5,
because in every stripe-set there is on disk doing parity):

# ext3 fs, 5 disks, 64K stripe, units in 4K blocks
mkfs -text3 -E stride=$((64/4))

# xfs, 5 disks, 64K stripe, units in 512 bytes
mkfs -txfs -d sunit=$((64*2)) -d swidth=$((5*64*2))

3. Don't use partitions. Partions do not start on a multiple of
the (stripe_size * nr_disks), so your I/O will be misaligned
and the settings in (2) will have no or an adverse effect.
If you must use partitions, either build them manually
with sfdisk so that partitions do start on that multiple,
or use LVM.

4. Reconsider your stripe size for streaming large files.
If you have say 4 disks, and a 64K
stripe size, then a read of a block of 256K will busy all 4 disks.
Many simultaneous threads reading blocks of 256K will result in
trashings disks as they all want to read from all 4 disks .. so
in that case, using a stripesize of 256K will make things better.
One read of 256K (in the ideal, aligned case) will just keep one
disk busy. 4 reads can happen in parallel without trashing. Esp.
in this case, you need the alignment I talked about in (3).

5. Defragment the files.
If the files are written sequentially, they will not be fragmented.
But if they were stored by writing to thousands of them appending
a few K at a time in round-robin fashion, you need to defragment..
in the case of XFS, run xfs_fsr every so often.

Good luck,

Mike.

2005-10-07 21:35:59

by Jon Burgess

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

You might be interested in trying a small tool I wrote to perform some
parallel write tests on different linux filesystems.

http://marc.theaimsgroup.com/?l=linux-kernel&m=107661735307313&w=2

At the time that I wrote the tool, 18 months ago, both ext3 and
reiserfsV3 performed fairly badly at handling concurrent writes and only
JFS and XFS excelled. Since then I believe the ext3 performance has been
greatly improved due to the block reservation scheme added in 2.6.10.
AFAIK the reiserfs performance is only addressed in reiserfsV4.

The test code is fairly trivial and could be easily adapted to simulate
other workloads (like a web server) to help to optimise your filesystem
and driver performance.

tiobench provides another threaded IO test http://tiobench.sourceforge.net/

Jon

2005-10-10 10:42:23

by subbie subbie

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

OK,

I now dumped RAID5 and am running all of my 12 disks
separately each partitioned with XFS.

I did a very crude test of reading a single 1GB file
from each of my disks in parallel by putting 12 dd
processes into the background. Each file was read at
approximately 35MB/s giving an aggragate of a little
over 400MB/s. According to 3Ware support, 400MB/s is
the "theoretical maximum" of this controller. I'm
very happy with these results.

I want to run a killer test where 400 files are being
read in parallel to see what the combined throughput
would be. Can anyone recommend a benchmark utility
that would help me do so? I tried using bonnie/iozone
but they (to my limited understanding) won't do this.

Can anyone point me in the right direction?

Thank you




__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

2005-10-10 10:54:27

by Dr. David Alan Gilbert

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

* subbie subbie ([email protected]) wrote:
> OK,
>
> I now dumped RAID5 and am running all of my 12 disks
> separately each partitioned with XFS.
>
>
> I did a very crude test of reading a single 1GB file
> from each of my disks in parallel by putting 12 dd
> processes into the background. Each file was read at
> approximately 35MB/s giving an aggragate of a little
> over 400MB/s. According to 3Ware support, 400MB/s is
> the "theoretical maximum" of this controller. I'm
> very happy with these results.

Nice. Have you tried Software RAID5 on top of that?
I would be very interested to know how software RAID5
goes relative to the 3Ware hardware.


Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2005-10-10 11:07:58

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

On Mon, 10 Oct 2005, Dr. David Alan Gilbert wrote:

> Nice. Have you tried Software RAID5 on top of that? I would be very
> interested to know how software RAID5 goes relative to the 3Ware
> hardware.

There have been hundreds of email regarding this on the
[email protected] list. Please look in the archives.

It's well known that 3ware hw raid is slow when writing, current theory is
that this is due to the lack of buffering meaning that any write makes it
read a lot as well, destroying performance. Generally, the performance
numbers advertised by 3ware when writing is a dd to the drive itself (I
got this number after doing a support request on it a few years back),
without a filesystem. This goes very quickly, but writing files on a
filesystem is usually very slow (10 megabyte/s or so). When doing SW raid
the SW layer has access to the memory block cache and can thus avoid a lot
of physical reads on the drives.

I never had any problems getting good read speeds on the HW raid.

My experience is with the 7500 series, the 9500 series has cache as well
but this doesn't seem to have solved a lot of the performance problems
seen with the 7500 series.

--
Mikael Abrahamsson email: [email protected]

2005-10-10 11:22:35

by subbie subbie

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

No. I'm running each disk with its own XFS partition.

Unfortunately I'm not in the position to experiment
much on this system.

Which tool can I use to stress all 12 disks reading
many many files in parallel?

Thanks

> Nice. Have you tried Software RAID5 on top of that?
> I would be very interested to know how software
> RAID5
> goes relative to the 3Ware hardware.
>



__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/

2005-10-10 12:36:38

by Mikael Abrahamsson

[permalink] [raw]
Subject: Re: 3Ware 9500S-12 RAID controller -- poor performance

On Mon, 10 Oct 2005, subbie subbie wrote:

> Which tool can I use to stress all 12 disks reading many many files in
> parallel?

Start 400 dd:s ? You will find that you will destroy performance as the
drive heads will be thrashing a lot.

--
Mikael Abrahamsson email: [email protected]