Dear list,
After almost two weeks of experimentation, google
searches and reading of posts, bug reports and
discussions I'm still far from an answer. I'm hoping
someone on this list could shed some light on the
subject.
I'm using a 3Ware 9500S-12 card and am able to produce
up to 400MB/s sustained read from my 12-disk 4.1TB
RAID5 SATA array, 128MB cache onboard, ext3 formatted.
All is well when performing a single read -- it
works nice and fast.
The system is a web server, serving mid-size files
(50MB, each, on average). All hell breaks loose when
doing many concurrent reads, anywhere between 200 to
400 concurrent streams things simply grind to a halt
and the system transfers a maximum of 12-14MB/s.
I'm in the process of clearing up the array (this
would take some time) and restructuring it to JBOD
mode in order to use each disk individually. I will
use a filesystem more suitable to streaming large
files, such as XFS. But this would take time and I
would very much appreciate the advice of people in the
know if this is going to help at all. It's hard for
me to make extreme experimentation (deleting,
formatting, reformatting) as this is a productio n
system with many files that I have no other place to
dump until they can be safely removed. Though I'm
working on dumping them slowly to other, remote,
machines.
I'm running the latest kernel, 2.6.13.2 and the latest
3Ware driver, taken from the 3ware.com web site which
upon insmod, updates the card's firmware to the latest
version as well.
In my experiments, I've tried using larger readahead,
currently at 16k (this helps, higher values do not
seem to help much), using the deadline scheduler for
this device, booting the system with the 'noapic'
option and playing with a bunch of VM tunable
parameters which I'm not sure that I should really be
touching. At the moment only the readahead
modification is used as the other stuff simply didn't
help at all.
With the stock kernel shipped with my distribution,
2.6.8 and its old 3ware driver things were just as
worse but manifested themselves differently. The
system was visibly (top, vmstat...) spending most of
its time in io-wait and load average was extremely
high, in the area of 10 to 20. With the recent
kernel and driver mentioned above, the excessive
io-wait and load seems to have been resolved and
observed loadavg is between 1 and 4.
I don't have much experience with systems that are
supposed to stream many files concurrently off a
hardware RAID of this configuration, but my gut
feeling is that something is very wrong and I should
be seeing a much higher read throughput.
Trying to preempt people's questions I've tried to
include as much information as possible, a lot of
stuff is pasted below.
I've just seen that the 3ware driver shares the same
IRQ with my ethernet card, which has got me a little
worried, should I be?
System uptime, exactly 1 day:
# cat /proc/interrupts
CPU0 CPU1
0: 21619638 0 XT-PIC timer
2: 0 0 XT-PIC cascade
8: 4 0 XT-PIC rtc
10: 268753224 0 XT-PIC 3w-9xxx,
eth0
14: 11 0 XT-PIC ide0
15: 337881 0 XT-PIC libata
NMI: 0 0
LOC: 21110402 21557685
ERR: 0
MIS: 0
# free
total used free shared
buffers cached
Mem: 2075260 2024724 50536 0
5184 1388408
-/+ buffers/cache: 631132 1444128
Swap: 3903784 0 3903784
# vmstat -n 1 (output of the last few seconds):
procs -----------memory---------- ---swap--
-----io---- --system-- ----cpu----
0 0 0 49932 4760 1392980 0 0 15636
32 3169 3697 4 6 30 60
0 0 0 50816 4752 1392376 0 0 5844
0 3114 3929 3 5 91 1
0 0 0 50924 4772 1391404 0 0 9360
0 3187 4348 6 6 76 13
0 2 0 50552 4780 1391532 0 0 24976
44 4077 3906 3 7 65 25
0 1 0 50444 4780 1392688 0 0 20192
0 5048 3914 7 8 56 30
0 1 0 50568 4756 1392508 0 0 21248
0 4060 3603 4 6 48 41
0 0 0 50704 4724 1392268 0 0 30004
0 3834 3369 4 9 65 22
0 3 0 50556 4728 1392468 0 0 3248
1832 2974 4514 2 5 58 35
0 3 0 50308 4724 1392200 0 0 1288
336 1766 1886 1 3 50 47
0 4 0 50308 4732 1391852 0 0 2300
408 1919 2158 0 3 51 46
0 4 0 50556 4736 1390692 0 0 1856
532 1488 1846 3 1 50 46
0 3 0 50680 4740 1390620 0 0 4016
1296 1577 1682 2 2 50 47
0 3 0 50432 4752 1391628 0 0 2180
72 1730 1945 2 2 51 46
2 2 0 49924 4772 1391540 0 0 44372
564 3403 2847 4 5 50 42
0 0 0 50684 4784 1391528 0 0 28640
216 3804 3847 7 8 69 16
# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 1
cpu MHz : 2993.035
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
pbe nx lm pni monitor ds_c
pl cid cx16 xtpr
bogomips : 5993.68
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 4
model name : Intel(R) Xeon(TM) CPU 3.00GHz
stepping : 1
cpu MHz : 2993.035
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 3
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8
apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
pbe nx lm pni monitor ds_c
pl cid cx16 xtpr
bogomips : 5985.52
# lspci
0000:00:00.0 Host bridge: Intel Corporation E7520
Memory Controller Hub (rev 0c)
0000:00:00.1 ff00: Intel Corporation E7525/E7520 Error
Reporting Registers (rev
0c)
0000:00:01.0 System peripheral: Intel Corporation
E7520 DMA Controller (rev 0c)
0000:00:02.0 PCI bridge: Intel Corporation
E7525/E7520/E7320 PCI Express Port A
(rev 0c)
0000:00:04.0 PCI bridge: Intel Corporation E7525/E7520
PCI Express Port B (rev 0
c)
0000:00:05.0 PCI bridge: Intel Corporation E7520 PCI
Express Port B1 (rev 0c)
0000:00:06.0 PCI bridge: Intel Corporation E7520 PCI
Express Port C (rev 0c)
0000:00:1d.0 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
0000:00:1d.7 USB Controller: Intel Corporation
82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI
Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER
(ICH5/ICH5R) LPC Interface
Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corporation
82801EB/ER (ICH5/ICH5R) IDE Contro
ller (rev 02)
0000:00:1f.2 IDE interface: Intel Corporation 82801EB
(ICH5) SATA Controller (re
v 02)
0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER
(ICH5/ICH5R) SMBus Controller (
rev 02)
0000:01:00.0 PCI bridge: Intel Corporation 6700PXH PCI
Express-to-PCI Bridge A (
rev 09)
0000:01:00.1 PIC: Intel Corporation 6700/6702PXH
I/OxAPIC Interrupt Controller A
(rev 09)
0000:01:00.2 PCI bridge: Intel Corporation 6700PXH PCI
Express-to-PCI Bridge B (
rev 09)
0000:01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC
Interrupt Controller B (rev
09)
0000:03:01.0 RAID bus controller: 3ware Inc
9xxx-series SATA-RAID
0000:05:00.0 Ethernet controller: Marvell Technology
Group Ltd. 88E8050 Gigabit
Ethernet Controller (rev 18)
0000:07:04.0 Ethernet controller: Intel Corporation
82541GI/PI Gigabit Ethernet
Controller (rev 05)
0000:07:0c.0 VGA compatible controller: ATI
Technologies Inc Rage XL (rev 27)
As you can see, this is a fairly recent motherboard
that's supposed to perform well, though I don't know
the manufacturer of this board as the machine is
hosted and I don't have physical access, though I can
ask them if anyone would like to know.
The ethernet card actually being used is the Intel
E1000 with NAPI support compiled.
If there's any bit of information that's missing,
please let me know and I'd he happy to provide it
quickly.
If you can suggest a better (non Netapp, EMC, etc)
solution that is somehow affordable and can provide
the very high read throughputs please let me know, I'm
very interested in solutions that can staturate
multiple gigabit links (of course, using more than one
machine ;)
Please CC me on any replies as I'm not subscribed to
the list.
Thank you for listening!
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Why are you booting with 'noapic'.. in my experience that will seriously
impact interrupt performance. Use the APIC if you've got it, which in this
case you definitely do.
Yes, having your gigabit NIC and RAID controller on the same IRQ (in PIC
mode) could definitely me a source of trouble.
In your web server testing, were you using an external traffic generator or
an on-host process? If you try on-host (eliminating the network throughput
and related interrupts) does performance improve?
So two biggest suggestions:
- Use the APIC. It is your friend.
- It looks like the 3ware card and gigabit nic are on different busses, but
the pirq lines are being routed to the same legacy interrupt in PIC mode. So
APIC mode should avoid that problem. If the controller and nic are actually
on the same bus, separate them.
Regards,
Ian Morgan
--
-------------------------------------------------------------------
Ian E. Morgan Vice President & C.O.O. Webcon, Inc.
imorgan at webcon dot ca PGP: #2DA40D07 http://www.webcon.ca
* Customized Linux network solutions for your business *
-------------------------------------------------------------------
On Thu, 29 Sep 2005, subbie subbie wrote:
> Dear list,
>
> After almost two weeks of experimentation, google
> searches and reading of posts, bug reports and
> discussions I'm still far from an answer. I'm hoping
> someone on this list could shed some light on the
> subject.
>
> I'm using a 3Ware 9500S-12 card and am able to produce
> up to 400MB/s sustained read from my 12-disk 4.1TB
> RAID5 SATA array, 128MB cache onboard, ext3 formatted.
> All is well when performing a single read -- it
> works nice and fast.
>
> The system is a web server, serving mid-size files
> (50MB, each, on average). All hell breaks loose when
> doing many concurrent reads, anywhere between 200 to
> 400 concurrent streams things simply grind to a halt
> and the system transfers a maximum of 12-14MB/s.
>
> I'm in the process of clearing up the array (this
> would take some time) and restructuring it to JBOD
> mode in order to use each disk individually. I will
> use a filesystem more suitable to streaming large
> files, such as XFS. But this would take time and I
> would very much appreciate the advice of people in the
> know if this is going to help at all. It's hard for
> me to make extreme experimentation (deleting,
> formatting, reformatting) as this is a productio n
> system with many files that I have no other place to
> dump until they can be safely removed. Though I'm
> working on dumping them slowly to other, remote,
> machines.
>
> I'm running the latest kernel, 2.6.13.2 and the latest
> 3Ware driver, taken from the 3ware.com web site which
> upon insmod, updates the card's firmware to the latest
> version as well.
>
> In my experiments, I've tried using larger readahead,
> currently at 16k (this helps, higher values do not
> seem to help much), using the deadline scheduler for
> this device, booting the system with the 'noapic'
> option and playing with a bunch of VM tunable
> parameters which I'm not sure that I should really be
> touching. At the moment only the readahead
> modification is used as the other stuff simply didn't
> help at all.
>
> With the stock kernel shipped with my distribution,
> 2.6.8 and its old 3ware driver things were just as
> worse but manifested themselves differently. The
> system was visibly (top, vmstat...) spending most of
> its time in io-wait and load average was extremely
> high, in the area of 10 to 20. With the recent
> kernel and driver mentioned above, the excessive
> io-wait and load seems to have been resolved and
> observed loadavg is between 1 and 4.
>
> I don't have much experience with systems that are
> supposed to stream many files concurrently off a
> hardware RAID of this configuration, but my gut
> feeling is that something is very wrong and I should
> be seeing a much higher read throughput.
>
> Trying to preempt people's questions I've tried to
> include as much information as possible, a lot of
> stuff is pasted below.
>
> I've just seen that the 3ware driver shares the same
> IRQ with my ethernet card, which has got me a little
> worried, should I be?
>
> System uptime, exactly 1 day:
>
> # cat /proc/interrupts
> CPU0 CPU1
> 0: 21619638 0 XT-PIC timer
> 2: 0 0 XT-PIC cascade
> 8: 4 0 XT-PIC rtc
> 10: 268753224 0 XT-PIC 3w-9xxx,
> eth0
> 14: 11 0 XT-PIC ide0
> 15: 337881 0 XT-PIC libata
> NMI: 0 0
> LOC: 21110402 21557685
> ERR: 0
> MIS: 0
>
> # free
> total used free shared
> buffers cached
> Mem: 2075260 2024724 50536 0
> 5184 1388408
> -/+ buffers/cache: 631132 1444128
> Swap: 3903784 0 3903784
>
> # vmstat -n 1 (output of the last few seconds):
> procs -----------memory---------- ---swap--
> -----io---- --system-- ----cpu----
> 0 0 0 49932 4760 1392980 0 0 15636
> 32 3169 3697 4 6 30 60
> 0 0 0 50816 4752 1392376 0 0 5844
> 0 3114 3929 3 5 91 1
> 0 0 0 50924 4772 1391404 0 0 9360
> 0 3187 4348 6 6 76 13
> 0 2 0 50552 4780 1391532 0 0 24976
> 44 4077 3906 3 7 65 25
> 0 1 0 50444 4780 1392688 0 0 20192
> 0 5048 3914 7 8 56 30
> 0 1 0 50568 4756 1392508 0 0 21248
> 0 4060 3603 4 6 48 41
> 0 0 0 50704 4724 1392268 0 0 30004
> 0 3834 3369 4 9 65 22
> 0 3 0 50556 4728 1392468 0 0 3248
> 1832 2974 4514 2 5 58 35
> 0 3 0 50308 4724 1392200 0 0 1288
> 336 1766 1886 1 3 50 47
> 0 4 0 50308 4732 1391852 0 0 2300
> 408 1919 2158 0 3 51 46
> 0 4 0 50556 4736 1390692 0 0 1856
> 532 1488 1846 3 1 50 46
> 0 3 0 50680 4740 1390620 0 0 4016
> 1296 1577 1682 2 2 50 47
> 0 3 0 50432 4752 1391628 0 0 2180
> 72 1730 1945 2 2 51 46
> 2 2 0 49924 4772 1391540 0 0 44372
> 564 3403 2847 4 5 50 42
> 0 0 0 50684 4784 1391528 0 0 28640
> 216 3804 3847 7 8 69 16
>
> # cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.00GHz
> stepping : 1
> cpu MHz : 2993.035
> cache size : 1024 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8
> apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> pbe nx lm pni monitor ds_c
> pl cid cx16 xtpr
> bogomips : 5993.68
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 15
> model : 4
> model name : Intel(R) Xeon(TM) CPU 3.00GHz
> stepping : 1
> cpu MHz : 2993.035
> cache size : 1024 KB
> physical id : 3
> siblings : 2
> core id : 3
> cpu cores : 1
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 5
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8
> apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> pbe nx lm pni monitor ds_c
> pl cid cx16 xtpr
> bogomips : 5985.52
>
> # lspci
> 0000:00:00.0 Host bridge: Intel Corporation E7520
> Memory Controller Hub (rev 0c)
> 0000:00:00.1 ff00: Intel Corporation E7525/E7520 Error
> Reporting Registers (rev
> 0c)
> 0000:00:01.0 System peripheral: Intel Corporation
> E7520 DMA Controller (rev 0c)
> 0000:00:02.0 PCI bridge: Intel Corporation
> E7525/E7520/E7320 PCI Express Port A
> (rev 0c)
> 0000:00:04.0 PCI bridge: Intel Corporation E7525/E7520
> PCI Express Port B (rev 0
> c)
> 0000:00:05.0 PCI bridge: Intel Corporation E7520 PCI
> Express Port B1 (rev 0c)
> 0000:00:06.0 PCI bridge: Intel Corporation E7520 PCI
> Express Port C (rev 0c)
> 0000:00:1d.0 USB Controller: Intel Corporation
> 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #1 (rev 02)
> 0000:00:1d.1 USB Controller: Intel Corporation
> 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #2 (rev 02)
> 0000:00:1d.2 USB Controller: Intel Corporation
> 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #3 (rev 02)
> 0000:00:1d.7 USB Controller: Intel Corporation
> 82801EB/ER (ICH5/ICH5R) USB2 EHCI
> Controller (rev 02)
> 0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI
> Bridge (rev c2)
> 0000:00:1f.0 ISA bridge: Intel Corporation 82801EB/ER
> (ICH5/ICH5R) LPC Interface
> Bridge (rev 02)
> 0000:00:1f.1 IDE interface: Intel Corporation
> 82801EB/ER (ICH5/ICH5R) IDE Contro
> ller (rev 02)
> 0000:00:1f.2 IDE interface: Intel Corporation 82801EB
> (ICH5) SATA Controller (re
> v 02)
> 0000:00:1f.3 SMBus: Intel Corporation 82801EB/ER
> (ICH5/ICH5R) SMBus Controller (
> rev 02)
> 0000:01:00.0 PCI bridge: Intel Corporation 6700PXH PCI
> Express-to-PCI Bridge A (
> rev 09)
> 0000:01:00.1 PIC: Intel Corporation 6700/6702PXH
> I/OxAPIC Interrupt Controller A
> (rev 09)
> 0000:01:00.2 PCI bridge: Intel Corporation 6700PXH PCI
> Express-to-PCI Bridge B (
> rev 09)
> 0000:01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC
> Interrupt Controller B (rev
> 09)
> 0000:03:01.0 RAID bus controller: 3ware Inc
> 9xxx-series SATA-RAID
> 0000:05:00.0 Ethernet controller: Marvell Technology
> Group Ltd. 88E8050 Gigabit
> Ethernet Controller (rev 18)
> 0000:07:04.0 Ethernet controller: Intel Corporation
> 82541GI/PI Gigabit Ethernet
> Controller (rev 05)
> 0000:07:0c.0 VGA compatible controller: ATI
> Technologies Inc Rage XL (rev 27)
>
> As you can see, this is a fairly recent motherboard
> that's supposed to perform well, though I don't know
> the manufacturer of this board as the machine is
> hosted and I don't have physical access, though I can
> ask them if anyone would like to know.
>
> The ethernet card actually being used is the Intel
> E1000 with NAPI support compiled.
>
> If there's any bit of information that's missing,
> please let me know and I'd he happy to provide it
> quickly.
>
> If you can suggest a better (non Netapp, EMC, etc)
> solution that is somehow affordable and can provide
> the very high read throughputs please let me know, I'm
> very interested in solutions that can staturate
> multiple gigabit links (of course, using more than one
> machine ;)
>
> Please CC me on any replies as I'm not subscribed to
> the list.
>
> Thank you for listening!
On Thu, Sep 29, 2005 at 11:50:58PM -0700, you [subbie subbie] wrote:
> Dear list,
>
> After almost two weeks of experimentation, google
> searches and reading of posts, bug reports and
> discussions I'm still far from an answer. I'm hoping
> someone on this list could shed some light on the
> subject.
Hi,
We're having similar problems with 9500S-4LP and two Hitachi 250GB SATA
disks.
Currently we are running 2.6.12.5 and its 3w-9xxx driver. We have tried
numerous 2.4 (Red Hat and kernel.org) and 2.6 kernels and 3w-9xxx drivers
(kernel and 3ware.com).
The results it more or less the same: on 2.4 it corrupts data and is slow,
on 2.6 it doesn't corrupt data, but is slow.
Our workload is VMWare GSX server. Multiple readers/writers will grind the
performance to halt.
We've tried:
- http://216.239.59.104/search?q=cache:08R-x7y-Kn8J:xn.pinkhamster.net/blog/tech/3ware_9500_notes.html+blockdev+stra+3w-9xxx&hl=en
http://www.linux-mag.com/index2.php?option=com_content&task=view&id=2033&Itemid=2304&pop=1&page=0
blockdev -setra 16384 /dev/sda
- different kernel IO schedulers
- parameters such as
blockdev --setra 8192 /dev/vg0/root
echo 300 > /sys/block/sda/queue/iosched/read_expire
echo 5 > /sys/block/sda/queue/iosched/writes_starved
None of this seems to make much difference.
Please drop me a note if you ever get any light on this issue...
-- v --
[email protected]
Dear Ian,
'noapic' was a recommendation by 3Ware / AMCC tech
support. It did not help at all, as expected.
Unfortunately they did not have any other
recommendations.
I've now removed 'noapic' and unfortunately nothing
has changed, really. See current stats below.
# cat /proc/interrupts
CPU0 CPU1
0: 64713875 355 IO-APIC-edge timer
2: 0 0 XT-PIC cascade
8: 3 1 IO-APIC-edge rtc
14: 7 6 IO-APIC-edge ide0
16: 176847225 191855106 IO-APIC-level eth0
18: 499139 336893 IO-APIC-level libata
48: 31491551 22761438 IO-APIC-level 3w-9xxx
NMI: 0 0
LOC: 64049632 64155206
ERR: 0
MIS: 0
# uptime
08:39:14 up 2 days, 23:54, 8 users, load average:
0.16, 0.13, 0.07
I have also tried playing with the parameters our
friend Ville has mentioned in his post, nothing has
come out of it.
I'm willing to give any developer here access to my
production machine so that they can see the situation
first hand. Performance is just aweful.
I'm planning to ditch RAID5 on this card and try JBOD
and in spreading files evenly across my 12 disks,
hopefully this would give some benifit.
Something is very wrong with this card / driver /
firmware and or kernel combination, hopefully someone
can help out.
Much appreciated
--- Ian Morgan <[email protected]> wrote:
> Why are you booting with 'noapic'.. in my experience
> that will seriously
> impact interrupt performance. Use the APIC if you've
> got it, which in this
> case you definitely do.
>
> Yes, having your gigabit NIC and RAID controller on
> the same IRQ (in PIC
> mode) could definitely me a source of trouble.
>
> In your web server testing, were you using an
> external traffic generator or
> an on-host process? If you try on-host (eliminating
> the network throughput
> and related interrupts) does performance improve?
>
> So two biggest suggestions:
>
> - Use the APIC. It is your friend.
>
> - It looks like the 3ware card and gigabit nic are
> on different busses, but
> the pirq lines are being routed to the same legacy
> interrupt in PIC mode. So
> APIC mode should avoid that problem. If the
> controller and nic are actually
> on the same bus, separate them.
>
>
> Regards,
> Ian Morgan
>
> --
>
-------------------------------------------------------------------
> Ian E. Morgan Vice President & C.O.O.
> Webcon, Inc.
> imorgan at webcon dot ca PGP: #2DA40D07
> http://www.webcon.ca
> * Customized Linux network solutions for your
> business *
>
-------------------------------------------------------------------
>
>
> On Thu, 29 Sep 2005, subbie subbie wrote:
>
> > Dear list,
> >
> > After almost two weeks of experimentation, google
> > searches and reading of posts, bug reports and
> > discussions I'm still far from an answer. I'm
> hoping
> > someone on this list could shed some light on the
> > subject.
> >
> > I'm using a 3Ware 9500S-12 card and am able to
> produce
> > up to 400MB/s sustained read from my 12-disk 4.1TB
> > RAID5 SATA array, 128MB cache onboard, ext3
> formatted.
> > All is well when performing a single read -- it
> > works nice and fast.
> >
> > The system is a web server, serving mid-size files
> > (50MB, each, on average). All hell breaks loose
> when
> > doing many concurrent reads, anywhere between 200
> to
> > 400 concurrent streams things simply grind to a
> halt
> > and the system transfers a maximum of 12-14MB/s.
> >
> > I'm in the process of clearing up the array (this
> > would take some time) and restructuring it to JBOD
> > mode in order to use each disk individually. I
> will
> > use a filesystem more suitable to streaming large
> > files, such as XFS. But this would take time and
> I
> > would very much appreciate the advice of people in
> the
> > know if this is going to help at all. It's hard
> for
> > me to make extreme experimentation (deleting,
> > formatting, reformatting) as this is a productio n
> > system with many files that I have no other place
> to
> > dump until they can be safely removed. Though I'm
> > working on dumping them slowly to other, remote,
> > machines.
> >
> > I'm running the latest kernel, 2.6.13.2 and the
> latest
> > 3Ware driver, taken from the 3ware.com web site
> which
> > upon insmod, updates the card's firmware to the
> latest
> > version as well.
> >
> > In my experiments, I've tried using larger
> readahead,
> > currently at 16k (this helps, higher values do not
> > seem to help much), using the deadline scheduler
> for
> > this device, booting the system with the 'noapic'
> > option and playing with a bunch of VM tunable
> > parameters which I'm not sure that I should really
> be
> > touching. At the moment only the readahead
> > modification is used as the other stuff simply
> didn't
> > help at all.
> >
> > With the stock kernel shipped with my
> distribution,
> > 2.6.8 and its old 3ware driver things were just as
> > worse but manifested themselves differently.
> The
> > system was visibly (top, vmstat...) spending most
> of
> > its time in io-wait and load average was extremely
> > high, in the area of 10 to 20. With the recent
> > kernel and driver mentioned above, the excessive
> > io-wait and load seems to have been resolved and
> > observed loadavg is between 1 and 4.
> >
> > I don't have much experience with systems that are
> > supposed to stream many files concurrently off a
> > hardware RAID of this configuration, but my gut
> > feeling is that something is very wrong and I
> should
> > be seeing a much higher read throughput.
> >
> > Trying to preempt people's questions I've tried to
> > include as much information as possible, a lot of
> > stuff is pasted below.
> >
> > I've just seen that the 3ware driver shares the
> same
> > IRQ with my ethernet card, which has got me a
> little
> > worried, should I be?
> >
> > System uptime, exactly 1 day:
> >
> > # cat /proc/interrupts
> > CPU0 CPU1
> > 0: 21619638 0 XT-PIC timer
> > 2: 0 0 XT-PIC cascade
> > 8: 4 0 XT-PIC rtc
> > 10: 268753224 0 XT-PIC
> 3w-9xxx,
> > eth0
> > 14: 11 0 XT-PIC ide0
> > 15: 337881 0 XT-PIC libata
> > NMI: 0 0
> > LOC: 21110402 21557685
> > ERR: 0
> > MIS: 0
> >
> > # free
> > total used free shared
> > buffers cached
> > Mem: 2075260 2024724 50536
> 0
> > 5184 1388408
> > -/+ buffers/cache: 631132 1444128
> > Swap: 3903784 0 3903784
> >
> > # vmstat -n 1 (output of the last few seconds):
> > procs -----------memory---------- ---swap--
> > -----io---- --system-- ----cpu----
> > 0 0 0 49932 4760 1392980 0 0 15636
> > 32 3169 3697 4 6 30 60
> > 0 0 0 50816 4752 1392376 0 0 5844
> > 0 3114 3929 3 5 91 1
> > 0 0 0 50924 4772 1391404 0 0 9360
> > 0 3187 4348 6 6 76 13
> > 0 2 0 50552 4780 1391532 0 0 24976
> > 44 4077 3906 3 7 65 25
> > 0 1 0 50444 4780 1392688 0 0 20192
> > 0 5048 3914 7 8 56 30
> > 0 1 0 50568 4756 1392508 0 0 21248
> > 0 4060 3603 4 6 48 41
> > 0 0 0 50704 4724 1392268 0 0 30004
> > 0 3834 3369 4 9 65 22
> > 0 3 0 50556 4728 1392468 0 0 3248
> > 1832 2974 4514 2 5 58 35
> > 0 3 0 50308 4724 1392200 0 0 1288
> > 336 1766 1886 1 3 50 47
> > 0 4 0 50308 4732 1391852 0 0 2300
> > 408 1919 2158 0 3 51 46
> > 0 4 0 50556 4736 1390692 0 0 1856
> > 532 1488 1846 3 1 50 46
> > 0 3 0 50680 4740 1390620 0 0 4016
> > 1296 1577 1682 2 2 50 47
> > 0 3 0 50432 4752 1391628 0 0 2180
> > 72 1730 1945 2 2 51 46
> > 2 2 0 49924 4772 1391540 0 0 44372
> > 564 3403 2847 4 5 50 42
> > 0 0 0 50684 4784 1391528 0 0 28640
> > 216 3804 3847 7 8 69 16
>
=== message truncated ===
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
> Something is very wrong with this card / driver /
> firmware and or kernel combination, hopefully someone
> can help out.
I think I have to agree; see my post from:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112282837926689&w=2
I've got about 30MB/s from a single threaded version of my
backup code - which seems rather on the low side for
a modern RAID-5; with multiple writers I was seeing sub-5MB/s
but that might be fair if it is seeking everywhere.
I'd be interested to hear how your experiments with jbod'ing them
go.
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
A single thread writing at 30MB/s is still not on par
with 3ware's specs.
I see that you're also running RAID5 and in this case
3ware did report bad write performance on RAID5 and
that was fixed with recent firmwares.
The latest linux driver off their website also
includes the latest firmware inside it and flashes the
card upon load, make sure to use that.
I'm getting a little over 50MB/s when writing to my
RAID volume when completely idle, there's no reason
why you should get less.
As for read performance, nothing helps with many
concurrent reads, what I get is simply aweful
performance no matter what I do.
I'll let you guys know once I try JBOD (as soon as all
the data is moved away).
According to Ville answering me privately:
> Unfortunately, it's not limited to just that
firmware version or kernel version or driver version.
I've tried several firmwares, 2.4.x and 2.6.x kernels
and driver version - no salvage.
I do agree, which leads me to believe this is
something very specific with the RAID controller
itself or its firmware.
Maybe something is so badly designed in this
controller that it can't physically do better than
that ? Anyone has experience with this controller and
its performance on windoze?
Can someone in the know give us some input ?
Thanks
--- "Dr. David Alan Gilbert" <[email protected]> wrote:
> > Something is very wrong with this card / driver /
> > firmware and or kernel combination, hopefully
> someone
> > can help out.
>
> I think I have to agree; see my post from:
>
>
http://marc.theaimsgroup.com/?l=linux-kernel&m=112282837926689&w=2
>
> I've got about 30MB/s from a single threaded version
> of my
> backup code - which seems rather on the low side for
> a modern RAID-5; with multiple writers I was seeing
> sub-5MB/s
> but that might be fair if it is seeking everywhere.
>
> I'd be interested to hear how your experiments with
> jbod'ing them
> go.
>
> Dave
> --
> -----Open up your eyes, open up your mind, open up
> your code -------
> / Dr. David Alan Gilbert | Running GNU/Linux on
> Alpha,68K| Happy \
> \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC
> & HPPA | In Hex /
> \ _________________________|_____
> http://www.treblig.org |_______/
>
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
* subbie subbie ([email protected]) wrote:
> A single thread writing at 30MB/s is still not on par
> with 3ware's specs.
>
> I see that you're also running RAID5 and in this case
> 3ware did report bad write performance on RAID5 and
> that was fixed with recent firmwares.
>
> The latest linux driver off their website also
> includes the latest firmware inside it and flashes the
> card upon load, make sure to use that.
I've got driver/firmware that is about 2months old that
certainly helped; prior to that I was getting card
timeouts (although I also upgraded the e1000 driver
at the same time so it might have been that rather
than the 3ware that helped).
(Note: I don't expect a driver to perform a dangerous
operation like firmware flashing on boot!)
> I'm getting a little over 50MB/s when writing to my
> RAID volume when completely idle, there's no reason
> why you should get less.
Well my ~30MB/s is sucking over gig ether and writing
in 10MB chunks; but still 50MB/s for RAID5 feels like
it sucks.
> I'll let you guys know once I try JBOD (as soon as all
> the data is moved away).
Nod.
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
Hi
>
> I've got driver/firmware that is about 2months old
> that
> certainly helped; prior to that I was getting card
> timeouts (although I also upgraded the e1000 driver
> at the same time so it might have been that rather
> than the 3ware that helped).
Two months old might be too old, make sure you have at
least 9.2.1.1
> (Note: I don't expect a driver to perform a
> dangerous
> operation like firmware flashing on boot!)
I believe they are writing to NVRAM or similar so that
makes it much less risky, otherwise they wouldn't have
to write it on each load ... or I might be wrong and
they do it once, not sure.
> > I'm getting a little over 50MB/s when writing to
> my
> > RAID volume when completely idle, there's no
> reason
> > why you should get less.
>
> Well my ~30MB/s is sucking over gig ether and
> writing
> in 10MB chunks; but still 50MB/s for RAID5 feels
> like
> it sucks.
Yes, that's terrible.
Reading the release notes for the latest one, 9.3.0
that also supports their very latest controller,
9550SX, I get the feeling that write performance is
something of an on-going issue with this family of
controllers even though they disguise it in this
latest driver version as a tweak in write
performance.
Another thing, the BBU (battery backup unit) that you
are using was also a can of worms for them so this
makes it even more important for you to upgrade to the
very latest version ,they seem to be having ongoing
issues with that one.
Good luck to us all.
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
I have seen some systems on which IRQ load balancing can have a detrimental
effect on some devices such as gigabit Ethernet etc.
You could try disabling both the irqbalance userspace daemon (if that's part
of your distribution), and in-kernel IRQ balancing, if enabled
(CONFIG_IRQBALANCE).
For your NIC, try enabling NAPI interrupt mitigation, if available. This
will significantly reduce the interrupt load under high traffic volume.
I guess there's another obvious question that I forgot: Do you have the
3ware cache enabled or disabled? Are your ext3 filesystems mounted with the
'noatime' option?
Regards,
Ian Morgan
--
-------------------------------------------------------------------
Ian E. Morgan Vice President & C.O.O. Webcon, Inc.
imorgan at webcon dot ca PGP: #2DA40D07 http://www.webcon.ca
* Customized Linux network solutions for your business *
-------------------------------------------------------------------
On Tue, 4 Oct 2005, subbie subbie wrote:
> Dear Ian,
>
> 'noapic' was a recommendation by 3Ware / AMCC tech
> support. It did not help at all, as expected.
> Unfortunately they did not have any other
> recommendations.
>
> I've now removed 'noapic' and unfortunately nothing
> has changed, really. See current stats below.
>
> # cat /proc/interrupts
> CPU0 CPU1
> 0: 64713875 355 IO-APIC-edge timer
> 2: 0 0 XT-PIC cascade
> 8: 3 1 IO-APIC-edge rtc
> 14: 7 6 IO-APIC-edge ide0
> 16: 176847225 191855106 IO-APIC-level eth0
> 18: 499139 336893 IO-APIC-level libata
> 48: 31491551 22761438 IO-APIC-level 3w-9xxx
> NMI: 0 0
> LOC: 64049632 64155206
> ERR: 0
> MIS: 0
>
> # uptime
> 08:39:14 up 2 days, 23:54, 8 users, load average:
> 0.16, 0.13, 0.07
>
> I have also tried playing with the parameters our
> friend Ville has mentioned in his post, nothing has
> come out of it.
>
> I'm willing to give any developer here access to my
> production machine so that they can see the situation
> first hand. Performance is just aweful.
>
> I'm planning to ditch RAID5 on this card and try JBOD
> and in spreading files evenly across my 12 disks,
> hopefully this would give some benifit.
>
> Something is very wrong with this card / driver /
> firmware and or kernel combination, hopefully someone
> can help out.
>
> Much appreciated
>
> --- Ian Morgan <[email protected]> wrote:
>
>> Why are you booting with 'noapic'.. in my experience
>> that will seriously
>> impact interrupt performance. Use the APIC if you've
>> got it, which in this
>> case you definitely do.
>>
>> Yes, having your gigabit NIC and RAID controller on
>> the same IRQ (in PIC
>> mode) could definitely me a source of trouble.
>>
>> In your web server testing, were you using an
>> external traffic generator or
>> an on-host process? If you try on-host (eliminating
>> the network throughput
>> and related interrupts) does performance improve?
>>
>> So two biggest suggestions:
>>
>> - Use the APIC. It is your friend.
>>
>> - It looks like the 3ware card and gigabit nic are
>> on different busses, but
>> the pirq lines are being routed to the same legacy
>> interrupt in PIC mode. So
>> APIC mode should avoid that problem. If the
>> controller and nic are actually
>> on the same bus, separate them.
>>
>>
>> Regards,
>> Ian Morgan
>>
>> --
>>
> -------------------------------------------------------------------
>> Ian E. Morgan Vice President & C.O.O.
>> Webcon, Inc.
>> imorgan at webcon dot ca PGP: #2DA40D07
>> http://www.webcon.ca
>> * Customized Linux network solutions for your
>> business *
>>
> -------------------------------------------------------------------
>>
>>
>> On Thu, 29 Sep 2005, subbie subbie wrote:
>>
>>> Dear list,
>>>
>>> After almost two weeks of experimentation, google
>>> searches and reading of posts, bug reports and
>>> discussions I'm still far from an answer. I'm
>> hoping
>>> someone on this list could shed some light on the
>>> subject.
>>>
>>> I'm using a 3Ware 9500S-12 card and am able to
>> produce
>>> up to 400MB/s sustained read from my 12-disk 4.1TB
>>> RAID5 SATA array, 128MB cache onboard, ext3
>> formatted.
>>> All is well when performing a single read -- it
>>> works nice and fast.
>>>
>>> The system is a web server, serving mid-size files
>>> (50MB, each, on average). All hell breaks loose
>> when
>>> doing many concurrent reads, anywhere between 200
>> to
>>> 400 concurrent streams things simply grind to a
>> halt
>>> and the system transfers a maximum of 12-14MB/s.
>>>
>>> I'm in the process of clearing up the array (this
>>> would take some time) and restructuring it to JBOD
>>> mode in order to use each disk individually. I
>> will
>>> use a filesystem more suitable to streaming large
>>> files, such as XFS. But this would take time and
>> I
>>> would very much appreciate the advice of people in
>> the
>>> know if this is going to help at all. It's hard
>> for
>>> me to make extreme experimentation (deleting,
>>> formatting, reformatting) as this is a productio n
>>> system with many files that I have no other place
>> to
>>> dump until they can be safely removed. Though I'm
>>> working on dumping them slowly to other, remote,
>>> machines.
>>>
>>> I'm running the latest kernel, 2.6.13.2 and the
>> latest
>>> 3Ware driver, taken from the 3ware.com web site
>> which
>>> upon insmod, updates the card's firmware to the
>> latest
>>> version as well.
>>>
>>> In my experiments, I've tried using larger
>> readahead,
>>> currently at 16k (this helps, higher values do not
>>> seem to help much), using the deadline scheduler
>> for
>>> this device, booting the system with the 'noapic'
>>> option and playing with a bunch of VM tunable
>>> parameters which I'm not sure that I should really
>> be
>>> touching. At the moment only the readahead
>>> modification is used as the other stuff simply
>> didn't
>>> help at all.
>>>
>>> With the stock kernel shipped with my
>> distribution,
>>> 2.6.8 and its old 3ware driver things were just as
>>> worse but manifested themselves differently.
>> The
>>> system was visibly (top, vmstat...) spending most
>> of
>>> its time in io-wait and load average was extremely
>>> high, in the area of 10 to 20. With the recent
>>> kernel and driver mentioned above, the excessive
>>> io-wait and load seems to have been resolved and
>>> observed loadavg is between 1 and 4.
>>>
>>> I don't have much experience with systems that are
>>> supposed to stream many files concurrently off a
>>> hardware RAID of this configuration, but my gut
>>> feeling is that something is very wrong and I
>> should
>>> be seeing a much higher read throughput.
>>>
>>> Trying to preempt people's questions I've tried to
>>> include as much information as possible, a lot of
>>> stuff is pasted below.
>>>
>>> I've just seen that the 3ware driver shares the
>> same
>>> IRQ with my ethernet card, which has got me a
>> little
>>> worried, should I be?
>>>
>>> System uptime, exactly 1 day:
>>>
>>> # cat /proc/interrupts
>>> CPU0 CPU1
>>> 0: 21619638 0 XT-PIC timer
>>> 2: 0 0 XT-PIC cascade
>>> 8: 4 0 XT-PIC rtc
>>> 10: 268753224 0 XT-PIC
>> 3w-9xxx,
>>> eth0
>>> 14: 11 0 XT-PIC ide0
>>> 15: 337881 0 XT-PIC libata
>>> NMI: 0 0
>>> LOC: 21110402 21557685
>>> ERR: 0
>>> MIS: 0
>>>
>>> # free
>>> total used free shared
>>> buffers cached
>>> Mem: 2075260 2024724 50536
>> 0
>>> 5184 1388408
>>> -/+ buffers/cache: 631132 1444128
>>> Swap: 3903784 0 3903784
>>>
>>> # vmstat -n 1 (output of the last few seconds):
>>> procs -----------memory---------- ---swap--
>>> -----io---- --system-- ----cpu----
>>> 0 0 0 49932 4760 1392980 0 0 15636
>>> 32 3169 3697 4 6 30 60
>>> 0 0 0 50816 4752 1392376 0 0 5844
>>> 0 3114 3929 3 5 91 1
>>> 0 0 0 50924 4772 1391404 0 0 9360
>>> 0 3187 4348 6 6 76 13
>>> 0 2 0 50552 4780 1391532 0 0 24976
>>> 44 4077 3906 3 7 65 25
>>> 0 1 0 50444 4780 1392688 0 0 20192
>>> 0 5048 3914 7 8 56 30
>>> 0 1 0 50568 4756 1392508 0 0 21248
>>> 0 4060 3603 4 6 48 41
>>> 0 0 0 50704 4724 1392268 0 0 30004
>>> 0 3834 3369 4 9 65 22
>>> 0 3 0 50556 4728 1392468 0 0 3248
>>> 1832 2974 4514 2 5 58 35
>>> 0 3 0 50308 4724 1392200 0 0 1288
>>> 336 1766 1886 1 3 50 47
>>> 0 4 0 50308 4732 1391852 0 0 2300
>>> 408 1919 2158 0 3 51 46
>>> 0 4 0 50556 4736 1390692 0 0 1856
>>> 532 1488 1846 3 1 50 46
>>> 0 3 0 50680 4740 1390620 0 0 4016
>>> 1296 1577 1682 2 2 50 47
>>> 0 3 0 50432 4752 1391628 0 0 2180
>>> 72 1730 1945 2 2 51 46
>>> 2 2 0 49924 4772 1391540 0 0 44372
>>> 564 3403 2847 4 5 50 42
>>> 0 0 0 50684 4784 1391528 0 0 28640
>>> 216 3804 3847 7 8 69 16
--- "Ian E. Morgan" <[email protected]> wrote:
> I have seen some systems on which IRQ load balancing
> can have a detrimental
> effect on some devices such as gigabit Ethernet etc.
>
> You could try disabling both the irqbalance
> userspace daemon (if that's part
> of your distribution), and in-kernel IRQ balancing,
> if enabled
> (CONFIG_IRQBALANCE).
I don't have a userspace daemon for that, but I'll try
the kernel option.
>
> For your NIC, try enabling NAPI interrupt
> mitigation, if available. This
> will significantly reduce the interrupt load under
> high traffic volume.
It's always enabled in my configs.
> I guess there's another obvious question that I
> forgot: Do you have the
> 3ware cache enabled or disabled? Are your ext3
> filesystems mounted with the
> 'noatime' option?
Write caching is enabled. I don't have much activity
across thousands of files so noatime is less
ciritical, but the RAID volume is still mounted
noatime.
So basically I'll try the irq load balancing and see
whath happens.
Thanks
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com