2005-03-08 13:38:20

by Holger Kiehl

[permalink] [raw]
Subject: Fusion-MPT much faster as module

Hello

On a four CPU Opteron compiling the Fusion-MPT as module gives much better
performance when compiling it in, here some bonnie++ results:

Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2
module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2

This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a
Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI.

Why is there such a large difference?

Holger
--


2005-03-21 23:29:32

by Andrew Morton

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module

Holger Kiehl <[email protected]> wrote:
>
> Hello
>
> On a four CPU Opteron compiling the Fusion-MPT as module gives much better
> performance when compiling it in, here some bonnie++ results:
>
> Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2
> module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2
>
> This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a
> Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI.
>
> Why is there such a large difference?
>

Holger, this problem remains unresolved, does it not? Have you done any
more experimentation?

I must say that something funny seems to be happening here. I have two
MPT-based Dell machines, neither of which is using a modular driver:


akpm:/usr/src/25> 0 hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec

That's a bit disappointing. Running 2.6.9-rc2-mm2(!) with a

SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)

controller on disks which shudl hit 50MB/sec.




And

bix:/home/akpm# hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 114 MB in 3.03 seconds = 37.57 MB/sec

with 2.6.11-rc4-mm1 using

Fusion MPT SCSI Host driver 3.01.16
scsi0 : ioc0: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=25
scsi1 : ioc1: LSI53C1030, FwRev=01030600h, Ports=1, MaxQ=222, IRQ=26
Vendor: SEAGATE Model: ST3146807LW Rev: DS09
Type: Direct-Access ANSI SCSI revision: 03

Better, but again I'd expect >50MB/sec.

2005-03-22 07:37:07

by Janne Pikkarainen

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module

Hello everyone,

On Mon, 2005-03-21 at 15:27 -0800, Andrew Morton wrote:
> > On a four CPU Opteron compiling the Fusion-MPT as module gives much better
> > performance when compiling it in, here some bonnie++ results:
> >
> > Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> > compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2
> > module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2
> >
> > This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a
> > Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI.
> >
> > Why is there such a large difference?
> >
>
> Holger, this problem remains unresolved, does it not? Have you done any
> more experimentation?

Quick summary:
- older IBM xSeries 335 + kernel 2.4.26 = surprisingly slow
- older IBM xSeries 335 + kernel 2.6.8 = pretty fast
- newer IBM xSeries 335 + kernel 2.6.9 = pretty fast
- newer IBM xSeries 335 (and a 336) + kernel 2.6.10 = surprisingly slow

Longer story:
I'm administering bunch of IBM xSeries 335 servers (and one 336, too),
all equipped with the exactly same SCSI controller than in the case
above. In every server Fusion MPT module is compiled straight into
kernel and disk setup is two identical SCSI hard drives in RAID-1 mode.
For the 2.6.x servers about the same kernel .config file is used.

One of the older servers (still using kernel 2.6.8) with P4 Xeon 2.0 GHz
and ~70 GB U320 SCSI disk gives me pretty good results:

---
hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 136 MB in 3.02 seconds = 45.01 MB/sec
---

Identical hardware, but with kernel 2.4.25:

---
hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 64 MB in 3.35 seconds = 19.10 MB/sec
---

A newer generation of x335 (using kernel 2.6.9) with dual P4 Xeon 3.0
GHz and ~70 GB U320 SCSI disk:

---
hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 130 MB in 3.07 seconds = 42.35 MB/sec
---

Still a bit newer generation of x335 with P4 Xeon 3.06 GHz and ~140 GB
U320 SCSI disk, using kernel 2.6.10 is a big disappoitment:

---
hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 48 MB in 3.11 seconds = 15.43 MB/sec
---

And the latest x336 with dual P4 Xeon 3.2 GHz (using kernel 2.6.10) with
~140 GB U320 SCSI disk is also very disappointing:

---
hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 58 MB in 3.02 seconds = 19.20 MB/sec
---

Some info about the oldest x335:

---
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.01.09
scsi0 : ioc0: LSI53C1030, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=177
Vendor: LSILOGIC Model: 1030 IM Rev: 1000
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 143372288 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Vendor: IBM Model: 25P3495a S320 1 Rev: 1
Type: Processor ANSI SCSI revision: 02
---

A bit newer x335 with kernel 2.6.9:

---
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.01.16
scsi0 : ioc0: LSI53C1030, FwRev=01000e00h, Ports=1, MaxQ=222, IRQ=22
Vendor: LSILOGIC Model: 1030 IM Rev: 1000
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 143372288 512-byte hdwr sectors (73407 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Vendor: IBM Model: 25P3495a S320 1 Rev: 1
Type: Processor ANSI SCSI revision: 02
Attached scsi generic sg1 at scsi0, channel 0, id 8, lun 0, type 3
---

And the latest x335 we have:

---
Fusion MPT base driver 3.01.18
Copyright (c) 1999-2004 LSI Logic Corporation
ACPI: PCI interrupt 0000:01:01.0[A] -> GSI 22 (level, low) -> IRQ 169
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.01.18
scsi0 : ioc0: LSI53C1030, FwRev=01032316h, Ports=1, MaxQ=222, IRQ=169
Vendor: LSILOGIC Model: 1030 IM Rev: 1000
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 286746624 512-byte hdwr sectors (146814 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 286746624 512-byte hdwr sectors (146814 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Vendor: IBM Model: 25P3495a S320 1 Rev: 1
Type: Processor ANSI SCSI revision: 02
Attached scsi generic sg1 at scsi0, channel 0, id 8, lun 0, type 3
---

x336:

---
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.01.18
scsi0 : ioc0: LSI53C1030, FwRev=01032316h, Ports=1, MaxQ=222, IRQ=169
Vendor: LSILOGIC Model: 1030 IM Rev: 1000
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 286746624 512-byte hdwr sectors (146814 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 286746624 512-byte hdwr sectors (146814 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 p4 < p5 p6 p7 p8 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Vendor: IBM Model: 25P3495a S320 1 Rev: 1
Type: Processor ANSI SCSI revision: 02
Attached scsi generic sg1 at scsi0, channel 0, id 8, lun 0, type 3
---

I'll gladly be your test puppet and provide you any further information
you may need and can also upgrade the 2.6.8 server to be a 2.6.11 one
and/or test the Fusion MPT as a kernel module. I cannot boot the servers
at will, though, except the 2.6.8 one which is more or less only a
testbed server.


Best regards,

Janne Pikkarainen


2005-03-22 08:31:42

by Holger Kiehl

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module

On Mon, 21 Mar 2005, Andrew Morton wrote:

> Holger Kiehl <[email protected]> wrote:
>>
>> Hello
>>
>> On a four CPU Opteron compiling the Fusion-MPT as module gives much better
>> performance when compiling it in, here some bonnie++ results:
>>
>> Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
>> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
>> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
>> compiled in 15872M 38366 71 65602 22 18348 4 53276 84 57947 7 905.4 2
>> module 15872M 51246 96 204914 70 57236 14 59779 96 264171 33 923.0 2
>>
>> This happens with 2.6.10, 2.6.11 and 2.6.11-bk2. Controller is a
>> Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI.
>>
>> Why is there such a large difference?
>>
>
> Holger, this problem remains unresolved, does it not? Have you done any
> more experimentation?
>
No. For now I just leave it as module.

> I must say that something funny seems to be happening here. I have two
> MPT-based Dell machines, neither of which is using a modular driver:
>
>
> akpm:/usr/src/25> 0 hdparm -t /dev/sda
>
> /dev/sda:
> Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec
>
Got the same result when compiled in, always between 12 and 13 MB/s. As
module it is approx. 75 MB/s.

Hope that LSI Logic will find the problem.

Another question I have is there a way in what SCSI mode (320, 160, etc)
Fusion-MPT is running? Could not find anything in proc or dmesg. Adaptec
has the following information in dmesg (and more in proc):

(scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)

Or has the Fusion-MPT some other tool to show this information?

Holger

2005-03-22 10:32:24

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: Fusion-MPT much faster as module

On Mon, 21 Mar 2005, Andrew Morton wrote:
> Holger, this problem remains unresolved, does it not? Have you done any
> more experimentation?
>
> I must say that something funny seems to be happening here. I have two
> MPT-based Dell machines, neither of which is using a modular driver:
>
> akpm:/usr/src/25> 0 hdparm -t /dev/sda
>
> /dev/sda:
> Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec


Holger Kiehl wrote on Tuesday, March 22, 2005 12:31 AM
> Got the same result when compiled in, always between 12 and 13 MB/s. As
> module it is approx. 75 MB/s.


Half guess, half with data to prove: it must be the variable driver_setup
initialization. If compiled as built-in, driver_setup is initialized to
zero for all of its member variables, which isn't the fastest setting. If
compiled as module, it gets first class treatment with shinny performance
setting. Goofing around, this patch appears to be giving higher throughput.

Before:
/dev/sdc:
Timing buffered disk reads: 92 MB in 3.03 seconds = 30.32 MB/sec

After:
/dev/sdc:
Timing buffered disk reads: 174 MB in 3.02 seconds = 57.61 MB/sec


diff -Nurp linux-2.6.11/drivers/message/fusion/mptscsih.c linux-2.6.11.ken/drivers/message/fusion/mptscsih.c
--- linux-2.6.11/drivers/message/fusion/mptscsih.c 2005-03-01 23:38:37.000000000 -0800
+++ linux-2.6.11.ken/drivers/message/fusion/mptscsih.c 2005-03-22 02:18:21.000000000 -0800
@@ -96,7 +96,6 @@ MODULE_AUTHOR(MODULEAUTHOR);
MODULE_DESCRIPTION(my_NAME);
MODULE_LICENSE("GPL");

-#ifdef MODULE
static int dv = MPTSCSIH_DOMAIN_VALIDATION;
module_param(dv, int, 0);
MODULE_PARM_DESC(dv, "DV Algorithm: enhanced = 1, basic = 0 (default=MPTSCSIH_DOMAIN_VALIDATION=1)");
@@ -112,7 +111,6 @@ MODULE_PARM_DESC(factor, "Min Sync Facto
static int saf_te = MPTSCSIH_SAF_TE;
module_param(saf_te, int, 0);
MODULE_PARM_DESC(saf_te, "Force enabling SEP Processor: (default=MPTSCSIH_SAF_TE=0)");
-#endif

/*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/

@@ -1489,7 +1487,6 @@ mptscsih_init(void)
": Registered for IOC reset notifications\n"));
}

-#ifdef MODULE
dinitprintk((KERN_INFO MYNAM
": Command Line Args: dv=%d max_width=%d "
"factor=0x%x saf_te=%d\n",
@@ -1499,7 +1496,6 @@ mptscsih_init(void)
driver_setup.max_width = (width) ? 1 : 0;
driver_setup.min_sync_factor = factor;
driver_setup.saf_te = (saf_te) ? 1 : 0;;
-#endif

if(mpt_device_driver_register(&mptscsih_driver,
MPTSCSIH_DRIVER) != 0 ) {


2005-03-22 10:43:14

by Andrew Morton

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module

"Chen, Kenneth W" <[email protected]> wrote:
>
> On Mon, 21 Mar 2005, Andrew Morton wrote:
> > Holger, this problem remains unresolved, does it not? Have you done any
> > more experimentation?
> >
> > I must say that something funny seems to be happening here. I have two
> > MPT-based Dell machines, neither of which is using a modular driver:
> >
> > akpm:/usr/src/25> 0 hdparm -t /dev/sda
> >
> > /dev/sda:
> > Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec
>
>
> Holger Kiehl wrote on Tuesday, March 22, 2005 12:31 AM
> > Got the same result when compiled in, always between 12 and 13 MB/s. As
> > module it is approx. 75 MB/s.
>
>
> Half guess, half with data to prove: it must be the variable driver_setup
> initialization. If compiled as built-in, driver_setup is initialized to
> zero for all of its member variables, which isn't the fastest setting. If
> compiled as module, it gets first class treatment with shinny performance
> setting. Goofing around, this patch appears to be giving higher throughput.

ooh, you actually looked at the code ;)

> Before:
> /dev/sdc:
> Timing buffered disk reads: 92 MB in 3.03 seconds = 30.32 MB/sec
>
> After:
> /dev/sdc:
> Timing buffered disk reads: 174 MB in 3.02 seconds = 57.61 MB/sec
>

Yes, that's it. Eric, you owe me about 10000 hours ;)

2005-03-22 10:52:53

by Arjan van de Ven

[permalink] [raw]
Subject: RE: Fusion-MPT much faster as module

On Tue, 2005-03-22 at 02:29 -0800, Chen, Kenneth W wrote:

> Before:
> /dev/sdc:
> Timing buffered disk reads: 92 MB in 3.03 seconds = 30.32 MB/sec
>
> After:
> /dev/sdc:
> Timing buffered disk reads: 174 MB in 3.02 seconds = 57.61 MB/sec


nice!

More proof that #ifdef MODULE is considered harmful... how much of it is
actually left in the kernel? Maybe we could kill it entirely from
drivers/* (of course it has a limited place in include/*)



2005-03-22 12:28:22

by Adrian Bunk

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module

On Tue, Mar 22, 2005 at 11:52:22AM +0100, Arjan van de Ven wrote:
> On Tue, 2005-03-22 at 02:29 -0800, Chen, Kenneth W wrote:
>
> > Before:
> > /dev/sdc:
> > Timing buffered disk reads: 92 MB in 3.03 seconds = 30.32 MB/sec
> >
> > After:
> > /dev/sdc:
> > Timing buffered disk reads: 174 MB in 3.02 seconds = 57.61 MB/sec
>
>
> nice!
>
> More proof that #ifdef MODULE is considered harmful... how much of it is
> actually left in the kernel? Maybe we could kill it entirely from
> drivers/* (of course it has a limited place in include/*)

Too many...

And there are places where it's actually useful:

#if defined(CONFIG_FOO) || (defined(MODULE) && defined(CONFIG_FOO_MODULE))

is a good way to express that driver bar can use functionality of driver
foo if it's available.

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2005-03-22 12:37:36

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Fusion-MPT much faster as module


>
> And there are places where it's actually useful:
>
> #if defined(CONFIG_FOO) || (defined(MODULE) && defined(CONFIG_FOO_MODULE))
>
> is a good way to express that driver bar can use functionality of driver
> foo if it's available.

a good way? I'd disagree with that :)


2005-03-22 13:46:39

by Holger Kiehl

[permalink] [raw]
Subject: RE: Fusion-MPT much faster as module

On Tue, 22 Mar 2005, Chen, Kenneth W wrote:

> On Mon, 21 Mar 2005, Andrew Morton wrote:
>> Holger, this problem remains unresolved, does it not? Have you done any
>> more experimentation?
>>
>> I must say that something funny seems to be happening here. I have two
>> MPT-based Dell machines, neither of which is using a modular driver:
>>
>> akpm:/usr/src/25> 0 hdparm -t /dev/sda
>>
>> /dev/sda:
>> Timing buffered disk reads: 64 MB in 5.00 seconds = 12.80 MB/sec
>
>
> Holger Kiehl wrote on Tuesday, March 22, 2005 12:31 AM
>> Got the same result when compiled in, always between 12 and 13 MB/s. As
>> module it is approx. 75 MB/s.
>
>
> Half guess, half with data to prove: it must be the variable driver_setup
> initialization. If compiled as built-in, driver_setup is initialized to
> zero for all of its member variables, which isn't the fastest setting. If
> compiled as module, it gets first class treatment with shinny performance
> setting. Goofing around, this patch appears to be giving higher throughput.
>
Yes, that fixes it.

Many thanks!

Holger