2006-08-29 05:24:06

by djwong

[permalink] [raw]
Subject: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

Hi all,

Below is a patch that sets cmd_per_lun and can_queue in the aic94xx
driver's scsi_host_template to better performing values than what's
there currently. The cmd_per_lun setting is stolen straight out of the
adp94xx source, and can_queue is derived from the max_scb value that we
calculate in asd_init_hw. To the best of my (admittedly limited)
knowledge, this method provides the correct values (can_queue = 443 in
both adp94xx and aic94xx on my 9405W) but if anybody knows better,
please enlighten me. :)

That said, the effect of leaving these values set to 1 is terrible
performance in the case of either (a) certain Maxtor SAS drives flying
solo or (b) flooding several disks with I/O simultaneously (md-raid).
There may be more scenarios where we see similar problems that I
haven't uncovered.

Just for grins, I ran bogodisk (an O_DIRECT-enabled read speed test)
against three different scenarios:

1) adp94xx 1.0.8-6, pounded into 2.6.18-rc4 [green]
2) aic94xx 1.0.2, without this patch [red]
3) aic94xx 1.0.2, with this [blue]

...with these results:

http://sweaglesw.net/~djwong/programs/bogodisk/bd_graphs/bad_sas.0.png

As you can see, the read performance is cut in half by the aic94xx
driver not getting a chance to have multiple I/Os in flight at any given
time. With the patch, the two drivers are fairly close bandwidth-wise.
Also thanks to Mike Anderson for helping me figure this out.

--D

Signed-off-by: Darrick J. Wong <[email protected]>

diff --git a/drivers/scsi/aic94xx/aic94xx_hwi.h b/drivers/scsi/aic94xx/aic94xx_hwi.h
index c7d5053..a2d8881 100644
--- a/drivers/scsi/aic94xx/aic94xx_hwi.h
+++ b/drivers/scsi/aic94xx/aic94xx_hwi.h
@@ -36,6 +36,9 @@
#include "aic94xx.h"
#include "aic94xx_sas.h"

+/* Leave a few empty data buffers. */
+#define ASD_FREE_SCBS 3
+
/* Define ASD_MAX_PHYS to the maximum phys ever. Currently 8. */
#define ASD_MAX_PHYS 8
#define ASD_PCBA_SN_SIZE 12
diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3ec2e46..34a7b4b 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -71,7 +72,7 @@ static struct scsi_host_template aic94xx
.change_queue_type = sas_change_queue_type,
.bios_param = sas_bios_param,
.can_queue = 1,
- .cmd_per_lun = 1,
+ .cmd_per_lun = 2,
.this_id = -1,
.sg_tablesize = SG_ALL,
.max_sectors = SCSI_DEFAULT_MAX_SECTORS,
@@ -612,13 +616,17 @@ static int __devinit asd_pci_probe(struc
goto Err_free_cache;

asd_printk("device %s: SAS addr %llx, PCBA SN %s, %d phys, %d enabled "
- "phys, flash %s, BIOS %s%d\n",
+ "phys, flash %s, BIOS %s%d, SCBs %d\n",
pci_name(dev), SAS_ADDR(asd_ha->hw_prof.sas_addr),
asd_ha->hw_prof.pcba_sn, asd_ha->hw_prof.max_phys,
asd_ha->hw_prof.num_phys,
asd_ha->hw_prof.flash.present ? "present" : "not present",
asd_ha->hw_prof.bios.present ? "build " : "not present",
- asd_ha->hw_prof.bios.bld);
+ asd_ha->hw_prof.bios.bld,
+ asd_ha->hw_prof.max_scbs);
+
+ aic94xx_sht.can_queue = asd_ha->hw_prof.max_scbs - ASD_FREE_SCBS;
+ shost->can_queue = aic94xx_sht.can_queue;

if (use_msi)
pci_enable_msi(asd_ha->pcidev);


2006-08-30 17:19:46

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

On Mon, 2006-08-28 at 22:23 -0700, Darrick J. Wong wrote:
> - .cmd_per_lun = 1,
> + .cmd_per_lun = 2,

This is an old piece of code. Today we use 1 for the cmd_per_lun of
non-TCQ devices.

> + aic94xx_sht.can_queue = asd_ha->hw_prof.max_scbs - ASD_FREE_SCBS;

This is unnecessary unless you alter it before host alloc (which is
where it takes the shost values from the template).

Also, I think if you look at the rest of the driver, it's careful to
account for the need for required scbs in its internal queueing
algorithms, so the ASD_FREE_SCBS should be unnecessary.

> + shost->can_queue = aic94xx_sht.can_queue;

James


2006-08-30 19:03:16

by djwong

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

James Bottomley wrote:

> This is unnecessary unless you alter it before host alloc (which is
> where it takes the shost values from the template).
>
> Also, I think if you look at the rest of the driver, it's careful to
> account for the need for required scbs in its internal queueing
> algorithms, so the ASD_FREE_SCBS should be unnecessary.
>
>> + shost->can_queue = aic94xx_sht.can_queue;

Ok then, I think it collapses to this short patch:

--D

Signed-off-by: Darrick J. Wong <[email protected]>

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3e25e31..861d67b 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -623,6 +623,8 @@ static int __devinit asd_pci_probe(struc
asd_ha->hw_prof.bios.present ? "build " : "not present",
asd_ha->hw_prof.bios.bld);

+ shost->can_queue = asd_ha->hw_prof.max_scbs;
+
if (use_msi)
pci_enable_msi(asd_ha->pcidev);

2006-08-30 19:03:21

by djwong

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

James Bottomley wrote:

> This is unnecessary unless you alter it before host alloc (which is
> where it takes the shost values from the template).
>
> Also, I think if you look at the rest of the driver, it's careful to
> account for the need for required scbs in its internal queueing
> algorithms, so the ASD_FREE_SCBS should be unnecessary.
>
>> + shost->can_queue = aic94xx_sht.can_queue;

Ok then, I think it collapses to this short patch:

--D

Signed-off-by: Darrick J. Wong <[email protected]>

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3e25e31..861d67b 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -623,6 +623,8 @@ static int __devinit asd_pci_probe(struc
asd_ha->hw_prof.bios.present ? "build " : "not present",
asd_ha->hw_prof.bios.bld);

+ shost->can_queue = asd_ha->hw_prof.max_scbs;
+
if (use_msi)
pci_enable_msi(asd_ha->pcidev);

2006-08-30 19:04:56

by Luben Tuikov

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

--- "Darrick J. Wong" <[email protected]> wrote:
> Below is a patch that sets cmd_per_lun and can_queue in the aic94xx
> driver's scsi_host_template to better performing values than what's
> there currently. The cmd_per_lun setting is stolen straight out of the
> adp94xx source, and can_queue is derived from the max_scb value that we
> calculate in asd_init_hw. To the best of my (admittedly limited)
> knowledge, this method provides the correct values (can_queue = 443 in
> both adp94xx and aic94xx on my 9405W) but if anybody knows better,
> please enlighten me. :)

Enlightenment is the last thing one gets in mailing lists, especially
linux-scsi.

> That said, the effect of leaving these values set to 1 is terrible
> performance in the case of either

When I submitted this code last year in this mailing list, none of these
values were set to 1. Both cmd_per_lun and can_queue were/are set to the same
value, which is computed dynamically from the capabilities of the controller.

I've no idea who set "cmd_per_lun" to 1 and when.
I've no idea who set "can_queue" to anything other than what I
initilize it in my code as submitted last year.

> Just for grins, I ran bogodisk (an O_DIRECT-enabled read speed test)
> against three different scenarios:
>
> 1) adp94xx 1.0.8-6, pounded into 2.6.18-rc4 [green]
> 2) aic94xx 1.0.2, without this patch [red]
> 3) aic94xx 1.0.2, with this [blue]
>
> ...with these results:
>
> http://sweaglesw.net/~djwong/programs/bogodisk/bd_graphs/bad_sas.0.png

I just did some performance testing (of the code as I maintain it)
using your "bogodisk-0.5.2" and I get identical graph as your "green"
graph (but for different disk of course).

BTW, you need to print doubles as "%f" since using "%.2f" gives
me order of _four_ magnitude error, as opposed to order of -3
magnitude error when I use six decimal digits precision.

> As you can see, the read performance is cut in half by the aic94xx
> driver not getting a chance to have multiple I/Os in flight at any given
> time. With the patch, the two drivers are fairly close bandwidth-wise.
> Also thanks to Mike Anderson for helping me figure this out.

This is a very well known problem in existing storage tools/stacks
dealing with (powerful SCSI) LUs who implement queuing and or RAID.

> Signed-off-by: Darrick J. Wong <[email protected]>
>
> diff --git a/drivers/scsi/aic94xx/aic94xx_hwi.h b/drivers/scsi/aic94xx/aic94xx_hwi.h
> index c7d5053..a2d8881 100644
> --- a/drivers/scsi/aic94xx/aic94xx_hwi.h
> +++ b/drivers/scsi/aic94xx/aic94xx_hwi.h
> @@ -36,6 +36,9 @@
> #include "aic94xx.h"
> #include "aic94xx_sas.h"
>
> +/* Leave a few empty data buffers. */
> +#define ASD_FREE_SCBS 3
> +

This should be dynamically computed. Maybe you should
take a look at the original code I submitted. I'm sure
Bottomley has it.

They are not "free scbs", and the comment leaves
much to be desired.

> --- a/drivers/scsi/aic94xx/aic94xx_init.c
> +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> @@ -71,7 +72,7 @@ static struct scsi_host_template aic94xx
> .change_queue_type = sas_change_queue_type,
> .bios_param = sas_bios_param,
> .can_queue = 1,
> - .cmd_per_lun = 1,
> + .cmd_per_lun = 2,

Why 2? Why not 3? If you can set this to 3, then why not 4?
But if you can set it to 4, why not 5?

This value should also be dynamically set, it should depend on
the type of LU and it shouldn't be present in the host template.
(But that's an architectural argument which leads nowhere on lsml.)

Good luck!

2006-08-30 19:25:43

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

On Wed, 2006-08-30 at 12:04 -0700, Luben Tuikov wrote:
> > --- a/drivers/scsi/aic94xx/aic94xx_init.c
> > +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> > @@ -71,7 +72,7 @@ static struct scsi_host_template aic94xx
> > .change_queue_type = sas_change_queue_type,
> > .bios_param = sas_bios_param,
> > .can_queue = 1,
> > - .cmd_per_lun = 1,
> > + .cmd_per_lun = 2,
>
> Why 2? Why not 3? If you can set this to 3, then why not 4?
> But if you can set it to 4, why not 5?
>
> This value should also be dynamically set, it should depend on
> the type of LU and it shouldn't be present in the host template.
> (But that's an architectural argument which leads nowhere on lsml.)

actually, cmd_per_lun is pretty much a vestigial variable. It's used
for the initial queue depth before the driver decides to turn on TCQ.
Thus, since the non-tagged depth should always only be 1, the only
useful value to set it to is 1.

James


2006-08-30 20:56:14

by djwong

[permalink] [raw]
Subject: [PATCH v3] aic94xx: Increase can_queue for better performance

Hi all,

Below is a patch that sets can_queue in the aic94xx driver's scsi_host
to better performing values than what's there currently. It seems that
asd_ha->seq.can_queue reflects the number of requests that can be
queued per controller; so long as there's one scsi_host per controller,
it seems logical that the scsi_host ought to have the same can_queue
value. To the best of my (still limited) knowledge, this method
provides the correct value.

The effect of leaving this value set to 1 is terrible performance in
the case of either (a) certain Maxtor SAS drives flying solo or (b)
flooding several disks with I/O simultaneously (md-raid). There may be
more scenarios where we see similar problems that I haven't uncovered.

--D

Signed-off-by: Darrick J. Wong <[email protected]>

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3e25e31..5c6d457 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -623,6 +623,8 @@ static int __devinit asd_pci_probe(struc
asd_ha->hw_prof.bios.present ? "build " : "not present",
asd_ha->hw_prof.bios.bld);

+ shost->can_queue = asd_ha->seq.can_queue;
+
if (use_msi)
pci_enable_msi(asd_ha->pcidev);


2006-08-30 21:18:43

by djwong

[permalink] [raw]
Subject: Re: [PATCH v3] aic94xx: Increase can_queue for better performance

Darrick J. Wong wrote:
> Hi all,
>
> Below is a patch that sets can_queue in the aic94xx driver's scsi_host
> to better performing values than what's there currently. It seems that
> asd_ha->seq.can_queue reflects the number of requests that can be
> queued per controller; so long as there's one scsi_host per controller,
> it seems logical that the scsi_host ought to have the same can_queue
> value. To the best of my (still limited) knowledge, this method
> provides the correct value.
>
> The effect of leaving this value set to 1 is terrible performance in
> the case of either (a) certain Maxtor SAS drives flying solo or (b)
> flooding several disks with I/O simultaneously (md-raid). There may be
> more scenarios where we see similar problems that I haven't uncovered.

Let's try this again, after kicking Thunderbird in the head.

Signed-off-by: Darrick J. Wong <[email protected]>

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c
index 3e25e31..5c6d457 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -623,6 +623,8 @@ static int __devinit asd_pci_probe(struc
asd_ha->hw_prof.bios.present ? "build " : "not present",
asd_ha->hw_prof.bios.bld);

+ shost->can_queue = asd_ha->seq.can_queue;
+
if (use_msi)
pci_enable_msi(asd_ha->pcidev);

2006-08-31 05:29:33

by Luben Tuikov

[permalink] [raw]
Subject: Re: [PATCH] aic94xx: Increase can_queue and cmds_per_lun

--- James Bottomley <[email protected]> wrote:
> On Wed, 2006-08-30 at 12:04 -0700, Luben Tuikov wrote:
> > > --- a/drivers/scsi/aic94xx/aic94xx_init.c
> > > +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> > > @@ -71,7 +72,7 @@ static struct scsi_host_template aic94xx
> > > .change_queue_type = sas_change_queue_type,
> > > .bios_param = sas_bios_param,
> > > .can_queue = 1,
> > > - .cmd_per_lun = 1,
> > > + .cmd_per_lun = 2,
> >
> > Why 2? Why not 3? If you can set this to 3, then why not 4?
> > But if you can set it to 4, why not 5?
> >
> > This value should also be dynamically set, it should depend on
> > the type of LU and it shouldn't be present in the host template.
> > (But that's an architectural argument which leads nowhere on lsml.)
>
> actually, cmd_per_lun is pretty much a vestigial variable. It's used
> for the initial queue depth before the driver decides to turn on TCQ.
> Thus, since the non-tagged depth should always only be 1, the only
> useful value to set it to is 1.

Who cares?