2009-06-23 11:25:29

by Andy Whitcroft

[permalink] [raw]
Subject: [PATCH 0/1] BZ#11120: AACRAID driver stalls under high load

We have had reports of driver stalls on AACRAID drivers under high load. This
seems to be related to the ammount of concurrent IO that can be pushed
to the controller. Reducing the maximum queue count for this driver sorts
this out. For further details see the upstream and Ubuntu bugs:

http://bugzilla.kernel.org/show_bug.cgi?id=11120
http://bugs.launchpad.net/bugs/249964

Following this email is a patch from Mathias Urlichs to reduce the
queue size. This has been tested and confirmed to fix the issue by a
couple of those affected.

Patches against Linus' tree.

-apw

Mathias Urlichs (1):
Reduce AACRAID hardware queue size

drivers/scsi/aacraid/aacraid.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)


2009-06-23 11:25:42

by Andy Whitcroft

[permalink] [raw]
Subject: [PATCH 1/1] Reduce AACRAID hardware queue size

From: Mathias Urlichs <[email protected]>

BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=11120
BugLink: http://bugs.launchpad.net/bugs/249964

Reduce the hardware queue size for the AACRAID controller. This controloler
suffers adapter aborts and scsi resets under high load otherwise:

aacraid: Host adapter abort request (0,0,2,0)
aacraid: Host adapter abort request (0,0,3,0)
aacraid: Host adapter reset request. SCSI hang ?
aacraid: Host adapter abort request (0,0,0,0)

Signed-Off-By: Mathias Urlichs <[email protected]>
Signed-off-by: Andy Whitcroft <[email protected]>
---
drivers/scsi/aacraid/aacraid.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index cdbdec9..0d5d036 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -24,7 +24,7 @@
#define AAC_MAX_LUN (8)

#define AAC_MAX_HOSTPHYSMEMPAGES (0xfffff)
-#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)256)
+#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)127)

/*
* These macros convert from physical channels to virtual channels
--
1.6.3.rc3.199.g24398

2009-06-23 12:12:20

by Matthias Urlichs

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

Hi,

Andy Whitcroft:
> From: Mathias Urlichs <[email protected]>
>
Well ... "Matthias", if you please.

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
v4sw7$Yhw6+8ln7ma7u7L!wl7DUi2e6t3TMWb8HAGen6g3a4s6Mr1p-3/-6 hackerkey.com
- -
At no time is freedom of speech more precious than when a man hits his
thumb with a hammer.


Attachments:
(No filename) (436.00 B)
signature.asc (197.00 B)
Digital signature
Download all attachments

2009-06-23 14:11:28

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

On Tue, Jun 23, 2009 at 01:50:20PM +0200, Matthias Urlichs wrote:
> Hi,
>
> Andy Whitcroft:
> > From: Mathias Urlichs <[email protected]>
> >
> Well ... "Matthias", if you please.

Well that is most peculiar, your name is sufficiently alien to my naive
tongue that I am sure I cut-n-pasted it wholesale to there from somewhere.
It looks like I got it from the s-o-b line in the patch:

Signed-Off-By: Mathias Urlichs <[email protected]>

Yeah it seems you got your name wrong in the original patch on the Kernel
bugzilla, and I have propogated it to there:

http://bugzilla.kernel.org/show_bug.cgi?id=11120#c4

So I guess all of those references need changing. If these patches need
regenerating please let me know.

-apw

2009-06-23 15:33:17

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

On Tue, 2009-06-23 at 12:25 +0100, Andy Whitcroft wrote:
> From: Mathias Urlichs <[email protected]>
>
> BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=11120
> BugLink: http://bugs.launchpad.net/bugs/249964
>
> Reduce the hardware queue size for the AACRAID controller. This controloler
> suffers adapter aborts and scsi resets under high load otherwise:
>
> aacraid: Host adapter abort request (0,0,2,0)
> aacraid: Host adapter abort request (0,0,3,0)
> aacraid: Host adapter reset request. SCSI hang ?
> aacraid: Host adapter abort request (0,0,0,0)
>
> Signed-Off-By: Mathias Urlichs <[email protected]>
> Signed-off-by: Andy Whitcroft <[email protected]>
> ---
> drivers/scsi/aacraid/aacraid.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
> index cdbdec9..0d5d036 100644
> --- a/drivers/scsi/aacraid/aacraid.h
> +++ b/drivers/scsi/aacraid/aacraid.h
> @@ -24,7 +24,7 @@
> #define AAC_MAX_LUN (8)
>
> #define AAC_MAX_HOSTPHYSMEMPAGES (0xfffff)
> -#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)256)
> +#define AAC_MAX_32BIT_SGBCOUNT ((unsigned short)127)

So I'm afraid this isn't a proper fix. It was a diagnostic test to see
if SGBCOUNT was the root cause for this card.

Incidentally, SGBCOUNT isn't queue depth, its maximum number of sectors
in an individual transfer. What we'd need to show for this to be the
fix is that every 32 bit aacraid card is affected, which, given the
paucity of bug reports, I don't think so.

Firstly, Matthias, can you see if on an unmodified aacraid, this fixes
the problem for you:

echo 63 > /sys/block/<disk>/queue/max_sectors_kb

63 is because the parameter is in kb for sysfs, but in number of 512
byte blocks for the driver. If it does, we can likely just add it to
the udev unusual devices and not bother with a kernel fix.

To fix the kernel properly, we'd need to add an AAC_QUIRK for this
adapter, which is a bit more work, so lets see if udev can fix it for us
first ...

James

2009-06-23 15:42:16

by Alan

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

> Incidentally, SGBCOUNT isn't queue depth, its maximum number of sectors
> in an individual transfer. What we'd need to show for this to be the
> fix is that every 32 bit aacraid card is affected, which, given the
> paucity of bug reports, I don't think so.

Its usually a specific firmware revision that weird problems with aacraid
hardware are linked to so it is worth trying different firmwares.

Alan

2009-06-23 16:05:19

by Matthias Urlichs

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

Hi,

James Bottomley:
> Firstly, Matthias, can you see if on an unmodified aacraid, this fixes
> the problem for you:
>
> echo 63 > /sys/block/<disk>/queue/max_sectors_kb
>
Thank you, I'll do that tonight.

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
- -
Why did the chicken cross the road?

Ralph Waldo Emerson: It didn't cross the road; it transcended it.

2009-07-01 15:59:03

by Andy Whitcroft

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

On Tue, Jun 23, 2009 at 06:04:08PM +0200, Matthias Urlichs wrote:
> Hi,
>
> James Bottomley:
> > Firstly, Matthias, can you see if on an unmodified aacraid, this fixes
> > the problem for you:
> >
> > echo 63 > /sys/block/<disk>/queue/max_sectors_kb
> >
> Thank you, I'll do that tonight.

How did this one work out for you? I don't think I saw a reply either
way?

-apw

2009-07-01 16:29:55

by Matthias Urlichs

[permalink] [raw]
Subject: Re: [PATCH 1/1] Reduce AACRAID hardware queue size

Hi,

Andy Whitcroft:
> > > echo 63 > /sys/block/<disk>/queue/max_sectors_kb
> > >
> > Thank you, I'll do that tonight.
>
> How did this one work out for you? I don't think I saw a reply either
> way?
>
Bah. Thanks for the reminder; I totally forgot to send that email.
(Too much other stuff in my life right now. (Mostly good, fortunately.))

Short version: It didn't. The controller now seems dead.
It times out all the time, and is unable to find any disks.

Unfortunately the thing is built-in and can't easily be replaced;
I don't have any warranty info for this machine either, so ...

--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | [email protected]
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
v4sw7$Yhw6+8ln7ma7u7L!wl7DUi2e6t3TMWb8HAGen6g3a4s6Mr1p-3/-6 hackerkey.com
- -
Sex is a misdemeanor -- the more I miss, de meaner I get.