2005-10-23 20:45:41

by Robert Jones

[permalink] [raw]
Subject: 2.6.13.4 aacraid: Partitioning array makes adapter go "dead"

I apologize if I happen to break list etiquette, as this is the first
time I have posted here before. I'll try to make this as short as possible.

Here's some information regarding the machine I'm having the problem with:
2-Way Dual-Core AMD Opteron 275's
8GB Reg ECC
Tyan 2882-D (K8SD-Pro) motherboard (AMD-8000 series chipset)
2 Seagate ST336754LC's, updated to latest firmware (0004)
Adaptec 2230S (updated to latest firmware, 8205)

The machine boots wonderfully with any kernel I have tried (2.6.9,
2.6.12.9, 2.6.13.4, 2.6.14-rc5; all kernels are vanilla kernels) and
binds the array to /dev/sda, so long as the array is unpartitioned.
Here's what the kernel reports when booting:

-----------

SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: got wrong page
sda: assuming drive cache: write through
SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: got wrong page
sda: assuming drive cache: write through
sda: unknown partition table
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0

-----------

However, once I partition the array things begin to go wrong. While the
partitioning succeeds, any point after this aacraid will state the
controller is "dead". Here's what the kernel is telling me at this point:

-----------

SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: got wrong page
sda: assuming drive cache: write through
sda: sda1 sda2 sda3
SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
sda: Write Protect is off
sda: Mode Sense: 03 00 00 00
sda: got wrong page
sda: assuming drive cache: write through
sda: sda1 sda2 sda3

-----------

This looks fine too, however, when I reboot or try to access the array
directly such as a mke2fs /dev/sda1, the kernel says this:

-----------

aacraid: Host adapter reset request. SCSI hang ?
aacraid: Host adapter appears dead
scsi: Device offlined - not ready after error recovery: host 0 channel 0
id 0 lun 0
sd 0:0:0:0: SCSI error: return code = 0x6000000
end_request: I/O error, dev sda, sector 63
Buffer I/O error on device sda1, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda1, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda1, logical block 64
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda1, logical block 64

-----------

The same machine will install Win2K w/ Adaptec's driver just fine, so I
think the hardware is okay. I went ahead and pulled Adaptec's "official"
aacraid implementation and replaced the "vanilla" implementation in
2.6.9 with it. Unfortunately, this did not seem to help.

Also, if I remove the partitions aacraid is happy again and will bind
the array into /dev/sda as before. It's only after partitioning do
things go bad.

I have three of these machines and they don't have to go live until
January 1, 2006, so I have some time-and am willing-to knock around some
trial and error work with aacraid if needed.

Thanks,
Robert


2006-01-07 14:19:35

by Adrian Bunk

[permalink] [raw]
Subject: Re: 2.6.13.4 aacraid: Partitioning array makes adapter go "dead"

On Sun, Oct 23, 2005 at 03:45:31PM -0500, Robert Jones wrote:

>...
> Here's some information regarding the machine I'm having the problem with:
> 2-Way Dual-Core AMD Opteron 275's
> 8GB Reg ECC
> Tyan 2882-D (K8SD-Pro) motherboard (AMD-8000 series chipset)
> 2 Seagate ST336754LC's, updated to latest firmware (0004)
> Adaptec 2230S (updated to latest firmware, 8205)
>
> The machine boots wonderfully with any kernel I have tried (2.6.9,
> 2.6.12.9, 2.6.13.4, 2.6.14-rc5; all kernels are vanilla kernels) and
> binds the array to /dev/sda, so long as the array is unpartitioned.
> Here's what the kernel reports when booting:

Is this problem still present in kernel 2.6.15?

> SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
> sda: Write Protect is off
> sda: Mode Sense: 03 00 00 00
> sda: got wrong page
> sda: assuming drive cache: write through
> SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
> sda: Write Protect is off
> sda: Mode Sense: 03 00 00 00
> sda: got wrong page
> sda: assuming drive cache: write through
> sda: unknown partition table
> Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
> Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
>
> -----------
>
> However, once I partition the array things begin to go wrong. While the
> partitioning succeeds, any point after this aacraid will state the
> controller is "dead". Here's what the kernel is telling me at this point:
>
> -----------
>
> SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
> sda: Write Protect is off
> sda: Mode Sense: 03 00 00 00
> sda: got wrong page
> sda: assuming drive cache: write through
> sda: sda1 sda2 sda3
> SCSI device sda: 71619584 512-byte hdwr sectors (36669 MB)
> sda: Write Protect is off
> sda: Mode Sense: 03 00 00 00
> sda: got wrong page
> sda: assuming drive cache: write through
> sda: sda1 sda2 sda3
>
> -----------
>
> This looks fine too, however, when I reboot or try to access the array
> directly such as a mke2fs /dev/sda1, the kernel says this:
>
> -----------
>
> aacraid: Host adapter reset request. SCSI hang ?
> aacraid: Host adapter appears dead
> scsi: Device offlined - not ready after error recovery: host 0 channel 0
> id 0 lun 0
> sd 0:0:0:0: SCSI error: return code = 0x6000000
> end_request: I/O error, dev sda, sector 63
> Buffer I/O error on device sda1, logical block 0
> scsi0 (0:0): rejecting I/O to offline device
> Buffer I/O error on device sda1, logical block 0
> scsi0 (0:0): rejecting I/O to offline device
> Buffer I/O error on device sda1, logical block 64
> scsi0 (0:0): rejecting I/O to offline device
> Buffer I/O error on device sda1, logical block 64
>
> -----------
>
> The same machine will install Win2K w/ Adaptec's driver just fine, so I
> think the hardware is okay. I went ahead and pulled Adaptec's "official"
> aacraid implementation and replaced the "vanilla" implementation in
> 2.6.9 with it. Unfortunately, this did not seem to help.
>
> Also, if I remove the partitions aacraid is happy again and will bind
> the array into /dev/sda as before. It's only after partitioning do
> things go bad.
>...
> Thanks,
> Robert

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2006-01-07 15:47:00

by Martin Drab

[permalink] [raw]
Subject: Re: 2.6.13.4 aacraid: Partitioning array makes adapter go "dead"

On Sat, 7 Jan 2006, Adrian Bunk wrote:

> On Sun, Oct 23, 2005 at 03:45:31PM -0500, Robert Jones wrote:
>
> >...
> > Here's some information regarding the machine I'm having the problem with:
> > 2-Way Dual-Core AMD Opteron 275's
> > 8GB Reg ECC
> > Tyan 2882-D (K8SD-Pro) motherboard (AMD-8000 series chipset)
> > 2 Seagate ST336754LC's, updated to latest firmware (0004)
> > Adaptec 2230S (updated to latest firmware, 8205)
> >
> > The machine boots wonderfully with any kernel I have tried (2.6.9,
> > 2.6.12.9, 2.6.13.4, 2.6.14-rc5; all kernels are vanilla kernels) and
> > binds the array to /dev/sda, so long as the array is unpartitioned.
> > Here's what the kernel reports when booting:
>
> Is this problem still present in kernel 2.6.15?

Correct me if I'm wrong, but I think this was a problem of setting
AAC_MAX_32BIT_SGBCOUNT in aacraid.h low enough, so that the transfer
chunks (for big transfers) are small enough for various disks to handle
it. For me 512 was low enough to work. I see now it's at 256. But that was
just a temporary workaround AFAIK, it should have been solved by the "new
comm" technology making it into the mainline. Mark Haverkamp missed the
14 day window for new patches for 2.6.14, however it is in the 2.6.15
allready, so I guess this may be fixed by

commit 8e0c5ebde82b08f6d996e11983890fc4cc085fab
Author: Mark Haverkamp <[email protected]>
Date: Mon Oct 24 10:52:22 2005 -0700

[SCSI] aacraid: Newer adapter communication iterface support

Received from Mark Salyzyn.

This patch adds the 'new comm' interface, which modern AAC based
adapters that are less than a year old support in the name of much
improved performance. These modern adapters support both the legacy and
the 'new comm' interfaces.

Signed-off-by: Mark Haverkamp <[email protected]>
Signed-off-by: James Bottomley <[email protected]>

But I may be mistaken, though. (The symptoms are but very simmilar to
those I witnessed in the past.)

...
> > -----------
> >
> > aacraid: Host adapter reset request. SCSI hang ?
> > aacraid: Host adapter appears dead
> > scsi: Device offlined - not ready after error recovery: host 0 channel 0
> > id 0 lun 0
> > sd 0:0:0:0: SCSI error: return code = 0x6000000
> > end_request: I/O error, dev sda, sector 63
> > Buffer I/O error on device sda1, logical block 0
> > scsi0 (0:0): rejecting I/O to offline device
> > Buffer I/O error on device sda1, logical block 0
> > scsi0 (0:0): rejecting I/O to offline device
> > Buffer I/O error on device sda1, logical block 64
> > scsi0 (0:0): rejecting I/O to offline device
> > Buffer I/O error on device sda1, logical block 64
> >
> > -----------
> >...
> > Thanks,
> > Robert
>
> cu
> Adrian

Martin

2006-01-07 16:32:19

by Martin Drab

[permalink] [raw]
Subject: Re: 2.6.13.4 aacraid: Partitioning array makes adapter go "dead"

> > On Sun, Oct 23, 2005 at 03:45:31PM -0500, Robert Jones wrote:
^^^^^^^^^^^
...
> commit 8e0c5ebde82b08f6d996e11983890fc4cc085fab
> Author: Mark Haverkamp <[email protected]>
> Date: Mon Oct 24 10:52:22 2005 -0700
^^^^^^^^^^
Hehe, this is just a mere conincidence, but a funny one at that. ;))

Martin