2005-01-14 20:12:01

by Aaron Gowatch

[permalink] [raw]
Subject: aacraid fails when RAID1 array is in anything but Optimal state

We're using Dell PowerEdge 750s with a Dell rebranded Adaptec CERC 1.5/6ch
SATA adapter. The systems have 2 disks configured as RAID1. If the array
is in any other state than 'Optimal' (ie. 'Degraded' or 'Rebuilding') the
following error is displayed and the box subsequently panics because its
unable to mount the root filesystem.

We've had the same experience with 2.6.7 and 2.6.9. Yesterday as a test,
I installed RedHat ES 3.0 and it does not exhibit this behavior, but thats
kernel 2.4 with different aacraid driver. We've also updated to the
latest recommended firmware in attempt to correct this.

Is there any way to make this work with kernel 2.6? Or is this expected
behavior for this controller under 2.6?

Red Hat/Adaptec aacraid driver (1.1.2-lk2 Jan 14 2005)
AAC0: kernel 4.1.4 build 7403
AAC0: monitor 4.1.4 build 7403
AAC0: bios 4.1.0 build 7403
AAC0: serial bf91c8fafaf001
scsi0 : aacraid
Vendor: DELL Model: CERC Mirror Rev: V1.0
Type: Direct-Access ANSI SCSI revision: 02
SCSI device sda: 78057216 512-byte hdwr sectors (39965 MB)
sda: Write Protect is off
SCSI device sda: drive cache: write through
sda:<3>aacraid: Host adapter reset request. SCSI hang ?
aacraid: Host adapter appears dead
scsi: Device offlined - not ready after error recovery: host 0 channel 0
id 0 lun 0
SCSI error : <0 0 0 0> return code = 0x6000000
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
scsi0 (0:0): rejecting I/O to offline device
Buffer I/O error on device sda, logical block 0
unable to read partition table
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0

Thanks,
Aa.


2005-01-14 21:15:56

by Paul A. Sumner

[permalink] [raw]
Subject: Re: aacraid fails when RAID1 array is in anything but Optimal state

You might try the new aacraid module: 1.1.5[2371] from Adaptec's site. I
had some newer firmware that combined w/ the stock 1.1.2-lk2... got the
exact 'SCSI hang ?' msg... w/ the new module no such problems. 2.6.9 is
stable for me except it seems I don't have the write performance I
should yet (see my last post).

Hope this helps.

Aaron Gowatch wrote:
> We're using Dell PowerEdge 750s with a Dell rebranded Adaptec CERC 1.5/6ch
> SATA adapter. The systems have 2 disks configured as RAID1. If the array
> is in any other state than 'Optimal' (ie. 'Degraded' or 'Rebuilding') the
> following error is displayed and the box subsequently panics because its
> unable to mount the root filesystem.
>
> We've had the same experience with 2.6.7 and 2.6.9. Yesterday as a test,
> I installed RedHat ES 3.0 and it does not exhibit this behavior, but thats
> kernel 2.4 with different aacraid driver. We've also updated to the
> latest recommended firmware in attempt to correct this.
>
> Is there any way to make this work with kernel 2.6? Or is this expected
> behavior for this controller under 2.6?
>
> Red Hat/Adaptec aacraid driver (1.1.2-lk2 Jan 14 2005)
> AAC0: kernel 4.1.4 build 7403
> AAC0: monitor 4.1.4 build 7403
> AAC0: bios 4.1.0 build 7403
> AAC0: serial bf91c8fafaf001
> scsi0 : aacraid
> Vendor: DELL Model: CERC Mirror Rev: V1.0
> Type: Direct-Access ANSI SCSI revision: 02
> SCSI device sda: 78057216 512-byte hdwr sectors (39965 MB)
> sda: Write Protect is off
> SCSI device sda: drive cache: write through
> sda:<3>aacraid: Host adapter reset request. SCSI hang ?
> aacraid: Host adapter appears dead
> scsi: Device offlined - not ready after error recovery: host 0 channel 0
> id 0 lun 0
> SCSI error : <0 0 0 0> return code = 0x6000000
> end_request: I/O error, dev sda, sector 0
> Buffer I/O error on device sda, logical block 0
> scsi0 (0:0): rejecting I/O to offline device
> Buffer I/O error on device sda, logical block 0
> unable to read partition table
> Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
>
> Thanks,
> Aa.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-01-14 23:44:23

by Alan

[permalink] [raw]
Subject: Re: aacraid fails when RAID1 array is in anything but Optimal state

On Gwe, 2005-01-14 at 20:10, Aaron Gowatch wrote:
> We're using Dell PowerEdge 750s with a Dell rebranded Adaptec CERC 1.5/6ch
> SATA adapter. The systems have 2 disks configured as RAID1. If the array
> is in any other state than 'Optimal' (ie. 'Degraded' or 'Rebuilding') the
> following error is displayed and the box subsequently panics because its
> unable to mount the root filesystem.

Known bug in the 2.6 aacraid driver. It's fixed in current Fedora Core 3
kernels, 2.6.10-ac or in 2.6.11rc1. I've attached the needed patch below

diff -u --new-file --recursive --exclude-from /usr/src/exclude linux.vanilla-2.6.10/drivers/scsi/aacraid/commsup.c linux-2.6.10/drivers/scsi/aacraid/commsup.c
--- linux.vanilla-2.6.10/drivers/scsi/aacraid/commsup.c 2004-12-25 21:14:35.000000000 +0000
+++ linux-2.6.10/drivers/scsi/aacraid/commsup.c 2005-01-13 17:29:50.077160240 +0000
@@ -768,28 +768,6 @@
memset(cp, 0, 256);
}

-
-/**
- * aac_handle_aif - Handle a message from the firmware
- * @dev: Which adapter this fib is from
- * @fibptr: Pointer to fibptr from adapter
- *
- * This routine handles a driver notify fib from the adapter and
- * dispatches it to the appropriate routine for handling.
- */
-
-static void aac_handle_aif(struct aac_dev * dev, struct fib * fibptr)
-{
- struct hw_fib * hw_fib = fibptr->hw_fib;
- /*
- * Set the status of this FIB to be Invalid parameter.
- *
- * *(u32 *)fib->data = ST_INVAL;
- */
- *(u32 *)hw_fib->data = cpu_to_le32(ST_OK);
- fib_adapter_complete(fibptr, sizeof(u32));
-}
-
/**
* aac_command_thread - command processing thread
* @dev: Adapter to monitor
@@ -859,7 +837,6 @@
aifcmd = (struct aac_aifcmd *) hw_fib->data;
if (aifcmd->command == cpu_to_le32(AifCmdDriverNotify)) {
/* Handle Driver Notify Events */
- aac_handle_aif(dev, fib);
*(u32 *)hw_fib->data = cpu_to_le32(ST_OK);
fib_adapter_complete(fib, sizeof(u32));
} else {
@@ -870,10 +847,6 @@
u32 time_now, time_last;
unsigned long flagv;

- /* Sniff events */
- if (aifcmd->command == cpu_to_le32(AifCmdEventNotify))
- aac_handle_aif(dev, fib);
-
time_now = jiffies/HZ;

spin_lock_irqsave(&dev->fib_lock, flagv);