2002-06-07 21:30:01

by James Bourne

[permalink] [raw]
Subject: 2.4.18 AIC7XXX hard lockup

Hi,
We are currently running 2.4.18 (Marcelo) on a Dell PE4400 with
Perc3/DC and 2 on board adaptec controllers (two 7899 channels and one
7880).

On the first 7899 channel we have attached 2 Quantum DLT 7000 drives and a
BHTi Quad 7 changer, on the second there are another 2 Quantum DLT
7000 drives and a BHTi Q2 changer as well as the internal CDROM.

Now, what is happening is when we are initializing the JBs, the host
system will abruptly hang. Turning on aic7xxx=verbose on the kernel
command line has given additional output, but still nothing solid as far
as trouble shooting information. The only information I have been
able to get out of it has been:

scsi1:0:6:0: Attempting to queue an ABORT message

This is the stock 6.2.4 adaptec driver in 2.4.18 and with the 6.2.5 driver
under 2.4.18 (I've patched and tested this one today). Attached are
kernel log segments of the information from boot to hang for 6.2.4 and
6.2.5 driver revisions...

Any information would be extreamly helpful.

TIA and regards,
James Bourne


--
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
http://www.mtroyal.ab.ca

******************************************************************************
This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and or privileged
information. Please contact the sender immediately if you are not the
intended recipient of this communication, and do not copy, distribute, or
take action relying on it. Any communication received in error, or
subsequent reply, should be deleted or destroyed.
******************************************************************************




Attachments:
hang-2002-06-06.gz (9.35 kB)
6.2.4 hang
hang-2002-06-07.gz (9.75 kB)
6.2.5 hang
Download all attachments

2002-06-07 22:23:15

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: 2.4.18 AIC7XXX hard lockup

>Now, what is happening is when we are initializing the JBs, the host
>system will abruptly hang. Turning on aic7xxx=verbose on the kernel
>command line has given additional output, but still nothing solid as far
>as trouble shooting information.

I'd like to see the debugging output anyway.

You might also try 6.2.8 which is in the latest Marcelo tree.

--
Justin

2002-06-07 22:28:13

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.18 AIC7XXX hard lockup

James Bourne wrote:
>
> Hi,
> We are currently running 2.4.18 (Marcelo) on a Dell PE4400 with
> Perc3/DC and 2 on board adaptec controllers (two 7899 channels and one
> 7880).
>
> On the first 7899 channel we have attached 2 Quantum DLT 7000 drives and a
> BHTi Quad 7 changer, on the second there are another 2 Quantum DLT
> 7000 drives and a BHTi Q2 changer as well as the internal CDROM.
>
> Now, what is happening is when we are initializing the JBs, the host
> system will abruptly hang. Turning on aic7xxx=verbose on the kernel
> command line has given additional output, but still nothing solid as far
> as trouble shooting information. The only information I have been
> able to get out of it has been:
>
> scsi1:0:6:0: Attempting to queue an ABORT message
>
> This is the stock 6.2.4 adaptec driver in 2.4.18 and with the 6.2.5 driver
> under 2.4.18 (I've patched and tested this one today). Attached are
> kernel log segments of the information from boot to hang for 6.2.4 and
> 6.2.5 driver revisions...
>

It does that for me in 2.5 as well, very occasionally.

Try booting with the `nmi_watchdog=1' kernel boot option,
and see if that catches a backtrace.

-

2002-06-13 03:47:48

by James Bourne

[permalink] [raw]
Subject: Re: 2.4.18 AIC7XXX hard lockup

On Fri, 7 Jun 2002, Justin T. Gibbs wrote:

> >Now, what is happening is when we are initializing the JBs, the host
> >system will abruptly hang. Turning on aic7xxx=verbose on the kernel
> >command line has given additional output, but still nothing solid as far
> >as trouble shooting information.
>
> I'd like to see the debugging output anyway.
>
> You might also try 6.2.8 which is in the latest Marcelo tree.

Same thing with 6.2.8, but some additional info after a second abort which
did not hang the machine.

kernel logs are attached.

Regards
James Bourne

>
> --
> Justin
>

--
James Bourne, Supervisor Data Centre Operations
Mount Royal College, Calgary, AB, CA
http://www.mtroyal.ab.ca

******************************************************************************
This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and or privileged
information. Please contact the sender immediately if you are not the
intended recipient of this communication, and do not copy, distribute, or
take action relying on it. Any communication received in error, or
subsequent reply, should be deleted or destroyed.
******************************************************************************


"There are only 10 types of people in this world: those who
understand binary and those who don't."


Attachments:
scsi-dump-after-abort-2002-06-12.gz (34.09 kB)
hang-2002-06-12.gz (13.86 kB)
Download all attachments