2001-11-01 17:11:54

by Holger Lubitz

[permalink] [raw]
Subject: aic7xxx 6.2.1 unstable?

hi,

i came to work today to find out that our news & web proxy server, which
i updated from 2.4.8 to 2.4.13 a week ago, hung after a week of uptime.
the machine was locked solid, last kernel messages on the console seemed
to indicate a scsi problem.

the machine in question is a 440gx based p3 800 mhz with onboard adaptec
7896 controller (dual channel, one u2w, one uw).

the drives are ibm ddys (1x 9gb, 1x 18gb) connected to the u2w port.

the problem is similar to another problem which turned up when i first
tried the "new" adaptec driver with 2.4.2. i later updated to aic7xxx
6.1.5, but the problem remained, so i kept using the old driver.

when i switched to 2.4.5 with 6.1.13 included, i felt adventurous again.
and to my surprise, everything was fine. the machine has not shown a
single scsi error since then, using the new driver. i updated to 2.4.8
some time later, the machine continued to run fine.

so, after nearly half a year without problems, i was reluctant to update
the aic driver again, but was trying nevertheless because of the
security fixes in recent kernels, which i considered "nice to have".
however, version 6.2.1 of the adaptec driver seems to have broken the
setup once again. i'm back to 2.4.8 for now.

this is not meant as a real bug report, more like a word of warning,
because there's not enough evidence that aic7xxx is the only possible
suspect. however, given that the problems have not been there for half a
year and the messages indicated a scsi problem, i consider it the
primary one. since the machine is a production machine, i am unable to
run extensive tests, but if i can do anything else to assist, i'd be
glad to.

holger


2001-11-01 19:41:06

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: aic7xxx 6.2.1 unstable?

>hi,
>
>i came to work today to find out that our news & web proxy server, which
>i updated from 2.4.8 to 2.4.13 a week ago, hung after a week of uptime.
>the machine was locked solid, last kernel messages on the console seemed
>to indicate a scsi problem.

Care to share the console messages?

--
Justin

2001-11-02 13:18:11

by Holger Lubitz

[permalink] [raw]
Subject: Re: aic7xxx 6.2.1 unstable?

"Justin T. Gibbs" proclaimed:

> Care to share the console messages?

Sorry, but the machine was locked solid and I did not take the time to
write them down manually. I was hoping for something in the system
logfile, but all I found there were several tons of these:

Nov 1 04:52:31 darkstar kernel: Device 08:11 not ready.
Nov 1 04:52:31 darkstar kernel: I/O error: dev 08:11, sector 4735240

(not really helpful since this comes from a higher layer)

As far as I remember, the console messages were some scsi timeouts and
"trying to abort command", which the host adapter believed successful,
but the sequence repeated. I think that the second drive (hosting
/var/spool) got confused, the system continued running, until the first
drive (hosting the rest of the partitions) got confused, too (approx.
five hours later) and then the system hung. Sorry again that I cannot
provide more detail, which was why I did not want to formally report a
bug, just pass a word of warning, partly in the hope that someone else
was experiencing similar things and could fill in some detail.

Holger

2001-11-03 11:15:17

by Matthias Andree

[permalink] [raw]
Subject: Re: aic7xxx 6.2.1 unstable?

On Thu, 01 Nov 2001, Holger Lubitz wrote:

> so, after nearly half a year without problems, i was reluctant to update
> the aic driver again, but was trying nevertheless because of the
> security fixes in recent kernels, which i considered "nice to have".
> however, version 6.2.1 of the adaptec driver seems to have broken the
> setup once again. i'm back to 2.4.8 for now.

I had some persistent difficulties (strange messages on bootup, but
otherwise fine operation) with 6.2.1 which went away when I upgraded to
6.2.4. This is not exactly helpful because you just don't play games on
a production machine, but might be a hint nonetheless. I'm currently
running 2.4.12-ac6 with some netfilter updates on my machines because
the SuSE 2.4.10 kernel that ships with 7.3 froze solid on at least two
machines for reasons I cannot tell.

--
Matthias Andree

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." Benjamin Franklin