2002-02-03 16:05:04

by Alexander Sandler

[permalink] [raw]
Subject: 2.4.17: Bug?

Hi all.

I found something that looks like a bug.

The configuration is the following:
Dual CPU machine with Linux RedHat 7.1 running kernel 2.4.17
(official), connected to SAN with two FC-HBAs (QLogic 2200).

Bug appears when I am starting two processes, first doing I/O
to first LUN through first HBA and second doing I/O to second
LUN through second HBA. When I am disconnecting first HBA
from the SAN, machine getting into four minute SCSI error
recovery and then first process exits with I/O error as it
should, while second process getting stacked and never
returns (this is the problem - it should continue doing I/O
like nothing happend).

This problem appearing on SMP kernel. On UP kernel,
everything works fine.
I found this while I was working on volume manager driver.
This driver should be able to do fail over to another HBA (if
available) in case of error.

I have all required hardware and software to work out this
problem so I'll be glad to give a hand to who ever can
(should?) or/and will start working on this.

Alexandr Sandler.


2002-02-03 16:32:00

by Alexander Sandler

[permalink] [raw]
Subject: RE: 2.4.17: Bug?

It's 4.27beta.
QLogic currently has three different drivers on their web site. This one is
the oldest and the most stable. With other two I wan't even able to those
LUNs.

One more thing I didn't tell. According to 'ps', stacked process is sleeping
in __get_request_wait() from ll_rw_blk.c

Alexandr Sandler.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Sunday, February 03, 2002 4:27 PM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: 2.4.17: Bug?
>
>
> In article
> <[email protected]
> > you wrote:
> > Hi all.
> > The configuration is the following:
> > Dual CPU machine with Linux RedHat 7.1 running kernel 2.4.17
> > (official), connected to SAN with two FC-HBAs (QLogic 2200).
>
> which driver are you using for that ?
>

2002-02-03 16:28:10

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.4.17: Bug?

In article <[email protected]> you wrote:
> Hi all.
> The configuration is the following:
> Dual CPU machine with Linux RedHat 7.1 running kernel 2.4.17
> (official), connected to SAN with two FC-HBAs (QLogic 2200).

which driver are you using for that ?

2002-02-04 00:25:16

by Tim Pepper

[permalink] [raw]
Subject: Re: 2.4.17: Bug?

Sounds like you're using the qlogic 4.27beta or 4.36beta from the qlogic
website. The 4.46.12beta has a shorter time out. In any of them you can
control this...Look at qla2x00.h.

t.

--
*********************************************************
* tpepper@vato dot org * Venimus, Vidimus, *
* http://www.vato.org/~tpepper * Dolavimus *
*********************************************************

2002-02-04 18:45:59

by Tim Pepper

[permalink] [raw]
Subject: Re: 2.4.17: Bug?

On Mon 04 Feb at 11:12:27 +0200 [email protected] done said:
> No no no no.
>
> This is a bug. For me, it took two hours to get released
> from that. There is no such thing two hours timeout.
> And who said this is only two hours? I spoke with Arjan van de Ven
> and he told me that it may take for up to 14 hours.
>
> Anyway, Arjan told me that he fixed this bug in the version that
> will be out with 2.4.18.

We're talking about different things. Looking at the original post I see
I missed that the concern was the hung process not the "long" error retry.

Anybody have a link to what Arjan fixed? I used to have occasional hangs like
this but they seemed to have gone away with qlogic's latest (4.46.12beta)
driver.

t.

--
*********************************************************
* tpepper@vato dot org * Venimus, Vidimus, *
* http://www.vato.org/~tpepper * Dolavimus *
*********************************************************

2002-02-04 20:45:05

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.4.17: Bug?

On Mon, Feb 04, 2002 at 10:45:25AM -0800, Tim Pepper wrote:
> On Mon 04 Feb at 11:12:27 +0200 [email protected] done said:
> > No no no no.
> >
> > This is a bug. For me, it took two hours to get released
> > from that. There is no such thing two hours timeout.
> > And who said this is only two hours? I spoke with Arjan van de Ven
> > and he told me that it may take for up to 14 hours.
> >
> > Anyway, Arjan told me that he fixed this bug in the version that
> > will be out with 2.4.18.

Misunderstanding; I did not say (or intend to say) that it will go into
2.4.18; it's not good enough yet.

2002-02-05 19:03:38

by Alexander Sandler

[permalink] [raw]
Subject: RE: 2.4.17: Bug?

Sorry about this. I though it is good anough.

Anyway, Arjan, do you have any suggestions for me? With problems in device
detection QLogic's drivers have (those from their web site) it appears that
there is no solution for this problem right now. Am I correct?

> Misunderstanding; I did not say (or intend to say) that it
> will go into 2.4.18; it's not good enough yet.

Sasha.