Hi folks,
I'm seeing a problem on all the kernels that are 2.4.13pre1 and up. I've
lost the ability to communicate to my storage through some qlogic 2200 fibre
channel cards. All the disks are identified and given over to devices. The
problem occurs when you attempt to write to the disks. The system prints
out that the link is up, but will not move from there. The system becomes
unresponsive to the keyboard. Up to 2.4.12 works ok (I'm putting together
some comparative numbers), and the current ac tree is working correctly as
well.
I saw this behavior with 2.4.10 and the bounce memory patch by Jens Axboe,
but attributed it to operator error. I'm less sure now.
Any ideas about what is going on would be appreciated. I know this is a
sketchy description, but I'm hoping someone else has seen it too and can
help me get closer to a resolution.
Cary Dickens
Hewlett-Packard
Hardware:
4 processors, 4GB ram
45 fibre channel drives, set up in hardware RAID 0/1
2 direct Gigabit Ethernet connections between SPEC SFS prime client and
system under test
reiserfs
all NFS filesystems exported with sync,no_wdelay to insure O_SYNC writes to
storage
NFS v3 UDP
On Tue, Oct 16 2001, DICKENS,CARY (HP-Loveland,ex2) wrote:
> Hi folks,
>
> I'm seeing a problem on all the kernels that are 2.4.13pre1 and up. I've
> lost the ability to communicate to my storage through some qlogic 2200 fibre
> channel cards. All the disks are identified and given over to devices. The
> problem occurs when you attempt to write to the disks. The system prints
> out that the link is up, but will not move from there. The system becomes
> unresponsive to the keyboard. Up to 2.4.12 works ok (I'm putting together
> some comparative numbers), and the current ac tree is working correctly as
> well.
>
> I saw this behavior with 2.4.10 and the bounce memory patch by Jens Axboe,
> but attributed it to operator error. I'm less sure now.
>
> Any ideas about what is going on would be appreciated. I know this is a
> sketchy description, but I'm hoping someone else has seen it too and can
> help me get closer to a resolution.
This smells like a bug in the pci64 conversion of qlogicfc. Maybe davem
has an idea, I'll take a look too.
BTW, I'd love to see no-bounce numbers for this setup once this bug is
resolved!
--
Jens Axboe
From: Jens Axboe <[email protected]>
Date: Wed, 17 Oct 2001 08:18:37 +0200
On Tue, Oct 16 2001, DICKENS,CARY (HP-Loveland,ex2) wrote:
> I'm seeing a problem on all the kernels that are 2.4.13pre1 and up.
This smells like a bug in the pci64 conversion of qlogicfc. Maybe davem
has an idea, I'll take a look too.
Not if it broke in pre1 since the pci64 stuff went into pre2 :-)
Franks a lot,
David S. Miller
[email protected]
On Tue, Oct 16 2001, David S. Miller wrote:
> From: Jens Axboe <[email protected]>
> Date: Wed, 17 Oct 2001 08:18:37 +0200
>
> On Tue, Oct 16 2001, DICKENS,CARY (HP-Loveland,ex2) wrote:
> > I'm seeing a problem on all the kernels that are 2.4.13pre1 and up.
>
> This smells like a bug in the pci64 conversion of qlogicfc. Maybe davem
> has an idea, I'll take a look too.
>
> Not if it broke in pre1 since the pci64 stuff went into pre2 :-)
Ah yes, maybe this is my off-by-one or Cary's :-)
He also writes that it broke with 2.4.10 + block-highmem which has the
same PCI changes, so that's why I jumped to that conclusion. Cary, can
you verify that 2.4.13-pre1 _doesn't_ work and that 2.4.12 does?
--
Jens Axboe
>
> Ah yes, maybe this is my off-by-one or Cary's :-)
>
> He also writes that it broke with 2.4.10 + block-highmem which has the
> same PCI changes, so that's why I jumped to that conclusion. Cary, can
> you verify that 2.4.13-pre1 _doesn't_ work and that 2.4.12 does?
>
It is my off by one error. 2.4.13-pre1 works as well as 2.4.12. Sorry
about that.
Cary
"David S. Miller" wrote:
>
> From: Jens Axboe <[email protected]>
> Date: Wed, 17 Oct 2001 08:18:37 +0200
>
> On Tue, Oct 16 2001, DICKENS,CARY (HP-Loveland,ex2) wrote:
> > I'm seeing a problem on all the kernels that are 2.4.13pre1 and up.
>
> This smells like a bug in the pci64 conversion of qlogicfc. Maybe davem
> has an idea, I'll take a look too.
>
> Not if it broke in pre1 since the pci64 stuff went into pre2 :-)
since it broke as of pre2, the following things are suspect:
if BITS_PER_LONG > 32
#define pci_dma_lo32(a) (a & 0xffffffff)
#define pci_dma_hi32(a) ((a >> 32) & 0xffffffff)
#else
#define pci_dma_lo32(a) (a & 0xffffffff)
#define pci_dma_hi32(a) 0
#endif
#if BITS_PER_LONG <= 32
#define VIRT_TO_BUS_LOW(a) (uint32_t)virt_to_bus(((void *)a))
#define VIRT_TO_BUS_HIGH(a) (uint32_t)(0x0)
#else
#define VIRT_TO_BUS_LOW(a) (uint32_t)(0xffffffff & virt_to_bus((void
*)(a)))
#define VIRT_TO_BUS_HIGH(a) (uint32_t)(0xffffffff & (virt_to_bus((void
*)(a))>
#endif#
if BITS_PER_LONG > 32
uint64_t request_dma; /* Physical address. */
#else
uint32_t request_dma; /* Physical address. */
#endif
the later is abused instead of dma_addr_t and friends, and is used for
several other physical address
variables as well.
From: Arjan van de Ven <[email protected]>
Date: Wed, 17 Oct 2001 17:22:17 +0100
"David S. Miller" wrote:
> Not if it broke in pre1 since the pci64 stuff went into pre2 :-)
since it broke as of pre2, the following things are suspect:
Wrong qlogic driver arjan :-) There never was ever a reference
to virt_to_bus anything in the qlogicfc driver since early 2.3.x
days when the PCI DMA api first went into the tree.
Franks a lot,
David S. Miller
[email protected]
From: "DICKENS,CARY (HP-Loveland,ex2)" <[email protected]>
Date: Wed, 17 Oct 2001 12:01:54 -0400
It is my off by one error. 2.4.13-pre1 works as well as 2.4.12. Sorry
about that.
So now please try the broken 2.4.13-preX kernels with
CONFIG_SCSI_QLOGIC_FC_FIRMWARE set, does that
make any difference?
I have a feeling that will make it work.
Franks a lot,
David S. Miller
[email protected]
> So now please try the broken 2.4.13-preX kernels with
> CONFIG_SCSI_QLOGIC_FC_FIRMWARE set, does that
> make any difference?
>
> I have a feeling that will make it work.
>
> Franks a lot,
> David S. Miller
> [email protected]
>
I've done that and the problem is still there. It no longer gives me the
perpetual link is up message when trying to mount storage on the fibre
channel disks. Now it just stops. I booted without any of the fibre
storage being mounted and ran an fdisk on the storage in question. The
response from the ps -eo cmd,wchan is:
fdisk /dev/sdc lock_page
Hope this helps,
Cary
From: "DICKENS,CARY (HP-Loveland,ex2)" <[email protected]>
Date: Wed, 17 Oct 2001 15:16:03 -0400
I've done that and the problem is still there. It no longer gives me the
perpetual link is up message when trying to mount storage on the fibre
channel disks. Now it just stops. I booted without any of the fibre
storage being mounted and ran an fdisk on the storage in question. The
response from the ps -eo cmd,wchan is:
fdisk /dev/sdc lock_page
Ok, and to reiterate this is on an x86 system with HIGHMEM enabled?
Also, just to confirm, you have _not_ applied Jen's block highmem
patches on top of this 2.4.13-preX tree right? It is just a vanilla
2.4.13-preX tree?
Franks a lot,
David S. Miller
[email protected]
>
> Ok, and to reiterate this is on an x86 system with HIGHMEM enabled?
> Also, just to confirm, you have _not_ applied Jen's block highmem
> patches on top of this 2.4.13-preX tree right? It is just a vanilla
> 2.4.13-preX tree?
>
> Franks a lot,
> David S. Miller
> [email protected]
>
This _is_ and x86 system with HIGHMEM enabled. The kernel is 2.4.13-pre3
and does _not_ have Jen's block highmem patches.
The hardware:
4 processors(700Mhz Xeon), 4GB ram
45 fibre channel drives, set up in hardware RAID 0/1
2 direct Gigabit Ethernet connections between SPEC SFS prime client and
system under test
reiserfs
all NFS filesystems exported with sync,no_wdelay to insure O_SYNC writes to
storage
NFS v3 UDP
Please try this patch:
--- linux/drivers/scsi/qlogicfc.h.~1~ Tue Nov 28 08:33:08 2000
+++ linux/drivers/scsi/qlogicfc.h Thu Oct 18 05:22:52 2001
@@ -62,13 +62,8 @@
* determined for each queue request anew.
*/
-#if BITS_PER_LONG > 32
#define DATASEGS_PER_COMMAND 2
#define DATASEGS_PER_CONT 5
-#else
-#define DATASEGS_PER_COMMAND 3
-#define DATASEGS_PER_CONT 7
-#endif
#define QLOGICFC_REQ_QUEUE_LEN 127 /* must be power of two - 1 */
#define QLOGICFC_MAX_SG(ql) (DATASEGS_PER_COMMAND + (((ql) > 0) ? DATASEGS_PER_CONT*((ql) - 1) : 0))
David,
This appears to clear up the problems that I was having. I'm running
numbers now and will let you know how it goes.
Thanks,
Cary
> -----Original Message-----
> From: David S. Miller [mailto:[email protected]]
> Sent: Thursday, October 18, 2001 6:25 AM
> To: [email protected]
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: Problem with 2.4.14prex and qlogicfc
>
>
>
> Please try this patch:
>
> --- linux/drivers/scsi/qlogicfc.h.~1~ Tue Nov 28 08:33:08 2000
> +++ linux/drivers/scsi/qlogicfc.h Thu Oct 18 05:22:52 2001
> @@ -62,13 +62,8 @@
> * determined for each queue request anew.
> */
>
> -#if BITS_PER_LONG > 32
> #define DATASEGS_PER_COMMAND 2
> #define DATASEGS_PER_CONT 5
> -#else
> -#define DATASEGS_PER_COMMAND 3
> -#define DATASEGS_PER_CONT 7
> -#endif
>
> #define QLOGICFC_REQ_QUEUE_LEN 127 /* must be
> power of two - 1 */
> #define QLOGICFC_MAX_SG(ql) (DATASEGS_PER_COMMAND + (((ql)
> > 0) ? DATASEGS_PER_CONT*((ql) - 1) : 0))
>