2006-08-22 14:46:38

by David Johnson

[permalink] [raw]
Subject: sym53c8xx PCI card broken in 2.6.18

Hi all,

I'm running a Sun Ultra Enterprise 450 (SPARC64) machine which has an on-board
SCSI controller and a PCI SCSI controller, both supported by the sym53c8xx
driver.

With 2.6.17.9 (and earlier) SCSI works perfectly, but with 2.6.18-rc4 and
2.6.18-rc4-git1 I'm getting errors on boot for all devices attached to the
PCI card, but all the devices attached to the on-board controller are
detected and configured OK.

lspci identifies the on-board controller as:
SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 03)
and the PCI controller as:
SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)

Here's the output from initialisation of the devices on the PCI card (repeated
for every device):
scsi2: sym-2.2.3
scsi 2:0:0:0 ABORT operation started
scsi 2:0:0:0 ABORT operation timed out
scsi 2:0:0:0 DEVICE RESET operation started
scsi 2:0:0:0 DEVICE RESET operation timed out
scsi 2:0:0:0 BUS RESET operation started
scsi 2:0:0:0 BUS RESET operation timed out
scsi 2:0:0:0 HOST RESET operation started
sym2: SCSI bus has been reset
scsi 2:0:0:0 HOST RESET operation timed out
scsi: device offlined - not ready after error recovery

The devices on the PCI controller are a mixture of 'Fujitsu MAG3182L SUN18G'
and 'Seagate ST318203LSUN18G' drives.

Looking through the changelogs between 2.6.17.9 and 2.6.18-rc4-git1, I can't
see any changes to sym53c8xx, so I'm guessing this has been caused by some
generic SCSI subsystem change. Let me know if I can do any more to debug.

Regards,
David.

--
David Johnson
http://www.david-web.co.uk - My Personal Website
http://www.penguincomputing.co.uk - Need a Web Developer?


2006-08-22 20:38:44

by David Miller

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18


Sounds like the interrupts are being misconfigured for the
PCI card. Please post 2 pieces of information:

1) Boot logs with "ofdebug=2" given on the kernel command line
2) Output of "/usr/sbin/prtconf -pv"

Thanks.

2006-08-22 22:40:11

by David Johnson

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

On Tuesday 22 August 2006 21:39, you wrote:
> Sounds like the interrupts are being misconfigured for the
> PCI card. Please post 2 pieces of information:
>
> 1) Boot logs with "ofdebug=2" given on the kernel command line
> 2) Output of "/usr/sbin/prtconf -pv"
>

Both attached.

Now that I've let the system finish booting, there are also a few oopses that
seem related to the new openprom interface.

Regards,
David.

--
David Johnson
http://www.david-web.co.uk - My Personal Website
http://www.penguincomputing.co.uk - Need a Web Developer?


Attachments:
(No filename) (532.00 B)
dmesg (29.75 kB)
prtconf (41.02 kB)
Download all attachments

2006-08-23 03:59:17

by Eric Brower

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

On 8/22/06, David Johnson <[email protected]> wrote:
> On Tuesday 22 August 2006 21:39, you wrote:
> > Sounds like the interrupts are being misconfigured for the
> > PCI card. Please post 2 pieces of information:
> >
> > 1) Boot logs with "ofdebug=2" given on the kernel command line

The envctrl OOPS is definately my fault in the blind conversion of the
driver to the OF interface-- of_find_propery() return values should be
checked for NULL rather than relying upon a -1 value stored into lenp.
We can discuss this separately, since you are using an out-of-kernel
driver.

Thanks,
E


> > 2) Output of "/usr/sbin/prtconf -pv"
> >
>
> Both attached.
>
> Now that I've let the system finish booting, there are also a few oopses that
> seem related to the new openprom interface.
>
> Regards,
> David.
>
> --
> David Johnson
> http://www.david-web.co.uk - My Personal Website
> http://www.penguincomputing.co.uk - Need a Web Developer?
>
>
>


--
E

2006-08-23 04:01:38

by David Miller

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

From: "Eric Brower" <[email protected]>
Date: Tue, 22 Aug 2006 20:59:14 -0700

> The envctrl OOPS is definately my fault in the blind conversion of the
> driver to the OF interface-- of_find_propery() return values should be
> checked for NULL rather than relying upon a -1 value stored into lenp.
> We can discuss this separately, since you are using an out-of-kernel
> driver.

Ok.

BTW, it is better to use "of_get_property()" if you are actually
interested in the value since it will return a void pointer to the
property value instead of a "struct property". of_find_property() is
useful if you just want to check for existence or if you want to
modify the property value.

2006-08-23 05:35:18

by Eric Brower

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

On 8/22/06, David Miller <[email protected]> wrote:
> From: "Eric Brower" <[email protected]>
> Date: Tue, 22 Aug 2006 20:59:14 -0700
>
> > The envctrl OOPS is definately my fault in the blind conversion of the
> > driver to the OF interface-- of_find_propery() return values should be
> > checked for NULL rather than relying upon a -1 value stored into lenp.
> > We can discuss this separately, since you are using an out-of-kernel
> > driver.
>
> Ok.
>
> BTW, it is better to use "of_get_property()" if you are actually
> interested in the value since it will return a void pointer to the
> property value instead of a "struct property". of_find_property() is
> useful if you just want to check for existence or if you want to
> modify the property value.
>

Thanks, Dave. This driver is interested in property existence and
length-- some OBP versions don't create all expected envctrl
properties, and due to lack of implementation documentation the
property lengths are being checked as well; so of_find_property()
seems appropriate in this case.

Would you consider assigning -1 to lenp (if provided) in
of_find_property() when no matching device is found?

Thanks,
E

--
E

2006-08-23 05:50:04

by David Miller

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

From: "Eric Brower" <[email protected]>
Date: Tue, 22 Aug 2006 22:35:14 -0700

> Would you consider assigning -1 to lenp (if provided) in
> of_find_property() when no matching device is found?

I think checking for NULL should be the first thing a caller of these
interfaces should do. So from that perspective, I don't think putting
anything in *lenp makes sense. It's value is undefined.

In fact since we'll leave *lenp alone if the property doesn't exist,
you can initialize it to -1 if you want to simplify your checks.

2006-08-23 21:01:48

by Daniel Smolik

[permalink] [raw]
Subject: Re: sym53c8xx PCI card broken in 2.6.18

David Johnson napsal(a):
> Hi all,
>
> I'm running a Sun Ultra Enterprise 450 (SPARC64) machine which has an on-board
> SCSI controller and a PCI SCSI controller, both supported by the sym53c8xx
> driver.
>
> With 2.6.17.9 (and earlier) SCSI works perfectly, but with 2.6.18-rc4 and
> 2.6.18-rc4-git1 I'm getting errors on boot for all devices attached to the
> PCI card, but all the devices attached to the on-board controller are
> detected and configured OK.
>
> lspci identifies the on-board controller as:
> SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 03)
> and the PCI controller as:
> SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)
>
> Here's the output from initialisation of the devices on the PCI card (repeated
> for every device):
> scsi2: sym-2.2.3
> scsi 2:0:0:0 ABORT operation started
> scsi 2:0:0:0 ABORT operation timed out
> scsi 2:0:0:0 DEVICE RESET operation started
> scsi 2:0:0:0 DEVICE RESET operation timed out
> scsi 2:0:0:0 BUS RESET operation started
> scsi 2:0:0:0 BUS RESET operation timed out
> scsi 2:0:0:0 HOST RESET operation started
> sym2: SCSI bus has been reset
> scsi 2:0:0:0 HOST RESET operation timed out
> scsi: device offlined - not ready after error recovery
>
> The devices on the PCI controller are a mixture of 'Fujitsu MAG3182L SUN18G'
> and 'Seagate ST318203LSUN18G' drives.
>
> Looking through the changelogs between 2.6.17.9 and 2.6.18-rc4-git1, I can't
> see any changes to sym53c8xx, so I'm guessing this has been caused by some
> generic SCSI subsystem change. Let me know if I can do any more to debug.
>
> Regards,
> David.
>
I must say that I have the same experience with E250 a D1000 disk array.
I think that is HW problem but I have the same symptom described before.
If I have disk in internal bay and controller all works perfect. But if I put
disk to D1000 I get the same error. I have use 2.6.18-rc3.

Dan