2003-07-18 08:20:12

by Anders Gustafsson

[permalink] [raw]
Subject: 2.6.0-test1 gets corrupted data when loading init

Hi,

with 2.6.0-test1 I get error like unresolved symbol in libs used by init or
bad elf-file. Sometimes init manages to start and there are same type of
errors in bash when it tries to run initscripts. This is on a dual xeon, no
highmem (512M), ServerWorks chipset, aic79xx scsi, / on scsi.

It breaks between 2.5.70 and 2.5.70-bk1, which contains a update in the
aic79xx-drivers, so my guess is related to that.

Scsi-related hardware in machine:

scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.0
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.0
<Adaptec AIC7902 Ultra320 SCSI adapter>
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512 SCBs

(scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
Vendor: SEAGATE Model: ST373307LW Rev: 0002
Type: Direct-Access ANSI SCSI revision: 03


--
Anders Gustafsson - [email protected] - http://0x63.nu/


2003-07-18 09:48:31

by Anders Gustafsson

[permalink] [raw]
Subject: Re: 2.6.0-test1 gets corrupted data when loading init

On Fri, Jul 18, 2003 at 10:34:58AM +0200, Anders Gustafsson wrote:
> It breaks between 2.5.70 and 2.5.70-bk1, which contains a update in the
> aic79xx-drivers, so my guess is related to that.

http://linux.bkbits.net:8080/linux-2.5/[email protected] is the changeset that
makes it stop working.

(and if that cset-number isn't stable the comments for it are:

Aic79xx Driver Update
o Change handling of the Rev. A packetized lun output bug
to be more efficient by having the sequencer copy the
single byte of valid lun data into the long lun field.

)

--
Anders Gustafsson - [email protected] - http://0x63.nu/

2003-07-18 11:25:23

by Anders Gustafsson

[permalink] [raw]
Subject: Re: 2.6.0-test1 gets corrupted data when loading init

On Fri, Jul 18, 2003 at 11:51:08AM +0200, Anders Gustafsson wrote:
> On Fri, Jul 18, 2003 at 10:34:58AM +0200, Anders Gustafsson wrote:
> > It breaks between 2.5.70 and 2.5.70-bk1, which contains a update in the
> > aic79xx-drivers, so my guess is related to that.
>
> http://linux.bkbits.net:8080/linux-2.5/[email protected] is the changeset that
> makes it stop working.

Yeah, and reversing that on 2.6.0-test+bk with the attached patch makes it
work on 2.6.0-test1.

But I get these messages:

(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0): 160.000MB/s transfers (80.000MHz DT|IU|QAS, 16bit)
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0): 160.000MB/s transfers (80.000MHz DT, 16bit)
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error
(scsi1:A:0): 80.000MB/s transfers (40.000MHz DT, 16bit)
(scsi1:A:0:0): CDB: 0x12 0x1 0x80 0x0 0x60 0x0
(scsi1:A:0:0): Saw underflow (72 of 96 bytes). Treated as error


--
Anders Gustafsson - [email protected] - http://0x63.nu/

--- a/drivers/scsi/aic7xxx/aic79xx.seq Fri Jul 18 19:34:59 2003
+++ b/drivers/scsi/aic7xxx/aic79xx.seq Fri Jul 18 19:34:59 2003
@@ -261,15 +261,6 @@
clr A;
add CMDS_PENDING, 1;
adc CMDS_PENDING[1], A;
- if ((ahd->bugs & AHD_PKT_LUN_BUG) != 0) {
- /*
- * "Short Luns" are not placed into outgoing LQ
- * packets in the correct byte order. Use a full
- * sized lun field instead and fill it with the
- * one byte of lun information we support.
- */
- mov SCB_PKT_LUN[6], SCB_LUN;
- }
/*
* The FIFO use count field is shared with the
* tag set by the host so that our SCB dma engine
--- a/drivers/scsi/aic7xxx/aic79xx_core.c Fri Jul 18 19:34:59 2003
+++ b/drivers/scsi/aic7xxx/aic79xx_core.c Fri Jul 18 19:34:59 2003
@@ -8272,6 +8272,8 @@
download_consts[PKT_OVERRUN_BUFOFFSET] =
(ahd->overrun_buf - (uint8_t *)ahd->qoutfifo) / 256;
download_consts[SCB_TRANSFER_SIZE] = SCB_TRANSFER_SIZE_1BYTE_LUN;
+ if ((ahd->bugs & AHD_PKT_LUN_BUG) != 0)
+ download_consts[SCB_TRANSFER_SIZE] = SCB_TRANSFER_SIZE_FULL_LUN;
cur_patch = patches;
downloaded = 0;
skip_addr = 0;
--- a/drivers/scsi/aic7xxx/aic79xx_inline.h Fri Jul 18 19:34:59 2003
+++ b/drivers/scsi/aic7xxx/aic79xx_inline.h Fri Jul 18 19:34:59 2003
@@ -272,6 +272,10 @@
if ((scb->flags & SCB_PACKETIZED) != 0) {
/* XXX what about ACA?? It is type 4, but TAG_TYPE == 0x3. */
scb->hscb->task_attribute = scb->hscb->control & SCB_TAG_TYPE;
+ /*
+ * For Rev A short lun workaround.
+ */
+ scb->hscb->pkt_long_lun[6] = scb->hscb->lun;
} else {
if (ahd_get_transfer_length(scb) & 0x01)
scb->hscb->task_attribute = SCB_XFERLEN_ODD;

2003-07-22 21:29:52

by Justin T. Gibbs

[permalink] [raw]
Subject: Re: 2.6.0-test1 gets corrupted data when loading init

> On Fri, Jul 18, 2003 at 11:51:08AM +0200, Anders Gustafsson wrote:
>> On Fri, Jul 18, 2003 at 10:34:58AM +0200, Anders Gustafsson wrote:
>> > It breaks between 2.5.70 and 2.5.70-bk1, which contains a update in the
>> > aic79xx-drivers, so my guess is related to that.
>>
>> http://linux.bkbits.net:8080/linux-2.5/[email protected] is the changeset that
>> makes it stop working.
>
> Yeah, and reversing that on 2.6.0-test+bk with the attached patch makes it
> work on 2.6.0-test1.

There are a whole slew of later changesets that haven't made it in yet.
The root cause of your particular problem is not the lun copy optimization,
but a problem with the layout of a data structure that is dma'ed to the
controller and a controller errata. The fix for this is available in
the 20030603 bksend file at my site:

http://people.FreeBSD.org/~gibbs/linux/SRC/

I will try to find some time later this week to review the code that
is now in 2.6 and generate updated changesets for that branch.

--
Justin

2003-07-23 17:11:14

by Anders Gustafsson

[permalink] [raw]
Subject: Re: 2.6.0-test1 gets corrupted data when loading init

On Tue, Jul 22, 2003 at 03:46:31PM -0600, Justin T. Gibbs wrote:
> There are a whole slew of later changesets that haven't made it in yet.
> The root cause of your particular problem is not the lun copy optimization,
> but a problem with the layout of a data structure that is dma'ed to the
> controller and a controller errata. The fix for this is available in
> the 20030603 bksend file at my site:
>
> http://people.FreeBSD.org/~gibbs/linux/SRC/

Yes, thanks, that works just fine.

--
Anders Gustafsson - [email protected] - http://0x63.nu/