2007-11-28 17:42:16

by Anders Henke

[permalink] [raw]
Subject: broken dpt_i2o (was: ext2_check_page: bad entry in directory)

Hi,

I've been bitten by the problem noted in the lkml message of rougly the same
subject, dated back on Oct/24/2007.
My boxes were running 2.6.19 and have been upgraded to 2.6.23.1, but their
bootup failed when trying to mount the root (ext2) filesystem:

---cut
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:08: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:09: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Loading Adaptec I2O RAID: Version 2.4 Build 5go
Detecting Adaptec I2O RAID controllers...
ACPI: PCI Interrupt 0000:04:08.0[A] -> GSI 48 (level, low) -> IRQ 16
Adaptec I2O RAID controller 0 irq=16
BAR0 f8880000 - size= 100000
BAR1 f8a00000 - size= 1000000
dpti: If you have a lot of devices this could take a few minutes.
dpti0: Reading the hardware resource table.
TID 008 Vendor: ADAPTEC Device: AIC-7902 Rev: 00000001
TID 009 Vendor: ADAPTEC Device: AIC-7902 Rev: 00000001
TID 515 Vendor: ESG-SHV S Device: SCA HSBP M21 Rev: 0.080
TID 518 Vendor: ADAPTEC R Device: RAID-1 Rev: 3B0AD
scsi0 : Vendor: Adaptec Model: 2010S FW:3B0A
scsi 0:1:0:0: Direct-Access ADAPTEC RAID-1 3B0A PQ: 0
ANSI: 2
scsi 0:1:6:0: Processor ESG-SHV SCA HSBP M21 0.08 PQ: 0
ANSI: 2
Adaptec aacraid driver 1.1-5[2449]-ms
GDT-HA: Storage RAID Controller Driver. Version: 3.05
GDT-HA: Found 0 PCI Storage RAID Controllers
3ware Storage Controller device driver for Linux v1.26.02.002.
3ware 9000 Storage Controller device driver for Linux v2.26.02.010.
sd 0:1:0:0: [sda] 143374336 512-byte hardware sectors (73408 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Write cache: enabled, read cache: enabled, supports
DPO and FUA
sd 0:1:0:0: [sda] 143374336 512-byte hardware sectors (73408 MB)
sd 0:1:0:0: [sda] Write Protect is off
sd 0:1:0:0: [sda] Write cache: enabled, read cache: enabled, supports
DPO and FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 >
sd 0:1:0:0: [sda] Attached SCSI disk
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please
boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
md: raid1 personality registered for level 1
EDAC MC: Ver: 2.1.0 Oct 23 2007
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
Starting balanced_irq
Using IPI Shortcut mode
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 264k freed
EXT2-fs error (device sda1): ext2_check_page: bad entry in directory #2:
rec_len is smaller than minimal - offset=0, inode=0, rec_len=0,
name_len=0
Warning: unable to open an initial console.
Kernel panic - not syncing: No init found. Try passing init= option to
kernel.
Rebooting in 30 seconds..
---cut

Rebooting the box into 2.6.19 works without any problems.

I've checked the changelogs for 2.6.24-rc*, but haven't come across a
solution for this issue; but maybe I've also overseen the point.

http://lkml.org/lkml/2007/10/24/224, this bug has been reported earlier.

I've contacted Jan Kara off-list; as booting into 2.6.19 works and e2fsck
on an e2image file doesn't show any errors, we assumed that the Ext2 itself
is fine.

As "everything is reported as being zero" is quite odd an Jan took a
guess that it might be block-layer or driver-related, I've assumed
that the driver is responsible for this; just out of the curiousity,
I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.

I haven't yet fine-tested from which kernel release on the dpt_i2o driver
behaves like this and spews out zeroed blocks when trying to mount
the rootfs. Maybe this is just some timing issue.


For some strange reason, this doesn't affect all boxes running the
dpt_i2o driver.

Affected (verified on 6 out of 6 tested boxes so far):

Intel SE7501WV2S using an Adaptec 2010S with the following "lspci -vn"-section:

0000:04:08.0 0104: 1044:a511 (rev 01)
Subsystem: 1044:c035
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
BIST result: 00
Memory at fe900000 (32-bit, non-prefetchable) [size=1M]
Memory at fb000000 (32-bit, prefetchable) [size=16M]
Memory at f8000000 (32-bit, prefetchable) [size=32M]
Expansion ROM at f6200000 [disabled] [size=32K]
Capabilities: [44] Power Management version 2

Not affected are e.g. a box with a Supermicro X5DPR using an Adaptec 2015S
and the following "lspci -vn"-section:

0000:03:03.0 0104: 1044:a511 (rev 01)
Subsystem: 1044:c034
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
BIST result: 00
Memory at f8300000 (32-bit, non-prefetchable) [size=1M]
Memory at fb000000 (32-bit, prefetchable) [size=16M]
Memory at fc000000 (32-bit, prefetchable) [size=32M]
Capabilities: [44] Power Management version 2

... and of course boxes not using an dpt_i2o-driven Controller.

The Adaptec 2010S-boxes are currently running the Adaptec firmware 3B05,
while the Adaptec 2015S box is running firmware 3B0A. As those
controllers are capable of running the same firmware image, maybe
a firmware update might resolve this issue as well (well, unlikely
according to the changelog); the above bootup log is from an updated
box, so the firmware update didn't help. What really helps is the older
driver.




Anders
--
1&1 Internet AG System Design
Brauerstrasse 48 v://49.721.91374.50
D-76135 Karlsruhe f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren


2007-11-29 12:32:12

by Anders Henke

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On November 28 2007, Anders Henke wrote:
> As "everything is reported as being zero" is quite odd an Jan took a
> guess that it might be block-layer or driver-related, I've assumed
> that the driver is responsible for this; just out of the curiousity,
> I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
>
> I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> behaves like this and spews out zeroed blocks when trying to mount
> the rootfs. Maybe this is just some timing issue.

I've started the fine-tests and can say so far that dpt_i2o from
2.6.22 is still fine. Test is simple:

anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/

... recompile the kernel, reboot: works.

2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
patch sets:
-one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
-one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
-one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.

When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
the "zero blocks"-symptom show up, so it's the "lucky" situation
that the smallest patch actually seams to be the broken one.

According to the 2.6.23-rc1 short-form changelog, there is
one major edit on the dpt_i2o driver:

FUJITA Tomonori

[SCSI] dpt_i2o: convert to use the data buffer accessors

Stephen Rothwell
dpt_i2o depends on virt_to_bus

Fujita, would you please take a look at this?

I think that something's broken in there, leading to the dpt_i2o
sending out blocks of zeroes right after initialization, at least on
some specific controllers (in this case, Adaptec 2010S on Intel
SE7501WV2S-based boxes).

I don't have insight kernel driver development knowledge, so I'm
quite out of help right now. Nevertheless, I'll add the diff
from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:

---cut
diff -Nur linux-2.6.22/drivers/scsi/dpt_i2o.c linux-2.6.23-rc1/drivers/scsi/dpt_i2o.c
--- linux-2.6.22/drivers/scsi/dpt_i2o.c 2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.23-rc1/drivers/scsi/dpt_i2o.c 2007-07-22 22:41:00.000000000 +0200
@@ -2078,12 +2078,13 @@
u32 *lenptr;
int direction;
int scsidir;
+ int nseg;
u32 len;
u32 reqlen;
s32 rcode;

memset(msg, 0 , sizeof(msg));
- len = cmd->request_bufflen;
+ len = scsi_bufflen(cmd);
direction = 0x00000000;

scsidir = 0x00000000; // DATA NO XFER
@@ -2140,21 +2141,21 @@
lenptr=mptr++; /* Remember me - fill in when we know */
reqlen = 14; // SINGLE SGE
/* Now fill in the SGList and command */
- if(cmd->use_sg) {
- struct scatterlist *sg = (struct scatterlist *)cmd->request_buffer;
- int sg_count = pci_map_sg(pHba->pDev, sg, cmd->use_sg,
- cmd->sc_data_direction);

+ nseg = scsi_dma_map(cmd);
+ BUG_ON(nseg < 0);
+ if (nseg) {
+ struct scatterlist *sg;

len = 0;
- for(i = 0 ; i < sg_count; i++) {
+ scsi_for_each_sg(cmd, sg, nseg, i) {
*mptr++ = direction|0x10000000|sg_dma_len(sg);
len+=sg_dma_len(sg);
*mptr++ = sg_dma_address(sg);
- sg++;
+ /* Make this an end of list */
+ if (i == nseg - 1)
+ mptr[-2] = direction|0xD0000000|sg_dma_len(sg);
}
- /* Make this an end of list */
- mptr[-2] = direction|0xD0000000|sg_dma_len(sg-1);
reqlen = mptr - msg;
*lenptr = len;

@@ -2163,16 +2164,8 @@
len, cmd->underflow);
}
} else {
- *lenptr = len = cmd->request_bufflen;
- if(len == 0) {
- reqlen = 12;
- } else {
- *mptr++ = 0xD0000000|direction|cmd->request_bufflen;
- *mptr++ = pci_map_single(pHba->pDev,
- cmd->request_buffer,
- cmd->request_bufflen,
- cmd->sc_data_direction);
- }
+ *lenptr = len = 0;
+ reqlen = 12;
}

/* Stick the headers on */
@@ -2232,7 +2225,7 @@
hba_status = detailed_status >> 8;

// calculate resid for sg
- cmd->resid = cmd->request_bufflen - readl(reply+5);
+ scsi_set_resid(cmd, scsi_bufflen(cmd) - readl(reply+5));

pHba = (adpt_hba*) cmd->device->host->hostdata[0];

---cut

Personally I guess that it's the large drop from lines 2164 on
that's broken, who replaces a mapping routine by some static assignment.


Regards,

Anders
--
1&1 Internet AG System Design
Brauerstrasse 48 v://49.721.91374.50
D-76135 Karlsruhe f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren

2007-12-05 00:58:23

by Andrew Morton

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Thu, 29 Nov 2007 13:31:50 +0100
Anders Henke <[email protected]> wrote:

> On November 28 2007, Anders Henke wrote:
> > As "everything is reported as being zero" is quite odd an Jan took a
> > guess that it might be block-layer or driver-related, I've assumed
> > that the driver is responsible for this; just out of the curiousity,
> > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> >
> > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > behaves like this and spews out zeroed blocks when trying to mount
> > the rootfs. Maybe this is just some timing issue.
>
> I've started the fine-tests and can say so far that dpt_i2o from
> 2.6.22 is still fine. Test is simple:
>
> anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
>
> ... recompile the kernel, reboot: works.
>
> 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> patch sets:
> -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
>
> When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> the "zero blocks"-symptom show up, so it's the "lucky" situation
> that the smallest patch actually seams to be the broken one.
>
> According to the 2.6.23-rc1 short-form changelog, there is
> one major edit on the dpt_i2o driver:
>
> FUJITA Tomonori
>
> [SCSI] dpt_i2o: convert to use the data buffer accessors
>
> Stephen Rothwell
> dpt_i2o depends on virt_to_bus
>
> Fujita, would you please take a look at this?

He won't have seen this. cc's added.

> I think that something's broken in there, leading to the dpt_i2o
> sending out blocks of zeroes right after initialization, at least on
> some specific controllers (in this case, Adaptec 2010S on Intel
> SE7501WV2S-based boxes).
>
> I don't have insight kernel driver development knowledge, so I'm
> quite out of help right now. Nevertheless, I'll add the diff
> from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
>

Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
corruption problems?

Thanks.


diff -puN drivers/scsi/dpt_i2o.c~revert-dpt_i2o-convert-to-use-the-data-buffer-accessors drivers/scsi/dpt_i2o.c
--- a/drivers/scsi/dpt_i2o.c~revert-dpt_i2o-convert-to-use-the-data-buffer-accessors
+++ a/drivers/scsi/dpt_i2o.c
@@ -2062,13 +2062,12 @@ static s32 adpt_scsi_to_i2o(adpt_hba* pH
u32 *lenptr;
int direction;
int scsidir;
- int nseg;
u32 len;
u32 reqlen;
s32 rcode;

memset(msg, 0 , sizeof(msg));
- len = scsi_bufflen(cmd);
+ len = cmd->request_bufflen;
direction = 0x00000000;

scsidir = 0x00000000; // DATA NO XFER
@@ -2125,21 +2124,21 @@ static s32 adpt_scsi_to_i2o(adpt_hba* pH
lenptr=mptr++; /* Remember me - fill in when we know */
reqlen = 14; // SINGLE SGE
/* Now fill in the SGList and command */
+ if(cmd->use_sg) {
+ struct scatterlist *sg = (struct scatterlist *)cmd->request_buffer;
+ int sg_count = pci_map_sg(pHba->pDev, sg, cmd->use_sg,
+ cmd->sc_data_direction);

- nseg = scsi_dma_map(cmd);
- BUG_ON(nseg < 0);
- if (nseg) {
- struct scatterlist *sg;

len = 0;
- scsi_for_each_sg(cmd, sg, nseg, i) {
+ for(i = 0 ; i < sg_count; i++) {
*mptr++ = direction|0x10000000|sg_dma_len(sg);
len+=sg_dma_len(sg);
*mptr++ = sg_dma_address(sg);
- /* Make this an end of list */
- if (i == nseg - 1)
- mptr[-2] = direction|0xD0000000|sg_dma_len(sg);
+ sg++;
}
+ /* Make this an end of list */
+ mptr[-2] = direction|0xD0000000|sg_dma_len(sg-1);
reqlen = mptr - msg;
*lenptr = len;

@@ -2148,8 +2147,16 @@ static s32 adpt_scsi_to_i2o(adpt_hba* pH
len, cmd->underflow);
}
} else {
- *lenptr = len = 0;
- reqlen = 12;
+ *lenptr = len = cmd->request_bufflen;
+ if(len == 0) {
+ reqlen = 12;
+ } else {
+ *mptr++ = 0xD0000000|direction|cmd->request_bufflen;
+ *mptr++ = pci_map_single(pHba->pDev,
+ cmd->request_buffer,
+ cmd->request_bufflen,
+ cmd->sc_data_direction);
+ }
}

/* Stick the headers on */
@@ -2178,7 +2185,7 @@ static s32 adpt_i2o_to_scsi(void __iomem
hba_status = detailed_status >> 8;

// calculate resid for sg
- scsi_set_resid(cmd, scsi_bufflen(cmd) - readl(reply+5));
+ cmd->resid = cmd->request_bufflen - readl(reply+5);

pHba = (adpt_hba*) cmd->device->host->hostdata[0];

_

2007-12-05 01:12:30

by Andrew Morton

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Wed, 05 Dec 2007 10:04:03 +0900
FUJITA Tomonori <[email protected]> wrote:

> On Tue, 4 Dec 2007 16:57:38 -0800
> Andrew Morton <[email protected]> wrote:
>
> > On Thu, 29 Nov 2007 13:31:50 +0100
> > Anders Henke <[email protected]> wrote:
> >
> > > On November 28 2007, Anders Henke wrote:
> > > > As "everything is reported as being zero" is quite odd an Jan took a
> > > > guess that it might be block-layer or driver-related, I've assumed
> > > > that the driver is responsible for this; just out of the curiousity,
> > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > > >
> > > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > the rootfs. Maybe this is just some timing issue.
> > >
> > > I've started the fine-tests and can say so far that dpt_i2o from
> > > 2.6.22 is still fine. Test is simple:
> > >
> > > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > >
> > > ... recompile the kernel, reboot: works.
> > >
> > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > > patch sets:
> > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > >
> > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > that the smallest patch actually seams to be the broken one.
> > >
> > > According to the 2.6.23-rc1 short-form changelog, there is
> > > one major edit on the dpt_i2o driver:
> > >
> > > FUJITA Tomonori
> > >
> > > [SCSI] dpt_i2o: convert to use the data buffer accessors
> > >
> > > Stephen Rothwell
> > > dpt_i2o depends on virt_to_bus
> > >
> > > Fujita, would you please take a look at this?
> >
> > He won't have seen this. cc's added.
> >
> > > I think that something's broken in there, leading to the dpt_i2o
> > > sending out blocks of zeroes right after initialization, at least on
> > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > SE7501WV2S-based boxes).
> > >
> > > I don't have insight kernel driver development knowledge, so I'm
> > > quite out of help right now. Nevertheless, I'll add the diff
> > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > >
> >
> > Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> > corruption problems?
>
> Anders said that my patch is fine and seems that Matthew's hotplug
> conversion patch leads to the problem:
>
> http://marc.info/?l=linux-kernel&m=119641892129732&w=2

Oh. Jan broke message threading :(

So it's been nearly a week and nothing has happened? Do we revert that
change?

2007-12-05 01:32:00

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Tue, 4 Dec 2007 17:11:55 -0800
Andrew Morton <[email protected]> wrote:

> On Wed, 05 Dec 2007 10:04:03 +0900
> FUJITA Tomonori <[email protected]> wrote:
>
> > On Tue, 4 Dec 2007 16:57:38 -0800
> > Andrew Morton <[email protected]> wrote:
> >
> > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > Anders Henke <[email protected]> wrote:
> > >
> > > > On November 28 2007, Anders Henke wrote:
> > > > > As "everything is reported as being zero" is quite odd an Jan took a
> > > > > guess that it might be block-layer or driver-related, I've assumed
> > > > > that the driver is responsible for this; just out of the curiousity,
> > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > > > >
> > > > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > > the rootfs. Maybe this is just some timing issue.
> > > >
> > > > I've started the fine-tests and can say so far that dpt_i2o from
> > > > 2.6.22 is still fine. Test is simple:
> > > >
> > > > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > >
> > > > ... recompile the kernel, reboot: works.
> > > >
> > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > > > patch sets:
> > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > >
> > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > that the smallest patch actually seams to be the broken one.
> > > >
> > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > one major edit on the dpt_i2o driver:
> > > >
> > > > FUJITA Tomonori
> > > >
> > > > [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > >
> > > > Stephen Rothwell
> > > > dpt_i2o depends on virt_to_bus
> > > >
> > > > Fujita, would you please take a look at this?
> > >
> > > He won't have seen this. cc's added.
> > >
> > > > I think that something's broken in there, leading to the dpt_i2o
> > > > sending out blocks of zeroes right after initialization, at least on
> > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > SE7501WV2S-based boxes).
> > > >
> > > > I don't have insight kernel driver development knowledge, so I'm
> > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > >
> > >
> > > Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> > > corruption problems?
> >
> > Anders said that my patch is fine and seems that Matthew's hotplug
> > conversion patch leads to the problem:
> >
> > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
>
> Oh. Jan broke message threading :(
>
> So it's been nearly a week and nothing has happened? Do we revert that
> change?

SCSI people really want this conversion...

Matthew, did you have a chance to look at it?

2007-12-05 02:52:23

by Andrew Morton

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[email protected]> wrote:

> On Tue, 4 Dec 2007 17:11:55 -0800
> Andrew Morton <[email protected]> wrote:
>
> > On Wed, 05 Dec 2007 10:04:03 +0900
> > FUJITA Tomonori <[email protected]> wrote:
> >
> > > On Tue, 4 Dec 2007 16:57:38 -0800
> > > Andrew Morton <[email protected]> wrote:
> > >
> > > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > > Anders Henke <[email protected]> wrote:
> > > >
> > > > > On November 28 2007, Anders Henke wrote:
> > > > > > As "everything is reported as being zero" is quite odd an Jan took a
> > > > > > guess that it might be block-layer or driver-related, I've assumed
> > > > > > that the driver is responsible for this; just out of the curiousity,
> > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > > > > >
> > > > > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > > > the rootfs. Maybe this is just some timing issue.
> > > > >
> > > > > I've started the fine-tests and can say so far that dpt_i2o from
> > > > > 2.6.22 is still fine. Test is simple:
> > > > >
> > > > > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > > >
> > > > > ... recompile the kernel, reboot: works.
> > > > >
> > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > > > > patch sets:
> > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > > >
> > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > > that the smallest patch actually seams to be the broken one.
> > > > >
> > > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > > one major edit on the dpt_i2o driver:
> > > > >
> > > > > FUJITA Tomonori
> > > > >
> > > > > [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > > >
> > > > > Stephen Rothwell
> > > > > dpt_i2o depends on virt_to_bus
> > > > >
> > > > > Fujita, would you please take a look at this?
> > > >
> > > > He won't have seen this. cc's added.
> > > >
> > > > > I think that something's broken in there, leading to the dpt_i2o
> > > > > sending out blocks of zeroes right after initialization, at least on
> > > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > > SE7501WV2S-based boxes).
> > > > >
> > > > > I don't have insight kernel driver development knowledge, so I'm
> > > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > > >
> > > >
> > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> > > > corruption problems?
> > >
> > > Anders said that my patch is fine and seems that Matthew's hotplug
> > > conversion patch leads to the problem:
> > >
> > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
> >
> > Oh. Jan broke message threading :(
> >
> > So it's been nearly a week and nothing has happened? Do we revert that
> > change?
>
> SCSI people really want this conversion...
>
> Matthew, did you have a chance to look at it?

It seems pretty improbably that a change of that nature could cause data
corruption. Anders, are you able to determine whether the revert (against
current Linus mainline or 2.6.24-rc4) fixes things? Because it would be
very strange...

This is a grave bug. It's really quite urgent...

Thanks.

drivers/scsi/dpt_i2o.c | 132 ++++++++++++++++++---------------------
drivers/scsi/dpti.h | 9 ++
2 files changed, 68 insertions(+), 73 deletions(-)

diff -puN drivers/scsi/dpt_i2o.c~revert-dpt_i2o-convert-to-scsi-hotplug-model drivers/scsi/dpt_i2o.c
--- a/drivers/scsi/dpt_i2o.c~revert-dpt_i2o-convert-to-scsi-hotplug-model
+++ a/drivers/scsi/dpt_i2o.c
@@ -173,20 +173,20 @@ static struct pci_device_id dptids[] = {
};
MODULE_DEVICE_TABLE(pci,dptids);

-static void adpt_exit(void);
-
-static int adpt_detect(void)
+static int adpt_detect(struct scsi_host_template* sht)
{
struct pci_dev *pDev = NULL;
adpt_hba* pHba;

+ adpt_init();
+
PINFO("Detecting Adaptec I2O RAID controllers...\n");

/* search for all Adatpec I2O RAID cards */
while ((pDev = pci_get_device( PCI_DPT_VENDOR_ID, PCI_ANY_ID, pDev))) {
if(pDev->device == PCI_DPT_DEVICE_ID ||
pDev->device == PCI_DPT_RAPTOR_DEVICE_ID){
- if(adpt_install_hba(pDev) ){
+ if(adpt_install_hba(sht, pDev) ){
PERROR("Could not Init an I2O RAID device\n");
PERROR("Will not try to detect others.\n");
return hba_count-1;
@@ -248,33 +248,34 @@ rebuild_sys_tab:
}

for (pHba = hba_chain; pHba; pHba = pHba->next) {
- if (adpt_scsi_register(pHba) < 0) {
+ if( adpt_scsi_register(pHba,sht) < 0){
adpt_i2o_delete_hba(pHba);
continue;
}
pHba->initialized = TRUE;
pHba->state &= ~DPTI_STATE_RESET;
- scsi_scan_host(pHba->host);
}

// Register our control device node
// nodes will need to be created in /dev to access this
// the nodes can not be created from within the driver
if (hba_count && register_chrdev(DPTI_I2O_MAJOR, DPT_DRIVER, &adpt_fops)) {
- adpt_exit();
+ adpt_i2o_sys_shutdown();
return 0;
}
return hba_count;
}


-static int adpt_release(adpt_hba *pHba)
+/*
+ * scsi_unregister will be called AFTER we return.
+ */
+static int adpt_release(struct Scsi_Host *host)
{
- struct Scsi_Host *shost = pHba->host;
- scsi_remove_host(shost);
+ adpt_hba* pHba = (adpt_hba*) host->hostdata[0];
// adpt_i2o_quiesce_hba(pHba);
adpt_i2o_delete_hba(pHba);
- scsi_host_put(shost);
+ scsi_unregister(host);
return 0;
}

@@ -881,7 +882,7 @@ static int adpt_reboot_event(struct noti
#endif


-static int adpt_install_hba(struct pci_dev* pDev)
+static int adpt_install_hba(struct scsi_host_template* sht, struct pci_dev* pDev)
{

adpt_hba* pHba = NULL;
@@ -1028,6 +1029,8 @@ static void adpt_i2o_delete_hba(adpt_hba


mutex_lock(&adpt_configuration_lock);
+ // scsi_unregister calls our adpt_release which
+ // does a quiese
if(pHba->host){
free_irq(pHba->host->irq, pHba);
}
@@ -1079,6 +1082,17 @@ static void adpt_i2o_delete_hba(adpt_hba
}


+static int adpt_init(void)
+{
+ printk("Loading Adaptec I2O RAID: Version " DPT_I2O_VERSION "\n");
+#ifdef REBOOT_NOTIFIER
+ register_reboot_notifier(&adpt_reboot_notifier);
+#endif
+
+ return 0;
+}
+
+
static struct adpt_device* adpt_find_device(adpt_hba* pHba, u32 chan, u32 id, u32 lun)
{
struct adpt_device* d;
@@ -2164,6 +2178,37 @@ static s32 adpt_scsi_to_i2o(adpt_hba* pH
}


+static s32 adpt_scsi_register(adpt_hba* pHba,struct scsi_host_template * sht)
+{
+ struct Scsi_Host *host = NULL;
+
+ host = scsi_register(sht, sizeof(adpt_hba*));
+ if (host == NULL) {
+ printk ("%s: scsi_register returned NULL\n",pHba->name);
+ return -1;
+ }
+ host->hostdata[0] = (unsigned long)pHba;
+ pHba->host = host;
+
+ host->irq = pHba->pDev->irq;
+ /* no IO ports, so don't have to set host->io_port and
+ * host->n_io_port
+ */
+ host->io_port = 0;
+ host->n_io_port = 0;
+ /* see comments in scsi_host.h */
+ host->max_id = 16;
+ host->max_lun = 256;
+ host->max_channel = pHba->top_scsi_channel + 1;
+ host->cmd_per_lun = 1;
+ host->unique_id = (uint) pHba;
+ host->sg_tablesize = pHba->sg_tablesize;
+ host->can_queue = pHba->post_fifo_size;
+
+ return 0;
+}
+
+
static s32 adpt_i2o_to_scsi(void __iomem *reply, struct scsi_cmnd* cmd)
{
adpt_hba* pHba;
@@ -3279,10 +3324,12 @@ static static void adpt_delay(int millis

#endif

-static struct scsi_host_template adpt_template = {
+static struct scsi_host_template driver_template = {
.name = "dpt_i2o",
.proc_name = "dpt_i2o",
.proc_info = adpt_proc_info,
+ .detect = adpt_detect,
+ .release = adpt_release,
.info = adpt_info,
.queuecommand = adpt_queue,
.eh_abort_handler = adpt_abort,
@@ -3297,62 +3344,5 @@ static struct scsi_host_template adpt_te
.use_clustering = ENABLE_CLUSTERING,
.use_sg_chaining = ENABLE_SG_CHAINING,
};
-
-static s32 adpt_scsi_register(adpt_hba* pHba)
-{
- struct Scsi_Host *host;
-
- host = scsi_host_alloc(&adpt_template, sizeof(adpt_hba*));
- if (host == NULL) {
- printk ("%s: scsi_host_alloc returned NULL\n",pHba->name);
- return -1;
- }
- host->hostdata[0] = (unsigned long)pHba;
- pHba->host = host;
-
- host->irq = pHba->pDev->irq;
- /* no IO ports, so don't have to set host->io_port and
- * host->n_io_port
- */
- host->io_port = 0;
- host->n_io_port = 0;
- /* see comments in scsi_host.h */
- host->max_id = 16;
- host->max_lun = 256;
- host->max_channel = pHba->top_scsi_channel + 1;
- host->cmd_per_lun = 1;
- host->unique_id = (uint) pHba;
- host->sg_tablesize = pHba->sg_tablesize;
- host->can_queue = pHba->post_fifo_size;
-
- if (scsi_add_host(host, &pHba->pDev->dev)) {
- scsi_host_put(host);
- return -1;
- }
-
- return 0;
-}
-
-static int __init adpt_init(void)
-{
- int count;
-
- printk("Loading Adaptec I2O RAID: Version " DPT_I2O_VERSION "\n");
-#ifdef REBOOT_NOTIFIER
- register_reboot_notifier(&adpt_reboot_notifier);
-#endif
-
- count = adpt_detect();
-
- return count > 0 ? 0 : -ENODEV;
-}
-
-static void adpt_exit(void)
-{
- while (hba_chain)
- adpt_release(hba_chain);
-}
-
-module_init(adpt_init);
-module_exit(adpt_exit);
+#include "scsi_module.c"
MODULE_LICENSE("GPL");
diff -puN drivers/scsi/dpti.h~revert-dpt_i2o-convert-to-scsi-hotplug-model drivers/scsi/dpti.h
--- a/drivers/scsi/dpti.h~revert-dpt_i2o-convert-to-scsi-hotplug-model
+++ a/drivers/scsi/dpti.h
@@ -28,9 +28,11 @@
* SCSI interface function Prototypes
*/

+static int adpt_detect(struct scsi_host_template * sht);
static int adpt_queue(struct scsi_cmnd * cmd, void (*cmdcomplete) (struct scsi_cmnd *));
static int adpt_abort(struct scsi_cmnd * cmd);
static int adpt_reset(struct scsi_cmnd* cmd);
+static int adpt_release(struct Scsi_Host *host);
static int adpt_slave_configure(struct scsi_device *);

static const char *adpt_info(struct Scsi_Host *pSHost);
@@ -47,6 +49,8 @@ static int adpt_device_reset(struct scsi

#define DPT_DRIVER_NAME "Adaptec I2O RAID"

+#ifndef HOSTS_C
+
#include "dpt/sys_info.h"
#include <linux/wait.h>
#include "dpt/dpti_i2o.h"
@@ -285,7 +289,7 @@ static s32 adpt_i2o_init_outbound_q(adpt
static s32 adpt_i2o_hrt_get(adpt_hba* pHba);
static s32 adpt_scsi_to_i2o(adpt_hba* pHba, struct scsi_cmnd* cmd, struct adpt_device* dptdevice);
static s32 adpt_i2o_to_scsi(void __iomem *reply, struct scsi_cmnd* cmd);
-static s32 adpt_scsi_register(adpt_hba* pHba);
+static s32 adpt_scsi_register(adpt_hba* pHba,struct scsi_host_template * sht);
static s32 adpt_hba_reset(adpt_hba* pHba);
static s32 adpt_i2o_reset_hba(adpt_hba* pHba);
static s32 adpt_rescan(adpt_hba* pHba);
@@ -295,7 +299,7 @@ static void adpt_i2o_delete_hba(adpt_hba
static void adpt_inquiry(adpt_hba* pHba);
static void adpt_fail_posted_scbs(adpt_hba* pHba);
static struct adpt_device* adpt_find_device(adpt_hba* pHba, u32 chan, u32 id, u32 lun);
-static int adpt_install_hba(struct pci_dev* pDev) ;
+static int adpt_install_hba(struct scsi_host_template* sht, struct pci_dev* pDev) ;
static int adpt_i2o_online_hba(adpt_hba* pHba);
static void adpt_i2o_post_wait_complete(u32, int);
static int adpt_i2o_systab_send(adpt_hba* pHba);
@@ -339,4 +343,5 @@ static void adpt_i386_info(sysInfo_S* si
#define FW_DEBUG_BLED_OFFSET 8

#define FW_DEBUG_FLAGS_NO_HEADERS_B 0x01
+#endif /* !HOSTS_C */
#endif /* _DPT_H */
_

2007-12-05 04:03:28

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Tue, 4 Dec 2007 16:57:38 -0800
Andrew Morton <[email protected]> wrote:

> On Thu, 29 Nov 2007 13:31:50 +0100
> Anders Henke <[email protected]> wrote:
>
> > On November 28 2007, Anders Henke wrote:
> > > As "everything is reported as being zero" is quite odd an Jan took a
> > > guess that it might be block-layer or driver-related, I've assumed
> > > that the driver is responsible for this; just out of the curiousity,
> > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > >
> > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > behaves like this and spews out zeroed blocks when trying to mount
> > > the rootfs. Maybe this is just some timing issue.
> >
> > I've started the fine-tests and can say so far that dpt_i2o from
> > 2.6.22 is still fine. Test is simple:
> >
> > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> >
> > ... recompile the kernel, reboot: works.
> >
> > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > patch sets:
> > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> >
> > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > that the smallest patch actually seams to be the broken one.
> >
> > According to the 2.6.23-rc1 short-form changelog, there is
> > one major edit on the dpt_i2o driver:
> >
> > FUJITA Tomonori
> >
> > [SCSI] dpt_i2o: convert to use the data buffer accessors
> >
> > Stephen Rothwell
> > dpt_i2o depends on virt_to_bus
> >
> > Fujita, would you please take a look at this?
>
> He won't have seen this. cc's added.
>
> > I think that something's broken in there, leading to the dpt_i2o
> > sending out blocks of zeroes right after initialization, at least on
> > some specific controllers (in this case, Adaptec 2010S on Intel
> > SE7501WV2S-based boxes).
> >
> > I don't have insight kernel driver development knowledge, so I'm
> > quite out of help right now. Nevertheless, I'll add the diff
> > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> >
>
> Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> corruption problems?

Anders said that my patch is fine and seems that Matthew's hotplug
conversion patch leads to the problem:

http://marc.info/?l=linux-kernel&m=119641892129732&w=2

2007-12-05 10:15:15

by Anders Henke

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Tue, 4 Dec 2007 Andrew Morton wrote:
> On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[email protected]> wrote:
>
> > On Tue, 4 Dec 2007 17:11:55 -0800
> > Andrew Morton <[email protected]> wrote:
> >
> > > On Wed, 05 Dec 2007 10:04:03 +0900
> > > FUJITA Tomonori <[email protected]> wrote:
> > >
> > > > On Tue, 4 Dec 2007 16:57:38 -0800
> > > > Andrew Morton <[email protected]> wrote:
> > > >
> > > > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > > > Anders Henke <[email protected]> wrote:
> > > > >
> > > > > > On November 28 2007, Anders Henke wrote:
> > > > > > > As "everything is reported as being zero" is quite odd an Jan took a
> > > > > > > guess that it might be block-layer or driver-related, I've assumed
> > > > > > > that the driver is responsible for this; just out of the curiousity,
> > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > > > > > >
> > > > > > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > > > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > > > > the rootfs. Maybe this is just some timing issue.
> > > > > >
> > > > > > I've started the fine-tests and can say so far that dpt_i2o from
> > > > > > 2.6.22 is still fine. Test is simple:
> > > > > >
> > > > > > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > > > >
> > > > > > ... recompile the kernel, reboot: works.
> > > > > >
> > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > > > > > patch sets:
> > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > > > >
> > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > > > that the smallest patch actually seams to be the broken one.
> > > > > >
> > > > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > > > one major edit on the dpt_i2o driver:
> > > > > >
> > > > > > FUJITA Tomonori
> > > > > >
> > > > > > [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > > > >
> > > > > > Stephen Rothwell
> > > > > > dpt_i2o depends on virt_to_bus
> > > > > >
> > > > > > Fujita, would you please take a look at this?
> > > > >
> > > > > He won't have seen this. cc's added.
> > > > >
> > > > > > I think that something's broken in there, leading to the dpt_i2o
> > > > > > sending out blocks of zeroes right after initialization, at least on
> > > > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > > > SE7501WV2S-based boxes).
> > > > > >
> > > > > > I don't have insight kernel driver development knowledge, so I'm
> > > > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > > > >
> > > > >
> > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> > > > > corruption problems?
> > > >
> > > > Anders said that my patch is fine and seems that Matthew's hotplug
> > > > conversion patch leads to the problem:
> > > >
> > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
> > >
> > > Oh. Jan broke message threading :(
> > >
> > > So it's been nearly a week and nothing has happened? Do we revert that
> > > change?
> >
> > SCSI people really want this conversion...
> >
> > Matthew, did you have a chance to look at it?
>
> It seems pretty improbably that a change of that nature could cause data
> corruption. Anders, are you able to determine whether the revert (against
> current Linus mainline or 2.6.24-rc4) fixes things? Because it would be
> very strange...
>
> This is a grave bug. It's really quite urgent...
>
> Thanks.
>
> drivers/scsi/dpt_i2o.c | 132 ++++++++++++++++++---------------------
> drivers/scsi/dpti.h | 9 ++
> 2 files changed, 68 insertions(+), 73 deletions(-)

I've done the following:

-untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order
to verify that the driver is still broken: checked, the box still won't
boot.

-patched the just compiled kernel source with your patch, "make dist-clean"
(by means of "make-kpkg clean") and recompile: box boots fine.

I've put the captured console logs to
http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine
http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched
... and the kernelconfig (which shouldn't matter) to
http://w.sysiphus.de/dpt_i2o/kernelconfig.2624-rc4


Regards,

Anders
--
1&1 Internet AG Enter any 11-digit prime number to continue.
Brauerstrasse 48 v://49.721.91374.50
D-76135 Karlsruhe f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger,
Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren

2007-12-06 05:50:21

by FUJITA Tomonori

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Wed, 5 Dec 2007 11:14:41 +0100
Anders Henke <[email protected]> wrote:

> On Tue, 4 Dec 2007 Andrew Morton wrote:
> > On Wed, 05 Dec 2007 10:30:54 +0900 FUJITA Tomonori <[email protected]> wrote:
> >
> > > On Tue, 4 Dec 2007 17:11:55 -0800
> > > Andrew Morton <[email protected]> wrote:
> > >
> > > > On Wed, 05 Dec 2007 10:04:03 +0900
> > > > FUJITA Tomonori <[email protected]> wrote:
> > > >
> > > > > On Tue, 4 Dec 2007 16:57:38 -0800
> > > > > Andrew Morton <[email protected]> wrote:
> > > > >
> > > > > > On Thu, 29 Nov 2007 13:31:50 +0100
> > > > > > Anders Henke <[email protected]> wrote:
> > > > > >
> > > > > > > On November 28 2007, Anders Henke wrote:
> > > > > > > > As "everything is reported as being zero" is quite odd an Jan took a
> > > > > > > > guess that it might be block-layer or driver-related, I've assumed
> > > > > > > > that the driver is responsible for this; just out of the curiousity,
> > > > > > > > I've manually replaced the dpt_i2o driver by the 2.6.19 one by copying
> > > > > > > > driver/scsi/dpt_i2o.c driver/scsi/dpti.h and driver/scsi/dpt/ into a
> > > > > > > > vanilla 2.6.23.1. kernel; using this kernel fixed the issue for me.
> > > > > > > >
> > > > > > > > I haven't yet fine-tested from which kernel release on the dpt_i2o driver
> > > > > > > > behaves like this and spews out zeroed blocks when trying to mount
> > > > > > > > the rootfs. Maybe this is just some timing issue.
> > > > > > >
> > > > > > > I've started the fine-tests and can say so far that dpt_i2o from
> > > > > > > 2.6.22 is still fine. Test is simple:
> > > > > > >
> > > > > > > anders@ista:/usr/src/linux-2.6.22/drivers/scsi/dpt$ cp -r dpt/ dpt_i2o.c dpti.h /usr/src/linux-2.6.23.1/drivers/scsi/
> > > > > > >
> > > > > > > ... recompile the kernel, reboot: works.
> > > > > > >
> > > > > > > 2.6.22 and 2.6.23 differ in terms of the dpt_i2o driver by two different
> > > > > > > patch sets:
> > > > > > > -one 2 Kb small set of patches from 2.6.22 to 2.6.22-rc1
> > > > > > > -one 7 Kb set of patches from 2.6.23-rc2 to 2.6.23-rc3
> > > > > > > -one 162 Kb set of patches from 2.6.23-rc9 to 2.6.23-rc10.
> > > > > > >
> > > > > > > When applying the 2.6.23-rc1-based driver to "my" 2.6.31.1 kernel,
> > > > > > > the "zero blocks"-symptom show up, so it's the "lucky" situation
> > > > > > > that the smallest patch actually seams to be the broken one.
> > > > > > >
> > > > > > > According to the 2.6.23-rc1 short-form changelog, there is
> > > > > > > one major edit on the dpt_i2o driver:
> > > > > > >
> > > > > > > FUJITA Tomonori
> > > > > > >
> > > > > > > [SCSI] dpt_i2o: convert to use the data buffer accessors
> > > > > > >
> > > > > > > Stephen Rothwell
> > > > > > > dpt_i2o depends on virt_to_bus
> > > > > > >
> > > > > > > Fujita, would you please take a look at this?
> > > > > >
> > > > > > He won't have seen this. cc's added.
> > > > > >
> > > > > > > I think that something's broken in there, leading to the dpt_i2o
> > > > > > > sending out blocks of zeroes right after initialization, at least on
> > > > > > > some specific controllers (in this case, Adaptec 2010S on Intel
> > > > > > > SE7501WV2S-based boxes).
> > > > > > >
> > > > > > > I don't have insight kernel driver development knowledge, so I'm
> > > > > > > quite out of help right now. Nevertheless, I'll add the diff
> > > > > > > from 2.6.22 to 2.6.23-rc1 in terms of dpt_i2o:
> > > > > > >
> > > > > >
> > > > > > Can you please confirm that this revert (against 2.6.24-rc4) fixes the data
> > > > > > corruption problems?
> > > > >
> > > > > Anders said that my patch is fine and seems that Matthew's hotplug
> > > > > conversion patch leads to the problem:
> > > > >
> > > > > http://marc.info/?l=linux-kernel&m=119641892129732&w=2
> > > >
> > > > Oh. Jan broke message threading :(
> > > >
> > > > So it's been nearly a week and nothing has happened? Do we revert that
> > > > change?
> > >
> > > SCSI people really want this conversion...
> > >
> > > Matthew, did you have a chance to look at it?
> >
> > It seems pretty improbably that a change of that nature could cause data
> > corruption. Anders, are you able to determine whether the revert (against
> > current Linus mainline or 2.6.24-rc4) fixes things? Because it would be
> > very strange...
> >
> > This is a grave bug. It's really quite urgent...
> >
> > Thanks.
> >
> > drivers/scsi/dpt_i2o.c | 132 ++++++++++++++++++---------------------
> > drivers/scsi/dpti.h | 9 ++
> > 2 files changed, 68 insertions(+), 73 deletions(-)
>
> I've done the following:
>
> -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order
> to verify that the driver is still broken: checked, the box still won't
> boot.
>
> -patched the just compiled kernel source with your patch, "make dist-clean"
> (by means of "make-kpkg clean") and recompile: box boots fine.
>
> I've put the captured console logs to
> http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine
> http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched
> ... and the kernelconfig (which shouldn't matter) to
> http://w.sysiphus.de/dpt_i2o/kernelconfig.2624-rc4

Thanks for testing. So reverting Matthew's hotplug patch fixes the
problem though I have no idea how the patch leads to this. Seems that
nobody has any clue on that. We need to revert that patch for the
moment.

2007-12-06 07:09:21

by Andrew Morton

[permalink] [raw]
Subject: Re: broken dpt_i2o in 2.6.23 (was: ext2_check_page: bad entry in directory)

On Thu, 06 Dec 2007 14:49:37 +0900 FUJITA Tomonori <[email protected]> wrote:

> > > drivers/scsi/dpt_i2o.c | 132 ++++++++++++++++++---------------------
> > > drivers/scsi/dpti.h | 9 ++
> > > 2 files changed, 68 insertions(+), 73 deletions(-)
> >
> > I've done the following:
> >
> > -untared a clean 2.6.24-rc4 and compiled it with my 2.6.23.1-settings in order
> > to verify that the driver is still broken: checked, the box still won't
> > boot.
> >
> > -patched the just compiled kernel source with your patch, "make dist-clean"
> > (by means of "make-kpkg clean") and recompile: box boots fine.
> >
> > I've put the captured console logs to
> > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-pristine
> > http://w.sysiphus.de/dpt_i2o/bootlog.2624-rc4-patched
> > ... and the kernelconfig (which shouldn't matter) to
> > http://w.sysiphus.de/dpt_i2o/kernelconfig.2624-rc4
>
> Thanks for testing. So reverting Matthew's hotplug patch fixes the
> problem though I have no idea how the patch leads to this. Seems that
> nobody has any clue on that. We need to revert that patch for the
> moment.

OK, thanks. Let's leave it a couple of days for people to register objections,
have bright ideas, etc.