2008-09-14 07:14:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 11564] New: ext3 I/O errors when <4096 blocksize on certain hardware


(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 13 Sep 2008 19:20:35 -0700 (PDT) [email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11564
>
> Summary: ext3 I/O errors when <4096 blocksize on certain hardware
> Product: File System
> Version: 2.5
> KernelVersion: 2.6.27
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: ext3
> AssignedTo: [email protected]
> ReportedBy: [email protected]
>
>
> Latest working kernel version:2.6.17
> Earliest failing kernel version:2.6.18
> Distribution:Mandriva, OpenSUSE
> Hardware Environment: PIII-700 on i440BX (100FSB Tyan S1846)
> piix/sym53c8xx (SYM8751SP)
> dysfunctional HD: Quantum Atlas III QM39100TD-SW Rev: N1B0
> OK HD: IBM DPSS 309170; 07N3120; MLC: PS0S96 (Ultrastar)
> OK HD #2: 60G Seagate Barracuda PATA on piix
> Software Environment:typical, except all partitions formatted ext3 -I128 &
> -b1024 or -b2048 due to their small size (4.8G or less)
> Problem Description:
> Tail of most recent (Factory 2.6.27-rc6) /var/log/messages:
> Sep 13 21:29:23 xxxxx kernel: sd 0:0:1:0: [sda] Result: hostbyte=DID_SOFT_ERROR
> driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 13 21:29:23 xxxxx kernel: end_request: I/O error, dev sda, sector 1810985
> Sep 13 21:29:23 xxxxx kernel: sd 0:0:1:0: [sda] Result: hostbyte=DID_SOFT_ERROR
> driverbyte=DRIVER_OK,SUGGEST_OK
> Sep 13 21:29:23 xxxxx kernel: end_request: I/O error, dev sda, sector 1811039
> Sep 13 21:29:23 xxxxx kernel: JBD: Detected IO errors while flushing file data
> on sda7
> Sep 13 21:29:23 xxxxx kernel: JBD: Detected IO errors while flushing file data
> on sda7
>
> Similar errors occur with other post-2.6.17 kernels. Typical result is rpm
> database corruption (see e.g. https://qa.mandriva.com/show_bug.cgi?id=32547 not
> reported by me) making system very difficult to use.
>
> I've run current Cookers on this hardware combination for several years, but
> just over a year ago started having trouble when the 2.6.17 kernel was
> upgraded. I ran the manufacturer's QDPS diagnostics on the Quantum shortly
> after the problem appeared about 13 or so months ago, and again a few days ago,
> both times OK according to QDPS. I ran the LSI controller's format program on
> it a few days ago too. I then tried installing fresh Mandriva 2007.1 (complete
> success) and OpenSUSE 10.2 (limited number of errors of this type). Trying to
> do a current install of Cooker or Factory are hopeless. I tested Factory by
> copying a Factory/11.0 installation from the PATA to sda7 on SCSI, then trying
> to update to current Factory, while Cooker was on sda7 for several years. The
> problem simply did and does not exist with the Mandriva 2.6.17 and old kernels
> using the Atlas III. I tried cloning the Atlas III to the Ultrastar, and cannot
> reproduce using either the Barracuda or the Ultrastar. Trying a different SCSI
> cable didn't help.
>
> Steps to reproduce:
> Try to use a wrong hardware combination.
>



2008-09-15 06:22:38

by Andreas Dilger

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 11564] New: ext3 I/O errors when <4096 blocksize on certain hardware

On Sep 14, 2008 00:14 -0700, Andrew Morton wrote:
> On Sat, 13 Sep 2008 19:20:35 -0700 (PDT) [email protected] wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=11564
> > Tail of most recent (Factory 2.6.27-rc6) /var/log/messages:
> > Sep 13 21:29:23 xxxxx kernel: sd 0:0:1:0: [sda] Result: hostbyte=DID_SOFT_ERROR
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > Sep 13 21:29:23 xxxxx kernel: end_request: I/O error, dev sda, sector 1810985
> > Sep 13 21:29:23 xxxxx kernel: sd 0:0:1:0: [sda] Result: hostbyte=DID_SOFT_ERROR
> > driverbyte=DRIVER_OK,SUGGEST_OK
> > Sep 13 21:29:23 xxxxx kernel: end_request: I/O error, dev sda, sector 1811039
> > Sep 13 21:29:23 xxxxx kernel: JBD: Detected IO errors while flushing file data
> > on sda7
> > Sep 13 21:29:23 xxxxx kernel: JBD: Detected IO errors while flushing file data
> > on sda7

I'd think from the above errors that the problem is in the device itself,
or in the SCSI layer. No amount of ext3 IO should be able to trigger SCSI
errors.

> > Similar errors occur with other post-2.6.17 kernels. Typical result is rpm
> > database corruption (see e.g. https://qa.mandriva.com/show_bug.cgi?id=32547
> > not reported by me) making system very difficult to use.
> >
> > The problem simply did and does not exist with the
> > Mandriva 2.6.17 and old kernels using the Atlas III. I tried cloning
> > the Atlas III to the Ultrastar, and cannot reproduce using either the
> > Barracuda or the Ultrastar. Trying a different SCSI cable didn't help.

This sounds like a case where git-bisect of 2.6.17-2.6.18 would be able
to isolate the problem fairly efficiently.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.