LinuxLists.cc - 2.6.22-rc2: known regressions with patches

2007-05-24 14:05:44

Subject: 2.6.22-rc2: known regressions with patches

Hi all,

Here is a list of some known regressions in 2.6.22-rc2
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions

Block devices

Subject : loop devices limited to one single device
References : http://lkml.org/lkml/2007/5/16/229
Submitter : Uwe Bugla <[email protected]>
Handled-By : Ken Chen <[email protected]>
Patch : http://lkml.org/lkml/2007/5/21/483
Status : patch available

File systems

Subject : 2.6.21-git10/11: files getting truncated on xfs
References : http://lkml.org/lkml/2007/5/9/410
Submitter : Jeremy Fitzhardinge <[email protected]>
Handled-By : David Chinner <[email protected]>
Patch : http://lkml.org/lkml/2007/5/12/93
Status : patch available

Memory management

Subject : bug in i386 MTRR initialization
References : http://lkml.org/lkml/2007/5/19/93
Submitter : Andrea Righi <[email protected]>
Status : patch available

SATA/PATA

Subject : pata_via appears to incorrectly detects 40-pin cable
References : http://lkml.org/lkml/2007/5/17/273
http://bugzilla.kernel.org/show_bug.cgi?id=8142
Submitter : Francis Russell <[email protected]>
Status : Not really a regression. Alan seems to have a general fix.
(Tejun Heo)

Subject : libata reset-seq merge broke sata_sil on sh
References : http://lkml.org/lkml/2007/5/10/63
Submitter : Paul Mundt <[email protected]>
Handled-By : Tejun Heo <[email protected]>
Caused-By : commit 4750def52cb2c21732dda9aa1d43a07db37b0186
Patch : http://lkml.org/lkml/2007/5/19/161
Status : patch available

x86-64

Subject : BUG: at mm/slab.c:777 __find_general_cachep()
References : http://lkml.org/lkml/2007/5/18/17
Submitter : Jeff Garzik <[email protected]>
Handled-By : Ben Collins <[email protected]>
Patch : http://lkml.org/lkml/2007/5/18/19
Status : patch available

Regards,
Michal

--
"Najbardziej brakowa?o mi twojego milczenia."
-- Andrzej Sapkowski "Co? wi?cej"

2007-05-24 15:08:33

by Alan

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

> Subject : pata_via appears to incorrectly detects 40-pin cable
> References : http://lkml.org/lkml/2007/5/17/273
> http://bugzilla.kernel.org/show_bug.cgi?id=8142
> Submitter : Francis Russell <[email protected]>
> Status : Not really a regression. Alan seems to have a general fix.
> (Tejun Heo)

The laptop one has been resolved, tested and fired at Andrew.

Note btw that the bugzilla and email refer to two differing bugs. The
SATA one (bugzilla ref) has gone to Jeff and is a different fix.

Alan

2007-05-24 16:55:24

by Andrew Morton

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

On Thu, 24 May 2007 16:11:16 +0100 Alan Cox <[email protected]> wrote:

> > Subject : pata_via appears to incorrectly detects 40-pin cable
> > References : http://lkml.org/lkml/2007/5/17/273
> > http://bugzilla.kernel.org/show_bug.cgi?id=8142
> > Submitter : Francis Russell <[email protected]>
> > Status : Not really a regression. Alan seems to have a general fix.
> > (Tejun Heo)
>
> The laptop one has been resolved, tested and fired at Andrew.

Do you recall the Subject: on that patch?

> Note btw that the bugzilla and email refer to two differing bugs. The
> SATA one (bugzilla ref) has gone to Jeff and is a different fix.

And that?

Thanks.

2007-05-27 00:36:15

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

Michal Piotrowski wrote:
> File systems
>
> Subject : 2.6.21-git10/11: files getting truncated on xfs
> References : http://lkml.org/lkml/2007/5/9/410
> Submitter : Jeremy Fitzhardinge <[email protected]>
> Handled-By : David Chinner <[email protected]>
> Patch : http://lkml.org/lkml/2007/5/12/93
> Status : patch available
>

I'm satisfied the patch fixes the problem for me. Can we have some
movement to get it into at least -mm? This is a real, serious
data-corrupting bug. Ideally we should get it into -rc asap.

I've put the version I'm using below, but I haven't seen a properly
changelogged and signed-off version.

Thanks,
J

From: David Chinner <[email protected]>

On Sat, May 12, 2007 at 01:23:27PM +0200, Jan Engelhardt wrote:
>
> On May 10 2007 14:54, Jeremy Fitzhardinge wrote:
> >>>> What CPU architecture is this happening on? Not i686 with PAE by
> >>>> any chance?
> >>>>
> >>> Yes. Why?
> >>
> >> I have a bug report where NFS files are corrupted only with PAE clients.
> >> Corruption is at the end of the (newly untarred) files. Doesn't happen
> >> without PAE.
> >
> >Hm, suggestive, but I'm not convinced. Two differences to this situation:
> >
> > 1. Immediately after the clone ("untar"), the contents are completely
> > OK; it's only after a umount/mount cycle to problems appear
>
> And if you do a "sync" rather than umount/mount?

I doubt it will matter - I don't think we are marking the inode dirty at
the right point.

The change that was at fault modifies the way we update the file
size on the inode. We added an in-memory copy of the file size to
the in-memory copy of the disk inode's file size that we already
keep. We now only update the disk inode's (in memory copy) file size
on I/O completion. Because the generic code writes the inode out
before waiting for I/O to complete, the old file size gets written
out instead of the new one.

If the write was to extending the file into an existing block there
would be no delalloc transaction to redirty the inode (happens on
log I/O completion). Hence when the I/O completes and the file size
gets updated to the in-core disk inode (which is marked dirty), the
linux inode remains clean. As a result, a sync will never flush the
inode to get the updated file size to disk.

What I don't understand is that on unmount dirty xfs inodes get
written out. Clearly this is not happening - either there's a hole
in the writeback logic (unlikely - it was unchanged) or we've missed
some case where we need to update the filesize and mark the inode
dirty.

Hmmmm - if the write was just a short append to the file, then the
block that was written to should already be mapped. Then we'll just
look up the extent by doing a BMAPI_READ lookup, set the type to
IOMAP_READ and add the block to ioend we are building.

The type IOMAP_READ determines the I/O completion behaviour - in this case
it is xfs_end_bio_read(), which fails to update the file size....

Bingo.

A patch for you to try, Jeremy. I've just started a test run on it...

Cheers,

Dave.
---
fs/xfs/linux-2.6/xfs_aops.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)

===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c
@@ -973,8 +973,9 @@ xfs_page_state_convert(

bh = head = page_buffers(page);
offset = page_offset(page);
- flags = -1;
- type = IOMAP_READ;
+ iomap_valid = 0;
+ flags = BMAPI_READ;
+ type = IOMAP_NEW;

/* TODO: cleanup count and page_dirty */

@@ -1004,14 +1005,14 @@ xfs_page_state_convert(
*
* Third case, an unmapped buffer was found, and we are
* in a path where we need to write the whole page out.
- */
+ */
if (buffer_unwritten(bh) || buffer_delay(bh) ||
((buffer_uptodate(bh) || PageUptodate(page)) &&
!buffer_mapped(bh) && (unmapped || startio))) {
- /*
+ /*
* Make sure we don't use a read-only iomap
*/
- if (flags == BMAPI_READ)
+ if (flags == BMAPI_READ)
iomap_valid = 0;

if (buffer_unwritten(bh)) {
@@ -1060,7 +1061,7 @@ xfs_page_state_convert(
* That means it must already have extents allocated
* underneath it. Map the extent by reading it.
*/
- if (!iomap_valid || type != IOMAP_READ) {
+ if (!iomap_valid || flags != BMAPI_READ) {
flags = BMAPI_READ;
size = xfs_probe_cluster(inode, page, bh,
head, 1);
@@ -1071,7 +1072,15 @@ xfs_page_state_convert(
iomap_valid = xfs_iomap_valid(&iomap, offset);
}

- type = IOMAP_READ;
+ /*
+ * We set the type to IOMAP_NEW in case we are doing a
+ * small write at EOF that is extending the file but
+ * without needing an allocation. We need to update the
+ * file size on I/O completion in this case so it is
+ * the same case as having just allocated a new extent
+ * that we are writing into for the first time.
+ */
+ type = IOMAP_NEW;
if (!test_and_set_bit(BH_Lock, &bh->b_state)) {
ASSERT(buffer_mapped(bh));
if (iomap_valid)

2007-05-28 01:15:34

by David Chinner

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

On Sun, May 27, 2007 at 01:35:56AM +0100, Jeremy Fitzhardinge wrote:
> Michal Piotrowski wrote:
> > File systems
> >
> > Subject : 2.6.21-git10/11: files getting truncated on xfs
> > References : http://lkml.org/lkml/2007/5/9/410
> > Submitter : Jeremy Fitzhardinge <[email protected]>
> > Handled-By : David Chinner <[email protected]>
> > Patch : http://lkml.org/lkml/2007/5/12/93
> > Status : patch available
> >
>
> I'm satisfied the patch fixes the problem for me. Can we have some
> movement to get it into at least -mm? This is a real, serious
> data-corrupting bug. Ideally we should get it into -rc asap.

Patience, please. We like to have some QA coverage on a changes that
affect the writeback path in such a subtle manner before saying it
is good to go. Just releasing the fix into the main tree would be
irresponsible as there is the real possibility of the fix causing
other subtle corruption problems.

The fact the fix Works For You doesn't mean it works for everyone so
we need to take the time to make sure the fix is correct rather than
doing a half-arsed job of it and potentially leaving a landmine that
explodes on the wider community after release.

That being said, the fix doesn't appear to have any landmines in
it so we'll be pushing it to Linus RSN...

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-05-29 20:14:01

by Jeremy Fitzhardinge

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

David Chinner wrote:
> Patience, please. We like to have some QA coverage on a changes that
> affect the writeback path in such a subtle manner before saying it
> is good to go. Just releasing the fix into the main tree would be
> irresponsible as there is the real possibility of the fix causing
> other subtle corruption problems.
>

Oh, yes, I completely agree. But cooking in -mm isn't incompatible with
that.

J

2007-06-04 21:04:43

by Fabio Comolli

[permalink] [raw]

Subject: Re: 2.6.22-rc2: known regressions with patches

Hi.

> Block devices
>
> Subject : loop devices limited to one single device
> References : http://lkml.org/lkml/2007/5/16/229
> Submitter : Uwe Bugla <[email protected]>
> Handled-By : Ken Chen <[email protected]>
> Patch : http://lkml.org/lkml/2007/5/21/483
> Status : patch available
>

I just noticed that I have this issue; anyway, this patch works fine
for me (with -rc3)

[root@tycho mnt]# mkdir 1 2 3 4 5 6
[root@tycho mnt]# for i in `ls`
> do
> mount -o loop=/dev/loop$i /home/F-7-i386-DVD.iso /mnt/$i
> done
[root@tycho mnt]# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 4061572 2910092 941408 76% /
/dev 515908 36 515872 1% /dev
none 515908 16 515892 1% /dev/shm
/dev/sda4 27617068 7206108 19008084 28% /home
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/1
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/2
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/3
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/4
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/5
/home/F-7-i386-DVD.iso
2832620 2832620 0 100% /mnt/6

Regards,
Fabio