This is going to be a terribly vague bug report, of a bug that I can't
reproduce on demand (at least not yet). I'll see if I can bang on this
over the 4th of July weekend and create a reproducible scenario, but
until then, I want to at least get *something* in writing that other
people can see. So, here goes...
When I run 2.5.73-mm[123] on a Mandrake Cooker system here, it generally
runs fine. However, when I run "urpmi --auto-select" to upgrade the
packages to the latest versions, rpm tends to freeze up during
installation of one of the packages. This did not seem to happen with
2.5.70-mm9, which was the kernel I ran before 2.5.73-mm1.
(It doesn't seem to be happening with 2.5.74 either, although I think
it's really too soon to say for sure.)
ps shows a process (an rpm process, IIRC) stuck in the D state.
The most unusual aspect of this system is that it's using loopback root.
The root filesystem is ReiserFS, contained within a file on a FAT32
partition.
I'll try to make this happen in a more controlled environment soon...
-Barry K. Nathan <[email protected]>
On Thu, Jul 03, 2003 at 02:05:41AM -0700, Barry K. Nathan wrote:
> When I run 2.5.73-mm[123] on a Mandrake Cooker system here, it generally
> runs fine. However, when I run "urpmi --auto-select" to upgrade the
> packages to the latest versions, rpm tends to freeze up during
> installation of one of the packages. This did not seem to happen with
> 2.5.70-mm9, which was the kernel I ran before 2.5.73-mm1.
[snip]
I've figured things out a bit more and filed a Bugzilla report:
http://bugme.osdl.org/show_bug.cgi?id=877
-Barry K. Nathan <[email protected]>
Hi Barry,
How repeatable are the freezes? Would you be able to get a new
kernel, and capture a sysrq-T trace once the system has frozen?
Then would you try booting with elevator=deadline and see if you
can get it to freeze?
Thanks
Nick
Barry K. Nathan wrote:
>On Thu, Jul 03, 2003 at 02:05:41AM -0700, Barry K. Nathan wrote:
>
>>When I run 2.5.73-mm[123] on a Mandrake Cooker system here, it generally
>>runs fine. However, when I run "urpmi --auto-select" to upgrade the
>>packages to the latest versions, rpm tends to freeze up during
>>installation of one of the packages. This did not seem to happen with
>>2.5.70-mm9, which was the kernel I ran before 2.5.73-mm1.
>>
>[snip]
>
>I've figured things out a bit more and filed a Bugzilla report:
>http://bugme.osdl.org/show_bug.cgi?id=877
>
>-Barry K. Nathan <[email protected]>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
Nick Piggin <[email protected]> wrote:
>
> >I've figured things out a bit more and filed a Bugzilla report:
> >http://bugme.osdl.org/show_bug.cgi?id=877
Barry says the problem started with 2.5.73-mm1. There was a reiserfs patch
added in that kernel.
Does a `patch -R' of this fix it up?
fs/reiserfs/tail_conversion.c | 13 +++++++++++++
1 files changed, 13 insertions(+)
diff -puN fs/reiserfs/tail_conversion.c~reiserfs-unmapped-buffer-fix fs/reiserfs/tail_conversion.c
--- 25/fs/reiserfs/tail_conversion.c~reiserfs-unmapped-buffer-fix 2003-06-27 23:20:15.000000000 -0700
+++ 25-akpm/fs/reiserfs/tail_conversion.c 2003-06-27 23:20:15.000000000 -0700
@@ -143,6 +143,16 @@ void reiserfs_unmap_buffer(struct buffer
}
clear_buffer_dirty(bh) ;
lock_buffer(bh) ;
+ /* Remove the buffer from whatever list it belongs to. We are mostly
+ interested in removing it from per-sb j_dirty_buffers list, to avoid
+ BUG() on attempt to write not mapped buffer */
+ if ( !list_empty(&bh->b_assoc_buffers) && bh->b_page) {
+ struct inode *inode = bh->b_page->mapping->host;
+ struct reiserfs_journal *j = SB_JOURNAL(inode->i_sb);
+ spin_lock(&j->j_dirty_buffers_lock);
+ list_del_init(&bh->b_assoc_buffers);
+ spin_unlock(&j->j_dirty_buffers_lock);
+ }
clear_buffer_mapped(bh) ;
clear_buffer_req(bh) ;
clear_buffer_new(bh);
@@ -180,6 +190,9 @@ unmap_buffers(struct page *page, loff_t
}
bh = next ;
} while (bh != head) ;
+ if ( PAGE_SIZE == bh->b_size ) {
+ ClearPageDirty(page);
+ }
}
}
}
_
On Sun, Jul 06, 2003 at 07:37:22PM -0700, Andrew Morton wrote:
> Nick Piggin <[email protected]> wrote:
> >
> Barry says the problem started with 2.5.73-mm1. There was a reiserfs patch
> added in that kernel.
>
> Does a `patch -R' of this fix it up?
[patch snipped]
Yes, backing that patch out fixes it.
-Barry K. Nathan <[email protected]>
On Monday 07 July 2003 05:30, Barry K. Nathan wrote:
> On Sun, Jul 06, 2003 at 07:37:22PM -0700, Andrew Morton wrote:
> > Nick Piggin <[email protected]> wrote:
> >
> > Barry says the problem started with 2.5.73-mm1. There was a reiserfs
> > patch added in that kernel.
> >
> > Does a `patch -R' of this fix it up?
>
> [patch snipped]
>
> Yes, backing that patch out fixes it.
I had similar problems with my reiserfs root FS. For me backing out only the
second chunk of the patch made it, too. I've attached the patch I used for
that. If someone sees something really bad I'm doing with this please write,
because I'm playing with my root FS... ;-)
Best regards
Thomas Schlichter
On Mon, 2003-07-07 at 11:58, Thomas Schlichter wrote:
> On Monday 07 July 2003 05:30, Barry K. Nathan wrote:
> > On Sun, Jul 06, 2003 at 07:37:22PM -0700, Andrew Morton wrote:
> > > Nick Piggin <[email protected]> wrote:
> > >
> > > Barry says the problem started with 2.5.73-mm1. There was a reiserfs
> > > patch added in that kernel.
> > >
> > > Does a `patch -R' of this fix it up?
> >
> > [patch snipped]
> >
> > Yes, backing that patch out fixes it.
>
> I had similar problems with my reiserfs root FS. For me backing out only the
> second chunk of the patch made it, too. I've attached the patch I used for
> that. If someone sees something really bad I'm doing with this please write,
> because I'm playing with my root FS... ;-)
> diff -u linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c.orig linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c
> --- linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c.orig 2003-06-23 09:26:10.000000000 -0700
> +++ linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c 2003-06-23 09:26:10.000000000 -0700
> @@ -190,9 +190,6 @@ unmap_buffers(struct page *page, loff_t
> }
> bh = next ;
> } while (bh != head) ;
> - if ( PAGE_SIZE == bh->b_size ) {
> - ClearPageDirty(page);
> - }
> }
> }
> }
Heh, you read my mind. It makes more sense for this hunk to be causing
problems than the first one. Still we should be allowed to clear the
dirty bit since we've cleaned all the buffers on the page.
-chris
Chris Mason <[email protected]> wrote:
>
> > diff -u linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c.orig linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c
> > --- linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c.orig 2003-06-23 09:26:10.000000000 -0700
> > +++ linux-2.5.74-mm2/fs/reiserfs/tail_conversion.c 2003-06-23 09:26:10.000000000 -0700
> > @@ -190,9 +190,6 @@ unmap_buffers(struct page *page, loff_t
> > }
> > bh = next ;
> > } while (bh != head) ;
> > - if ( PAGE_SIZE == bh->b_size ) {
> > - ClearPageDirty(page);
> > - }
> > }
> > }
> > }
>
> Heh, you read my mind. It makes more sense for this hunk to be causing
> problems than the first one. Still we should be allowed to clear the
> dirty bit since we've cleaned all the buffers on the page.
But we need to tell the VFS that the page was cleaned.
Could someone please make that clear_page_dirty() and retest?
On Mon, Jul 07, 2003 at 12:18:59PM -0700, Andrew Morton wrote:
> But we need to tell the VFS that the page was cleaned.
>
> Could someone please make that clear_page_dirty() and retest?
Ok, I just did that -- indeed, that appears to fix it. Beneath my
e-mail signature is the fix, turned into a patch.
-Barry K. Nathan <[email protected]>
--- 2.5.74-bk2/fs/reiserfs/tail_conversion.c 2003-07-03 01:13:37.000000000 -0700
+++ 2.5.74-bk2-iserv/fs/reiserfs/tail_conversion.c 2003-07-07 16:36:01.000000000 -0700
@@ -191,7 +191,7 @@
bh = next ;
} while (bh != head) ;
if ( PAGE_SIZE == bh->b_size ) {
- ClearPageDirty(page);
+ clear_page_dirty(page);
}
}
}