2001-07-16 11:30:03

by Gianluca Anzolin

[permalink] [raw]
Subject: 2.4.7-pre6 can't complete e2fsck

I've upgraded to 2.4.7-pre6aa1 and I'm seeing a strange behaviour:

e2fsck /dev/hda3 never finishes: I can't even stop the process with
CTRL+C. Alt+SysRQ works and it tells me that the number of inactive dirty
pages increases, while the active and free pages decrease.

Alt+SYSRQ+P says the kernel loops mainly in page_launder

Is there a patch to solve this problem?

Gianluca Anzolin


2001-07-16 17:06:59

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7-pre6 can't complete e2fsck

On Mon, Jul 16, 2001 at 01:29:33PM +0200, Gianluca Anzolin wrote:
> I've upgraded to 2.4.7-pre6aa1 and I'm seeing a strange behaviour:
>
> e2fsck /dev/hda3 never finishes: I can't even stop the process with
> CTRL+C. Alt+SysRQ works and it tells me that the number of inactive dirty
> pages increases, while the active and free pages decrease.
>
> Alt+SYSRQ+P says the kernel loops mainly in page_launder
>
> Is there a patch to solve this problem?

The problem will go away if you backout this patch:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre6aa1/40_blkdev-pagecache-5

I can reproduce so it will be fixed in the next release. thanks for the
feedback.

Andrea

2001-07-16 18:28:51

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7-pre6 can't complete e2fsck

On Mon, Jul 16, 2001 at 07:06:53PM +0200, Andrea Arcangeli wrote:
> I can reproduce so it will be fixed in the next release. thanks for the

Ok, it was because I developed the blkdev-pagecache and
00_drop_async-io-get_bh-1 patches in two separated trees.

When both patches passed all the regression testing I merged both
into 2.4.7pre6aa1 but unfortunately no reject reminded me I had to drop
the get_bh from the async handler used by the blkdev pagecache (sorry!).

So in short this incremental patch on top of 2.4.7pre6aa1 will fix your
problem (at least it did for mine):

--- 2.4.7pre6aa1/fs/block_dev.c.~1~ Mon Jul 16 19:16:44 2001
+++ 2.4.7pre6aa1/fs/block_dev.c Mon Jul 16 20:15:51 2001
@@ -105,7 +105,6 @@
do {
lock_buffer(bh);
set_buffer_async_io(bh);
- atomic_inc(&bh->b_count);
set_bit(BH_Uptodate, &bh->b_state);
clear_bit(BH_Dirty, &bh->b_state);
bh = bh->b_this_page;
@@ -189,7 +188,6 @@
struct buffer_head * bh = arr[i];
lock_buffer(bh);
set_buffer_async_io(bh);
- atomic_inc(&bh->b_count);
}

/* Stage 3: start the IO */


I guess I will keep the above patch separated from the blkdev patch to
ensure I won't forget about it (and also because if for whatever reason
somebody can see any reason for which dropping the
00_drop_async-io-get_bh-1 patch could be a good thing in the long run, I
won't need to rediff the blkdev patch)

Andrea

2001-07-17 23:24:33

by Kurt Garloff

[permalink] [raw]
Subject: Re: 2.4.7-pre6 can't complete e2fsck

On Mon, Jul 16, 2001 at 08:28:25PM +0200, Andrea Arcangeli wrote:
> On Mon, Jul 16, 2001 at 07:06:53PM +0200, Andrea Arcangeli wrote:
> > I can reproduce so it will be fixed in the next release. thanks for the
>
> Ok, it was because I developed the blkdev-pagecache and
> 00_drop_async-io-get_bh-1 patches in two separated trees.
>
> When both patches passed all the regression testing I merged both
> into 2.4.7pre6aa1 but unfortunately no reject reminded me I had to drop
> the get_bh from the async handler used by the blkdev pagecache (sorry!).
>
> So in short this incremental patch on top of 2.4.7pre6aa1 will fix your
> problem (at least it did for mine):

Works for me. (I could just use hdparm -tT a couple of times to trigger the
bug before). Now, a couple of machines, including my SMP iron here, are
running stably now (that is, since max. a day)

Regards,
--
Kurt Garloff <[email protected]> Eindhoven, NL
GPG key: See mail header, key servers Linux kernel development
SuSE GmbH, Nuernberg, FRG SCSI, Security


Attachments:
(No filename) (1.06 kB)
(No filename) (232.00 B)
Download all attachments