2005-09-22 19:59:23

by Valdis Klētnieks

[permalink] [raw]
Subject: 2.6.14-rc2-mm1 - ext3 wedging up

Am seeing reproducible wedging up when writing large (20M+) files to an ext3
file system. Oddly enough, if something *else* writes files to the file system
as well, it will unwedge for a while and make progress. Also, a 'sync' command
will relieve things temporarily - but after a few megabytes it comes to a halt
again. Looks like a borkage someplace not causing it to actually finish
pushing dirty file pages out - gkrellm reports little/no disk activity in
progress. File activity on *other* filesystems continues unimpeded.

A representative sample sysrq-t output (doing an rpm2cpio | cpio -ivdm on the
FC4 kernel.src.rpm, lsof reports the file being extracted was
linux-2.6.13.tar.bz2).

[17187066.172000] cpio D C3A2EC8C 1928 9299 9144 9298 (NOTLB)
[17187066.172000] c3a2eca4 00000000 c011d897 c3a2ec8c cab666a0 cab66560 a830ef00 003d0f8b
[17187066.172000] 365c0400 00000000 00000282 c3a2ecac 001b7450 c3a2ece0 c3a2ecd0 c036e5bf
[17187066.172000] cb4c1f64 c04f43a8 001b7450 4b87ad6e c011e35a cab66560 c04f4120 00000019
[17187066.172000] Call Trace:
[17187066.172000] [<c036e5bf>] schedule_timeout+0x72/0x90
[17187066.172000] [<c036e532>] io_schedule_timeout+0xe/0x16
[17187066.172000] [<c02627bd>] blk_congestion_wait+0x53/0x68
[17187066.172000] [<c0139ec8>] balance_dirty_pages+0xe8/0x142
[17187066.172000] [<c0139fcf>] task_balance_dirty_pages+0xad/0xb6
[17187066.172000] [<c0139fe4>] balance_dirty_pages_ratelimited+0xc/0x92
[17187066.172000] [<c0136b2b>] generic_file_buffered_write+0x427/0x50f
[17187066.172000] [<c0136fb0>] __generic_file_aio_write_nolock+0x39d/0x3da
[17187066.172000] [<c01371e7>] generic_file_aio_write+0x62/0xb0
[17187066.172000] [<c01895ad>] ext3_file_write+0x1a/0x88
[17187066.172000] [<c014e9fc>] do_sync_write+0xb1/0xe6
[17187066.172000] [<c014eade>] vfs_write+0xad/0x156
[17187066.172000] [<c014ec22>] sys_write+0x3b/0x60
[17187066.172000] [<c01026b1>] syscall_call+0x7/0xb

/proc/meminfo says:
MemTotal: 255140 kB
MemFree: 11048 kB
Buffers: 17084 kB
Cached: 43020 kB
SwapCached: 23244 kB
Active: 200156 kB
Inactive: 16128 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 255140 kB
LowFree: 11048 kB
SwapTotal: 1052216 kB
SwapFree: 940176 kB
Dirty: 60 kB
Writeback: 4 kB
Mapped: 185012 kB
Slab: 19288 kB
CommitLimit: 1179784 kB
Committed_AS: 415788 kB
PageTables: 1368 kB
VmallocTotal: 777940 kB
VmallocUsed: 28268 kB
VmallocChunk: 747728 kB

Here I kept entering 'sync' in another window - each time, I'd see an immediate
read/write flurry on gkrellm for 1-3 seconds, and then nothing until the next
sync - then it would start moving again.

[~]2 l /usr/src/valdis/kern/linux-2.6.13.tar.bz2
1244 -rw------- 1 valdis valdis 1263104 Sep 22 15:48 /usr/src/valdis/kern/linux-2.6.13.tar.bz2
[~]2 sync
[~]2 l /usr/src/valdis/kern/linux-2.6.13.tar.bz2
7920 -rw------- 1 valdis valdis 8092672 Sep 22 15:51 /usr/src/valdis/kern/linux-2.6.13.tar.bz2
[~]2 sync
[~]2 l /usr/src/valdis/kern/linux-2.6.13.tar.bz2
9464 -rw------- 1 valdis valdis 9669120 Sep 22 15:52 /usr/src/valdis/kern/linux-2.6.13.tar.bz2
[~]2 sync
[~]2 l /usr/src/valdis/kern/linux-2.6.13.tar.bz2
11516 -rw------- 1 valdis valdis 11770880 Sep 22 15:52 /usr/src/valdis/kern/linux-2.6.13.tar.bz2


Attachments:
(No filename) (226.00 B)

2005-09-23 00:36:24

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 23 Sep 2005 05:59, [email protected] wrote:
> Am seeing reproducible wedging up when writing large (20M+) files to an
> ext3 file system. Oddly enough, if something *else* writes files to the
> file system as well, it will unwedge for a while and make progress. Also,
> a 'sync' command will relieve things temporarily - but after a few
> megabytes it comes to a halt again. Looks like a borkage someplace not
> causing it to actually finish pushing dirty file pages out - gkrellm
> reports little/no disk activity in progress. File activity on *other*
> filesystems continues unimpeded.


Could be the write throttling patches.

Try backing these out (in this order I think):

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1-tweaks.patch
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6.14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1.patch

Cheers,
Con

2005-09-23 07:20:58

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 23 Sep 2005 10:36:16 +1000, Con Kolivas said:

(Adding Andrea to the To: list...)

> On Fri, 23 Sep 2005 05:59, [email protected] wrote:
> > Am seeing reproducible wedging up when writing large (20M+) files to an
> > ext3 file system. Oddly enough, if something *else* writes files to the
> > file system as well, it will unwedge for a while and make progress. Also,
> > a 'sync' command will relieve things temporarily - but after a few
> > megabytes it comes to a halt again. Looks like a borkage someplace not
> > causing it to actually finish pushing dirty file pages out - gkrellm
> > reports little/no disk activity in progress. File activity on *other*
> > filesystems continues unimpeded.
>
>
> Could be the write throttling patches.
>
> Try backing these out (in this order I think):
>
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6
.14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1-tweaks.patch
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2/2.6
.14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1.patch

Bingo. I haven't built a kernel with these excluded, but writing 0 to
/proc/sys/vm/dirty_ratio_centisecs fixes the problem, so I'm pretty sure
this is it.

(For the record, I've noticed the starvation issue that Andrea is trying to
address, where one process can lock out others, so I *do* think work is needed
here...)

Tuning/debugging info I gathered:

1) I was seeing 'future_pages' values averaging as high as 7K to 14K - is this
a "reasonable" number when writing a 38M file to a fairly sluggish laptop disk
(gkrellm showing 1M/sec to 3M/sec often, maybe 10M/sec if it's a single fairly
linear write..). Popular values that were seen on multiple trials included
14364, 13851, and 13338 (though a few times, a wedge would hit 14364, and a few
seconds later would "burp" up a bunch of disk I/O and drop to 7626 and stay there).

The reproducability of the numbers probably says more about the fact that the
system is otherwise basically idle at 3AM than anything else (so the number of
available pages isn't bouncing around due to other processes).

2) 'centiseconds' value of 0 disabled as designed. The default value of '500'
is *waaay* too high on my laptop - even 100 is consistently too much. 40 was
consistently low enough, 50 was usually OK, 75 was usually *not* OK, but would
sometimes "stutter" through and not completely grind to a halt. I'm not sure
exactly where the "knee" is, or if it moves during higher-load (my laptop
works harder during the day in the office than 2AM at home, usually).

The patch includes documentation:

+dirty_ratio_centisecs
+-----------------
+
+Throttle the I/O if the per-task writing bandwidth is high enough for
+the dirty_ratio to be reached in less than dirty_ratio_centisecs. This
+makes the write throttling per-process and avoids making too much
+memory dirty at the same time. Ideally in the future we should add
+some feedback from the backing_dev_info to know the max disk bandwidth.

I'm pretty convinced that for this patch to work, it *will* need feedback from
the actual (not max) disk bandwidth and possibly the actual amount of RAM -
what works on Andrea's 1G workstation with (presumably) a real disk system
is waay too much for 256M and a single laptop-class disk.

For now, I'm leaving centisecs set to 40, and will see how that works - most
of my "problem cases" involve an FTP on a 10/100mbit connection, so that will
get tried tomorrow.....


Attachments:
(No filename) (226.00 B)

2005-09-23 08:45:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, Sep 23, 2005 at 03:20:33AM -0400, [email protected] wrote:
> On Fri, 23 Sep 2005 10:36:16 +1000, Con Kolivas said:
>
> (Adding Andrea to the To: list...)
>
> > On Fri, 23 Sep 2005 05:59, [email protected] wrote:
> > > Am seeing reproducible wedging up when writing large (20M+) files to an
> > > ext3 file system. Oddly enough, if something *else* writes files to the
> > > file system as well, it will unwedge for a while and make progress. Also,

So you get a total hang? I guess there's a bug somewhere...

> I'm pretty convinced that for this patch to work, it *will* need feedback from
> the actual (not max) disk bandwidth and possibly the actual amount of RAM -
> what works on Andrea's 1G workstation with (presumably) a real disk system
> is waay too much for 256M and a single laptop-class disk.

That's not the problem here if you get a total hang. This heuristic should
only reduce the amount of dirty memory, it should never grind a task
to a total hang, until some other task writes to the filesystem.

The sysrq shows the task sleeping in blk_congestion_wait.

> For now, I'm leaving centisecs set to 40, and will see how that works - most
> of my "problem cases" involve an FTP on a 10/100mbit connection, so that will
> get tried tomorrow.....

You should leave it to 0 until I find the buglet that hangs the system.

I'll have a look.

One other thing to change is to call balance_dirty only when the dirty
bit is toggled (so overwrites of dirty cache are not accounted, since
they generate no additional I/O on disk).

Thanks.

2005-09-23 09:45:50

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 23 Sep 2005 17:20, [email protected] wrote:
> On Fri, 23 Sep 2005 10:36:16 +1000, Con Kolivas said:
>
> (Adding Andrea to the To: list...)
>
> > On Fri, 23 Sep 2005 05:59, [email protected] wrote:
> > > Am seeing reproducible wedging up when writing large (20M+) files to an
> > > ext3 file system. Oddly enough, if something *else* writes files to
> > > the file system as well, it will unwedge for a while and make progress.
> > > Also, a 'sync' command will relieve things temporarily - but after a
> > > few megabytes it comes to a halt again. Looks like a borkage someplace
> > > not causing it to actually finish pushing dirty file pages out -
> > > gkrellm reports little/no disk activity in progress. File activity on
> > > *other* filesystems continues unimpeded.
> >
> > Could be the write throttling patches.
> >
> > Try backing these out (in this order I think):
> >
> > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2
> >/2.6
>
> .14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1-tweaks.patch
>
> > http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc2
> >/2.6
>
> .14-rc2-mm1/broken-out/per-task-predictive-write-throttling-1.patch
>
> Bingo. I haven't built a kernel with these excluded, but writing 0 to
> /proc/sys/vm/dirty_ratio_centisecs fixes the problem, so I'm pretty sure
> this is it.
>
> (For the record, I've noticed the starvation issue that Andrea is trying to
> address, where one process can lock out others, so I *do* think work is
> needed here...)

I don't disagree, which is why I was excited by this work as well. Like all
things in the kernel it always ends up being more complicated than the
original plan, requiring reworking. So I do not remotely see this as a
problem at this early stage.

Cheers,
Con

2005-09-23 14:24:30

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 2005-09-23 at 10:45 +0200, Andrea Arcangeli wrote:
> On Fri, Sep 23, 2005 at 03:20:33AM -0400, [email protected] wrote:
> > On Fri, 23 Sep 2005 10:36:16 +1000, Con Kolivas said:
> >
> > (Adding Andrea to the To: list...)
> >
> > > On Fri, 23 Sep 2005 05:59, [email protected] wrote:
> > > > Am seeing reproducible wedging up when writing large (20M+) files to an
> > > > ext3 file system. Oddly enough, if something *else* writes files to the
> > > > file system as well, it will unwedge for a while and make progress. Also,
>
> So you get a total hang? I guess there's a bug somewhere...

I get a similar hang running fsx on a jfs file system.
"echo 0 > /proc/sys/vm/dirty_ratio_centisecs" fixes it as well.

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2005-09-23 15:32:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

Hello,

Can you try this updated patch? I believe the blk_congestion_wait is
just wrong there, since there may be just one page being flushed. That
sounds like a longstanding bug except it normally wouldn't trigger
because the dirty levels never goes down near zero during heavy writes.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.14-rc1/per-task-predictive-write-throttling-3

2005-09-23 19:11:59

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 23 Sep 2005 17:31:58 +0200, Andrea Arcangeli said:
> Hello,
>
> Can you try this updated patch? I believe the blk_congestion_wait is
> just wrong there, since there may be just one page being flushed. That
> sounds like a longstanding bug except it normally wouldn't trigger
> because the dirty levels never goes down near zero during heavy writes.
>
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.1
4-rc1/per-task-predictive-write-throttling-3

Will do, although it may be Sunday night or Monday morning before I can report
back - just got handed a few higher-priority tasks...


Attachments:
(No filename) (226.00 B)

2005-09-23 20:57:34

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 2005-09-23 at 17:31 +0200, Andrea Arcangeli wrote:
> Hello,
>
> Can you try this updated patch? I believe the blk_congestion_wait is
> just wrong there, since there may be just one page being flushed. That
> sounds like a longstanding bug except it normally wouldn't trigger
> because the dirty levels never goes down near zero during heavy writes.

fsx is now stuck in a loop somewhere, using 100% cpu.

> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.14-rc1/per-task-predictive-write-throttling-3

--
David Kleikamp
IBM Linux Technology Center

2005-09-23 20:59:18

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 2005-09-23 at 15:57 -0500, Dave Kleikamp wrote:
> On Fri, 2005-09-23 at 17:31 +0200, Andrea Arcangeli wrote:
> > Hello,
> >
> > Can you try this updated patch? I believe the blk_congestion_wait is
> > just wrong there, since there may be just one page being flushed. That
> > sounds like a longstanding bug except it normally wouldn't trigger
> > because the dirty levels never goes down near zero during heavy writes.
>
> fsx is now stuck in a loop somewhere, using 100% cpu.

I hit send a little early. It eventually responded to a ^C. I'll try
to get some more info.

--
David Kleikamp
IBM Linux Technology Center

2005-09-23 21:46:25

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, 2005-09-23 at 15:59 -0500, Dave Kleikamp wrote:
> On Fri, 2005-09-23 at 15:57 -0500, Dave Kleikamp wrote:
> > On Fri, 2005-09-23 at 17:31 +0200, Andrea Arcangeli wrote:
> > > Hello,
> > >
> > > Can you try this updated patch? I believe the blk_congestion_wait is
> > > just wrong there, since there may be just one page being flushed. That
> > > sounds like a longstanding bug except it normally wouldn't trigger
> > > because the dirty levels never goes down near zero during heavy writes.
> >
> > fsx is now stuck in a loop somewhere, using 100% cpu.
>
> I hit send a little early. It eventually responded to a ^C. I'll try
> to get some more info.

I'd guess that it's spinning in balance_dirty_pages.
/proc/<pid>/future_dirty is 25650 for fsx. It appears that
nr_reclaimable is not going to zero for some reason.

--
David Kleikamp
IBM Linux Technology Center

2005-09-26 08:15:16

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, Sep 23, 2005 at 04:46:19PM -0500, Dave Kleikamp wrote:
> On Fri, 2005-09-23 at 15:59 -0500, Dave Kleikamp wrote:
> > On Fri, 2005-09-23 at 15:57 -0500, Dave Kleikamp wrote:
> > > On Fri, 2005-09-23 at 17:31 +0200, Andrea Arcangeli wrote:
> > > > Hello,
> > > >
> > > > Can you try this updated patch? I believe the blk_congestion_wait is
> > > > just wrong there, since there may be just one page being flushed. That
> > > > sounds like a longstanding bug except it normally wouldn't trigger
> > > > because the dirty levels never goes down near zero during heavy writes.
> > >
> > > fsx is now stuck in a loop somewhere, using 100% cpu.
> >
> > I hit send a little early. It eventually responded to a ^C. I'll try
> > to get some more info.
>
> I'd guess that it's spinning in balance_dirty_pages.
> /proc/<pid>/future_dirty is 25650 for fsx. It appears that

Ok the good news is that this isn't a bug in the basic algorithm, but
just in the implementation of it.

> nr_reclaimable is not going to zero for some reason.

Exactly, the !nr_reclaimable check is what I thought would have
prevented an infinite loop to trigger...

Unfortunately I couldn't reproduce on my laptop, I was working from the
laptop the whole last week (I even did a presentation with this patch
applied ;), I'll try to reprouce with fsx now.

Thanks for the help!

2005-09-28 22:38:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, Sep 23, 2005 at 04:46:19PM -0500, Dave Kleikamp wrote:
> On Fri, 2005-09-23 at 15:59 -0500, Dave Kleikamp wrote:
> I'd guess that it's spinning in balance_dirty_pages.
> /proc/<pid>/future_dirty is 25650 for fsx. It appears that
> nr_reclaimable is not going to zero for some reason.

Even if nr_reclaimable isn't going to zero, eventually the loop should
break out because pages_written must increase.

So this make me think it might be the nr_unstable that destabilizes it,
and whatever it is, it is a bug in mainline as well, except it was well
hidden until now, because the dirty levels never approached zero during
heavy write-IO like it can happen with this feature enabled.

Basically whatever we account as "reclaimable" must be _written_out_ and
accounted as well in the "pages_written" otherwise it'll just hang.
If there's a problem, it shall be a longstanding one.

Can you try with this new patch that stops accounting "unstable" as
"reclaimable". It should be possible to flush the dirty pages to disk so
"nr_dirty" should be safe because they should always increase the
"pages_written". I'm not sure if this fixes it, but this at least rule
out the nfs from the equation (perhaps nfs will never be accounted as
"pages_written" and that would be a possible explanation of the infinite
loop).

This new update also makes sure to never account rewrites (except for
reiserfs where it's more difficult to change the code for this).

I tried with fsx (no params) but I couldn't reproduce any problem yet,
but I've no nfs workload involved in my test box.

http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.14-rc1/per-task-predictive-write-throttling-4

thanks for the help!

2005-10-01 00:27:08

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Thu, 2005-09-29 at 00:38 +0200, Andrea Arcangeli wrote:
> On Fri, Sep 23, 2005 at 04:46:19PM -0500, Dave Kleikamp wrote:
> > On Fri, 2005-09-23 at 15:59 -0500, Dave Kleikamp wrote:
> > I'd guess that it's spinning in balance_dirty_pages.
> > /proc/<pid>/future_dirty is 25650 for fsx. It appears that
> > nr_reclaimable is not going to zero for some reason.

I tracked down my problem to a bug in jfs. jfs is explicitly setting
I_DIRTY in the i_state for a special inode that is preventing it from
being put on the s_dirty list. This must have been something I did a
long time ago when I was a newbie. I'm embarrassed I hadn't noticed it
until now.

I still had the problem even with your latest patch, but it's fixed with
this patch to jfs. I haven't yet tried the jfs patch with the earlier
versions of your patch to see if there is really a problem with them.

I don't have anything to say about the original problem reported on
ext3, since I only saw the problem on jfs.

> Even if nr_reclaimable isn't going to zero, eventually the loop should
> break out because pages_written must increase.
>
> So this make me think it might be the nr_unstable that destabilizes it,
> and whatever it is, it is a bug in mainline as well, except it was well
> hidden until now, because the dirty levels never approached zero during
> heavy write-IO like it can happen with this feature enabled.
>
> Basically whatever we account as "reclaimable" must be _written_out_ and
> accounted as well in the "pages_written" otherwise it'll just hang.
> If there's a problem, it shall be a longstanding one.

Yep. My bad.

> Can you try with this new patch that stops accounting "unstable" as
> "reclaimable". It should be possible to flush the dirty pages to disk so
> "nr_dirty" should be safe because they should always increase the
> "pages_written". I'm not sure if this fixes it, but this at least rule
> out the nfs from the equation (perhaps nfs will never be accounted as
> "pages_written" and that would be a possible explanation of the infinite
> loop).
>
> This new update also makes sure to never account rewrites (except for
> reiserfs where it's more difficult to change the code for this).
>
> I tried with fsx (no params) but I couldn't reproduce any problem yet,
> but I've no nfs workload involved in my test box.
>
> http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.14-rc1/per-task-predictive-write-throttling-4
>
> thanks for the help!

JFS: jfs should not be playing with i_state

jfs has been explicitly setting i_state |= I_DIRTY on a special inode.
This prevented it from being put on the s_dirty list. Very stupid.

Signed-off-by: Dave Kleikamp <[email protected]>

diff --git a/fs/jfs/jfs_dmap.c b/fs/jfs/jfs_dmap.c
--- a/fs/jfs/jfs_dmap.c
+++ b/fs/jfs/jfs_dmap.c
@@ -305,7 +305,6 @@ int dbSync(struct inode *ipbmap)
filemap_fdatawrite(ipbmap->i_mapping);
filemap_fdatawait(ipbmap->i_mapping);

- ipbmap->i_state |= I_DIRTY;
diWriteSpecial(ipbmap, 0);

return (0);
diff --git a/fs/jfs/jfs_imap.c b/fs/jfs/jfs_imap.c
--- a/fs/jfs/jfs_imap.c
+++ b/fs/jfs/jfs_imap.c
@@ -514,8 +514,6 @@ void diWriteSpecial(struct inode *ip, in
ino_t inum = ip->i_ino;
struct metapage *mp;

- ip->i_state &= ~I_DIRTY;
-
if (secondary)
address = addressPXD(&sbi->ait2) >> sbi->l2nbperpage;
else
diff --git a/fs/jfs/jfs_txnmgr.c b/fs/jfs/jfs_txnmgr.c
--- a/fs/jfs/jfs_txnmgr.c
+++ b/fs/jfs/jfs_txnmgr.c
@@ -2396,7 +2396,6 @@ static void txUpdateMap(struct tblock *
*/
if (tblk->xflag & COMMIT_CREATE) {
diUpdatePMap(ipimap, tblk->ino, FALSE, tblk);
- ipimap->i_state |= I_DIRTY;
/* update persistent block allocation map
* for the allocation of inode extent;
*/
@@ -2407,7 +2406,6 @@ static void txUpdateMap(struct tblock *
} else if (tblk->xflag & COMMIT_DELETE) {
ip = tblk->u.ip;
diUpdatePMap(ipimap, ip->i_ino, TRUE, tblk);
- ipimap->i_state |= I_DIRTY;
iput(ip);
}
}

--
David Kleikamp
IBM Linux Technology Center

2005-10-02 10:27:35

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Fri, Sep 30, 2005 at 07:27:04PM -0500, Dave Kleikamp wrote:
> I tracked down my problem to a bug in jfs. jfs is explicitly setting

Ok great this explain things, so perhaps my last hack attempt of not
accounting the unstable pages in the "nr_reclaimable" isn't needed.

What about Valids, were you using jfs too along with ext3? If a single
fs has a bug the loop can happen (it could happen in mainline too,
except it was less likely to be visible there).

Note Valids, your smtp server bounces back my emails.

2005-10-02 10:32:57

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Sun, Oct 02, 2005 at 12:27:26PM +0200, Andrea Arcangeli wrote:
> Note Valids, your smtp server bounces back my emails.

here we go again:

<[email protected]>: host smtp.vt.edu[198.82.161.8] said: 550 This domain
is blacklisted,consult your postmaster (in reply to MAIL FROM command)

If you blacklist 0.0.0.0/0 as well you won't risk getting any more spam ;)

2005-10-02 13:51:20

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Sun, 2005-10-02 at 12:27 +0200, Andrea Arcangeli wrote:
> On Fri, Sep 30, 2005 at 07:27:04PM -0500, Dave Kleikamp wrote:
> > I tracked down my problem to a bug in jfs. jfs is explicitly setting
>
> Ok great this explain things, so perhaps my last hack attempt of not
> accounting the unstable pages in the "nr_reclaimable" isn't needed.

Maybe it is. I just retested the fixed jfs on 2.6.14-rc2-mm1, and I
still see the hang. I can probably debug it further on Monday if
necessary.

> What about Valids, were you using jfs too along with ext3? If a single
> fs has a bug the loop can happen (it could happen in mainline too,
> except it was less likely to be visible there).
>
> Note Valids, your smtp server bounces back my emails.
>
--
David Kleikamp
IBM Linux Technology Center

2005-10-03 01:05:27

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Sun, 02 Oct 2005 12:27:26 +0200, Andrea Arcangeli said:

> Ok great this explain things, so perhaps my last hack attempt of not
> accounting the unstable pages in the "nr_reclaimable" isn't needed.
>
> What about Valids, were you using jfs too along with ext3? If a single
> fs has a bug the loop can happen (it could happen in mainline too,
> except it was less likely to be visible there).

% zgrep -i jfs /proc/config.gz
# CONFIG_JFS_FS is not set

Sorry, this is an ext3-based system, no JFS here.

Another (possibly unimportant) data point: I was seeing it with 256M
of RAM, but after a recent upgrade to 768M, I'm not seeing it. Probably
need to reboot with mem=256 to replicate now...


Attachments:
(No filename) (226.00 B)

2005-10-03 18:06:17

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Sun, 2005-10-02 at 08:51 -0500, Dave Kleikamp wrote:
> On Sun, 2005-10-02 at 12:27 +0200, Andrea Arcangeli wrote:
> > On Fri, Sep 30, 2005 at 07:27:04PM -0500, Dave Kleikamp wrote:
> > > I tracked down my problem to a bug in jfs. jfs is explicitly setting
> >
> > Ok great this explain things, so perhaps my last hack attempt of not
> > accounting the unstable pages in the "nr_reclaimable" isn't needed.
>
> Maybe it is. I just retested the fixed jfs on 2.6.14-rc2-mm1, and I
> still see the hang. I can probably debug it further on Monday if
> necessary.

I finally figured out what the problem was with jfs. There are really
three things I ended up fixing, but the most important was that the
reserved inodes that jfs uses for metadata were not in the inode hash.
__mark_inode_dirty() fails to add the inode to the superblock's dirty
list if hlist_unhashed() is true. Without being on the dirty list, the
inode is not even looked at by writeback_inodes().

The other problems are that jfs explicitly sets I_DIRTY (I already
reported that one) and that metadata_writepage may repeatedly redirty an
inode that is waiting on journal I/O without initiating the journal I/O.

> > What about Valids, were you using jfs too along with ext3? If a single
> > fs has a bug the loop can happen (it could happen in mainline too,
> > except it was less likely to be visible there).

Unfortunately, this doesn't solve Valdis' problem, as he isn't using
jfs. Valdis, do you have any other file systems mounted besides ext3?
I wonder if another file system has a similar problem.
--
David Kleikamp
IBM Linux Technology Center


Attachments:
jfs-i_hash.patch (3.79 kB)

2005-10-03 18:31:41

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Mon, 03 Oct 2005 13:06:08 CDT, Dave Kleikamp said:

> Unfortunately, this doesn't solve Valdis' problem, as he isn't using
> jfs. Valdis, do you have any other file systems mounted besides ext3?
> I wonder if another file system has a similar problem.

Nope, all ext3 here..


Attachments:
(No filename) (226.00 B)

2005-10-10 17:15:57

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

Hello,

So what's the status of this? Dave can you still reproduce hangs with
your last jfs fixes applied?

Valids did you test my last patch (you find it in my ftp area) that
removes the unstable pages from the equation? all ext3 as local fs ok,
but do you use nfs for the networked fs? If not can you post a way to
reproduce the hang? Is it enough to boot with mem=256M?

Thanks.

2005-10-10 17:21:24

by Dave Kleikamp

[permalink] [raw]
Subject: Re: 2.6.14-rc2-mm1 - ext3 wedging up

On Mon, 2005-10-10 at 19:15 +0200, Andrea Arcangeli wrote:
> Hello,
>
> So what's the status of this? Dave can you still reproduce hangs with
> your last jfs fixes applied?

With my latest jfs patch, I was unable to reproduce the hang on an
unmodified 2.6.14-rc2-mm1 kernel.

> Valids did you test my last patch (you find it in my ftp area) that
> removes the unstable pages from the equation? all ext3 as local fs ok,
> but do you use nfs for the networked fs? If not can you post a way to
> reproduce the hang? Is it enough to boot with mem=256M?
>
> Thanks.
>
--
David Kleikamp
IBM Linux Technology Center