2007-10-28 15:24:39

by Florin Iucha

[permalink] [raw]
Subject: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

Hello,

For a week or two I started noticing that some time after I'm logged
in, my keyboard input becomes a bit staggering, there is a small delay
between the keypress and the actual character appearing in the
terminal. This is on a AMD Athlon x2 4200+ with 2 GB RAM and just a
gnome-terminal open. The machine is as idle as possible - monitored
via the system monitor applet. I could not get any hard data on it,
until now.

After I logged off from GNOME, I switched to the text console and ran
top, with the option of showing one CPU stats line for each CPU. Lo
and behold, one core is 100% idle, and the other one is 25% idle and
75% waiting. Periodically, a pdflush process in 'D' state raises to
the top. I did a 'echo t > /proc/sysrequest-trigger' and this is what
is says for the two pdflush processes:

[ 3687.824424] pdflush S ffff8100057ffef8 0 247 2
[ 3687.824427] ffff8100057ffed0 0000000000000046 ffff8100057ffe70 ffffffff8022a96c
[ 3687.824431] ffff8100057fc000 ffff810003040770 ffff8100057fc208 0000000000000297
[ 3687.824434] ffff8100057ffe90 ffff810002c1ba10 ffff8100057ffed0 ffffffff8022b9d2
[ 3687.824438] Call Trace:
[ 3687.824440] [<ffffffff8022a96c>] enqueue_task_fair+0x21/0x34
[ 3687.824444] [<ffffffff8022b9d2>] set_user_nice+0x110/0x12c
[ 3687.824448] [<ffffffff80267165>] pdflush+0x0/0x1c3
[ 3687.824451] [<ffffffff80267234>] pdflush+0xcf/0x1c3
[ 3687.824455] [<ffffffff80245876>] kthread+0x49/0x77
[ 3687.824458] [<ffffffff8020c598>] child_rip+0xa/0x12
[ 3687.824463] [<ffffffff8024582d>] kthread+0x0/0x77
[ 3687.824466] [<ffffffff8020c58e>] child_rip+0x0/0x12
[ 3687.824468]
[ 3687.824470] pdflush D ffffffff805787c0 0 248 2
[ 3687.824473] ffff810006001d90 0000000000000046 0000000000000000 0000000000000286
[ 3687.824476] ffff8100057fc770 ffff810003062000 ffff8100057fc978 0000000106001da0
[ 3687.824480] 0000000000000003 ffffffff8023b1b2 0000000000000000 0000000000000000
[ 3687.824483] Call Trace:
[ 3687.824488] [<ffffffff8023b1b2>] __mod_timer+0xb8/0xca
[ 3687.824492] [<ffffffff8055c87a>] schedule_timeout+0x8d/0xb4
[ 3687.824496] [<ffffffff8023ad6c>] process_timeout+0x0/0xb
[ 3687.824499] [<ffffffff8055c79a>] io_schedule_timeout+0x28/0x33
[ 3687.824503] [<ffffffff8026bb24>] congestion_wait+0x6b/0x87
[ 3687.824506] [<ffffffff80245983>] autoremove_wake_function+0x0/0x38
[ 3687.824510] [<ffffffff8029e684>] writeback_inodes+0xcd/0xd5
[ 3687.824514] [<ffffffff80266dc4>] wb_kupdate+0xbb/0x10d
[ 3687.824518] [<ffffffff80267165>] pdflush+0x0/0x1c3
[ 3687.824520] [<ffffffff8026727d>] pdflush+0x118/0x1c3
[ 3687.824523] [<ffffffff80266d09>] wb_kupdate+0x0/0x10d
[ 3687.824527] [<ffffffff80245876>] kthread+0x49/0x77
[ 3687.824530] [<ffffffff8020c598>] child_rip+0xa/0x12
[ 3687.824535] [<ffffffff8024582d>] kthread+0x0/0x77
[ 3687.824538] [<ffffffff8020c58e>] child_rip+0x0/0x12
[ 3687.824540]

What could cause this? I use NFS4 to automount the home directories
from a Solaris10 server, and this box found a few bugs in the NFS4
code (fixed in the 2.6.22 kernel).

I'll try running with 2.6.23 again for a few days, to see if I get the
pdflush stuck. Any other ideas?

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.31 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-29 13:44:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32


On Sun, 2007-10-28 at 10:24 -0500, Florin Iucha wrote:
> Hello,
>
> For a week or two I started noticing that some time after I'm logged
> in, my keyboard input becomes a bit staggering, there is a small delay
> between the keypress and the actual character appearing in the
> terminal. This is on a AMD Athlon x2 4200+ with 2 GB RAM and just a
> gnome-terminal open. The machine is as idle as possible - monitored
> via the system monitor applet. I could not get any hard data on it,
> until now.
>
> After I logged off from GNOME, I switched to the text console and ran
> top, with the option of showing one CPU stats line for each CPU. Lo
> and behold, one core is 100% idle, and the other one is 25% idle and
> 75% waiting. Periodically, a pdflush process in 'D' state raises to
> the top. I did a 'echo t > /proc/sysrequest-trigger' and this is what
> is says for the two pdflush processes:
>
> [ 3687.824424] pdflush S ffff8100057ffef8 0 247 2
> [ 3687.824427] ffff8100057ffed0 0000000000000046 ffff8100057ffe70 ffffffff8022a96c
> [ 3687.824431] ffff8100057fc000 ffff810003040770 ffff8100057fc208 0000000000000297
> [ 3687.824434] ffff8100057ffe90 ffff810002c1ba10 ffff8100057ffed0 ffffffff8022b9d2
> [ 3687.824438] Call Trace:
> [ 3687.824440] [<ffffffff8022a96c>] enqueue_task_fair+0x21/0x34
> [ 3687.824444] [<ffffffff8022b9d2>] set_user_nice+0x110/0x12c
> [ 3687.824448] [<ffffffff80267165>] pdflush+0x0/0x1c3
> [ 3687.824451] [<ffffffff80267234>] pdflush+0xcf/0x1c3
> [ 3687.824455] [<ffffffff80245876>] kthread+0x49/0x77
> [ 3687.824458] [<ffffffff8020c598>] child_rip+0xa/0x12
> [ 3687.824463] [<ffffffff8024582d>] kthread+0x0/0x77
> [ 3687.824466] [<ffffffff8020c58e>] child_rip+0x0/0x12
> [ 3687.824468]
> [ 3687.824470] pdflush D ffffffff805787c0 0 248 2
> [ 3687.824473] ffff810006001d90 0000000000000046 0000000000000000 0000000000000286
> [ 3687.824476] ffff8100057fc770 ffff810003062000 ffff8100057fc978 0000000106001da0
> [ 3687.824480] 0000000000000003 ffffffff8023b1b2 0000000000000000 0000000000000000
> [ 3687.824483] Call Trace:
> [ 3687.824488] [<ffffffff8023b1b2>] __mod_timer+0xb8/0xca
> [ 3687.824492] [<ffffffff8055c87a>] schedule_timeout+0x8d/0xb4
> [ 3687.824496] [<ffffffff8023ad6c>] process_timeout+0x0/0xb
> [ 3687.824499] [<ffffffff8055c79a>] io_schedule_timeout+0x28/0x33
> [ 3687.824503] [<ffffffff8026bb24>] congestion_wait+0x6b/0x87
> [ 3687.824506] [<ffffffff80245983>] autoremove_wake_function+0x0/0x38
> [ 3687.824510] [<ffffffff8029e684>] writeback_inodes+0xcd/0xd5
> [ 3687.824514] [<ffffffff80266dc4>] wb_kupdate+0xbb/0x10d
> [ 3687.824518] [<ffffffff80267165>] pdflush+0x0/0x1c3
> [ 3687.824520] [<ffffffff8026727d>] pdflush+0x118/0x1c3
> [ 3687.824523] [<ffffffff80266d09>] wb_kupdate+0x0/0x10d
> [ 3687.824527] [<ffffffff80245876>] kthread+0x49/0x77
> [ 3687.824530] [<ffffffff8020c598>] child_rip+0xa/0x12
> [ 3687.824535] [<ffffffff8024582d>] kthread+0x0/0x77
> [ 3687.824538] [<ffffffff8020c58e>] child_rip+0x0/0x12
> [ 3687.824540]
>
> What could cause this? I use NFS4 to automount the home directories
> from a Solaris10 server, and this box found a few bugs in the NFS4
> code (fixed in the 2.6.22 kernel).
>
> I'll try running with 2.6.23 again for a few days, to see if I get the
> pdflush stuck. Any other ideas?

One of them appears to be waiting for i/o congestion to clear up. If the
filesystem is NFS, then that means that some other thread is busy
writing data out to the server. You'll need to look at the rest of the
thread dump to figure out which thread is writing the data out, and
where it is getting stuck.

Trond

2007-10-29 15:02:14

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Mon, Oct 29, 2007 at 09:46:59AM -0400, Trond Myklebust wrote:
> > What could cause this? I use NFS4 to automount the home directories
> > from a Solaris10 server, and this box found a few bugs in the NFS4
> > code (fixed in the 2.6.22 kernel).
> >
> > I'll try running with 2.6.23 again for a few days, to see if I get the
> > pdflush stuck. Any other ideas?
>
> One of them appears to be waiting for i/o congestion to clear up. If the
> filesystem is NFS, then that means that some other thread is busy
> writing data out to the server. You'll need to look at the rest of the
> thread dump to figure out which thread is writing the data out, and
> where it is getting stuck.

Trond,

The full dmesg is at http://iucha.net/2.6.24-rc1/dmesg.stuck_pdflush.gz

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (881.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-29 18:43:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32


On Mon, 2007-10-29 at 10:01 -0500, Florin Iucha wrote:
> On Mon, Oct 29, 2007 at 09:46:59AM -0400, Trond Myklebust wrote:
> > > What could cause this? I use NFS4 to automount the home directories
> > > from a Solaris10 server, and this box found a few bugs in the NFS4
> > > code (fixed in the 2.6.22 kernel).
> > >
> > > I'll try running with 2.6.23 again for a few days, to see if I get the
> > > pdflush stuck. Any other ideas?
> >
> > One of them appears to be waiting for i/o congestion to clear up. If the
> > filesystem is NFS, then that means that some other thread is busy
> > writing data out to the server. You'll need to look at the rest of the
> > thread dump to figure out which thread is writing the data out, and
> > where it is getting stuck.
>
> Trond,
>
> The full dmesg is at http://iucha.net/2.6.24-rc1/dmesg.stuck_pdflush.gz
>
> Cheers,
> florin

I can't see any evidence of NFS traffic at all in those traces, but
there is a fuse process that is sleeping in :fuse:fuse_dev_read(). Could
that perhaps be relevant.

Trond

2007-10-29 18:48:19

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Mon, Oct 29, 2007 at 02:43:32PM -0400, Trond Myklebust wrote:
>
> On Mon, 2007-10-29 at 10:01 -0500, Florin Iucha wrote:
> > On Mon, Oct 29, 2007 at 09:46:59AM -0400, Trond Myklebust wrote:
> > > > What could cause this? I use NFS4 to automount the home directories
> > > > from a Solaris10 server, and this box found a few bugs in the NFS4
> > > > code (fixed in the 2.6.22 kernel).
> > > >
> > > > I'll try running with 2.6.23 again for a few days, to see if I get the
> > > > pdflush stuck. Any other ideas?
> > >
> > > One of them appears to be waiting for i/o congestion to clear up. If the
> > > filesystem is NFS, then that means that some other thread is busy
> > > writing data out to the server. You'll need to look at the rest of the
> > > thread dump to figure out which thread is writing the data out, and
> > > where it is getting stuck.
> >
> > Trond,
> >
> > The full dmesg is at http://iucha.net/2.6.24-rc1/dmesg.stuck_pdflush.gz
>
> I can't see any evidence of NFS traffic at all in those traces, but
> there is a fuse process that is sleeping in :fuse:fuse_dev_read(). Could
> that perhaps be relevant.

That might be the overzealous Ubuntu trying to make the NTFS partition
available. I will try to disable it and see if I can reproduce the
hang. BTW: With 2.6.24-rc1+ it happens after a couple of hours. With
2.6.23 it did not happen after 6 hours or so.

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.47 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-30 07:54:19

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Sun, Oct 28, 2007 at 10:24:29AM -0500, Florin Iucha wrote:
[...]
> [ 3687.824468]
> [ 3687.824470] pdflush D ffffffff805787c0 0 248 2
> [ 3687.824473] ffff810006001d90 0000000000000046 0000000000000000 0000000000000286
> [ 3687.824476] ffff8100057fc770 ffff810003062000 ffff8100057fc978 0000000106001da0
> [ 3687.824480] 0000000000000003 ffffffff8023b1b2 0000000000000000 0000000000000000
> [ 3687.824483] Call Trace:
> [ 3687.824488] [<ffffffff8023b1b2>] __mod_timer+0xb8/0xca
> [ 3687.824492] [<ffffffff8055c87a>] schedule_timeout+0x8d/0xb4
> [ 3687.824496] [<ffffffff8023ad6c>] process_timeout+0x0/0xb
> [ 3687.824499] [<ffffffff8055c79a>] io_schedule_timeout+0x28/0x33
> [ 3687.824503] [<ffffffff8026bb24>] congestion_wait+0x6b/0x87
> [ 3687.824506] [<ffffffff80245983>] autoremove_wake_function+0x0/0x38
> [ 3687.824510] [<ffffffff8029e684>] writeback_inodes+0xcd/0xd5
> [ 3687.824514] [<ffffffff80266dc4>] wb_kupdate+0xbb/0x10d
> [ 3687.824518] [<ffffffff80267165>] pdflush+0x0/0x1c3
> [ 3687.824520] [<ffffffff8026727d>] pdflush+0x118/0x1c3
> [ 3687.824523] [<ffffffff80266d09>] wb_kupdate+0x0/0x10d
> [ 3687.824527] [<ffffffff80245876>] kthread+0x49/0x77
> [ 3687.824530] [<ffffffff8020c598>] child_rip+0xa/0x12
> [ 3687.824535] [<ffffffff8024582d>] kthread+0x0/0x77
> [ 3687.824538] [<ffffffff8020c58e>] child_rip+0x0/0x12
> [ 3687.824540]
>
> What could cause this? I use NFS4 to automount the home directories
> from a Solaris10 server, and this box found a few bugs in the NFS4
> code (fixed in the 2.6.22 kernel).
>
> I'll try running with 2.6.23 again for a few days, to see if I get the
> pdflush stuck. Any other ideas?

It could be triggered by the more aggressive writeback behavior - the
new code will keep on retrying as long as there are dirty inodes pending.

Florin, would you try the attached patches against 2.6.24-git?
They may generate big traffic of printk messages, but will help
debug the problem.

Thank you,
Fengguang


Attachments:
(No filename) (2.01 kB)
writeback-debug.patch (1.88 kB)
requeue_io-debug.patch (1.11 kB)
Download all attachments

2007-10-30 11:43:00

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 03:54:03PM +0800, Fengguang Wu wrote:
> On Sun, Oct 28, 2007 at 10:24:29AM -0500, Florin Iucha wrote:
> [...]
> > [ 3687.824468]
> > [ 3687.824470] pdflush D ffffffff805787c0 0 248 2
> > [ 3687.824473] ffff810006001d90 0000000000000046 0000000000000000 0000000000000286
> > [ 3687.824476] ffff8100057fc770 ffff810003062000 ffff8100057fc978 0000000106001da0
> > [ 3687.824480] 0000000000000003 ffffffff8023b1b2 0000000000000000 0000000000000000
> > [ 3687.824483] Call Trace:
> > [ 3687.824488] [<ffffffff8023b1b2>] __mod_timer+0xb8/0xca
> > [ 3687.824492] [<ffffffff8055c87a>] schedule_timeout+0x8d/0xb4
> > [ 3687.824496] [<ffffffff8023ad6c>] process_timeout+0x0/0xb
> > [ 3687.824499] [<ffffffff8055c79a>] io_schedule_timeout+0x28/0x33
> > [ 3687.824503] [<ffffffff8026bb24>] congestion_wait+0x6b/0x87
> > [ 3687.824506] [<ffffffff80245983>] autoremove_wake_function+0x0/0x38
> > [ 3687.824510] [<ffffffff8029e684>] writeback_inodes+0xcd/0xd5
> > [ 3687.824514] [<ffffffff80266dc4>] wb_kupdate+0xbb/0x10d
> > [ 3687.824518] [<ffffffff80267165>] pdflush+0x0/0x1c3
> > [ 3687.824520] [<ffffffff8026727d>] pdflush+0x118/0x1c3
> > [ 3687.824523] [<ffffffff80266d09>] wb_kupdate+0x0/0x10d
> > [ 3687.824527] [<ffffffff80245876>] kthread+0x49/0x77
> > [ 3687.824530] [<ffffffff8020c598>] child_rip+0xa/0x12
> > [ 3687.824535] [<ffffffff8024582d>] kthread+0x0/0x77
> > [ 3687.824538] [<ffffffff8020c58e>] child_rip+0x0/0x12
> > [ 3687.824540]
> >
> > What could cause this? I use NFS4 to automount the home directories
> > from a Solaris10 server, and this box found a few bugs in the NFS4
> > code (fixed in the 2.6.22 kernel).
> >
> > I'll try running with 2.6.23 again for a few days, to see if I get the
> > pdflush stuck. Any other ideas?
>
> It could be triggered by the more aggressive writeback behavior - the
> new code will keep on retrying as long as there are dirty inodes pending.
>
> Florin, would you try the attached patches against 2.6.24-git?
> They may generate big traffic of printk messages, but will help
> debug the problem.

I have updated to v2.6.24-rc1-334-g82798a1. After using my computer
for two hours, I left the computer idle overnight. This morning,
pdflushd is again consuming 25% of a CPU. I will try Fengguang's
patches today.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (2.44 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-30 11:49:55

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 06:42:50AM -0500, Florin Iucha wrote:
> On Tue, Oct 30, 2007 at 03:54:03PM +0800, Fengguang Wu wrote:
> > It could be triggered by the more aggressive writeback behavior - the
> > new code will keep on retrying as long as there are dirty inodes pending.
> >
> > Florin, would you try the attached patches against 2.6.24-git?
> > They may generate big traffic of printk messages, but will help
> > debug the problem.
>
> I have updated to v2.6.24-rc1-334-g82798a1. After using my computer
> for two hours, I left the computer idle overnight. This morning,
> pdflushd is again consuming 25% of a CPU. I will try Fengguang's
> patches today.

Thank you.

Be sure to stop klogd ;-)
A `dmesg` will be sufficient for debugging.

btw, I cannot reproduce it with simple NFSv3 tests.

Fengguang

2007-10-30 11:55:18

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 07:49:41PM +0800, Fengguang Wu wrote:
> On Tue, Oct 30, 2007 at 06:42:50AM -0500, Florin Iucha wrote:
> > On Tue, Oct 30, 2007 at 03:54:03PM +0800, Fengguang Wu wrote:
> > > It could be triggered by the more aggressive writeback behavior - the
> > > new code will keep on retrying as long as there are dirty inodes pending.
> > >
> > > Florin, would you try the attached patches against 2.6.24-git?
> > > They may generate big traffic of printk messages, but will help
> > > debug the problem.
> >
> > I have updated to v2.6.24-rc1-334-g82798a1. After using my computer
> > for two hours, I left the computer idle overnight. This morning,
> > pdflushd is again consuming 25% of a CPU. I will try Fengguang's
> > patches today.
>
> Thank you.
>
> Be sure to stop klogd ;-)
> A `dmesg` will be sufficient for debugging.
>
> btw, I cannot reproduce it with simple NFSv3 tests.

This is using nfsv4. And being idle in a GNOME session with home
mounted from the server is hardly a stressful experience. But what do
I know ;)

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.14 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-31 00:02:52

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 07:49:41PM +0800, Fengguang Wu wrote:
> > > It could be triggered by the more aggressive writeback behavior - the
> > > new code will keep on retrying as long as there are dirty inodes pending.
> > >
> > > Florin, would you try the attached patches against 2.6.24-git?
> > > They may generate big traffic of printk messages, but will help
> > > debug the problem.
> >
> > I have updated to v2.6.24-rc1-334-g82798a1. After using my computer
> > for two hours, I left the computer idle overnight. This morning,
> > pdflushd is again consuming 25% of a CPU. I will try Fengguang's
> > patches today.
>
> Thank you.
>
> Be sure to stop klogd ;-)
> A `dmesg` will be sufficient for debugging.

I have added the patches and started a linux kernel compilation, and
something really interesting happens. I run the build with the
equivalent of "make -j3" and in a separate console I am watching the
build with 'top'. The build consumes 98% of both CPUs. If I stop the
output in the build console with "Ctrl-S", one core goes to idle,
while the other is in 50% waiting, then goes to 75% waiting. When I
resume the build with "Ctrl-Q", the build starts to use both CPUs at
98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
just logged on the console. Also, the CPU that is in 75% waiting
state changes occasionally. 'Top' shows pdflush in D state, using
5-6% of CPU.

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.49 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-31 03:52:54

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 07:02:42PM -0500, Florin Iucha wrote:
> I have added the patches and started a linux kernel compilation, and
> something really interesting happens. I run the build with the
> equivalent of "make -j3" and in a separate console I am watching the
> build with 'top'. The build consumes 98% of both CPUs. If I stop the
> output in the build console with "Ctrl-S", one core goes to idle,
> while the other is in 50% waiting, then goes to 75% waiting. When I
> resume the build with "Ctrl-Q", the build starts to use both CPUs at
> 98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
> just logged on the console. Also, the CPU that is in 75% waiting
> state changes occasionally. 'Top' shows pdflush in D state, using
> 5-6% of CPU.

I forgot the traces:

http://iucha.net/2.6.24-rc1/fw.1.gz
http://iucha.net/2.6.24-rc1/fw.2.gz
http://iucha.net/2.6.24-rc1/fw.3.gz

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.00 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-31 06:53:37

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Tue, Oct 30, 2007 at 10:52:45PM -0500, Florin Iucha wrote:
> On Tue, Oct 30, 2007 at 07:02:42PM -0500, Florin Iucha wrote:
> > I have added the patches and started a linux kernel compilation, and
> > something really interesting happens. I run the build with the
> > equivalent of "make -j3" and in a separate console I am watching the
> > build with 'top'. The build consumes 98% of both CPUs. If I stop the
> > output in the build console with "Ctrl-S", one core goes to idle,
> > while the other is in 50% waiting, then goes to 75% waiting. When I
> > resume the build with "Ctrl-Q", the build starts to use both CPUs at
> > 98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
> > just logged on the console. Also, the CPU that is in 75% waiting
> > state changes occasionally. 'Top' shows pdflush in D state, using
> > 5-6% of CPU.
>
> I forgot the traces:
>
> http://iucha.net/2.6.24-rc1/fw.1.gz
> http://iucha.net/2.6.24-rc1/fw.2.gz
> http://iucha.net/2.6.24-rc1/fw.3.gz

Sorry for the delay - I've been fixing our server today.

[ 263.685691] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
[ 263.789648] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
[ 263.789656] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
[ 263.789660] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
[ 263.789664] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
[ 263.789667] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
[ 263.789671] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
[ 263.789674] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
[ 263.789678] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
[ 263.789681] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
[ 263.789690] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
[ 263.890184] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
[ 263.890191] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
[ 263.890195] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
[ 263.890198] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
[ 263.890202] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
[ 263.890205] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
[ 263.890208] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
[ 263.890212] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
[ 263.890215] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
[ 263.890223] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0

It's about sda7, not NFSv4.

Is it a Reiserfs? We have a fresh fix for it: http://lkml.org/lkml/2007/10/23/93

Thank you,
Fengguang

2007-10-31 12:16:18

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Wed, Oct 31, 2007 at 02:53:25PM +0800, Fengguang Wu wrote:
> On Tue, Oct 30, 2007 at 10:52:45PM -0500, Florin Iucha wrote:
> > On Tue, Oct 30, 2007 at 07:02:42PM -0500, Florin Iucha wrote:
> > > I have added the patches and started a linux kernel compilation, and
> > > something really interesting happens. I run the build with the
> > > equivalent of "make -j3" and in a separate console I am watching the
> > > build with 'top'. The build consumes 98% of both CPUs. If I stop the
> > > output in the build console with "Ctrl-S", one core goes to idle,
> > > while the other is in 50% waiting, then goes to 75% waiting. When I
> > > resume the build with "Ctrl-Q", the build starts to use both CPUs at
> > > 98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
> > > just logged on the console. Also, the CPU that is in 75% waiting
> > > state changes occasionally. 'Top' shows pdflush in D state, using
> > > 5-6% of CPU.
> >
> > I forgot the traces:
> >
> > http://iucha.net/2.6.24-rc1/fw.1.gz
> > http://iucha.net/2.6.24-rc1/fw.2.gz
> > http://iucha.net/2.6.24-rc1/fw.3.gz
>
> Sorry for the delay - I've been fixing our server today.
>
> [ 263.685691] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> [ 263.789648] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> [ 263.789656] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> [ 263.789660] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> [ 263.789664] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> [ 263.789667] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> [ 263.789671] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> [ 263.789674] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> [ 263.789678] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> [ 263.789681] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> [ 263.789690] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> [ 263.890184] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> [ 263.890191] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> [ 263.890195] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> [ 263.890198] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> [ 263.890202] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> [ 263.890205] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> [ 263.890208] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> [ 263.890212] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> [ 263.890215] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> [ 263.890223] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
>
> It's about sda7, not NFSv4.
>
> Is it a Reiserfs? We have a fresh fix for it: http://lkml.org/lkml/2007/10/23/93

Yes, it is a Reiserfs. Incidentally it is the partition that holds
the kernel sources and build directory. The message states that the
same bug exists in 2.6.23 but I do not see the same behavior in
2.6.23. Anyway, I will apply the patch and see what I get.

Thanks,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.19 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-10-31 17:53:28

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Wed, Oct 31, 2007 at 07:16:06AM -0500, Florin Iucha wrote:
> On Wed, Oct 31, 2007 at 02:53:25PM +0800, Fengguang Wu wrote:
> > On Tue, Oct 30, 2007 at 10:52:45PM -0500, Florin Iucha wrote:
> > > On Tue, Oct 30, 2007 at 07:02:42PM -0500, Florin Iucha wrote:
> > > > I have added the patches and started a linux kernel compilation, and
> > > > something really interesting happens. I run the build with the
> > > > equivalent of "make -j3" and in a separate console I am watching the
> > > > build with 'top'. The build consumes 98% of both CPUs. If I stop the
> > > > output in the build console with "Ctrl-S", one core goes to idle,
> > > > while the other is in 50% waiting, then goes to 75% waiting. When I
> > > > resume the build with "Ctrl-Q", the build starts to use both CPUs at
> > > > 98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
> > > > just logged on the console. Also, the CPU that is in 75% waiting
> > > > state changes occasionally. 'Top' shows pdflush in D state, using
> > > > 5-6% of CPU.
> > >
> > > I forgot the traces:
> > >
> > > http://iucha.net/2.6.24-rc1/fw.1.gz
> > > http://iucha.net/2.6.24-rc1/fw.2.gz
> > > http://iucha.net/2.6.24-rc1/fw.3.gz
> >
> > Sorry for the delay - I've been fixing our server today.
> >
> > [ 263.685691] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> > [ 263.789648] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> > [ 263.789656] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> > [ 263.789660] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> > [ 263.789664] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> > [ 263.789667] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> > [ 263.789671] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> > [ 263.789674] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> > [ 263.789678] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> > [ 263.789681] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> > [ 263.789690] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> > [ 263.890184] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> > [ 263.890191] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> > [ 263.890195] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> > [ 263.890198] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> > [ 263.890202] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> > [ 263.890205] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> > [ 263.890208] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> > [ 263.890212] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> > [ 263.890215] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> > [ 263.890223] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> >
> > It's about sda7, not NFSv4.
> >
> > Is it a Reiserfs? We have a fresh fix for it: http://lkml.org/lkml/2007/10/23/93
>
> Yes, it is a Reiserfs. Incidentally it is the partition that holds
> the kernel sources and build directory. The message states that the
> same bug exists in 2.6.23 but I do not see the same behavior in
> 2.6.23. Anyway, I will apply the patch and see what I get.

Fengguang,

This patch does not fix anything for me. Even such light use of the
reiserfs filesystem as pulling the linux-2.6 git tree updates caused
one CPU to go to 75% iowait.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.53 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-11-01 07:15:46

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Wed, Oct 31, 2007 at 12:53:18PM -0500, Florin Iucha wrote:
> On Wed, Oct 31, 2007 at 07:16:06AM -0500, Florin Iucha wrote:
> > On Wed, Oct 31, 2007 at 02:53:25PM +0800, Fengguang Wu wrote:
> > > On Tue, Oct 30, 2007 at 10:52:45PM -0500, Florin Iucha wrote:
> > > > On Tue, Oct 30, 2007 at 07:02:42PM -0500, Florin Iucha wrote:
> > > > > I have added the patches and started a linux kernel compilation, and
> > > > > something really interesting happens. I run the build with the
> > > > > equivalent of "make -j3" and in a separate console I am watching the
> > > > > build with 'top'. The build consumes 98% of both CPUs. If I stop the
> > > > > output in the build console with "Ctrl-S", one core goes to idle,
> > > > > while the other is in 50% waiting, then goes to 75% waiting. When I
> > > > > resume the build with "Ctrl-Q", the build starts to use both CPUs at
> > > > > 98-99%. The NFS4 use was minimal, as I did not login with Gnome, but
> > > > > just logged on the console. Also, the CPU that is in 75% waiting
> > > > > state changes occasionally. 'Top' shows pdflush in D state, using
> > > > > 5-6% of CPU.
> > > >
> > > > I forgot the traces:
> > > >
> > > > http://iucha.net/2.6.24-rc1/fw.1.gz
> > > > http://iucha.net/2.6.24-rc1/fw.2.gz
> > > > http://iucha.net/2.6.24-rc1/fw.3.gz
> > >
> > > Sorry for the delay - I've been fixing our server today.
> > >
> > > [ 263.685691] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> > > [ 263.789648] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> > > [ 263.789656] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> > > [ 263.789660] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> > > [ 263.789664] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> > > [ 263.789667] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> > > [ 263.789671] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> > > [ 263.789674] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> > > [ 263.789678] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> > > [ 263.789681] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> > > [ 263.789690] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> > > [ 263.890184] requeue_io 301: inode 4031199 size 562 at 08:07(sda7)
> > > [ 263.890191] requeue_io 301: inode 4031231 size 329 at 08:07(sda7)
> > > [ 263.890195] requeue_io 301: inode 4031255 size 177 at 08:07(sda7)
> > > [ 263.890198] requeue_io 301: inode 4031268 size 94 at 08:07(sda7)
> > > [ 263.890202] requeue_io 301: inode 4031329 size 88 at 08:07(sda7)
> > > [ 263.890205] requeue_io 301: inode 4031351 size 74 at 08:07(sda7)
> > > [ 263.890208] requeue_io 301: inode 4031408 size 175 at 08:07(sda7)
> > > [ 263.890212] requeue_io 301: inode 4031413 size 129 at 08:07(sda7)
> > > [ 263.890215] requeue_io 301: inode 4031415 size 391 at 08:07(sda7)
> > > [ 263.890223] mm/page-writeback.c 655 wb_kupdate: pdflush(248) 24235 global 4593 0 0 wc _M tw 1024 sk 0
> > >
> > > It's about sda7, not NFSv4.
> > >
> > > Is it a Reiserfs? We have a fresh fix for it: http://lkml.org/lkml/2007/10/23/93
> >
> > Yes, it is a Reiserfs. Incidentally it is the partition that holds
> > the kernel sources and build directory. The message states that the
> > same bug exists in 2.6.23 but I do not see the same behavior in
> > 2.6.23. Anyway, I will apply the patch and see what I get.
>
> Fengguang,
>
> This patch does not fix anything for me. Even such light use of the
> reiserfs filesystem as pulling the linux-2.6 git tree updates caused
> one CPU to go to 75% iowait.

Thank you, Florin. Could you provide more details about sda7, such as
the mount option and output of `reiserfstune /dev/sda7`? I'll try to
reproduce it before asking for your help.

Fengguang

2007-11-01 12:26:16

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Thu, Nov 01, 2007 at 03:15:32PM +0800, Fengguang Wu wrote:
> On Wed, Oct 31, 2007 at 12:53:18PM -0500, Florin Iucha wrote:
> > This patch does not fix anything for me. Even such light use of the
> > reiserfs filesystem as pulling the linux-2.6 git tree updates caused
> > one CPU to go to 75% iowait.
>
> Thank you, Florin. Could you provide more details about sda7, such as
> the mount option and output of `reiserfstune /dev/sda7`? I'll try to
> reproduce it before asking for your help.

Fengguang,

root@zeus:~# mount | grep sda7
/dev/sda7 on /scratch type reiserfs (rw,noatime)
root@zeus:~# df -h /scratch/
Filesystem Size Used Avail Use% Mounted on
/dev/sda7 38G 32G 5.7G 85% /scratch
root@zeus:~# umount /dev/sda7
root@zeus:~# reiserfstune /dev/sda7
reiserfstune: Journal device has not been specified. Assuming journal is on the main device (/dev/sda7).

Current parameters:

Filesystem state: consistent

/scratch: Reiserfs super block in block 16 on 0x807 of format 3.6 with standard journal
Count of blocks on the device: 9765504
Number of bitmaps: 299
Blocksize: 4096
Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 1471399
Root block: 2359332
Filesystem is clean
Tree height: 5
Hash function used to sort names: "r5"
Objectid map size 916, max 972
Journal parameters:
Device [0x0]
Magic [0x31037e64]
Size 8193 blocks (including 1 for journal header) (first block 18)
Max transaction length 1024 blocks
Max batch size 900 blocks
Max commit age 30
Blocks reserved by journal: 0
Fs state field: 0x0:
sb_version: 2
inode generation number: 5856766
UUID: a0191e80-be6e-47f6-8fd0-047e2d763a4a
LABEL: /scratch
Set flags in SB:
ATTRIBUTES CLEAN

And for bonus points:
###########
reiserfsck --check started at Thu Nov 1 07:09:56 2007
###########
Replaying journal..
Reiserfs journal '/dev/sda7' in blocks [18..8211]: 0 transactions replayed
Checking internal tree..finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
Leaves 247231
Internal nodes 1570
Directories 11330
Other files 722878
Data block pointers 8040675 (3880 of them are zero)
Safe links 0
###########
reiserfsck finished at Thu Nov 1 07:18:43 2007
###########

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (2.53 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-11-01 13:03:45

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Thu, Nov 01, 2007 at 07:25:58AM -0500, Florin Iucha wrote:
> On Thu, Nov 01, 2007 at 03:15:32PM +0800, Fengguang Wu wrote:
> > On Wed, Oct 31, 2007 at 12:53:18PM -0500, Florin Iucha wrote:
> > > This patch does not fix anything for me. Even such light use of the
> > > reiserfs filesystem as pulling the linux-2.6 git tree updates caused
> > > one CPU to go to 75% iowait.
> >
> > Thank you, Florin. Could you provide more details about sda7, such as
> > the mount option and output of `reiserfstune /dev/sda7`? I'll try to
> > reproduce it before asking for your help.
>
> Fengguang,
>
> root@zeus:~# mount | grep sda7
> /dev/sda7 on /scratch type reiserfs (rw,noatime)
> root@zeus:~# df -h /scratch/
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda7 38G 32G 5.7G 85% /scratch
[...]

Thank you. It seems the only difference with mine reiserfs is about
the 'noatime' - which I tried and saw no difference.

Or will the system or fs size/age make any difference? If you happen
to have a spare/swap partition, could you make a new reiserfs and
mount it and copy several less-than-4KB files into it and wait for 30s
and see what happen to pdflush?

btw, what's the exact kernel version you are running?

Thank you,
Fengguang

2007-11-01 14:14:22

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Thu, Nov 01, 2007 at 09:03:33PM +0800, Fengguang Wu wrote:
> Or will the system or fs size/age make any difference? If you happen
> to have a spare/swap partition, could you make a new reiserfs and
> mount it and copy several less-than-4KB files into it and wait for 30s
> and see what happen to pdflush?

I will try that with a USB disk - I hope that won't make a difference.

> btw, what's the exact kernel version you are running?

I noticed it with the kernel in the $SUBJECT, as reported by 'git
describe'. I have pulled in new changesets since then.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (669.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-11-02 01:33:46

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Thu, Nov 01, 2007 at 09:14:14AM -0500, Florin Iucha wrote:
> On Thu, Nov 01, 2007 at 09:03:33PM +0800, Fengguang Wu wrote:
> > Or will the system or fs size/age make any difference? If you happen
> > to have a spare/swap partition, could you make a new reiserfs and
> > mount it and copy several less-than-4KB files into it and wait for 30s
> > and see what happen to pdflush?
>
> I will try that with a USB disk - I hope that won't make a difference.

Thank you. I guess a reiserfs on loop file would also be OK.

> > btw, what's the exact kernel version you are running?
>
> I noticed it with the kernel in the $SUBJECT, as reported by 'git
> describe'. I have pulled in new changesets since then.

And with the following patch applied?

---
fs/reiserfs/stree.c | 3 ---
1 file changed, 3 deletions(-)

--- linux-2.6.24-git17.orig/fs/reiserfs/stree.c
+++ linux-2.6.24-git17/fs/reiserfs/stree.c
@@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p
}
bh = next;
} while (bh != head);
- if (PAGE_SIZE == bh->b_size) {
- cancel_dirty_page(page, PAGE_CACHE_SIZE);
- }
}
}
}

2007-11-02 02:10:20

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Fri, Nov 02, 2007 at 09:33:21AM +0800, Fengguang Wu wrote:
> > I will try that with a USB disk - I hope that won't make a difference.
>
> Thank you. I guess a reiserfs on loop file would also be OK.
>
> > > btw, what's the exact kernel version you are running?
> >
> > I noticed it with the kernel in the $SUBJECT, as reported by 'git
> > describe'. I have pulled in new changesets since then.
>
> And with the following patch applied?
>
> ---
> fs/reiserfs/stree.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> --- linux-2.6.24-git17.orig/fs/reiserfs/stree.c
> +++ linux-2.6.24-git17/fs/reiserfs/stree.c
> @@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p
> }
> bh = next;
> } while (bh != head);
> - if (PAGE_SIZE == bh->b_size) {
> - cancel_dirty_page(page, PAGE_CACHE_SIZE);
> - }
> }
> }
> }

... and with the above patch applied.

Copying 300 MB from root (ext3) to the new file system did not trigger
the pdflush condition. But then I did a
cd $MOUNTPOINT && find . -exec md5sum {} \;
and that brought one cpu to 75% iowait.

I have attached my .config, if it helps.

Cheers,
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (0.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-11-02 12:57:15

by Wu Fengguang

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Thu, Nov 01, 2007 at 09:10:02PM -0500, Florin Iucha wrote:
> On Fri, Nov 02, 2007 at 09:33:21AM +0800, Fengguang Wu wrote:
> > > I will try that with a USB disk - I hope that won't make a difference.
> >
> > Thank you. I guess a reiserfs on loop file would also be OK.
> >
> > > > btw, what's the exact kernel version you are running?
> > >
> > > I noticed it with the kernel in the $SUBJECT, as reported by 'git
> > > describe'. I have pulled in new changesets since then.
> >
> > And with the following patch applied?
> >
> > ---
> > fs/reiserfs/stree.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > --- linux-2.6.24-git17.orig/fs/reiserfs/stree.c
> > +++ linux-2.6.24-git17/fs/reiserfs/stree.c
> > @@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p
> > }
> > bh = next;
> > } while (bh != head);
> > - if (PAGE_SIZE == bh->b_size) {
> > - cancel_dirty_page(page, PAGE_CACHE_SIZE);
> > - }
> > }
> > }
> > }
>
> ... and with the above patch applied.
>
> Copying 300 MB from root (ext3) to the new file system did not trigger
> the pdflush condition. But then I did a
> cd $MOUNTPOINT && find . -exec md5sum {} \;
> and that brought one cpu to 75% iowait.

Immediately? Do you have the debug printk messages this time(with the
above patch)?

> I have attached my .config, if it helps.

It's really curious - I tried your .config and commands, and still
could not trigger the high iowait. I'm running 64bit Intel Core 2,
and kernel 2.6.24-rc1-git6 with the above patch.

Fengguang

2007-11-02 13:32:49

by Florin Iucha

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Fri, Nov 02, 2007 at 08:56:55PM +0800, Fengguang Wu wrote:
> > > ---
> > > fs/reiserfs/stree.c | 3 ---
> > > 1 file changed, 3 deletions(-)
> > >
> > > --- linux-2.6.24-git17.orig/fs/reiserfs/stree.c
> > > +++ linux-2.6.24-git17/fs/reiserfs/stree.c
> > > @@ -1458,9 +1458,6 @@ static void unmap_buffers(struct page *p
> > > }
> > > bh = next;
> > > } while (bh != head);
> > > - if (PAGE_SIZE == bh->b_size) {
> > > - cancel_dirty_page(page, PAGE_CACHE_SIZE);
> > > - }
> > > }
> > > }
> > > }
> >
> > ... and with the above patch applied.
> >
> > Copying 300 MB from root (ext3) to the new file system did not trigger
> > the pdflush condition. But then I did a
> > cd $MOUNTPOINT && find . -exec md5sum {} \;
> > and that brought one cpu to 75% iowait.
>
> Immediately? Do you have the debug printk messages this time(with the
> above patch)?

No, but I will add them this afternoon.

> > I have attached my .config, if it helps.
>
> It's really curious - I tried your .config and commands, and still
> could not trigger the high iowait. I'm running 64bit Intel Core 2,
> and kernel 2.6.24-rc1-git6 with the above patch.

Curious but 100% reproducible, at least on my box. What I'm going to
try is booting into the kernel with your patch and just doing the find
/ md5sum. It would be really interesting if the read-only access
triggers it.

florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (1.46 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-11-11 13:44:56

by Thomas Kuther

[permalink] [raw]
Subject: Re: pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

Florin Iucha <florin <at> iucha.net> writes:

> > It's really curious - I tried your .config and commands, and still
> > could not trigger the high iowait. I'm running 64bit Intel Core 2,
> > and kernel 2.6.24-rc1-git6 with the above patch.
>
> Curious but 100% reproducible, at least on my box. What I'm going to
> try is booting into the kernel with your patch and just doing the find
> / md5sum. It would be really interesting if the read-only access
> triggers it.
>
> florin
>

I can confirm this issue too on any .24-rc. I'm also using reiserfs on a LVM.

And there is one more user on Gentoo forums having the same issue.
http://forums.gentoo.org/viewtopic-t-612959.html

So you are not alone, florian.

2007-12-04 10:28:38

by Ingo Molnar

[permalink] [raw]
Subject: [Bug 9291] pdflush stuck in D state with v2.6.24-rc1-192-gef49c32


* Thomas <[email protected]> wrote:

> I can confirm this issue too on any .24-rc. I'm also using reiserfs on
> a LVM.
>
> And there is one more user on Gentoo forums having the same issue.
> http://forums.gentoo.org/viewtopic-t-612959.html
>
> So you are not alone, florian.

any progress on this issue? Seems a bit stalled.

Ingo

2007-12-04 17:46:00

by Thomas Kuther

[permalink] [raw]
Subject: Re: [Bug 9291] pdflush stuck in D state with v2.6.24-rc1-192-gef49c32

On Di, 04.12.07 11:28 Ingo Molnar <[email protected]> wrote:

>
> * Thomas <[email protected]> wrote:
>
> > I can confirm this issue too on any .24-rc. I'm also using reiserfs
> > on a LVM.
> >
> > And there is one more user on Gentoo forums having the same issue.
> > http://forums.gentoo.org/viewtopic-t-612959.html
> >
> > So you are not alone, florian.
>
> any progress on this issue? Seems a bit stalled.
>
> Ingo

For me the two patches
* mm-speed-up-writeback-ramp-up-on-clean-systems.patch
* reiserfs-writeback-fix.patch
solved the issue.

IIRC one was from this thread, the other from
http://lkml.org/lkml/2007/10/23/93

So since 2.6.24-rc2-git5 all is fine again. No problems since.

Regards,
Thomas


Attachments:
signature.asc (189.00 B)