LinuxLists.cc - [PATCH 00/45] some writeback experiments

2009-10-07 08:13:50

Subject: [PATCH 00/45] some writeback experiments

Hi all,

Here is a collection of writeback patches on

- larger writeback chunk sizes
- single per-bdi flush thread (killing the foreground throttling writeouts)
- lumpy pageout
- sync livelock prevention
- writeback scheduling
- random fixes

Sorry for posting a too big series - there are many direct or implicit
dependencies, and one patch lead to another before I can stop..

The lumpy pageout and nr_segments support is not complete and do not
cover all filesystems for now. It may be better to first convert some of
the ->writepages to the generic routines to avoid duplicate work.

I managed to address many issues in past week, however there are still known
problems. Hints from filesystem developers are highly appreciated. Thanks!

The estimated writeback bandwidth is about 1/2 the real throughput
for ext2/3/4 and btrfs; noticeable bigger than real throughput for NFS; and
cannot be estimated at all for XFS. Very interesting..

NFS writeback is very bumpy. The page numbers and network throughput "freeze"
together from time to time:

# vmmon -d 1 nr_writeback nr_dirty nr_unstable # (per 1-second samples)
nr_writeback nr_dirty nr_unstable
11227 41463 38044
11227 41463 38044
11227 41463 38044
11227 41463 38044
11045 53987 6490
11033 53120 8145
11195 52143 10886
11211 52144 10913
11211 52144 10913
11211 52144 10913

btrfs seems to maintain a private pool of writeback pages, which can go out of
control:

nr_writeback nr_dirty
261075 132
252891 195
244795 187
236851 187
228830 187
221040 218
212674 237
204981 237

XFS has very interesting "bumpy writeback" behavior: it tends to wait
collect enough pages and then write the whole world.

nr_writeback nr_dirty
80781 0
37117 37703
37117 43933
81044 6
81050 0
43943 10199
43930 36355
43930 36355
80293 0
80285 0
80285 0

Thanks,
Fengguang

2009-10-07 13:48:11

by Peter Staubach

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

Wu Fengguang wrote:
> Hi all,
>
> Here is a collection of writeback patches on
>
> - larger writeback chunk sizes
> - single per-bdi flush thread (killing the foreground throttling writeouts)
> - lumpy pageout
> - sync livelock prevention
> - writeback scheduling
> - random fixes
>
> Sorry for posting a too big series - there are many direct or implicit
> dependencies, and one patch lead to another before I can stop..
>
> The lumpy pageout and nr_segments support is not complete and do not
> cover all filesystems for now. It may be better to first convert some of
> the ->writepages to the generic routines to avoid duplicate work.
>
> I managed to address many issues in past week, however there are still known
> problems. Hints from filesystem developers are highly appreciated. Thanks!
>
> The estimated writeback bandwidth is about 1/2 the real throughput
> for ext2/3/4 and btrfs; noticeable bigger than real throughput for NFS; and
> cannot be estimated at all for XFS. Very interesting..
>
> NFS writeback is very bumpy. The page numbers and network throughput "freeze"
> together from time to time:
>

Yes. It appears that the problem is that too many pages get dirtied
and the network/server get overwhelmed by the NFS client attempting
to write out all of the pages as quickly as it possibly can.

I think that it would be better if we could better match the
number of pages which can be dirty at any given point with the
bandwidth available through the network and the server file
system and storage.

One approach that I have pondered is immediately queuing an
asynchronous request when enough pages are dirtied to be able
to completely fill an over the wire transfer. This sort of
seems like a per-file bdi, which doesn't seem quite like the
right approach to me. What would y'all think about that?

ps

> # vmmon -d 1 nr_writeback nr_dirty nr_unstable # (per 1-second samples)
> nr_writeback nr_dirty nr_unstable
> 11227 41463 38044
> 11227 41463 38044
> 11227 41463 38044
> 11227 41463 38044
> 11045 53987 6490
> 11033 53120 8145
> 11195 52143 10886
> 11211 52144 10913
> 11211 52144 10913
> 11211 52144 10913
>
> btrfs seems to maintain a private pool of writeback pages, which can go out of
> control:
>
> nr_writeback nr_dirty
> 261075 132
> 252891 195
> 244795 187
> 236851 187
> 228830 187
> 221040 218
> 212674 237
> 204981 237
>
> XFS has very interesting "bumpy writeback" behavior: it tends to wait
> collect enough pages and then write the whole world.
>
> nr_writeback nr_dirty
> 80781 0
> 37117 37703
> 37117 43933
> 81044 6
> 81050 0
> 43943 10199
> 43930 36355
> 43930 36355
> 80293 0
> 80285 0
> 80285 0
>
> Thanks,
> Fengguang
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-10-07 14:27:39

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

On Wed, Oct 07, 2009 at 03:38:18PM +0800, Wu Fengguang wrote:
>
> The estimated writeback bandwidth is about 1/2 the real throughput
> for ext2/3/4 and btrfs; noticeable bigger than real throughput for NFS; and
> cannot be estimated at all for XFS. Very interesting..

Can you expand on what you mean here? Estimated write bandwidth of
what? And what are you comparing it against?

I'm having trouble understanding your note (which I'm guessing you
write fairly late at night? :-)

Thanks,

- Ted

2009-10-07 14:46:00

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

On Wed, Oct 07, 2009 at 10:26:32PM +0800, Theodore Ts'o wrote:
> On Wed, Oct 07, 2009 at 03:38:18PM +0800, Wu Fengguang wrote:
> >
> > The estimated writeback bandwidth is about 1/2 the real throughput
> > for ext2/3/4 and btrfs; noticeable bigger than real throughput for NFS; and
> > cannot be estimated at all for XFS. Very interesting..
>
> Can you expand on what you mean here? Estimated write bandwidth of
> what? And what are you comparing it against?

Please refer to [PATCH 21/45] writeback: estimate bdi write bandwidth
and patch 22, I have some numbers there :)

> I'm having trouble understanding your note (which I'm guessing you
> write fairly late at night? :-)

Sorry - I wrote that when I got tired on debugging ;)

Thanks,
Fengguang

2009-10-07 15:19:41

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

On Wed, Oct 07, 2009 at 09:47:14PM +0800, Peter Staubach wrote:
> Wu Fengguang wrote:
> > Hi all,
> >
> > Here is a collection of writeback patches on
> >
> > - larger writeback chunk sizes
> > - single per-bdi flush thread (killing the foreground throttling writeouts)
> > - lumpy pageout
> > - sync livelock prevention
> > - writeback scheduling
> > - random fixes
> >
> > Sorry for posting a too big series - there are many direct or implicit
> > dependencies, and one patch lead to another before I can stop..
> >
> > The lumpy pageout and nr_segments support is not complete and do not
> > cover all filesystems for now. It may be better to first convert some of
> > the ->writepages to the generic routines to avoid duplicate work.
> >
> > I managed to address many issues in past week, however there are still known
> > problems. Hints from filesystem developers are highly appreciated. Thanks!
> >
> > The estimated writeback bandwidth is about 1/2 the real throughput
> > for ext2/3/4 and btrfs; noticeable bigger than real throughput for NFS; and
> > cannot be estimated at all for XFS. Very interesting..
> >
> > NFS writeback is very bumpy. The page numbers and network throughput "freeze"
> > together from time to time:
> >
>
> Yes. It appears that the problem is that too many pages get dirtied
> and the network/server get overwhelmed by the NFS client attempting
> to write out all of the pages as quickly as it possibly can.

In theory it should push pages as quickly as possible at first,
to fill up the server side queue.

> I think that it would be better if we could better match the
> number of pages which can be dirty at any given point with the
> bandwidth available through the network and the server file
> system and storage.

And then go into this steady state of matched network/disk bandwidth.

> One approach that I have pondered is immediately queuing an
> asynchronous request when enough pages are dirtied to be able
> to completely fill an over the wire transfer. This sort of
> seems like a per-file bdi, which doesn't seem quite like the
> right approach to me. What would y'all think about that?

Hmm, it sounds like unnecessary complexity. Because it is not going to
help the busy-dirtier case anyway. And if we can do good on heavy IO,
the pre-flushing policy becomes less interesting.

>
> > # vmmon -d 1 nr_writeback nr_dirty nr_unstable # (per 1-second samples)
> > nr_writeback nr_dirty nr_unstable
> > 11227 41463 38044
> > 11227 41463 38044
> > 11227 41463 38044
> > 11227 41463 38044

I guess in the above 4 seconds, either client or (more likely) server
is blocked. A blocked server cannot send ACKs to knock down both
nr_writeback/nr_unstable. And the stuck nr_writeback will freeze
nr_dirty as well, because the dirtying process is throttled until
it receives enough "PG_writeback cleared" event, however the bdi-flush
thread is also blocked when trying to clear more PG_writeback, because
the client side nr_writeback limit has been reached. In summary,

server blocked => nr_writeback stuck => nr_writeback limit reached
=> bdi-flush blocked => no end_page_writeback() => dirtier blocked
=> nr_dirty stuck

Thanks,
Fengguang

> > 11045 53987 6490
> > 11033 53120 8145
> > 11195 52143 10886
> > 11211 52144 10913
> > 11211 52144 10913
> > 11211 52144 10913
> >
> > btrfs seems to maintain a private pool of writeback pages, which can go out of
> > control:
> >
> > nr_writeback nr_dirty
> > 261075 132
> > 252891 195
> > 244795 187
> > 236851 187
> > 228830 187
> > 221040 218
> > 212674 237
> > 204981 237
> >
> > XFS has very interesting "bumpy writeback" behavior: it tends to wait
> > collect enough pages and then write the whole world.
> >
> > nr_writeback nr_dirty
> > 80781 0
> > 37117 37703
> > 37117 43933
> > 81044 6
> > 81050 0
> > 43943 10199
> > 43930 36355
> > 43930 36355
> > 80293 0
> > 80285 0
> > 80285 0
> >
> > Thanks,
> > Fengguang
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/

2009-10-08 05:34:31

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

On Wed, Oct 07, 2009 at 11:18:22PM +0800, Wu Fengguang wrote:
> On Wed, Oct 07, 2009 at 09:47:14PM +0800, Peter Staubach wrote:
> >
> > > # vmmon -d 1 nr_writeback nr_dirty nr_unstable # (per 1-second samples)
> > > nr_writeback nr_dirty nr_unstable
> > > 11227 41463 38044
> > > 11227 41463 38044
> > > 11227 41463 38044
> > > 11227 41463 38044
>
> I guess in the above 4 seconds, either client or (more likely) server
> is blocked. A blocked server cannot send ACKs to knock down both

Yeah the server side is blocked. The nfsd are mostly blocked in
generic_file_aio_write(), in particular, the i_mutex lock! I'm copying
one or two big files over NFS, so the i_mutex lock is heavily contented.

I'm using the default wsize=4096 for NFS-root..

wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
329 329 TS - -5 24 1 0.0 S< worker_thread nfsiod
4690 4690 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4691 4691 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
4692 4692 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
4693 4693 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
4694 4694 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4695 4695 TS - -5 24 1 0.1 D< generic_file_aio_write nfsd
4696 4696 TS - -5 24 1 0.0 D< log_wait_commit nfsd
4697 4697 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
329 329 TS - -5 24 1 0.0 S< worker_thread nfsiod
4690 4690 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4691 4691 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
4692 4692 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd
4693 4693 TS - -5 24 0 0.0 D< sync_buffer nfsd
4694 4694 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4695 4695 TS - -5 24 1 0.1 D< generic_file_aio_write nfsd
4696 4696 TS - -5 24 1 0.0 D< generic_file_aio_write nfsd
4697 4697 TS - -5 24 0 0.0 D< generic_file_aio_write nfsd

wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
329 329 TS - -5 24 1 0.0 S< worker_thread nfsiod
4690 4690 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4691 4691 TS - -5 24 0 0.1 D< get_request_wait nfsd
4692 4692 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4693 4693 TS - -5 24 0 0.1 S< svc_recv nfsd
4694 4694 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4695 4695 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4696 4696 TS - -5 24 0 0.1 S< svc_recv nfsd
4697 4697 TS - -5 24 1 0.1 D< generic_file_aio_write nfsd

wfg ~% ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
329 329 TS - -5 24 1 0.0 S< worker_thread nfsiod
4690 4690 TS - -5 24 1 0.1 D< get_write_access nfsd
4691 4691 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4692 4692 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4693 4693 TS - -5 24 1 0.1 D< generic_file_aio_write nfsd
4694 4694 TS - -5 24 1 0.1 D< get_write_access nfsd
4695 4695 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4696 4696 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd
4697 4697 TS - -5 24 0 0.1 D< generic_file_aio_write nfsd

Thanks,
Fengguang

> nr_writeback/nr_unstable. And the stuck nr_writeback will freeze
> nr_dirty as well, because the dirtying process is throttled until
> it receives enough "PG_writeback cleared" event, however the bdi-flush
> thread is also blocked when trying to clear more PG_writeback, because
> the client side nr_writeback limit has been reached. In summary,
>
> server blocked => nr_writeback stuck => nr_writeback limit reached
> => bdi-flush blocked => no end_page_writeback() => dirtier blocked
> => nr_dirty stuck
>
> Thanks,
> Fengguang
>
> > > 11045 53987 6490
> > > 11033 53120 8145
> > > 11195 52143 10886
> > > 11211 52144 10913
> > > 11211 52144 10913
> > > 11211 52144 10913
> > >
> > > btrfs seems to maintain a private pool of writeback pages, which can go out of
> > > control:
> > >
> > > nr_writeback nr_dirty
> > > 261075 132
> > > 252891 195
> > > 244795 187
> > > 236851 187
> > > 228830 187
> > > 221040 218
> > > 212674 237
> > > 204981 237
> > >
> > > XFS has very interesting "bumpy writeback" behavior: it tends to wait
> > > collect enough pages and then write the whole world.
> > >
> > > nr_writeback nr_dirty
> > > 80781 0
> > > 37117 37703
> > > 37117 43933
> > > 81044 6
> > > 81050 0
> > > 43943 10199
> > > 43930 36355
> > > 43930 36355
> > > 80293 0
> > > 80285 0
> > > 80285 0
> > >
> > > Thanks,
> > > Fengguang
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/

2009-10-08 05:45:20

by Fengguang Wu

[permalink] [raw]

Subject: Re: [PATCH 00/45] some writeback experiments

On Thu, Oct > On Wed, > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > >
> Yeah > generic_file_aio_write(), > one or >
> I'm using
Just switched nfsd are not
nr_writeback 11105 11105 11233 11101 11105 11233 10985 11233 11047 11105 11105 10985 10977 11105 11105 10985 11166 10980 11233 11233 11105 11131 11233 11233

wfg ~% ps 329 329 TS 4690 4690 TS 4691 4691 TS 4692 4692 TS 4693 4693 TS 4694 4694 TS 4695 4695 TS 4696 4696 TS 4697 4697 TS wfg ~% ps 329 329 TS 4690 4690 TS 4691 4691 TS 4692 4692 TS 4693 4693 TS 4694 4694 TS 4695 4695 TS 4696 4696 TS 4697 4697 TS wfg ~% ps 329 329 TS 4690 4690 TS 4691 4691 TS 4692 4692 TS 4693 4693 TS 4694 4694 TS 4695 4695 TS 4696 4696 TS 4697 4697 TS wfg ~% ps 329 329 TS 4690 4690 TS 4691 4691 TS 4692 4692 TS 4693 4693 TS 4694 4694 TS 4695 4695 TS 4696 4696 TS 4697 4697 TS
Thanks,
Fengguang

> wfg ~% > 329 329 TS > 4690 4690 TS > 4691 4691 TS > 4692 4692 TS > 4693 4693 TS > 4694 4694 TS > 4695 4695 TS > 4696 4696 TS > 4697 4697 TS > wfg ~% > 329 329 TS > 4690 4690 TS > 4691 4691 TS > 4692 4692 TS > 4693 4693 TS > 4694 4694 TS > 4695 4695 TS > 4696 4696 TS > 4697 4697 TS >
> wfg ~% > 329 329 TS > 4690 4690 TS > 4691 4691 TS > 4692 4692 TS > 4693 4693 TS > 4694 4694 TS > 4695 4695 TS > 4696 4696 TS > 4697 4697 TS >
> wfg ~% > 329 329 TS > 4690 4690 TS > 4691 4691 TS > 4692 4692 TS > 4693 4693 TS > 4694 4694 TS > 4695 4695 TS > 4696 4696 TS > 4697 4697 TS >
> Thanks,
> Fengguang
>
> > > > > > > > > > > >
> > > > > > => nr_dirty stuck
> >
> > Thanks,
> > Fengguang
> >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > control:
> > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > Thanks,
> > > > Fengguang
> > > >
> > > > --
> > > > > > > > 08, 2009 at 01:33:35PM +0800, Wu Fengguang wrote:
Oct 07, 2009 at 11:18:22PM +0800, Wu Fengguang wrote:
On Wed, Oct 07, 2009 at 09:47:14PM +0800, Peter Staubach wrote:
> > # vmmon -d 1 nr_writeback nr_dirty nr_unstable # (per 1-second samples)
nr_writeback nr_dirty nr_unstable
11227 41463 38044
11227 41463 38044
11227 41463 38044
11227 41463 38044
I guess in the above 4 seconds, either client or (more likely) server
is blocked. A blocked server cannot send ACKs to knock down both
the server side is blocked. The nfsd are mostly blocked in
in particular, the i_mutex lock! I'm copying
two big files over NFS, so the i_mutex lock is heavily contented.
the default wsize=4096 for NFS-root..
to 512k wsize, and things improved: in most time the 8
all blocked. However, the bumpiness still remains:
nr_dirty nr_unstable
58080 15042
58080 15042
54583 18626
51964 22036
51978 22065
52362 22577
58538 13500
53748 19721
51999 21778
50262 23572
50262 20441
52772 20721
52109 21516
48296 26629
48296 26629
52191 21042
51456 22296
50681 24466
45352 30488
45352 30488
45475 30616
45313 20355
51126 22637
51126 22637
-o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 1 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 R< ? nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 R< ? nfsd
-o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 D< log_wait_commit nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
-o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 1 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 R< ? nfsd
- -5 24 1 0.1 R< ? nfsd
- -5 24 1 0.1 R< ? nfsd
- -5 24 1 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 S< svc_recv nfsd
-o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 1 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 D< nfsd_sync nfsd
- -5 24 1 0.1 D< sync_buffer nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 S< svc_recv nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.0 D< log_wait_commit nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< sync_buffer nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.0 D< generic_file_aio_write nfsd
- -5 24 0 0.0 D< generic_file_aio_write nfsd
ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< get_request_wait nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 S< svc_recv nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
ps -o pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:24,comm ax|g nfs
- -5 24 1 0.0 S< worker_thread nfsiod
- -5 24 1 0.1 D< get_write_access nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 D< generic_file_aio_write nfsd
- -5 24 1 0.1 D< get_write_access nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
- -5 24 0 0.1 D< generic_file_aio_write nfsd
nr_writeback/nr_unstable. And the stuck nr_writeback will freeze
nr_dirty as well, because the dirtying process is throttled until
it receives enough "PG_writeback cleared" event, however the bdi-flush
thread is also blocked when trying to clear more PG_writeback, because
the client side nr_writeback limit has been reached. In summary,
server blocked => nr_writeback stuck => nr_writeback limit reached
=> bdi-flush blocked => no end_page_writeback() => dirtier blocked
11045 53987 6490
11033 53120 8145
11195 52143 10886
11211 52144 10913
11211 52144 10913
11211 52144 10913
> > btrfs seems to maintain a private pool of writeback pages, which can go out of
nr_writeback nr_dirty
261075 132
252891 195
244795 187
236851 187
228830 187
221040 218
212674 237
204981 237
> > XFS has very interesting "bumpy writeback" behavior: it tends to wait
> > collect enough pages and then write the whole world.
nr_writeback nr_dirty
80781 0
37117 37703
37117 43933
81044 6
81050 0
43943 10199
43930 36355
43930 36355
80293 0
80285 0
80285 0
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/