2009-10-07 08:00:38

by Fengguang Wu

[permalink] [raw]
Subject: [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock

It was introduced in 72cb77f4a5ac, and the several issues have been
addressed in generic writeback:
- out of order writeback (or interleaved concurrent writeback)
addressed by the per-bdi writeback and wait queue in balance_dirty_pages()
- sync livelocked by a fast dirtier
addressed by throttling all to-be-synced dirty inodes

CC: Peter Zijlstra <[email protected]>
CC: Peter Staubach <[email protected]>
CC: Trond Myklebust <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
---
fs/nfs/file.c | 9 ---------
fs/nfs/write.c | 11 -----------
include/linux/nfs_fs.h | 1 -
3 files changed, 21 deletions(-)

--- linux.orig/fs/nfs/file.c 2009-10-07 14:31:45.000000000 +0800
+++ linux/fs/nfs/file.c 2009-10-07 14:32:54.000000000 +0800
@@ -386,15 +386,6 @@ static int nfs_write_begin(struct file *
mapping->host->i_ino, len, (long long) pos);

start:
- /*
- * Prevent starvation issues if someone is doing a consistency
- * sync-to-disk
- */
- ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
- nfs_wait_bit_killable, TASK_KILLABLE);
- if (ret)
- return ret;
-
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page)
return -ENOMEM;
--- linux.orig/fs/nfs/write.c 2009-10-07 14:31:45.000000000 +0800
+++ linux/fs/nfs/write.c 2009-10-07 14:32:54.000000000 +0800
@@ -387,26 +387,15 @@ static int nfs_writepages_callback(struc
int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
struct inode *inode = mapping->host;
- unsigned long *bitlock = &NFS_I(inode)->flags;
struct nfs_pageio_descriptor pgio;
int err;

- /* Stop dirtying of new pages while we sync */
- err = wait_on_bit_lock(bitlock, NFS_INO_FLUSHING,
- nfs_wait_bit_killable, TASK_KILLABLE);
- if (err)
- goto out_err;
-
nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);

nfs_pageio_init_write(&pgio, inode, wb_priority(wbc));
err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio);
nfs_pageio_complete(&pgio);

- clear_bit_unlock(NFS_INO_FLUSHING, bitlock);
- smp_mb__after_clear_bit();
- wake_up_bit(bitlock, NFS_INO_FLUSHING);
-
if (err < 0)
goto out_err;
err = pgio.pg_error;
--- linux.orig/include/linux/nfs_fs.h 2009-10-07 14:31:45.000000000 +0800
+++ linux/include/linux/nfs_fs.h 2009-10-07 14:32:54.000000000 +0800
@@ -208,7 +208,6 @@ struct nfs_inode {
#define NFS_INO_STALE (1) /* possible stale inode */
#define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */
#define NFS_INO_MOUNTPOINT (3) /* inode is remote mountpoint */
-#define NFS_INO_FLUSHING (4) /* inode is flushing out data */
#define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */
#define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */



2009-10-07 13:12:39

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock

Wu Fengguang wrote:
> It was introduced in 72cb77f4a5ac, and the several issues have been
> addressed in generic writeback:
> - out of order writeback (or interleaved concurrent writeback)
> addressed by the per-bdi writeback and wait queue in balance_dirty_pages()
> - sync livelocked by a fast dirtier
> addressed by throttling all to-be-synced dirty inodes
>

I don't think that we can just remove this support. It is
designed to reduce the effects from doing a stat(2) on a
file which is being actively written to.

If we do remove it, then we will need to replace this patch
with another. Trond and I hadn't quite finished discussing
some aspects of that other patch... :-)

Thanx...

ps

> CC: Peter Zijlstra <[email protected]>
> CC: Peter Staubach <[email protected]>
> CC: Trond Myklebust <[email protected]>
> Signed-off-by: Wu Fengguang <[email protected]>
> ---
> fs/nfs/file.c | 9 ---------
> fs/nfs/write.c | 11 -----------
> include/linux/nfs_fs.h | 1 -
> 3 files changed, 21 deletions(-)
>
> --- linux.orig/fs/nfs/file.c 2009-10-07 14:31:45.000000000 +0800
> +++ linux/fs/nfs/file.c 2009-10-07 14:32:54.000000000 +0800
> @@ -386,15 +386,6 @@ static int nfs_write_begin(struct file *
> mapping->host->i_ino, len, (long long) pos);
>
> start:
> - /*
> - * Prevent starvation issues if someone is doing a consistency
> - * sync-to-disk
> - */
> - ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
> - nfs_wait_bit_killable, TASK_KILLABLE);
> - if (ret)
> - return ret;
> -
> page = grab_cache_page_write_begin(mapping, index, flags);
> if (!page)
> return -ENOMEM;
> --- linux.orig/fs/nfs/write.c 2009-10-07 14:31:45.000000000 +0800
> +++ linux/fs/nfs/write.c 2009-10-07 14:32:54.000000000 +0800
> @@ -387,26 +387,15 @@ static int nfs_writepages_callback(struc
> int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
> {
> struct inode *inode = mapping->host;
> - unsigned long *bitlock = &NFS_I(inode)->flags;
> struct nfs_pageio_descriptor pgio;
> int err;
>
> - /* Stop dirtying of new pages while we sync */
> - err = wait_on_bit_lock(bitlock, NFS_INO_FLUSHING,
> - nfs_wait_bit_killable, TASK_KILLABLE);
> - if (err)
> - goto out_err;
> -
> nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
>
> nfs_pageio_init_write(&pgio, inode, wb_priority(wbc));
> err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio);
> nfs_pageio_complete(&pgio);
>
> - clear_bit_unlock(NFS_INO_FLUSHING, bitlock);
> - smp_mb__after_clear_bit();
> - wake_up_bit(bitlock, NFS_INO_FLUSHING);
> -
> if (err < 0)
> goto out_err;
> err = pgio.pg_error;
> --- linux.orig/include/linux/nfs_fs.h 2009-10-07 14:31:45.000000000 +0800
> +++ linux/include/linux/nfs_fs.h 2009-10-07 14:32:54.000000000 +0800
> @@ -208,7 +208,6 @@ struct nfs_inode {
> #define NFS_INO_STALE (1) /* possible stale inode */
> #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */
> #define NFS_INO_MOUNTPOINT (3) /* inode is remote mountpoint */
> -#define NFS_INO_FLUSHING (4) /* inode is flushing out data */
> #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */
> #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */

2009-10-07 13:33:49

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock

On Wed, Oct 07, 2009 at 09:11:15PM +0800, Peter Staubach wrote:
> Wu Fengguang wrote:
> > It was introduced in 72cb77f4a5ac, and the several issues have been
> > addressed in generic writeback:
> > - out of order writeback (or interleaved concurrent writeback)
> > addressed by the per-bdi writeback and wait queue in balance_dirty_pages()
> > - sync livelocked by a fast dirtier
> > addressed by throttling all to-be-synced dirty inodes
> >
>
> I don't think that we can just remove this support. It is
> designed to reduce the effects from doing a stat(2) on a
> file which is being actively written to.

Ah OK.

> If we do remove it, then we will need to replace this patch
> with another. Trond and I hadn't quite finished discussing
> some aspects of that other patch... :-)

I noticed the i_mutex lock in nfs_getattr(). Do you mean that?

Thanks,
Fengguang

> > CC: Peter Zijlstra <[email protected]>
> > CC: Peter Staubach <[email protected]>
> > CC: Trond Myklebust <[email protected]>
> > Signed-off-by: Wu Fengguang <[email protected]>
> > ---
> > fs/nfs/file.c | 9 ---------
> > fs/nfs/write.c | 11 -----------
> > include/linux/nfs_fs.h | 1 -
> > 3 files changed, 21 deletions(-)
> >
> > --- linux.orig/fs/nfs/file.c 2009-10-07 14:31:45.000000000 +0800
> > +++ linux/fs/nfs/file.c 2009-10-07 14:32:54.000000000 +0800
> > @@ -386,15 +386,6 @@ static int nfs_write_begin(struct file *
> > mapping->host->i_ino, len, (long long) pos);
> >
> > start:
> > - /*
> > - * Prevent starvation issues if someone is doing a consistency
> > - * sync-to-disk
> > - */
> > - ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
> > - nfs_wait_bit_killable, TASK_KILLABLE);
> > - if (ret)
> > - return ret;
> > -
> > page = grab_cache_page_write_begin(mapping, index, flags);
> > if (!page)
> > return -ENOMEM;
> > --- linux.orig/fs/nfs/write.c 2009-10-07 14:31:45.000000000 +0800
> > +++ linux/fs/nfs/write.c 2009-10-07 14:32:54.000000000 +0800
> > @@ -387,26 +387,15 @@ static int nfs_writepages_callback(struc
> > int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
> > {
> > struct inode *inode = mapping->host;
> > - unsigned long *bitlock = &NFS_I(inode)->flags;
> > struct nfs_pageio_descriptor pgio;
> > int err;
> >
> > - /* Stop dirtying of new pages while we sync */
> > - err = wait_on_bit_lock(bitlock, NFS_INO_FLUSHING,
> > - nfs_wait_bit_killable, TASK_KILLABLE);
> > - if (err)
> > - goto out_err;
> > -
> > nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
> >
> > nfs_pageio_init_write(&pgio, inode, wb_priority(wbc));
> > err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio);
> > nfs_pageio_complete(&pgio);
> >
> > - clear_bit_unlock(NFS_INO_FLUSHING, bitlock);
> > - smp_mb__after_clear_bit();
> > - wake_up_bit(bitlock, NFS_INO_FLUSHING);
> > -
> > if (err < 0)
> > goto out_err;
> > err = pgio.pg_error;
> > --- linux.orig/include/linux/nfs_fs.h 2009-10-07 14:31:45.000000000 +0800
> > +++ linux/include/linux/nfs_fs.h 2009-10-07 14:32:54.000000000 +0800
> > @@ -208,7 +208,6 @@ struct nfs_inode {
> > #define NFS_INO_STALE (1) /* possible stale inode */
> > #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */
> > #define NFS_INO_MOUNTPOINT (3) /* inode is remote mountpoint */
> > -#define NFS_INO_FLUSHING (4) /* inode is flushing out data */
> > #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */
> > #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */

2009-10-07 14:02:09

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock

Wu Fengguang wrote:
> On Wed, Oct 07, 2009 at 09:11:15PM +0800, Peter Staubach wrote:
>> Wu Fengguang wrote:
>>> It was introduced in 72cb77f4a5ac, and the several issues have been
>>> addressed in generic writeback:
>>> - out of order writeback (or interleaved concurrent writeback)
>>> addressed by the per-bdi writeback and wait queue in balance_dirty_pages()
>>> - sync livelocked by a fast dirtier
>>> addressed by throttling all to-be-synced dirty inodes
>>>
>> I don't think that we can just remove this support. It is
>> designed to reduce the effects from doing a stat(2) on a
>> file which is being actively written to.
>
> Ah OK.
>
>> If we do remove it, then we will need to replace this patch
>> with another. Trond and I hadn't quite finished discussing
>> some aspects of that other patch... :-)
>
> I noticed the i_mutex lock in nfs_getattr(). Do you mean that?
>

Well, that's part of that support as well. That keeps a writing
application from dirtying more pages while the application doing
the stat is attempting to clean them.

Another approach that I suggested was to keep track of the
number of pages which are dirty on a per-inode basis. When
enough pages are dirty to fill an over the wire transfer,
then schedule an asynchronous write to transmit that data to
the server. This ties in with support to ensure that the
server/network is not completely overwhelmed by the client
by flow controlling the writing application to better match
the bandwidth and latencies of the network and server.
With this support, the NFS client tends not to fill memory
with dirty pages and thus, does not depend upon the other
parts of the system to flush these pages.

All of these recent pages make this current flushing happen
in a much more orderly fashion, which is great. However,
this can still lead to the client attempting to flush
potentially gigabytes all at once, which is more than most
networks and servers can handle reasonably.

ps


> Thanks,
> Fengguang
>
>>> CC: Peter Zijlstra <[email protected]>
>>> CC: Peter Staubach <[email protected]>
>>> CC: Trond Myklebust <[email protected]>
>>> Signed-off-by: Wu Fengguang <[email protected]>
>>> ---
>>> fs/nfs/file.c | 9 ---------
>>> fs/nfs/write.c | 11 -----------
>>> include/linux/nfs_fs.h | 1 -
>>> 3 files changed, 21 deletions(-)
>>>
>>> --- linux.orig/fs/nfs/file.c 2009-10-07 14:31:45.000000000 +0800
>>> +++ linux/fs/nfs/file.c 2009-10-07 14:32:54.000000000 +0800
>>> @@ -386,15 +386,6 @@ static int nfs_write_begin(struct file *
>>> mapping->host->i_ino, len, (long long) pos);
>>>
>>> start:
>>> - /*
>>> - * Prevent starvation issues if someone is doing a consistency
>>> - * sync-to-disk
>>> - */
>>> - ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>>> - nfs_wait_bit_killable, TASK_KILLABLE);
>>> - if (ret)
>>> - return ret;
>>> -
>>> page = grab_cache_page_write_begin(mapping, index, flags);
>>> if (!page)
>>> return -ENOMEM;
>>> --- linux.orig/fs/nfs/write.c 2009-10-07 14:31:45.000000000 +0800
>>> +++ linux/fs/nfs/write.c 2009-10-07 14:32:54.000000000 +0800
>>> @@ -387,26 +387,15 @@ static int nfs_writepages_callback(struc
>>> int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
>>> {
>>> struct inode *inode = mapping->host;
>>> - unsigned long *bitlock = &NFS_I(inode)->flags;
>>> struct nfs_pageio_descriptor pgio;
>>> int err;
>>>
>>> - /* Stop dirtying of new pages while we sync */
>>> - err = wait_on_bit_lock(bitlock, NFS_INO_FLUSHING,
>>> - nfs_wait_bit_killable, TASK_KILLABLE);
>>> - if (err)
>>> - goto out_err;
>>> -
>>> nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
>>>
>>> nfs_pageio_init_write(&pgio, inode, wb_priority(wbc));
>>> err = write_cache_pages(mapping, wbc, nfs_writepages_callback, &pgio);
>>> nfs_pageio_complete(&pgio);
>>>
>>> - clear_bit_unlock(NFS_INO_FLUSHING, bitlock);
>>> - smp_mb__after_clear_bit();
>>> - wake_up_bit(bitlock, NFS_INO_FLUSHING);
>>> -
>>> if (err < 0)
>>> goto out_err;
>>> err = pgio.pg_error;
>>> --- linux.orig/include/linux/nfs_fs.h 2009-10-07 14:31:45.000000000 +0800
>>> +++ linux/include/linux/nfs_fs.h 2009-10-07 14:32:54.000000000 +0800
>>> @@ -208,7 +208,6 @@ struct nfs_inode {
>>> #define NFS_INO_STALE (1) /* possible stale inode */
>>> #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */
>>> #define NFS_INO_MOUNTPOINT (3) /* inode is remote mountpoint */
>>> -#define NFS_INO_FLUSHING (4) /* inode is flushing out data */
>>> #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */
>>> #define NFS_INO_FSCACHE_LOCK (6) /* FS-Cache cookie management lock */

2009-10-08 01:45:41

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH 44/45] NFS: remove NFS_INO_FLUSHING lock

On Wed, Oct 07, 2009 at 09:59:10PM +0800, Peter Staubach wrote:
> Wu Fengguang wrote:
> > On Wed, Oct 07, 2009 at 09:11:15PM +0800, Peter Staubach wrote:
> >> Wu Fengguang wrote:
> >>> It was introduced in 72cb77f4a5ac, and the several issues have been
> >>> addressed in generic writeback:
> >>> - out of order writeback (or interleaved concurrent writeback)
> >>> addressed by the per-bdi writeback and wait queue in balance_dirty_pages()
> >>> - sync livelocked by a fast dirtier
> >>> addressed by throttling all to-be-synced dirty inodes
> >>>
> >> I don't think that we can just remove this support. It is
> >> designed to reduce the effects from doing a stat(2) on a
> >> file which is being actively written to.
> >
> > Ah OK.
> >
> >> If we do remove it, then we will need to replace this patch
> >> with another. Trond and I hadn't quite finished discussing
> >> some aspects of that other patch... :-)
> >
> > I noticed the i_mutex lock in nfs_getattr(). Do you mean that?
> >
>
> Well, that's part of that support as well. That keeps a writing
> application from dirtying more pages while the application doing
> the stat is attempting to clean them.

Instead of blocking totally, we could throttle application writes.
The following two patches are the more gentle way of doing this,
however it does not guarantee to kill the livelock, since a busy
bdi-flush thread could writeback many pages to unfreeze the
application prematurely. Anyway I attach them as a demo of idea,
whether it be good or bad.

> Another approach that I suggested was to keep track of the
> number of pages which are dirty on a per-inode basis. When

Yes a per-inode dirty count should be trivial and may be good for others.

> enough pages are dirty to fill an over the wire transfer,
> then schedule an asynchronous write to transmit that data to

This should also be trivial to support if the location ordered
writeback infrastructure is ready.

> the server. This ties in with support to ensure that the
> server/network is not completely overwhelmed by the client
> by flow controlling the writing application to better match
> the bandwidth and latencies of the network and server.

I like that feature :)

> With this support, the NFS client tends not to fill memory
> with dirty pages and thus, does not depend upon the other
> parts of the system to flush these pages.
>
> All of these recent pages make this current flushing happen
> in a much more orderly fashion, which is great. However,

Thanks.

> this can still lead to the client attempting to flush
> potentially gigabytes all at once, which is more than most
> networks and servers can handle reasonably.

OK, I now see the need to keep mapping->nr_dirty under control: it
could make many NFS operations response in a more bounded fashion.
The good thing is, it can share infrastructure with the location
based writeback (http://lkml.org/lkml/2007/8/27/45 :)

Thanks,
Fengguang


Attachments:
(No filename) (2.88 kB)
writeback-throttle-sync-mapping.patch (2.43 kB)
nfs-no-i_mutex-for-livelock-prevention.patch (1.46 kB)
Download all attachments