Hi Trond,
I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
in 2007.
Specifically it removes mutex lock/unlock in nfs_getattr.
Chuck added them:
- /* Flush out writes to the server in order to update c/mtime */
- if (S_ISREG(inode->i_mode))
+ /*
+ * Flush out writes to the server in order to update c/mtime.
+ *
+ * Hold the i_mutex to suspend application writes temporarily;
+ * this prevents long-running writing applications from blocking
+ * nfs_wb_nocommit.
+ */
+ if (S_ISREG(inode->i_mode)) {
+ mutex_lock(&inode->i_mutex);
nfs_wb_nocommit(inode);
+ mutex_unlock(&inode->i_mutex);
+ }
You removed them.
- /*
- * Flush out writes to the server in order to update c/mtime.
- *
- * Hold the i_mutex to suspend application writes temporarily;
- * this prevents long-running writing applications from blocking
- * nfs_wb_nocommit.
- */
+ /* Flush out writes to the server in order to update c/mtime. */
if (S_ISREG(inode->i_mode)) {
- mutex_lock(&inode->i_mutex);
- nfs_wb_nocommit(inode);
- mutex_unlock(&inode->i_mutex);
+ err = filemap_write_and_wait(inode->i_mapping);
+ if (err)
+ goto out;
}
/*
Do you recall why?
I noticed because a customer reported exactly the same symptoms the were
fixed by Chucks patch some years ago.
The comment on your patch says (in part):
Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.
It is certainly true that filemap_write_and_wait doesn't need to hold the
mutex, but neither did nfs_wb_nocommit. The mutex is held to stop "suspend
application writes temporarily" so no more pages get dirtied until all
the current dirty pages have been written out. i.e. to stop
generic_file_aio_write() from proceeding.
The particular test that shows the problem is a large write like
dd if=/dev/zero of=/mnt/nfs/somefile count=2000000
then in another window
ls -l /mnt/nfs
the "ls -l" will hang until the "dd" completes.
Can we put the mutex lock/unlock back please?
Thanks,
NeilBrown
On Tue, 2013-11-26 at 10:23 +-1100, NeilBrown wrote:
+AD4- On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever +ADw-chuck.lever+AEA-oracle.com+AD4- wrote:
+AD4-
+AD4- +AD4- Hi Neil-
+AD4- +AD4-
+AD4- +AD4- On Nov 24, 2013, at 11:59 PM, NeilBrown +ADw-neilb+AEA-suse.de+AD4- wrote:
+AD4- +AD4-
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- Hi Trond,
+AD4- +AD4- +AD4- I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
+AD4- +AD4- +AD4- reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
+AD4- +AD4- +AD4- in 2007.
+AD4- +AD4-
+AD4- +AD4- I'm wondering if a subsequent commit changed filemap+AF8-write+AF8-and+AF8-wait().
+AD4-
+AD4- I hadn't thought of that possibility. I've just had a look at the
+AD4- differences between acdc53b2146c7ee6 and now and cannot find anything that
+AD4- could be related.
To clarify a little: my understanding is that the current 2-pass code in
write+AF8-cache+AF8-pages() is supposed to prevent livelock. Instead of chasing
PAGECACHE+AF8-TAG+AF8-DIRTY tags (which are constantly being set if an
application is actively writing), we call tag+AF8-pages+AF8-for+AF8-writeback() once
in order to convert the current set of PAGECACHE+AF8-TAG+AF8-DIRTY tags into
PAGECACHE+AF8-TAG+AF8-TOWRITE tags, and then we have a second pass write those
pages back (and wait for completion).
IOW: the inode-+AD4-i+AF8-mutex should be unnecessary here...
Now that said, we recently added in the call to nfs+AF8-inode+AF8-dio+AF8-wait(). If
applications are using O+AF8-DIRECT, then +AF8-that+AF8- could livelock. There is
nothing currently preventing the applications from continuing to bump
the inode-+AD4-i+AF8-dio+AF8-count while we're waiting. Christoph has proposed some
locking changes that should fix that problem. I'm still evaluating his
patchset...
Cheers
Trond
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com
Hi Neil-
On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:
>
> Hi Trond,
> I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> in 2007.
I'm wondering if a subsequent commit changed filemap_write_and_wait().
There was some recent clean up that (very likely unintentionally) changed the serialization of some of these VFS utility functions.
> Specifically it removes mutex lock/unlock in nfs_getattr.
> Chuck added them:
>
> - /* Flush out writes to the server in order to update c/mtime */
> - if (S_ISREG(inode->i_mode))
> + /*
> + * Flush out writes to the server in order to update c/mtime.
> + *
> + * Hold the i_mutex to suspend application writes temporarily;
> + * this prevents long-running writing applications from blocking
> + * nfs_wb_nocommit.
> + */
> + if (S_ISREG(inode->i_mode)) {
> + mutex_lock(&inode->i_mutex);
> nfs_wb_nocommit(inode);
> + mutex_unlock(&inode->i_mutex);
> + }
>
>
> You removed them.
>
> - /*
> - * Flush out writes to the server in order to update c/mtime.
> - *
> - * Hold the i_mutex to suspend application writes temporarily;
> - * this prevents long-running writing applications from blocking
> - * nfs_wb_nocommit.
> - */
> + /* Flush out writes to the server in order to update c/mtime. */
> if (S_ISREG(inode->i_mode)) {
> - mutex_lock(&inode->i_mutex);
> - nfs_wb_nocommit(inode);
> - mutex_unlock(&inode->i_mutex);
> + err = filemap_write_and_wait(inode->i_mapping);
> + if (err)
> + goto out;
> }
>
> /*
>
>
> Do you recall why?
>
> I noticed because a customer reported exactly the same symptoms the were
> fixed by Chucks patch some years ago.
>
>
> The comment on your patch says (in part):
>
> Also replace nfs_wb_nocommit() with a
> call to filemap_write_and_wait(), which doesn't need to hold the
> inode->i_mutex.
>
> It is certainly true that filemap_write_and_wait doesn't need to hold the
> mutex, but neither did nfs_wb_nocommit. The mutex is held to stop "suspend
> application writes temporarily" so no more pages get dirtied until all
> the current dirty pages have been written out. i.e. to stop
> generic_file_aio_write() from proceeding.
>
> The particular test that shows the problem is a large write like
> dd if=/dev/zero of=/mnt/nfs/somefile count=2000000
> then in another window
> ls -l /mnt/nfs
>
> the "ls -l" will hang until the "dd" completes.
>
> Can we put the mutex lock/unlock back please?
>
> Thanks,
> NeilBrown
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
On Mon, 25 Nov 2013 23:41:02 +0000 "Myklebust, Trond"
<[email protected]> wrote:
> On Tue, 2013-11-26 at 10:23 +1100, NeilBrown wrote:
> > On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever <[email protected]> wrote:
> >
> > > Hi Neil-
> > >
> > > On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:
> > >
> > > >
> > > > Hi Trond,
> > > > I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> > > > reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> > > > in 2007.
> > >
> > > I'm wondering if a subsequent commit changed filemap_write_and_wait().
> >
> > I hadn't thought of that possibility. I've just had a look at the
> > differences between acdc53b2146c7ee6 and now and cannot find anything that
> > could be related.
>
> To clarify a little: my understanding is that the current 2-pass code in
> write_cache_pages() is supposed to prevent livelock. Instead of chasing
> PAGECACHE_TAG_DIRTY tags (which are constantly being set if an
> application is actively writing), we call tag_pages_for_writeback() once
> in order to convert the current set of PAGECACHE_TAG_DIRTY tags into
> PAGECACHE_TAG_TOWRITE tags, and then we have a second pass write those
> pages back (and wait for completion).
> IOW: the inode->i_mutex should be unnecessary here...
Thanks for that pointer. You are right, write_cache_pages shouldn't loop
indefinitely any more.
And I also just noticed commit 72cb77f4a5ace37b12dcb47a0e8637a2c28ad881 and
NFS_INO_FLUSHING which is an extra reason that the i_mutex isn't needed.
Don't know how I missed that last night. Less haste, more speed I guess.
So it looks like I jumped the gun and there must be some other explanation.
Sorry 'bout that.
Thanks,
NeilBrown
>
>
> Now that said, we recently added in the call to nfs_inode_dio_wait(). If
> applications are using O_DIRECT, then _that_ could livelock. There is
> nothing currently preventing the applications from continuing to bump
> the inode->i_dio_count while we're waiting. Christoph has proposed some
> locking changes that should fix that problem. I'm still evaluating his
> patchset...
>
> Cheers
> Trond
>
> Cheers
> Trond
On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever <[email protected]> wrote:
> Hi Neil-
>
> On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:
>
> >
> > Hi Trond,
> > I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> > reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> > in 2007.
>
> I'm wondering if a subsequent commit changed filemap_write_and_wait().
I hadn't thought of that possibility. I've just had a look at the
differences between acdc53b2146c7ee6 and now and cannot find anything that
could be related.
Thanks,
NeilBrown
>
> There was some recent clean up that (very likely unintentionally) changed the serialization of some of these VFS utility functions.
>
>
>