2013-11-25 04:59:56

by NeilBrown

[permalink] [raw]
Subject: The return of the hanging "ls"...


Hi Trond,
I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
in 2007.

Specifically it removes mutex lock/unlock in nfs_getattr.
Chuck added them:

- /* Flush out writes to the server in order to update c/mtime */
- if (S_ISREG(inode->i_mode))
+ /*
+ * Flush out writes to the server in order to update c/mtime.
+ *
+ * Hold the i_mutex to suspend application writes temporarily;
+ * this prevents long-running writing applications from blocking
+ * nfs_wb_nocommit.
+ */
+ if (S_ISREG(inode->i_mode)) {
+ mutex_lock(&inode->i_mutex);
nfs_wb_nocommit(inode);
+ mutex_unlock(&inode->i_mutex);
+ }


You removed them.

- /*
- * Flush out writes to the server in order to update c/mtime.
- *
- * Hold the i_mutex to suspend application writes temporarily;
- * this prevents long-running writing applications from blocking
- * nfs_wb_nocommit.
- */
+ /* Flush out writes to the server in order to update c/mtime. */
if (S_ISREG(inode->i_mode)) {
- mutex_lock(&inode->i_mutex);
- nfs_wb_nocommit(inode);
- mutex_unlock(&inode->i_mutex);
+ err = filemap_write_and_wait(inode->i_mapping);
+ if (err)
+ goto out;
}

/*


Do you recall why?

I noticed because a customer reported exactly the same symptoms the were
fixed by Chucks patch some years ago.


The comment on your patch says (in part):

Also replace nfs_wb_nocommit() with a
call to filemap_write_and_wait(), which doesn't need to hold the
inode->i_mutex.

It is certainly true that filemap_write_and_wait doesn't need to hold the
mutex, but neither did nfs_wb_nocommit. The mutex is held to stop "suspend
application writes temporarily" so no more pages get dirtied until all
the current dirty pages have been written out. i.e. to stop
generic_file_aio_write() from proceeding.

The particular test that shows the problem is a large write like
dd if=/dev/zero of=/mnt/nfs/somefile count=2000000
then in another window
ls -l /mnt/nfs

the "ls -l" will hang until the "dd" completes.

Can we put the mutex lock/unlock back please?

Thanks,
NeilBrown


Attachments:
signature.asc (828.00 B)

2013-11-25 23:41:03

by Myklebust, Trond

[permalink] [raw]
Subject: Re: The return of the hanging "ls"...

On Tue, 2013-11-26 at 10:23 +-1100, NeilBrown wrote:
+AD4- On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever +ADw-chuck.lever+AEA-oracle.com+AD4- wrote:
+AD4-
+AD4- +AD4- Hi Neil-
+AD4- +AD4-
+AD4- +AD4- On Nov 24, 2013, at 11:59 PM, NeilBrown +ADw-neilb+AEA-suse.de+AD4- wrote:
+AD4- +AD4-
+AD4- +AD4- +AD4-
+AD4- +AD4- +AD4- Hi Trond,
+AD4- +AD4- +AD4- I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
+AD4- +AD4- +AD4- reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
+AD4- +AD4- +AD4- in 2007.
+AD4- +AD4-
+AD4- +AD4- I'm wondering if a subsequent commit changed filemap+AF8-write+AF8-and+AF8-wait().
+AD4-
+AD4- I hadn't thought of that possibility. I've just had a look at the
+AD4- differences between acdc53b2146c7ee6 and now and cannot find anything that
+AD4- could be related.

To clarify a little: my understanding is that the current 2-pass code in
write+AF8-cache+AF8-pages() is supposed to prevent livelock. Instead of chasing
PAGECACHE+AF8-TAG+AF8-DIRTY tags (which are constantly being set if an
application is actively writing), we call tag+AF8-pages+AF8-for+AF8-writeback() once
in order to convert the current set of PAGECACHE+AF8-TAG+AF8-DIRTY tags into
PAGECACHE+AF8-TAG+AF8-TOWRITE tags, and then we have a second pass write those
pages back (and wait for completion).
IOW: the inode-+AD4-i+AF8-mutex should be unnecessary here...


Now that said, we recently added in the call to nfs+AF8-inode+AF8-dio+AF8-wait(). If
applications are using O+AF8-DIRECT, then +AF8-that+AF8- could livelock. There is
nothing currently preventing the applications from continuing to bump
the inode-+AD4-i+AF8-dio+AF8-count while we're waiting. Christoph has proposed some
locking changes that should fix that problem. I'm still evaluating his
patchset...

Cheers
Trond

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust+AEA-netapp.com
http://www.netapp.com

2013-11-25 14:59:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: The return of the hanging "ls"...

Hi Neil-

On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:

>
> Hi Trond,
> I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> in 2007.

I'm wondering if a subsequent commit changed filemap_write_and_wait().

There was some recent clean up that (very likely unintentionally) changed the serialization of some of these VFS utility functions.



> Specifically it removes mutex lock/unlock in nfs_getattr.
> Chuck added them:
>
> - /* Flush out writes to the server in order to update c/mtime */
> - if (S_ISREG(inode->i_mode))
> + /*
> + * Flush out writes to the server in order to update c/mtime.
> + *
> + * Hold the i_mutex to suspend application writes temporarily;
> + * this prevents long-running writing applications from blocking
> + * nfs_wb_nocommit.
> + */
> + if (S_ISREG(inode->i_mode)) {
> + mutex_lock(&inode->i_mutex);
> nfs_wb_nocommit(inode);
> + mutex_unlock(&inode->i_mutex);
> + }
>
>
> You removed them.
>
> - /*
> - * Flush out writes to the server in order to update c/mtime.
> - *
> - * Hold the i_mutex to suspend application writes temporarily;
> - * this prevents long-running writing applications from blocking
> - * nfs_wb_nocommit.
> - */
> + /* Flush out writes to the server in order to update c/mtime. */
> if (S_ISREG(inode->i_mode)) {
> - mutex_lock(&inode->i_mutex);
> - nfs_wb_nocommit(inode);
> - mutex_unlock(&inode->i_mutex);
> + err = filemap_write_and_wait(inode->i_mapping);
> + if (err)
> + goto out;
> }
>
> /*
>
>
> Do you recall why?
>
> I noticed because a customer reported exactly the same symptoms the were
> fixed by Chucks patch some years ago.
>
>
> The comment on your patch says (in part):
>
> Also replace nfs_wb_nocommit() with a
> call to filemap_write_and_wait(), which doesn't need to hold the
> inode->i_mutex.
>
> It is certainly true that filemap_write_and_wait doesn't need to hold the
> mutex, but neither did nfs_wb_nocommit. The mutex is held to stop "suspend
> application writes temporarily" so no more pages get dirtied until all
> the current dirty pages have been written out. i.e. to stop
> generic_file_aio_write() from proceeding.
>
> The particular test that shows the problem is a large write like
> dd if=/dev/zero of=/mnt/nfs/somefile count=2000000
> then in another window
> ls -l /mnt/nfs
>
> the "ls -l" will hang until the "dd" completes.
>
> Can we put the mutex lock/unlock back please?
>
> Thanks,
> NeilBrown

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2013-11-26 02:11:39

by NeilBrown

[permalink] [raw]
Subject: Re: The return of the hanging "ls"...

On Mon, 25 Nov 2013 23:41:02 +0000 "Myklebust, Trond"
<[email protected]> wrote:

> On Tue, 2013-11-26 at 10:23 +1100, NeilBrown wrote:
> > On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever <[email protected]> wrote:
> >
> > > Hi Neil-
> > >
> > > On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:
> > >
> > > >
> > > > Hi Trond,
> > > > I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> > > > reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> > > > in 2007.
> > >
> > > I'm wondering if a subsequent commit changed filemap_write_and_wait().
> >
> > I hadn't thought of that possibility. I've just had a look at the
> > differences between acdc53b2146c7ee6 and now and cannot find anything that
> > could be related.
>
> To clarify a little: my understanding is that the current 2-pass code in
> write_cache_pages() is supposed to prevent livelock. Instead of chasing
> PAGECACHE_TAG_DIRTY tags (which are constantly being set if an
> application is actively writing), we call tag_pages_for_writeback() once
> in order to convert the current set of PAGECACHE_TAG_DIRTY tags into
> PAGECACHE_TAG_TOWRITE tags, and then we have a second pass write those
> pages back (and wait for completion).
> IOW: the inode->i_mutex should be unnecessary here...

Thanks for that pointer. You are right, write_cache_pages shouldn't loop
indefinitely any more.
And I also just noticed commit 72cb77f4a5ace37b12dcb47a0e8637a2c28ad881 and
NFS_INO_FLUSHING which is an extra reason that the i_mutex isn't needed.
Don't know how I missed that last night. Less haste, more speed I guess.

So it looks like I jumped the gun and there must be some other explanation.
Sorry 'bout that.

Thanks,
NeilBrown

>
>
> Now that said, we recently added in the call to nfs_inode_dio_wait(). If
> applications are using O_DIRECT, then _that_ could livelock. There is
> nothing currently preventing the applications from continuing to bump
> the inode->i_dio_count while we're waiting. Christoph has proposed some
> locking changes that should fix that problem. I'm still evaluating his
> patchset...
>
> Cheers
> Trond
>
> Cheers
> Trond


Attachments:
signature.asc (828.00 B)

2013-11-25 23:23:16

by NeilBrown

[permalink] [raw]
Subject: Re: The return of the hanging "ls"...

On Mon, 25 Nov 2013 09:59:39 -0500 Chuck Lever <[email protected]> wrote:

> Hi Neil-
>
> On Nov 24, 2013, at 11:59 PM, NeilBrown <[email protected]> wrote:
>
> >
> > Hi Trond,
> > I just noticed commit acdc53b2146c7ee67feb1f02f7bc3020126514b8 from 2010
> > reverts the effect commit 28c494c5c8d425e15b7b82571e4df6d6bc34594d from Chunk
> > in 2007.
>
> I'm wondering if a subsequent commit changed filemap_write_and_wait().

I hadn't thought of that possibility. I've just had a look at the
differences between acdc53b2146c7ee6 and now and cannot find anything that
could be related.

Thanks,
NeilBrown

>
> There was some recent clean up that (very likely unintentionally) changed the serialization of some of these VFS utility functions.
>
>
>


Attachments:
signature.asc (828.00 B)