2006-10-03 16:26:45

by Stephane Doyon

[permalink] [raw]
Subject: Re: Long sleep with i_mutex in xfs_flush_device(), affects NFS service, was: Re: several messages

[Ouch... terribly sorry for the mangled subject line on my previous post,
insufficient coffee I guess. Sheepishly re-posting in hope of untangling
the discussion threading mess...]

---------- Forwarded message ----------
Date: Tue, 3 Oct 2006 09:39:55 -0400 (EDT)
From: Stephane Doyon <[email protected]>
To: Trond Myklebust <[email protected]>, David Chinner <[email protected]>
Cc: <[email protected]>, <[email protected]>,
Shailendra Tripathi <[email protected]>
Subject: Re: several messages

Sorry for insisting, but it seems to me there's still a problem in need of
fixing: when writing a 5GB file over NFS to an XFS file system and hitting
ENOSPC, it takes on the order of 22hours before my application gets an error,
whereas it would normally take about 2minutes if the file system did not become
full.

Perhaps I was being a bit too "constructive" and drowned my point in
explanations and proposed workarounds... You are telling me that neither NFS
nor XFS is doing anything wrong, and I can understand your points of view, but
surely that behavior isn't considered acceptable?

On Tue, 26 Sep 2006, Trond Myklebust wrote:

> On Tue, 2006-09-26 at 16:05 -0400, Stephane Doyon wrote:
> > I suppose it's not technically wrong to try to flush all the pages of the
> > file, but if the server file system is full then it will be at its worse.
> > Also if you happened to be on a slower link and have a big cache to flush,
> > you're waiting around for very little gain.
>
> That all assumes that nobody fixes the problem on the server. If
> somebody notices, and actually removes an unused file, then you may be
> happy that the kernel preserved the last 80% of the apache log file that
> was being written out.
>
> ENOSPC is a transient error: that is why the current behaviour exists.

On Tue, 3 Oct 2006, David Chinner wrote:

> This deep in the XFS allocation functions, we cannot tell if we hold
> the i_mutex or not, and it plays no part in determining if we have
> space or not. Hence we don't touch it here.


> I doubt it's a good idea for an NFS server, either.
[...]
> Remember that XFS, like most filesystems, trades off speed for
> correctness as we approach ENOSPC. Many parts of XFS slow down as we
> approach ENOSPC, and this is just one example of where we need to be
> correct, not fast.
[...]
> IMO, this is a non-problem. You're talking about optimising a
> relatively rare corner case where correctness is more important than
> speed and your test case is highly artificial. AFAIC, if you are
> running at ENOSPC then you get what performance is appropriate for
> correctness and if you are continually runing at ENOSPC, then buy
> some more disks.....

My recipe to reproduce the problem locally is admittedly somewhat artificial,
but the problematic usage definitely isn't: simply an app on an NFS client that
happens to fill up a file system. There must be some way to handle this better.

Thanks



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs