Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755476AbdLVDRr (ORCPT ); Thu, 21 Dec 2017 22:17:47 -0500 Received: from us-smtp-delivery-194.mimecast.com ([216.205.24.194]:27810 "EHLO us-smtp-delivery-194.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752863AbdLVDRo (ORCPT ); Thu, 21 Dec 2017 22:17:44 -0500 X-MC-Unique: aXgVQNYnOm6QpPBZjomfow-1 From: Trond Myklebust To: "neilb@suse.com" , "chuck.lever@oracle.com" CC: "Anna.Schumaker@netapp.com" , "linux-kernel@vger.kernel.org" , "linux-nfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" Subject: Re: [PATCH/RFC] NFS: add nostatflush mount option. Thread-Topic: [PATCH/RFC] NFS: add nostatflush mount option. Thread-Index: AQHTegeFGOaJJxFIg0GzqIIc5l9e7aNN77WAgAAES4CAAFU3gIAACvoAgAAPzACAAE7CgA== Date: Fri, 22 Dec 2017 03:17:39 +0000 Message-ID: <1513912651.24909.5.camel@primarydata.com> References: <87k1xgkct1.fsf@notabene.neil.brown.name> <4B4DA4D4-8068-4C10-92BE-F03632522C75@oracle.com> <1513871689.11836.3.camel@primarydata.com> <87efnnkda2.fsf@notabene.neil.brown.name> <1513892346.20034.18.camel@primarydata.com> <878tdvk8ud.fsf@notabene.neil.brown.name> In-Reply-To: <878tdvk8ud.fsf@notabene.neil.brown.name> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=trondmy@primarydata.com; x-originating-ip: [68.49.162.121] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5PR11MB0074;20:4gOHkJnxRTUgDFB+UORpD0QafmO8onyI+eVSUTrT8ijTyCL3UHxNKWDg7qYwF+u920CmYiDcU5m3t5A2htMx9nnuNUv8GlIqo487SiE14wRewyxoV+P8fK1oeaoAQngtlHe6Jn15N9Q+YrIf3vgpTFE9PvALwhD4oRYcYNqeh6w= x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-correlation-id: 8c0bf809-780e-40f8-dd16-08d548ea8e76 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(4534020)(4602075)(4603075)(4627115)(201702281549075)(5600026)(4604075)(3008031)(2017052603307)(7153060)(49563074);SRVR:DM5PR11MB0074; x-ms-traffictypediagnostic: DM5PR11MB0074: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863)(278428928389397); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(102415395)(6040470)(2401047)(8121501046)(5005006)(10201501046)(3231023)(93006095)(93001095)(3002001)(6041268)(20161123564045)(20161123562045)(20161123558120)(20161123560045)(2016111802025)(201703131423095)(20161123555045)(201703061421075)(6043046)(6072148)(201708071742011);SRVR:DM5PR11MB0074;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:DM5PR11MB0074; x-forefront-prvs: 05299D545B x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(346002)(39380400002)(366004)(376002)(199004)(189003)(24454002)(377424004)(14454004)(77096006)(86362001)(68736007)(5660300001)(508600001)(3660700001)(66066001)(3280700002)(6512007)(6436002)(2501003)(2950100002)(2906002)(81156014)(99936001)(8676002)(102836004)(106356001)(6486002)(36756003)(2900100001)(7736002)(305945005)(97736004)(3846002)(103116003)(4001150100001)(81166006)(6506007)(8936002)(6116002)(229853002)(93886005)(6246003)(25786009)(54906003)(99286004)(53936002)(53546011)(4326008)(59450400001)(76176011)(110136005)(105586002);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR11MB0074;H:DM5PR11MB0075.namprd11.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-+daW7vOwWHXZr8YKLLFw" MIME-Version: 1.0 X-OriginatorOrg: primarydata.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8c0bf809-780e-40f8-dd16-08d548ea8e76 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Dec 2017 03:17:39.1401 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 03193ed6-8726-4bb3-a832-18ab0d28adb7 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR11MB0074 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7311 Lines: 185 --=-+daW7vOwWHXZr8YKLLFw Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2017-12-22 at 09:35 +1100, NeilBrown wrote: > On Thu, Dec 21 2017, Trond Myklebust wrote: >=20 > > On Fri, 2017-12-22 at 07:59 +1100, NeilBrown wrote: > > > On Thu, Dec 21 2017, Trond Myklebust wrote: > > >=20 > > > > On Thu, 2017-12-21 at 10:39 -0500, Chuck Lever wrote: > > > > > Hi Neil- > > > > >=20 > > > > >=20 > > > > > > On Dec 20, 2017, at 9:57 PM, NeilBrown > > > > > > wrote: > > > > > >=20 > > > > > >=20 > > > > > > When an i_op->getattr() call is made on an NFS file > > > > > > (typically from a 'stat' family system call), NFS > > > > > > will first flush any dirty data to the server. > > > > > >=20 > > > > > > This ensures that the mtime reported is correct and stable, > > > > > > but has a performance penalty. 'stat' is normally thought > > > > > > to be a quick operation, and imposing this cost can be > > > > > > surprising. > > > > >=20 > > > > > To be clear, this behavior is a POSIX requirement. > > > > >=20 > > > > >=20 > > > > > > I have seen problems when one process is writing a large > > > > > > file and another process performs "ls -l" on the containing > > > > > > directory and is blocked for as long as it take to flush > > > > > > all the dirty data to the server, which can be minutes. > > > > >=20 > > > > > Yes, a well-known annoyance that cannot be addressed > > > > > even with a write delegation. > > > > >=20 > > > > >=20 > > > > > > I have also seen a legacy application which frequently > > > > > > calls > > > > > > "fstat" on a file that it is writing to. On a local > > > > > > filesystem (and in the Solaris implementation of NFS) this > > > > > > fstat call is cheap. On Linux/NFS, the causes a noticeable > > > > > > decrease in throughput. > > > > >=20 > > > > > If the preceding write is small, Linux could be using > > > > > a FILE_SYNC write, but Solaris could be using UNSTABLE. > > > > >=20 > > > > >=20 > > > > > > The only circumstances where an application calling > > > > > > 'stat()' > > > > > > might get an mtime which is not stable are times when some > > > > > > other process is writing to the file and the two processes > > > > > > are not using locking to ensure consistency, or when the > > > > > > one > > > > > > process is both writing and stating. In neither of these > > > > > > cases is it reasonable to expect the mtime to be stable. > > > > >=20 > > > > > I'm not convinced this is a strong enough rationale > > > > > for claiming it is safe to disable the existing > > > > > behavior. > > > > >=20 > > > > > You've explained cases where the new behavior is > > > > > reasonable, but do you have any examples where the > > > > > new behavior would be a problem? There must be a > > > > > reason why POSIX explicitly requires an up-to-date > > > > > mtime. > > > > >=20 > > > > > What guidance would nfs(5) give on when it is safe > > > > > to specify the new mount option? > > > > >=20 > > > > >=20 > > > > > > In the most common cases where mtime is important > > > > > > (e.g. make), no other process has the file open, so there > > > > > > will be no dirty data and the mtime will be stable. > > > > >=20 > > > > > Isn't it also the case that make is a multi-process > > > > > workload where one process modifies a file, then > > > > > closes it (which triggers a flush), and then another > > > > > process stats the file? The new mount option does > > > > > not change the behavior of close(2), does it? > > > > >=20 > > > > >=20 > > > > > > Rather than unilaterally changing this behavior of 'stat', > > > > > > this patch adds a "nosyncflush" mount option to allow > > > > > > sysadmins to have applications which are hurt by the > > > > > > current > > > > > > behavior to disable it. > > > > >=20 > > > > > IMO a mount option is at the wrong granularity. A > > > > > mount point will be shared between applications that > > > > > can tolerate the non-POSIX behavior and those that > > > > > cannot, for instance. > > > >=20 > > > > Agreed.=20 > > > >=20 > > > > The other thing to note here is that we now have an embryonic > > > > statx() > > > > system call, which allows the application itself to decide > > > > whether > > > > or > > > > not it needs up to date values for the atime/ctime/mtime. While > > > > we > > > > haven't yet plumbed in the NFS side, the intention was always > > > > to > > > > use > > > > that information to turn off the writeback flushing when > > > > possible. > > >=20 > > > Yes, if statx() were actually working, we could change the > > > application > > > to avoid the flush. But then if changing the application were an > > > option, I suspect that - for my current customer issue - we could > > > just > > > remove the fstat() calls. I doubt they are really necessary. > > > I think programmers often think of stat() (and particularly > > > fstat()) > > > as > > > fairly cheap and so they use it whenever convenient. Only NFS > > > violates > > > this expectation. > > >=20 > > > Also statx() is only a real solution if/when it gets widely > > > used. Will > > > "ls -l" default to AT_STATX_DONT_SYNC ?? > > >=20 > > > Apart from the Posix requirement (which only requires that the > > > timestamps be updated, not that the data be flushed), do you know > > > of > > > any > > > value gained from flushing data before stat()? > > >=20 > >=20 > > POSIX requires that timestamps change as part of the read() or > > write() > > system call. >=20 > Does it require that they don't change at other times? Yes. > I see the (arguable) deviation from POSIX that I propose to be > completely inline with the close-to-open consistency semantics that > NFS > already uses, which are weaker than what POSIX might suggest. >=20 > As you didn't exactly answer the question, would it be fair to say > that > you don't know of any reason to flush-before-stat except that POSIX > seems to require it? The reason is to emulate POSIX semantics. That is the only reason, and that is why when we added a statx() function, I asked that we allow the application to specify that it doesn't need these POSIX semantics. --=20 Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com --=-+daW7vOwWHXZr8YKLLFw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAlo8eUsACgkQZwvnipYK APLrLg//QIp4JDXLfXu3374Vyc0vzQvDNQVRXmtT5VkpDUzce3y7NBz697iWaODH M0Gl8K5sfIT5tg7PnuNt7tkVEEJjh2xP5uWsrsuHxCsy+eLW2PHzER49ErQP0oH9 iN+RvlH1lvflCWtoWvJHA3Hdozzr6bGCmZVqh20XcuonNPnKnUAvvvkYbK24S3sL qfgQ+SazE/RIVF6ZxQbF6qUJQr1eMnFFKaQw+/0JlzF5FPW5R9Zguh21fyvHQ1iw zt7YEFxTxzhTrDAGo4txGOfLKTiDjp8zLybgwjDYiqMUNIUnDyyFcnYoQP7jantx aEP86W5VFsr4ffXm3ZBu+avTNhmM90MWkE/i4A6fDbdlXU+Csvi/2FEJYtpVvxcm r+vlayT5KioFn/A1Il4JJKg4utHHoAliTbKxuzbiKAljWLXxgWKId3x6dvQjhr9Z BETPSbqllWwatzHkzGg97sdpoJFcxCncmdW6R7PfkSPxYUUme9uN5kfLaNTdRoGV RYI3Uqb0eMvz/mDGSzKoVJh98bxWpj+5YIFg1v8mmSKxVa+GhF/nm2xwDTzvP4B0 Q2vh0fI/ne8oKdXK23J8eTjgO3EON1SRP66c8tlO3MF9pIKwrbuFKnjNzSgjk88i kJN9rIMEXoGYGZERDABulk1MthiF72ExBWpx/Pj4fOf/uQp0O7w= =Kg/g -----END PGP SIGNATURE----- --=-+daW7vOwWHXZr8YKLLFw--