Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp4319136ybh; Tue, 17 Mar 2020 16:49:51 -0700 (PDT) X-Google-Smtp-Source: ADFU+vt+uTmJCEwIuI5gLT4KUY33aPp6aieIrOXMTpqWE9OBSO7EHL5eMpumT0XxtSgZujsdf2u4 X-Received: by 2002:a9d:1d07:: with SMTP id m7mr1489643otm.167.1584488990903; Tue, 17 Mar 2020 16:49:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584488990; cv=none; d=google.com; s=arc-20160816; b=P7TheYj58b1rr7LLNAbHwvVu/WyOay9Yu8Rg8OrnjaMl+tOK3p2P/SJxKaGRB+1loX ynk5N5B6B4CizfMA8RwlMtew6G3jXoZuaBNE0jt1acTtWatq3QR/fP0PqgqCrEtiq1pO eDz6jqGg0/MwfrIw+R4AJqW3gVix1kq0bwIfdsKuS22EV5knM/05SWqhjFeui6ydsPte rEYgpKVn8BJrMOSKC3pWVcBGKbm1wFzhG/azdUUWV8J9QHSxy4BjzbFYSSorOagp1i0p xdEI9yEYjZj1USh+SH4zfB7mJtzVEx68PQQYAvMoiBuKlufkP/BCpyC+6XbXexySznyG HDZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:references :in-reply-to:subject:cc:date:to:from; bh=txYYTteP8UhB0To7iMxFA7AB2KBRpBwspNTn+IHzzVw=; b=XZqcC8xlYkwiV4vZ0oy51SswE34TJIVV8FQKrO7bptB72uMvRwdc9VuRX7Poh3LtVB 7UtQPnln8Q8AvqC7AejG8eWjDYXGDpHE/PQCEPAQzM5+cjFWYD1BHo80gaPvFyY6fm+t hdAHRSWuoMCYdgHS4jgAqSYrK3jdP9ON/Yu5Z3SEgdb3xv+LQ+PcDTzfGBbp8Jrxv9WV q9QbFaCJM3exib2XBBtla+bAYlCMltaZ+POI5Yb6OEixIZfVc5gEj0H98LttCubyxbPZ XLRe2B+F6sSYRJOOVTSGQM8Y+JYGgsys7BzB09Sal2gpevX1xocH1xuOkMuRBgYWCr5w ZfNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d63si2491996oib.224.2020.03.17.16.49.04; Tue, 17 Mar 2020 16:49:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726680AbgCQXtC (ORCPT + 99 others); Tue, 17 Mar 2020 19:49:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:34536 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726564AbgCQXtC (ORCPT ); Tue, 17 Mar 2020 19:49:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CE37DAFB8; Tue, 17 Mar 2020 23:48:58 +0000 (UTC) From: NeilBrown To: Trond Myklebust , "Anna.Schumaker\@Netapp.com" Date: Wed, 18 Mar 2020 10:48:51 +1100 Cc: "linux-nfs\@vger.kernel.org" Subject: Re: [PATCH/RFC] NFS: Minimize COMMIT calls during writeback. In-Reply-To: <7a556bb02c9cd7ef219d156aff2a603b9c0fe22b.camel@hammerspace.com> References: <87y2s1rwof.fsf@notabene.neil.brown.name> <87v9n5rmrz.fsf@notabene.neil.brown.name> <87eetrs1p3.fsf@notabene.neil.brown.name> <7a556bb02c9cd7ef219d156aff2a603b9c0fe22b.camel@hammerspace.com> Message-ID: <875zf2sfto.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, Mar 17 2020, Trond Myklebust wrote: > On Tue, 2020-03-17 at 21:41 +1100, NeilBrown wrote: >>=20 >> Fair enough. I don't usually think of writeback as due to memory >> pressure, but I can see that I filesystem wouldn't much care for the >> distinction. >>=20 >> Still there is no overlap imposed by the VM. The VM is sending a >> sensible streak of writepages requests - as dd dirties more pages, >> the >> VM asks NFS to write out more pages. All in a nice orderly fashion. > > The question essentially boils down to: "What is the purpose of the > background writeback?" My answer would be "To free up pages", something > that does not happen until COMMIT is sent. > > IOW: My concern is livelock, not serialisation. Right now, we do start > commit when a batch sent by the VM has been completely written (same as > we do when an O_DIRECT batch has been completely written). That ensures > that the pages can be freed. Your patch is deliberately designed to > defer that process that allows pages to be freed, and I'm not seeing > how it is bounded. In the current code the COMMIT for any given writepages() call only completes after: - all the WRITEs generated by that writeapages call completes, and then - a subsequent COMMIT completes. After my patch, the bound is a little more generous, but still a strong bound. The COMMIT will be complete after: - Any currently running COMMIT completes (there is usually only one), then - any writes that were submitted through writepages() before that COMMIT completed, completed, then - a subsequent COMMIT completes. So we wait extra time to allow for a COMMIT and some more WRITEs to complete. Once the pending COMMIT completes, no more work can be added to the set of things that need to be waited for, so it cannot live-lock. =20=20 >> >=20 >> > commit_info.rpcs_out is there in order to count the number of >> > outstanding COMMITs. I'm not seeing why we need to change that >> > definition, particularly given that there can be processes out >> > there >> > that are trying to wait for the count to go to zero. >> >=20 >> > The I/O completion structure already has its own reference counter, >> > and >> > I can see nothing stopping you from modifying >> > nfs_io_completion_commit() to do the same things you are trying to >> > do >> > in nfs_commit_end() here. >>=20 >> I don't see how that would work. >> There are two events that need to happen asynchronously when a >> counter >> hits zero. >> One is that a COMMIT needs to be sent when the number of outstanding >> writes from a particular set hits zero. That is the >> nfs_io_completion >> counter. >> The other is that the currently growing set of pending commits needs >> to >> be detached and allowed to drain so that another COMMIT will get >> sent. >> This is what I'm using rpcs_out for. It seems quite a natural >> extension >> of its current use. >>=20 > > You're not supposed to call kref_get() on something with a refcount of > zero. That means the nfs_io_completion counter defines the lifetime of > the object that your commit_info.ioc points to, and once that refcount > goes to zero, you should be removing it, which you are doing in > nfs_writepages(). Certainly kref_get() cannot be called on something with a refcount of zero - and if you have something that might have just gone to zero you can use kref_get_unless_zero() to ensure you don't do the wrong thing. I use kref_get_unless_zero() in nfs_writepages, but there is nowhere that might take a ref on a kref with refcount of zero. The io completion stored in commit_info.ioc has an extra refcount to reflect the reference from commit_info.ioc. The reference is removed and the reference count dropped using atomic ops (no locking) so nfs_writepages() might get the pointer and not be able to increment that refcount because it is already zero. RCU makes it safe to look at the refcount, and kref_get_unless_zero() makes it safe to try an increment it. > > commit_info.rpcs_out is non-zero for the lifetime of any COMMIT > operations. I'm not seeing how that relates to anything in the > preceding paragraph (other than that nfs_io_completion going to zero > will usually cause commit_info.rpcs_out to get incremented). > IOW: commit_info.rpcs_out goes to zero a long time after the object > pointed to by commit_info.ioc ought to be freed. Yes, I agree with this. I confirm that rpcs_out will only go to zero well after commit_info.ioc is freed. The purpose of my change to rpcs_out is to cause it to be non-zero *earlier*. As soon as there exists a detached io completion (which will eventually trigger a COMMIT), rpcs_out becomes non-zero. The count held by the io completion is transferred to the pending COMMIT and dropped when the COMMIT completes. Making it non-zero earlier allows nfs_writepages to detect if there is already a pending COMMIT *even if it hasn't been sent yet*. If there is no pending COMMIT (or detached io completion), then it must create a detached io completion so that a commit will follow. If there *is* a pending COMMIT or detached io completion, then it can leave its io completion attached because it can be sure that a COMMIT will complete in bounded time, and so the io completion that it has attached will be detached and processed. Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAl5xYeMACgkQOeye3VZi gbkg7BAAkrgcTxkC+4ZmUOTH4eP0fJRTN9PgrSRCLcEk/J7bw+uNEF2qk0+isao+ HbRTh0FoA1SgQUSOoqRIvOgCrCNM+rDYhLWOYB6BSGcYIsgGYfyEGHkVaIPXtawz NpDcGDhmLTMQpHtOjfX1ccTyOSTvsZPS3oGLFAEfZRYy8JoIpO4E6wd2vEJY46Ss X/x9j0WBCZq8LPl5VMe74AL4QpbyrajmOSG0KPxsZgkNHtjuw7RfdOVzV2HD3kbb w7pZV9QQlAhP0c9FLi23QVmcrygKX7u8sDkNe1Md+bvofmVxddJeHhBxJ36ZfmFA kTHvkCCbit1Ixfd4Ht50x5jMez59KCsmYiEBsMjZ77TcU1kmzITGEnJgUf9SYiQ0 o1EWytPHdxCnyZZdshgJ2SpjcJ/LPOHlk6xCCVLB6YMDMEbVtXDf7rQQzXUBBzTN 7v6QymMt46LjFDhq+D0utuVKyrsN7seXc8PwT4BvQfnpQFbhXRLjibsK28kzSsLw dbJoBj6wtvqROGLjDCl+qkUkqtCv0bVi5ur7i/D+Xyts0Yrtd6nZwG1e1Mz3Dj7D B8NUT3V6PUhNljTUCeTgAl76BQGJV3pLRVzScmskiRHvhFLiy/rVVCtFhh7u4g4S wiVXvIPN5xJ9iO1//48vUzy+xV1n4NJ0G4T+vVUInersHDPsJn4= =qLS9 -----END PGP SIGNATURE----- --=-=-=--