From: Mi Jinlong Subject: Re: [PATCH] VFS: Unlink should revoke all outstanding leases on file Date: Tue, 25 May 2010 18:14:56 +0800 Message-ID: <4BFBA320.8090102@cn.fujitsu.com> References: <4BED195F.3070504@cn.fujitsu.com> <20100514055844.109d2fdc@tlielax.poochiereds.net> <1273857471.4732.7.camel@localhost.localdomain> <20100514133819.5e383485@tlielax.poochiereds.net> <1273859968.4732.22.camel@localhost.localdomain> <1273861872.4732.34.camel@localhost.localdomain> <20100514192327.GA20192@fieldses.org> <4BF3B36F.80209@cn.fujitsu.com> <20100519155700.GE4581@fieldses.org> <4BF504DE.7010804@cn.fujitsu.com> <20100521210738.GK11675@fieldses.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Trond Myklebust , Jeff Layton , NFSv3 list , linux-fsdevel@vger.kernel.org, ebiederm@xmission.com, adobriyan@gmail.com, viro@ZenIV.linux.org.uk, jamie@shareable.org To: "J. Bruce Fields" Return-path: Received: from cn.fujitsu.com ([222.73.24.84]:50810 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752330Ab0EYKOx (ORCPT ); Tue, 25 May 2010 06:14:53 -0400 In-Reply-To: <20100521210738.GK11675@fieldses.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: J. Bruce Fields : > On Thu, May 20, 2010 at 05:46:06PM +0800, Mi Jinlong wrote: >> J. Bruce Fields : >>> I don't know of any existing lock that does exactly what we want. >>> >>> Somebody at citi worked on a better lease implementation for a while, >>> but I don't think we ever really got it right; the last version I can >>> find is here: >>> >>> git://linux-nfs.org/~bfields/linux-topics.git leases >> When reading the code of the git, i found a patch which try to fix >> the lease's problem, but only for unlink. > > It's 8 patches together: > >> commit id: d5a08e556116c66fb60a448f805a40bf54314634 >> msg: "leases: break file leases on unlink." >> >> In this patch, it seems only add break_lease() and some other functions >> but seems don't avoid the problem of race. > > Look again: break_lease() is there, but there's also a break_lease_end() > after the unlink. Thanks. That's important. > >> Or there is some different >> at break_lease() with the community's kernel. >> >> Can you give me some message about the new lease? Thanks. > > So the 8 patches at that branch are: > leases: introduce per-inode lease enabling/disabling > VFS: clean up extra dereferences in do_unlinkat() > leases: break file leases on unlink. > leases: break file leases on rename. > leases: break leases on chown. > VFS: refactor sys_fchmod and sys_fchmodat > leases: break leases on chmod. > leases: break leases on link. > > Like I say, I don't think they're correct or I'd just copy them all to > the list. But maybe the comment on the first patch (appended) is > useful. As reading the first patch ("leases: introduce ... "), it's really useful. But, as Jamie Lokier said "new lease semantics should use new userspace API.", is there some new lease under development ? And, IMO, this problem we discussing is serious that should be fix ASAP. Can we fix this problem refer to this solutions? thanks, Mi Jinlong > > --b. > > leases: introduce per-inode lease enabling/disabling > > The current file lease implementation is inadequate (for the purposes of > nfs, and, we believe, for the purposes of Samba), in at least two ways: > > - Leases are broken only conflicting opens; but both nfsv4 > delegations and (we're told) Windows op locks actually require > that leases be broken on any operation that changes file > metadata--including unlink, link, rename, chmod, and chown. > > - The internal kernel api used for lease-breaking is inherently > racy, consisting as it does of a single break_lease() call. > (Consider this scenario: a file is not currently open and is > about to be unlinked. During unlink processing, a lookup is > done, and break_lease() is called. After the break_lease(), > but before the unlink completes, another user opens the file > and gets a read lease. The unlink then completes, but the > other user thinks their read lease is still valid. This > situation would be avoided if lease-granting for the inode > were disabled for the duration of the unlink.) > > We're primarily interested in the case of read leases for now. (Write > leases, which also must be broken on *access* to a file, are more > difficult to get completely right, and aren't used by the current nfs > server.) > > Fixing the second problem requires replacing break_lease() by a pair of > calls, here called break_lease() and break_lease_end(), between which > new leases are temporarily prohibited. > > We want to implement that temporary prohibition in a simple way that has > low impact in common (uncontended) cases. > > This patch adds a field, i_leasecount, which provides mutual exclusion > between inode-modifying operations and read leases in the same way the > i_writecount provides mutual exclusion between write opens and execs: > when i_leasecount is positive, it counts the number of leases on the > given inode, and when it's negative it counts the number of operations > which want leases temporarily disabled. This allows selective > enabling/disabling of leases on a per-inode basis. > > To that end, the functions leases_get_access() and leases_put_access() > are used when a lease is granted and returned, respectively. The > functions leases_deny_access() and leases_allow_access() are used to > prevent races between breaking-with-FMODE_WRITE and write-lease-granting > for the entire duration of a file operation. Currently, leases are > broken only when a file is opened or truncated; these functions will > allow leases to be broken on things like unlink and rename as well. > NFSv4 implements delegations using leases, and needs its delegations to > be revoked on unlinks, renames, chowns, etc. > > Note that this patch changes break_lease() and __break_lease(), such > that when they are called with FMODE_WRITE and return successfully, they > will leave leases disabled on the inode in question, and the caller must > eventually call break_lease_end() to re-enable leasing. As alluded to > in the scenario above, this behavior isn't necessary when breaking > without FMODE_WRITE: existing and new read-leases wouldn't need to be > revoked or blocked; and a write-lease-granting setlease won't race the > break_lease() because the latter is presumed to have been preceded by > something like a dget() on the dentry in question (where d_count or > i_count > 1 blocks write-lease-granting). > > This patch also closes a small existing open/lease race: a lease-related > race exists between the time that outstanding leases are broken (by > may_open()) and the time that, e.g., O_RDWR or O_WRONLY are reflected in > the inode's i_writecount variable (which will prevent subsequent > lease-granting setlease calls). Conceivably, a read lease could be > granted in the interim. > > To deal with this, may_open() is modified so that, on success and when > called with FMODE_WRITE, it will return with lease-granting disabled for > the inode in question. do_filp_open() is modified so that leasing is > re-enabled once everything is finished. Analogous changes are made on > truncation.