Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp1096741ybx; Fri, 1 Nov 2019 16:48:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqwjAG2ulAt6xi3aPrdpa1NI66WJGaQpRLARMR0EhEv9ZGZQokSfXqWESN1es1wHvqWE0V6c X-Received: by 2002:a05:6402:88c:: with SMTP id e12mr15519410edy.170.1572652103109; Fri, 01 Nov 2019 16:48:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572652103; cv=none; d=google.com; s=arc-20160816; b=LIAk8DRL8fstLy1RlslNzNq3nVmv4wIbjPscTgWbG09ABqcOdpgVizmMTBNcLrd7o3 wzI6bR9mZ64fsZ2M1xOlNFILkSD1EOJQEv8mOnc/yA4BNqTZvEobekpdMKZPsBZpAlkv 5xH8sm9fSNxtToofkRxARjQP/zCCaSZC6/LcdW4kN2lci8GjbWyBi1DpVXXD81DLsq1I xRev96eY9G8MysBlHL3cB05B2gTz9nbODM9ZU1DvsmmdFe60lbNV7Cw43Pdmljjql9j2 fwvZ0fKkT12xddX+ibZKJRtZ4fGGvtrwrwix7E7ipTLQXjBYqiD0WBjSdueUZ4lVGGac vbFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=r+w/2mwN7X12LSBEQMjNVQ5EP0CL6coVSevfHiKJSK8=; b=pETmUJesztatiWfXncHpooGbIJTSGxAPwKvYbu7TlHe4FyLpzDQ8UpVbTFkv+bKkGb 3IJL9RNcIPXN+VmMRsFpRAjZKHky/fYhWbnvMHpPwwFy7MKRIHmLyJFiGlT6g/+YaeYW iHpcumBv0pN45pd5zFIbf2/pC0zaV/XyUwkAPmYD5RhJjGAiTvh/Rwr1mdmZopPLfb2O 7/wCAD0C/JAOGRLytI7pKdpovp2RXVkx7ckM0hvR9HWJcIkZyHgMIoCQLLaQPwfyVwlv IfQrHx4uYcUFhY8M2KO0VEN15YTmisaqsIWVHhwnIJYcwHWOT0HL0390VaBV6crj49OX IHIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i52si1724962ede.65.2019.11.01.16.47.58; Fri, 01 Nov 2019 16:48:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727025AbfKAXq0 (ORCPT + 99 others); Fri, 1 Nov 2019 19:46:26 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:40806 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725989AbfKAXqZ (ORCPT ); Fri, 1 Nov 2019 19:46:25 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1iQgcY-0003GU-3x; Fri, 01 Nov 2019 23:46:22 +0000 Date: Fri, 1 Nov 2019 23:46:22 +0000 From: Al Viro To: Ritesh Harjani Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, wugyuan@cn.ibm.com, jlayton@kernel.org, hsiangkao@aol.com, Jan Kara , Linus Torvalds , "Paul E. McKenney" Subject: Re: [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast Message-ID: <20191101234622.GM26530@ZenIV.linux.org.uk> References: <20190927044243.18856-1-riteshh@linux.ibm.com> <20191015040730.6A84742047@d06av24.portsmouth.uk.ibm.com> <20191022133855.B1B4752050@d06av21.portsmouth.uk.ibm.com> <20191022143736.GX26530@ZenIV.linux.org.uk> <20191022201131.GZ26530@ZenIV.linux.org.uk> <20191023110551.D04AE4C044@d06av22.portsmouth.uk.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191023110551.D04AE4C044@d06av22.portsmouth.uk.ibm.com> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 23, 2019 at 04:35:50PM +0530, Ritesh Harjani wrote: > > > What we have guaranteed is > > > * ->d_lock serializes ->d_flags/->d_inode changes > > > * ->d_seq is bumped before/after such changes > > > * positive dentry never changes ->d_inode as long as you hold > > > a reference (negative dentry *can* become positive right under you) > > > > > > So there are 3 classes of valid users: those holding ->d_lock, those > > > sampling and rechecking ->d_seq and those relying upon having observed > > > the sucker they've pinned to be positive. > > :) Thanks for simplifying like this. Agreed. FWIW, after fixing several ceph bugs, add to that the following: * all places that turn a negative dentry into positive one are holding its parent exclusive or dentry has not been observable for anybody else. It had been present in the parent's list of children (negative and unhashed) and it might have been present in in-lookup hashtable. However, nobody is going to grab a reference to it from there without having grabbed ->d_lock on it and observed the state after it became positive. Which means that holding a reference to dentry *and* holding its parent at least shared stabilizes both ->d_inode and type bits in ->d_flags. The situation with barriers is more subtle - *IF* we had sufficient barriers to have ->d_inode/type bits seen right after having gotten the reference, we are fine. The only change possible after that point is negative->positive transition and that gets taken care of by barriers provided by ->i_rwsem. If we'd obtained that reference by d_lookup() or __d_lookup(), we are fine - ->d_lock gives a barrier. The same goes for places that grab references during a tree traversal, provided that they hold ->d_lock around that (fs/autofs/expire.c stuff). The same goes for having it found in inode's aliases list (->i_lock). I really hope that the same applies to accesses to file_dentry(file); on anything except alpha that would be pretty much automatic and on alpha we get the things along the lines of f = fdt[n] mb d = f->f_path.dentry i = d->d_inode assert(i != NULL) vs. see that d->d_inode is non-NULL f->f_path.dentry = d mb fdt[n] = f IOW, the barriers that make it safe to fetch the fields of struct file (rcu_dereference_raw() in __fcheck_files() vs. smp_store_release() in __fd_install() in the above) should *hopefully* take care of all stores visible by the time of do_dentry_open(). Sure, alpha cache coherency is insane, but AFAICS it's not _that_ insane. Question to folks familiar with alpha memory model: A = 0, B = NULL, C = NULL CPU1: A = 1 CPU2: r1 = A if (r1) { B = &A mb C = &B } CPU3: r2 = C; mb if (r2) { // &B r3 = *r2 // &A r4 = *r3 // 1 assert(r4 == 1) } is the above safe on alpha? [snip] > We may also need similar guarantees with __d_clear_type_and_inode(). Not really - pinned dentry can't go negative. In any case, with the audit I've done so far, I don't believe that blanket solutions like that are good idea - most of the places doing checks are safe as it is. The surface that needs to be taken care of is fairly small, actually; most of that is in fs/namei.c and fs/dcache.c.