Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1758501pxj; Wed, 19 May 2021 13:14:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHJ/UeSZV2ZBwyztQxMUtl+rfww1oAX5b8cy9sUC5whKBsTEuoVhxoAkt2DgydYPGEOvUF X-Received: by 2002:a92:cd85:: with SMTP id r5mr918887ilb.294.1621455253075; Wed, 19 May 2021 13:14:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621455253; cv=none; d=google.com; s=arc-20160816; b=bjlR5dmvu2utBksAC2cfqyP+NnTFDPbLbgi1+SI7cWmFCbRNbALtLPdjozRC9uf0vF ZE7lCgVHpxT/mJVtrwiIMRUU1ofbOUqKSWgV/qymrhSkVyYArEhGZeDYLd5E4zbTnIkF tDReqYAdof5liM/mCjK0ocSbs00/giBFUQ8i07hzTYV2YrsVtXR/LmzQwZPPCMbWxGUi Rt6JUf/1zn5Tyx6NBam67RDUj5xgoktmhcYKWIQd3wIJB2pukK/kQlZjU1V4coCdiaXK nj0kI0K6h2GvcNmCOsIj2saThhACtnL4Qtlbs9RGG55thz6M/EoIOaGUbx0UEQbU4eDm WW5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=tLZfHl6zgAvmPyA2dojvBolHUz9pm8BjlKD1XCa5uTw=; b=TFAJg6Lc5T6ys+8Akqwn78lHUjQhgdXA7OUpxg87/BDVEF0f0SfqV3pVBxRUoNVPQO 5ZXtrbqnJ+UHdfOvoX/9HEMjkfNlsoEbVp6GE1Q6ZCo8mBkDZEHfVurgQiXSe1pV5XdT N4yhKNE839a2+6AFXJtMe2cTE9rDAUwSZ2JerjK2mzO5RIeVu/Ywx7xs863XyYhkK8yX jgnsSSL15m0WqJjIoSI9d7ho41snmMVKi+Gonf4JU/U8S9ctJ7O7gdtkuju36UUKp/cM 1pAv7QvcucqUnB/xZtEkn+qXCd/hUJbiu16/LLNcEkwJfOuG2TsYfRD1FowScv0yIjfE etaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a16si588450ilr.160.2021.05.19.13.14.00; Wed, 19 May 2021 13:14:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354632AbhESP5k (ORCPT + 99 others); Wed, 19 May 2021 11:57:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239586AbhESP5k (ORCPT ); Wed, 19 May 2021 11:57:40 -0400 Received: from zeniv-ca.linux.org.uk (zeniv-ca.linux.org.uk [IPv6:2607:5300:60:148a::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30C59C06175F; Wed, 19 May 2021 08:56:20 -0700 (PDT) Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94 #2 (Red Hat Linux)) id 1ljOXW-00GH8f-H0; Wed, 19 May 2021 15:55:18 +0000 Date: Wed, 19 May 2021 15:55:18 +0000 From: Al Viro To: Andy Shevchenko Cc: Linus Torvalds , Jia He , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Rasmus Villemoes , Jonathan Corbet , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , "Eric W . Biederman" , "Darrick J. Wong" , "Peter Zijlstra (Intel)" , Ira Weiny , Eric Biggers , "Ahmed S. Darwish" , "open list:DOCUMENTATION" , Linux Kernel Mailing List , linux-s390 , linux-fsdevel Subject: Re: [PATCH 12/14] d_path: prepend_path(): lift the inner loop into a new helper Message-ID: References: <20210519004901.3829541-1-viro@zeniv.linux.org.uk> <20210519004901.3829541-12-viro@zeniv.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 19, 2021 at 11:07:04AM +0300, Andy Shevchenko wrote: > On Wed, May 19, 2021 at 12:48:59AM +0000, Al Viro wrote: > > ... and leave the rename_lock/mount_lock handling in prepend_path() > > itself > > ... > > > + if (!IS_ERR_OR_NULL(mnt_ns) && !is_anon_ns(mnt_ns)) > > + return 1; // absolute root > > + else > > + return 2; // detached or not attached yet > > Would it be slightly better to read > > if (IS_ERR_OR_NULL(mnt_ns) || is_anon_ns(mnt_ns)) > return 2; // detached or not attached yet > else > return 1; // absolute root > > ? > > Oh, I have noticed that it's in the original piece of code (perhaps separate > change if we ever need it?). The real readability problem here is not the negations. There are 4 possible states for vfsmount encoded via ->mnt_ns: 1) not attached to any tree, kept alive by refcount alone. ->mnt_ns == NULL. 2) long-term unattached. Not a part of any mount tree, but we have a known holder for it and until that's gone (making ->mnt_ns NULL), refcount is guaranteed to remain positive. pipe_mnt is an example of such. ->mnt_ns == MNT_NS_INTERNAL, which is encoded as ERR_PTR(-1), thus the use of IS_ERR_OR_NULL here (something I'd normally taken out and shot - use of that primitive is a sign of lousy API or of a cargo-culted "defensive programming"). 3) part of a temporary mount tree; not in anyone's namespace. ->mnt_ns points the tree in question, ->mnt_ns->seq == 0. 4) belongs to someone's namespace. ->mnt_ns points to that, ->mnt_ns->seq != 0. That's what we are looking for here. It's kludges all the way down ;-/ Note that temporary tree can't become a normal one or vice versa - mounts can get transferred to normal namespace, but they will see ->mnt_ns reassigned to that. IOW, ->mnt_ns->seq can't get changed without a change to ->mnt_ns. I suspect that the right way to handle that would be to have that state stored as explicit flags. All mounts are created (and destroyed) in state (1); state changes: commit_tree() - (1) or (3) to (3) or (4) umount_tree() - (3) or (4) to (1) clone_private_mount() - (1) to (2) open_detached_copy() - (1) to (3) copy_mnt_ns() - (1) to (4) mount_subtree() - (1) to (3) fsmount() - (1) to (3) init_mount_tree() - (1) to (4) kern_mount() - (1) to (2) kern_unmount{,_array}() - (2) to (1) commit_tree() has a pathological call chain that has it attach stuff to temporary tree; that's basically automount by lookup in temporary namespace. It can distinguish it from the usual (adding to normal namespace) by looking at the state of mountpoint we are attaching to - or simply describe all cases as "(1) or (3) to whatever state the mountpoint is". One really hot path where we check (1) vs. (2,3,4) is mntput_no_expire(), which is the initial reason behind the current representation. However, read from ->mnt_flags is just as cheap as that from ->mnt_ns and the same reasons that make READ_ONCE() legitimate there would apply to ->mnt_flags as well. We can't reuse MNT_INTERNAL for that, more's the pity - it's used to mark the mounts (kern_mount()-created, mostly) that need to destroyed synchronously on the final mntput(), with no task_work_add() allowed (think of module_init() failing halfway through, with kern_unmount() done to destroy the internal mounts already created; we *really* don't want to delay that filesystem shutdown until insmod(2) heads out to userland). Another headache is in LSM shite, as usual... Anyway, sorting that out is definitely a separate story.