Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2378072imm; Thu, 2 Aug 2018 10:32:44 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdFjwLOcs8paISwRCc0buWJ8lUku5pyjWmGu+8H3o0Dmz2hymbwRJLzXI4b0+UAUVZT1lFq X-Received: by 2002:a17:902:7898:: with SMTP id q24-v6mr341917pll.222.1533231164106; Thu, 02 Aug 2018 10:32:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533231164; cv=none; d=google.com; s=arc-20160816; b=yTukoED/BC0MW/PphG20Cwkx/EGfYlGRFQPrgtdDRf7kang04bTkU5Kaq5G68slhqI rJFjpZeKhdPBGsowkr8tylhwxrH9PqIa/MsGGwoGZ63erRXVOj2mpDfYZC2jiVaIKIn7 /Wf5sVB1c8jjI5KIo1XrlTsqP8CE+Yv/K241J5lXbyopu2HgQKGCOkK3Wuuu8IeouZM+ N+h/QouTIwCfSlQqsXzV62M71mHr4pPYkHgUtub53AGxLAZeSARIi6hh7Bm7dyPXUYHP 5pEhp3byOI/rBVbnve+RsklgzUDXUW6EC9z+NN5XiUyAHKsQP8it+S1Om2G5Z9O7qO6E /5Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=u29QTuzreazqEvKUYxZ+H8qJgLAYxZlOPdRo28Hek7U=; b=QtJ4KPbxSxH/BK4rudj7k/hbLWX1Oy/w0h9k37FWpGAnwNxgX0LYQkmgJ1Niq+GTD4 RK8nzYtHKLKQEKn9nWpFL+boM1TAaaxJgK02u3Chz0IcY62UYU3snbt/IUgomMRwZ/21 hQDoQOiGucfjZVLZnIqCM9mCHQLqjOaVGOFdfI71fx3mlaqd1orMyvmikX8IZcYlkzOR i8iV6R/z57j9ilHtc08uETL8Z84L7pVG40Kyu3PuJ74D6ezYevP24Q6oAUeCFmkFipBf p77K0C7N/jFkbWYUYprr5RzuGHy/ImSXMPSOgF/qMxXJb2fIwuE6N+B3jcH/RgLdhUNp /Mhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HpeY62Mj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a85-v6si2860398pfa.109.2018.08.02.10.32.29; Thu, 02 Aug 2018 10:32:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=HpeY62Mj; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732301AbeHBTXR (ORCPT + 99 others); Thu, 2 Aug 2018 15:23:17 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:41289 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729788AbeHBTXR (ORCPT ); Thu, 2 Aug 2018 15:23:17 -0400 Received: by mail-wr1-f66.google.com with SMTP id j5-v6so2915618wrr.8; Thu, 02 Aug 2018 10:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=u29QTuzreazqEvKUYxZ+H8qJgLAYxZlOPdRo28Hek7U=; b=HpeY62MjJpDxRt5kxqujRtvqlyuHCKGME1YbtbO/1S/BWhfoW8r7+ZtjEldpU8nL7c iqaD3f07X/OMaOoq5TBJHQr7erFeQdvRh5180uiqgUVH8bNzhklK4MXD7xn+AG/Dqcsa FholyZ3Pde7D4LUhJtHJ37epYoSFOaW+SQDutzGYBUDXFa2nHlN3pvLGWg0UxWb/9Ive paEm2BBdqJ1LyyhrkQFu39obfFrbhTnvjdP4KSI5h5KwL0uQ2lZ3fbV2yx/FRlkVh3rI ptAMkHp9e3fUndmgJ9hadVGR7/vnGqxbNm19gGadJWYJTTSoKe3EPPvnlNjMuiYffW3t G4yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=u29QTuzreazqEvKUYxZ+H8qJgLAYxZlOPdRo28Hek7U=; b=PXUlbPGjAabIvoYNcBQ4CVLqzJbQeWR/HH9R+KkT9/6G1dzdzvlsoCyr164l3koh6G mVvoMm09bWBFBAH4ZJZfMwfuhrZQFHtavzMqmcGaUxaoxBS5mSI1HyqhzmpZDaKLl2+d 7RkrESDxZsaHRC+gyPLAVpX1QzOUNZP1Up42u26hn7rJzaIv07gKHpZvwUN6UmEoREhm trUyEsOBl1GHG40+QaRw4U88u4WCkD16w394AJQUKq0v3So+hfkhBxeKFlE21DvNJaVD bPhhUWnygaltwFEFZYbl/bPykWBlIsk6T3p6KPsdLiQJSWziPJf7shQSF0VEgXz11Imi WSLw== X-Gm-Message-State: AOUpUlF1qeu61YqT3wqDXCkyg4CAsKkNFsI1nMsdInzQ+DSOiswZRTtB 1GpnCN4KRwLvKxLbEMQYsvDk1TtS X-Received: by 2002:a5d:4b90:: with SMTP id b16-v6mr264144wrt.168.1533231068795; Thu, 02 Aug 2018 10:31:08 -0700 (PDT) Received: from [172.16.9.139] (host-78-151-217-159.as13285.net. [78.151.217.159]) by smtp.gmail.com with ESMTPSA id u71-v6sm3219168wmd.12.2018.08.02.10.31.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Aug 2018 10:31:08 -0700 (PDT) Subject: Re: [PATCH 01/33] vfs: syscall: Add open_tree(2) to reference or clone a mount [ver #11] To: David Howells , viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org References: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> <153313705165.13253.4602180607294286849.stgit@warthog.procyon.org.uk> From: Alan Jenkins Message-ID: <7a292a44-7e17-572b-2d1e-0085ef400010@gmail.com> Date: Thu, 2 Aug 2018 18:31:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <153313705165.13253.4602180607294286849.stgit@warthog.procyon.org.uk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi I found this interesting, though I don't entirely follow the kernel mount/unmount code.  I had one puzzle about the code, and two questions which I was largely able to answer. On 01/08/18 16:24, David Howells wrote: > +void dissolve_on_fput(struct vfsmount *mnt) > +{ > + namespace_lock(); > + lock_mount_hash(); > + mntget(mnt); > + umount_tree(real_mount(mnt), UMOUNT_SYNC); > + unlock_mount_hash(); > + namespace_unlock(); > +} Can I ask why  UMOUNT_SYNC is used here?  I feel like I must have missed something, but doesn't it skip the IS_MNT_LOCKED() check in disconnect_mount() ? UMOUNT_SYNC seems used for non-lazy unmounts, and in internal cleanups where userspace wouldn't be able to see.  But I think userspace can keep watching in this case, e.g. by `fd2 = openat(fd, ".", O_PATH)` (or `fd2 = open_tree(fd, ".", 0)` ?).  I would think this function should avoid using UMOUNT_SYNC, like lazy unmount avoids it. > From: Al Viro > > open_tree(dfd, pathname, flags) > > Returns an O_PATH-opened file descriptor or an error. > dfd and pathname specify the location to open, in usual > fashion (see e.g. fstatat(2)). flags should be an OR of > some of the following: > * AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW - > same meanings as usual > * OPEN_TREE_CLOEXEC - make the resulting descriptor > close-on-exec > * OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE - > instead of opening the location in question, create a detached > mount tree matching the subtree rooted at location specified by > dfd/pathname. With AT_RECURSIVE the entire subtree is cloned, > without it - only the part within in the mount containing the > location in question. In other words, the same as mount --rbind > or mount --bind would've taken. One of the limitations documented for `mount --bind`, is that `mount -o bind,ro` is not atomic.  There's a workaround if you need it, it's just a bit clunky.  I wondered if it was possible to improve `mount` by changing the mount flags between OPEN_TREE_CLONE and move_mount().     fd = open_tree(..., OPEN_TREE_CLONE);     fchdir(fd);     mount(NULL, ".", NULL, MS_REMOUNT | MS_BIND | newbindflags, NULL);     move_mount(fd, ...); Another closely-related limitation of `mount`, is that it can't atomically set the propagation type at mount time. My conclusion was the above doesn't quite work yet.  do_remount() still uses check_mnt(), so it doesn't accept detached mounts.  I wonder if it can be changed in future. > The detached tree will be > dissolved on the final close of obtained file. My last question turned out very dull, feel free to ignore. It seems the only way to use MNT_FORCE[1], is to first attach the filesystem somewhere (and close the file descriptor).  I wondered if there was a way to make things more regular.  close_and_umount() feels too obscure to live though... [1] "Ask the filesystem to abort pending requests before attempting theunmount. This may allow the unmount to complete without waitingfor an inaccessible server. If, after aborting requests, someprocesses still have active references to the filesystem, theunmount will still fail." ...and I suppose it's much less useful than I thought.  The point of MNT_FORCE is to kick out processes that were blocked _trying to access a file by name_, e.g. open() or stat().  But if we're considering a detached mount, then it's impossible to access it by name alone.  You need an fd (or cwd or root), which would stop the filesystem being unmounted anyway. close_and_umount(fd, MNT_FORCE) is pointless unless your process has other threads accessing the filesystem through the same fd, but that's a really bad idea anyway. It could prevent someone else getting stuck indefinitely on /proc/$PID/fd, but that's still very obscure. Regards Alan