Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp644559ybp; Wed, 9 Oct 2019 01:49:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqzQmj54sKu4MUkusvX+N6MZmw2e5Hj/roCIuYF7JB/dWbdL8RVkArr5+Jfu6S5/Zd3niyHu X-Received: by 2002:a05:6402:1b92:: with SMTP id cc18mr1888807edb.129.1570610983643; Wed, 09 Oct 2019 01:49:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570610983; cv=none; d=google.com; s=arc-20160816; b=Q6nN5+VQGeB/7UK5GFapkwTnrAfIl5NgGKQV5C22hDX8oFChe0UiBUzD1BR6egWSSG nG8f4mxgckiZtQ8RMoz3AhS/Inn/TiNABNoBoaM1aYO8UeeLz0APzuHWuIGP1wc1D2A2 uFRDk+YJfQwoZd/ZqQdFjyj43HM9xaZak/IB1E6jNtBiDOZCykWlFM93vmI6fIueknKM fPcSe/p2dwfs5NkKHVTj2SUTLJIj9FFx5srH4HOwa8qSjAfPAPbGSzGrumtwSKSJ//9t na9VxTVrqEHU7Y8vy/5ohkphINBPVeuuAP3Uh8lhCyXZLwU9FrK7nO117iRU1bPBnMLz 2vlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:content-transfer-encoding :mime-version:user-agent:message-id:in-reply-to:date:references:cc :to:from; bh=mkrMYLPC+CAAMm3nOKuXX3yKXp/KqV0pLDcVOLaCWM4=; b=Hi9MxWYoR5r3Xdb2MNFsAI8P3f5Ik+m/+LKTy5hMfUZhEjqI6m+mmMP5rYyZJFkHA8 Ip4Y9KOHuYBIG3f1Nq1hycE2kU8vA12ezTqeAxUGbLjiC6lG3hw95PlBxB4oZNUQY4bp OXBiAVN0h2LPjIkGz7eFLbxW3ORON7gXBn1+Itc/01ejBFY8C/4LiwmJjOpp5ks0Fxg7 guCvuPse4kYcMoI/65l/xd9y4eqqp1XMe2g7xiQMRIsS08jMTzEPtlRqW76jDOdTF1OU Jhewmdl8FxURIK71fr4pbr3MsMcLTwnEry8uk/0ziOEDtmvF1yCucSwMVJS3DK6LrSQA lA3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c54si887714edb.230.2019.10.09.01.49.19; Wed, 09 Oct 2019 01:49:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=xmission.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725989AbfJIIq6 convert rfc822-to-8bit (ORCPT + 99 others); Wed, 9 Oct 2019 04:46:58 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:55042 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725440AbfJIIq5 (ORCPT ); Wed, 9 Oct 2019 04:46:57 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1iI7cP-0002a4-HA; Wed, 09 Oct 2019 02:46:49 -0600 Received: from ip68-227-160-95.om.om.cox.net ([68.227.160.95] helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1iI7cO-0003fB-9m; Wed, 09 Oct 2019 02:46:49 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: "Michael Kerrisk \(man-pages\)" Cc: Philipp Wendler , "Serge E. Hallyn" , Christian Brauner , Aleksa Sarai , Reid Priedhorsky , Andy Lutomirski , Yang Bo , Jakub Wilk , Joseph Sible , Al Viro , werner@almesberger.net, linux-man , lkml , Containers , =?utf-8?Q?St=C3=A9p?= =?utf-8?Q?hane?= Graber References: <620c691a-065e-b894-4f06-7453012bc8d3@gmail.com> Date: Wed, 09 Oct 2019 03:46:02 -0500 In-Reply-To: (Michael Kerrisk's message of "Wed, 9 Oct 2019 09:41:34 +0200") Message-ID: <87y2xu71dh.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-XM-SPF: eid=1iI7cO-0003fB-9m;;;mid=<87y2xu71dh.fsf@x220.int.ebiederm.org>;;;hst=in01.mta.xmission.com;;;ip=68.227.160.95;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/c/bS/iIQza/ug6UXm9TOVVep9XRjAqd8= X-SA-Exim-Connect-IP: 68.227.160.95 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on sa05.xmission.com X-Spam-Level: ** X-Spam-Status: No, score=2.3 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,TR_Symld_Words,T_TM2_M_HEADER_IN_MSG, T_XMDrugObfuBody_08,XM_B_Unicode autolearn=disabled version=3.4.2 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4999] * 1.5 TR_Symld_Words too many words that have symbols inside * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.0 XM_B_Unicode BODY: Testing for specific types of unicode * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa05 1397; Body=1 Fuz1=1 Fuz2=1] * 1.0 T_XMDrugObfuBody_08 obfuscated drug references X-Spam-DCC: XMission; sa05 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;"Michael Kerrisk \(man-pages\)" X-Spam-Relay-Country: X-Spam-Timing: total 747 ms - load_scoreonly_sql: 0.05 (0.0%), signal_user_changed: 3.9 (0.5%), b_tie_ro: 2.7 (0.4%), parse: 1.99 (0.3%), extract_message_metadata: 11 (1.5%), get_uri_detail_list: 7 (1.0%), tests_pri_-1000: 6 (0.8%), tests_pri_-950: 1.95 (0.3%), tests_pri_-900: 1.55 (0.2%), tests_pri_-90: 48 (6.4%), check_bayes: 46 (6.2%), b_tokenize: 20 (2.7%), b_tok_get_all: 13 (1.8%), b_comp_prob: 5 (0.7%), b_tok_touch_all: 4.2 (0.6%), b_finish: 0.75 (0.1%), tests_pri_0: 643 (86.1%), check_dkim_signature: 0.75 (0.1%), check_dkim_adsp: 2.8 (0.4%), poll_dns_idle: 0.92 (0.1%), tests_pri_10: 4.6 (0.6%), tests_pri_500: 12 (1.6%), rewrite_mail: 0.00 (0.0%) Subject: Re: For review: rewritten pivot_root(2) manual page X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "Michael Kerrisk (man-pages)" writes: > Hello Philipp, > > My apologies that it has taken a while to reply. (I had been hoping > and waiting that a few more people might weigh in on this thread.) > > On 9/23/19 3:42 PM, Philipp Wendler wrote: >> Hello Michael, >> >> Am 23.09.19 um 14:04 schrieb Michael Kerrisk (man-pages): >> >>> I'm considering to rewrite these pieces to exactly >>> describe what the system call does (which I already >>> do in the third paragraph) and remove the "may or may not" >>> pieces in the second paragraph. I'd welcome comments >>> on making that change. >> >> I think that it would make the man page significantly easier to >> understand if if the vague wording and the meta discussion about it are >> removed. > > It is my inclination to make this change, but I'd love to get more > feedback on this point. > >>> DESCRIPTION >> [...]> pivot_root() changes the >>> root directory and the current working directory of each process >>> or thread in the same mount namespace to new_root if they point to >>> the old root directory. (See also NOTES.) On the other hand, >>> pivot_root() does not change the caller's current working direc‐ >>> tory (unless it is on the old root directory), and thus it should >>> be followed by a chdir("/") call. >> >> There is a contradiction here with the NOTES (cf. below). > > See below. > >>> The following restrictions apply: >>> >>> - new_root and put_old must be directories. >>> >>> - new_root and put_old must not be on the same filesystem as the >>> current root. In particular, new_root can't be "/" (but can be >>> a bind mounted directory on the current root filesystem). >> >> Wouldn't "must not be on the same mountpoint" or something similar be >> more clear, at least for new_root? The note in parentheses indicates >> that new_root can actually be on the same filesystem as the current >> note. However, ... > > For 'put_old', it really is "filesystem". If we are going to be pedantic "filesystem" is really the wrong concept here. The section about bind mount clarifies it, but I wonder if there is a better term. I think I would say: "new_root and put_old must not be on the same mount as the current root." I think using "mount" instead of "filesystem" keeps the concepts less confusing. As I am reading through this email and seeing text that is trying to be precise and clear then hitting the term "filesystem" is a bit jarring. pivot_root doesn't care a thing for file systems. pivot_root only cares about mounts. And by a "mount" I mean the thing that you get when you create a bind mount or you call mount normally. Michael do you have man pages for the new mount api yet? > For 'new_root', see below. > >>> - put_old must be at or underneath new_root; that is, adding a >>> nonnegative number of /.. to the string pointed to by put_old >>> must yield the same directory as new_root. >>> >>> - new_root must be a mount point. (If it is not otherwise a >>> mount point, it suffices to bind mount new_root on top of >>> itself.) >> >> ... this item actually makes the above item almost redundant regarding >> new_root (except for the "/") case. So one could replace this item with >> something like this: >> >> - new_root must be a mount point different from "/". (If it is not >> otherwise a mount point, it suffices to bind mount new_root on top >> of itself.) >> >> The above item would then only mention put_old (and maybe use clarified >> wording on whether actually a different file system is necessary for >> put_old or whether a different mount point is enough). > > Thanks. That's a good suggestion. I simplified the earlier bullet > point as you suggested, and changed the text here to say: > > - new_root must be a mount point, but can't be "/". If it is not > otherwise a mount point, it suffices to bind mount new_root on > top of itself. (new_root can be a bind mounted directory on > the current root filesystem.) How about: - new_root must be the path to a mount, but can't be "/". Any path that is not already a mount can be converted into one by bind mounting the path onto itself. >>> NOTES >> [...] >>> pivot_root() allows the caller to switch to a new root filesystem >>> while at the same time placing the old root mount at a location >>> under new_root from where it can subsequently be unmounted. (The >>> fact that it moves all processes that have a root directory or >>> current working directory on the old root filesystem to the new >>> root filesystem frees the old root filesystem of users, allowing >>> it to be unmounted more easily.) >> >> Here is the contradiction: >> The DESCRIPTION says that root and current working dir are only changed >> "if they point to the old root directory". Here in the NOTES it says >> that any root or working directories on the old root file system (i.e., >> even if somewhere below the root) are changed. >> >> Which is correct? > > The first text is correct. I must have accidentally inserted > "filesystem" into the paragraph just here during a global edit. > Thanks for catching that. > >> If it indeed affects all processes with root and/or current working >> directory below the old root, the text here does not clearly state what >> the new root/current working directory of theses processes is. >> E.g., if a process is at /foo and we pivot to /bar, will the process be >> moved to /bar (i.e., at / after pivot_root), or will the kernel attempt >> to move it to some location like /bar/foo? Because the latter might not >> even exist, I suspect that everything is just moved to new_root, but >> this could be stated explicitly by replacing "to the new root >> filesystem" in the above paragraph with "to the new root directory" >> (after checking whether this is true). > > The text here now reads: > > pivot_root() allows the caller to switch to a new root filesystem > while at the same time placing the old root mount at a location > under new_root from where it can subsequently be unmounted. (The > fact that it moves all processes that have a root directory or > current working directory on the old root directory to the new > root frees the old root directory of users, allowing the old root > filesystem to be unmounted more easily.) Please "mount" instead of "filesystem". >>> EXAMPLE> The program below demonstrates the use of pivot_root() inside a >>> mount namespace that is created using clone(2). After pivoting to >>> the root directory named in the program's first command-line argu‐ >>> ment, the child created by clone(2) then executes the program >>> named in the remaining command-line arguments. >> >> Why not use the pivot_root(".", ".") in the example program? >> It would make the example shorter, and also works if the process cannot >> write to new_root (e..g., in a user namespace). > > I'm not sure. Some people have a bit of trouble to wrap their head > around the pivot_root(".", ".") idea. (I possibly am one of them.) > I'd be quite keen to hear other opinions on this. Unfortunately, > few people have commented on this manual page rewrite. I am happy as long as it is pivot_root(".", ".") is documented somewhere. There is real code that uses it so it is not going away. Plus pivot_root(".", ".") is really what is desired in a lot of situations where the caller of pivot_root is an intermediary and does not control the new root filesystem. At which point the only path you can be guaranteed to exit on the new root filesystem is "/". Eric