Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp2774254ybh; Mon, 5 Aug 2019 06:38:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqxdT5DX8BhwZ6dW+GrJvsN1nCGjFrmI0S/hT9fhEh8IpSbbH/ZoxOMnAzLuPEc56dObTzOX X-Received: by 2002:a17:90a:2648:: with SMTP id l66mr17983305pje.65.1565012295126; Mon, 05 Aug 2019 06:38:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565012295; cv=none; d=google.com; s=arc-20160816; b=R/PboNmiYoJyRPgp5mwD3G1QcdHiMV3vOnfrDdwTlMZ8TlyP8ShMLmJ+Pw6TqM9iWq SoLVX7QKL4+tC6yXPzxI4bbeSKAeGAas//kkxB9QL7uvDd3pcPtZFWTRWU3OxaxslhQ+ RxQBEm/ftnIXlFBO6ZEp6tz94xFPi0uvZjMpYf5+zJs7vfqJMxHrVhBGcd0XI3pbOyPi gFGGrLG0NO2YbvPpztXvKbiedTTSzcWbdcxaBBRxMdrZo2ZAxISxWVaOw7acsGqU6y6F NA5OftRmUAHq5T5zNrmWeQKZ8y9U803wexgxnh4mrWr2HKvATVI3D1R8x3s1sGGAbeZJ cTSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=VYVWXMu2L/JGrkVrdd80TgaLBKehgqbyVuq4vZmLNr4=; b=XuTJNIDJ3dM2tS/kbPptlgJ10iwcI+AnIui7vZGFbAOPQRc/ZkdAzienQhX9Aqdbbr ejhBNiQHYyq2yvgTiPHRsGGVOPV8ZyJKOqgsdpOd9ORf+s2kI4b+WZ/Jxaq3X3bSTq0o Ajz10MKxGPc+lIc3I6b5bGiYAqidl6Bt9eHNjRgR9DYyG/0to/pgaHOW6eW4Fbo9+3QA ktDuOZwrrtjaMcJgXdN9PyVurL8Rkt6OUQ48hPX2D2TjGxjEgfb1dBM1/LzsC0p1dXEj PsB1E1KS8E0RrDcI7yOMRUHQTD9ZnFBeVH4g4RAgCyGWv97y/SGk9sS2cFNejW5Oa8fZ 4yAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h27si38090769pgh.388.2019.08.05.06.37.59; Mon, 05 Aug 2019 06:38:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728997AbfHENhY (ORCPT + 99 others); Mon, 5 Aug 2019 09:37:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:51352 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726834AbfHENhY (ORCPT ); Mon, 5 Aug 2019 09:37:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5495EAD18; Mon, 5 Aug 2019 13:37:22 +0000 (UTC) Date: Mon, 5 Aug 2019 23:37:44 +1000 From: Aleksa Sarai To: "Michael Kerrisk (man-pages)" Cc: "Serge E. Hallyn" , linux-man , Containers , lkml , Andy Lutomirski , Jordan Ogas , Al Viro , werner@almesberger.net, Aleksa Sarai Subject: Re: pivot_root(".", ".") and the fchdir() dance Message-ID: <20190805133744.7sm6wx2rspwiuxu5@mikami> References: <20190805103630.tu4kytsbi5evfrhi@mikami> <3a96c631-6595-b75e-f6a7-db703bf89bcf@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="np2ugc7obo2qaz74" Content-Disposition: inline In-Reply-To: <3a96c631-6595-b75e-f6a7-db703bf89bcf@gmail.com> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --np2ugc7obo2qaz74 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2019-08-05, Michael Kerrisk (man-pages) wrote: > On 8/5/19 12:36 PM, Aleksa Sarai wrote: > > On 2019-08-01, Michael Kerrisk (man-pages) wro= te: > >> I'd like to add some documentation about the pivot_root(".", ".") > >> idea, but I have a doubt/question. In the lxc_pivot_root() code we > >> have these steps > >> > >> oldroot =3D open("/", O_DIRECTORY | O_RDONLY | O_CLOEXEC); > >> newroot =3D open(rootfs, O_DIRECTORY | O_RDONLY | O_CLOEXEC); > >> > >> fchdir(newroot); > >> pivot_root(".", "."); > >> > >> fchdir(oldroot); // **** > >=20 > > This one is "required" because (as the pivot_root(2) man page states), > > it's technically not guaranteed by the kernel that the process's cwd > > will be the same after pivot_root(2): > >=20 > >> pivot_root() may or may not change the current root and the current > >> working directory of any processes or threads which use the old root > >> directory. > >=20 > > Now, if it turns out that we can rely on the current behaviour (and the > > man page you're improving is actually inaccurate on this point) then > > you're right that this fchdir(2) isn't required. >=20 > I'm not sure that I follow your logic here. In the old manual page > text that you quote above, it says that [pivot_root() may change the > CWD of any processes that use the old root directory]. Two points=20 > there: >=20 > (1) the first fchdir() has *already* changed the CWD of the calling > process to the new root directory, and Right, I (and presumably the LXC folks as well) must've missed the qualifier on the end of the sentence and was thinking that it said "you can't trust CWD after pivot_root(2)". My follow-up was going to be that we need to be in the old root to umount, but as you mentioned that shouldn't be necessary either since the umount will apply to the stacked mount (which is somewhat counter-intuitively the earlier mount not the later one -- I will freely admit that I don't understand all of the stacked and tucked mount logic in VFS). > (2) the manual page implied but did not explicitly say that the > CWD of processes using the old root may be changed *to the new root > directory* (rather than changed to some arbitrary location!); > presumably, omitting to mention that detail explicitly in the manual > page was an oversight, since that is indeed the kernel's behavior. >=20 > The point is, the manual page was written 19 years ago and has > barely been changed since then. Even at the time that the system > call was officially released (in Linux 2.4.0), the manual page was > already inaccurate in a number of details, since it was written=20 > about a year beforehand (during the 2.3 series), and the=20 > implementation already changed by the time of 2.4.0, but the manual > page was not changed then (or since, but I'm working on that). >=20 > The behavior has in practice always been (modulo the introduction > of mount namespaces in 2001/2002): >=20 > pivot_root() changes the root directory and the current > working directory of each process or thread in the same > mount namespace to new_root if they point to the old root > directory. >=20 > Given that this has been the behavior since Linux 2.4.0 was > released, it improbable that this will ever change, since, > notwithstanding what the manual page says, this would be an ABI > breakage. >=20 > I hypothesize that the original manual page text, written before > the system call was even officially released, reflects Werner's > belief at the time that perhaps in the not too distant future > the implementation might change. But, 18 years on from 2.4.0, > it has not. >=20 > And arguably, the manual page should reflect that reality, > describing what the kernel actually does, rather than speculating > that things might (after 19 years) still sometime change. I wasn't aware of the history of the man page, and took it as gospel that we should avoid making assumptions about current's CWD surrounding a pivot_root(2). Given the history (and that it appears the behaviour was never intended to be changed after being merged), we should definitely drop that text to avoid the confusion which has already caused us container folks to implement this in a more-convoluted-than-necessary fashion. In case you haven't noticed already, you might want to also send a patch to util-linux to also update pivot_root(8) which makes the same mistake in its text: > Note that, depending on the implementation of pivot_root, root and cwd > of the caller may or may not change. Then again, it's also possible this text is independently just as vague for other reasons. > > And this one is required because we are in @oldroot at this point, due > > to the first fchdir(2). If we don't have the first one, then switching > > from "." to "/" in the mount/umount2 calls should fix the issue. >=20 > See my notes above for why I therefore think that the second fchdir() > is also not needed (and therefore why switching from "." to "/" in the > mount()/umount2() calls is unnecessary. My gut feeling reading this was that operating on "." will result in you doing the later mount operations on @newroot (since "." is @newroot) not on the stacked mount which isn't your CWD. *But* my gut feeling is obviously wrong (since you've tested it), and I will again admit I don't understand quite how CWD references interact with mount operations -- especially in the context of stacked mounts. > Do you agree with my analysis? Minus the mount bits that I'm not too sure about (I defer to Christian/Serge/et al on that point), it seems reasonable to me. My only real argument for keeping it the way it is, is that it's much easier (for me, at least) to understand the semantics with explicit fchdir(2)s. But that's not really a good reason to continue doing it the way we do it now -- if it's documented in a man page that'd be more than sufficient to avoid confusion when reading snippets that do it without the fchdir(2)s. --=20 Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH --np2ugc7obo2qaz74 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEXzbGxhtUYBJKdfWmnhiqJn3bjbQFAl1IMSUACgkQnhiqJn3b jbR9TxAAr3w1iw2zyRsIJt+qye7WLDUYc70Qq8XBW/+Xmbq5Vcf2gKXB3WAJb5PM 3D0Btda2Dw9tT69QJdEvVyt27FgwUzQAWo0m1Ki7NXI0AIocXV+a32NrBbtB+XKI f2xY5J9oCCoJ1GErEO4SkchHcxWFyygGn3jQVavxfFDA2I6p7DOXtJ23X83ZTXqJ i/Z4pu3TAPeU4SogXFJhBsdwsL8JFcLjDhnyK6SZws5oatneKvitC213J1CwGBno G4PaYQ4JGmeg+ePSvogQigYBd7ZZ89yjJjum+HAqrwVu0AVkf5VGzWu/HNXRD22v auU4MocADXFwbqjr5kDI/q5hpjVA97u4mbu06Wv9rqGIl8l1swORNJC4lZWN6uwT uzASEopUcZKrCrsj7qEQmQnfMQXZz/1SNdl55TxRA2b/BOWozK0Q7utoLBx+UGBe 0wwiCr++euhANpRrGOmgZRDvwrhGL+56Qzo8GhkQpYffmOglpiVkdSzZ5wGhLusZ IyXZH2t/vuEAoq9xEEtBvZLWW4dKnRKzYWWvV0hLEqajHRyxQCskc1LjBapE8rMm E0UR94LE0TX/KnxDPX/8XDhNT7RGVlEhTF5qmBhT1sOfXrfZ3T0zW/awlbGDHfHu 1wtivpqvwbtdF8JpWiS9uU0Pnf+743nF10yDnvJnkJgBRBzaCfE= =7p3L -----END PGP SIGNATURE----- --np2ugc7obo2qaz74--