Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp6178196ybf; Thu, 5 Mar 2020 14:54:46 -0800 (PST) X-Google-Smtp-Source: ADFU+vvq1cGKOtsYfH3GjFe6Zqi/SpWaU3Fd36cxZIByhe3uQwov+zRPNeOd0bLZitciAYc9gyu6 X-Received: by 2002:aca:d68a:: with SMTP id n132mr607014oig.40.1583448886337; Thu, 05 Mar 2020 14:54:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583448886; cv=none; d=google.com; s=arc-20160816; b=ZvYz/BGgtufqMBIixTv3Qno/P/j88xCbCKETUadg+0u+eSB28ybq7wR2lpkL1cGIDz CKhNdvU8BDocmgGq/z/P/2cujWi1kBF8bcb5GMddAU53zIPSKPLxfLixtG0huyaJKX55 vZXwJYt1v9i7WQ+Drx2/kjRIkFKHk4Bo5rTeY2OTCbhRH4671OSchtPP+TwdzJuE0ZYY WCRZz1ud4uzRxNIKHQL40F5cyLl4MYES7b1Bas7JKZjTzVSlGlaFPC0j3KwTboiP3YQT lEFM+93UQmVr8gQVxzLNUhBgnHPtPxK4XAhtVUkdGbHIE6d5Gl1OqTACr5eqcT0qujUX DtpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=KiDTbwvMn8WO9wINMLyHXgdgqSVs97pcrSC3Kr6hZio=; b=gxM0XHUjVj1lk5lN6d4+B2LD2Ir9gpfpnhUuOJTB2oWHkuMCiISh2TpG/bc4K8pH2H nvHi+wONuff0vtdH4dLF3M4h8KyeaX8xe9t8YMmBDCdXp0u2uugCPx/V/5jwN4Sk+amM vo+k3usMCQpb0sjP7RBubuykhrZmU4x8R2OXxMR6Kx3rvZerORuB55u147WSq+8nEtRI itWCdx2nYS3wP+wwJRI2lJ6ulXowTqcZDY+gDt4yyuEJRbwwbBRI+mE6WsM1C1pAJtok xEdCkMUcikOMKUFKkVWM0GVpH5pKvrtH2I2u+H49F/+5ePBW2bOjirAG/QqSxtHBXEdZ dSCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b="b1JAp8/I"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h203si250291oif.3.2020.03.05.14.54.32; Thu, 05 Mar 2020 14:54:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b="b1JAp8/I"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726191AbgCEWyJ (ORCPT + 99 others); Thu, 5 Mar 2020 17:54:09 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:40623 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726128AbgCEWyI (ORCPT ); Thu, 5 Mar 2020 17:54:08 -0500 Received: by mail-wr1-f68.google.com with SMTP id r17so129933wrj.7 for ; Thu, 05 Mar 2020 14:54:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=KiDTbwvMn8WO9wINMLyHXgdgqSVs97pcrSC3Kr6hZio=; b=b1JAp8/IJOWv1m+6Ozb9cLVBTaZ5wHLqQQgeQusaZVS6W3rnbnp/lPkH0qGznO4pyL p4sMiQhR8IRLXYbFM3loBE7piAUHqV7dn0SuAPILrm0A9e/no2Q00SuI3L47iXeWImk5 yWWwW7qGg05rrmK182hAktdzxTMCU7+pwq5zY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=KiDTbwvMn8WO9wINMLyHXgdgqSVs97pcrSC3Kr6hZio=; b=oEjUGvOIlMrn4/5hPa9P0zDBHAqLdauhfNmjaWoRnYGlyZsIyvCsjGsh64UhsrIobN 0b3QYGzOsVCurZIGjYnkYgzFMtlYES7wy/CV6R1dL2x/qOf0J1maGWoukWfFGTe4uvBZ ZJA3qFBD53g+pLzRGYgAgVLD6J+heQm77oIdFeDVF2GugQOxjk3UdLDl6nVGMdgja4bQ sIgBSZhDSdxi+Ny/X/5rmHctaXqRB4fLWaHKkroU1FRpSMCGh5mPto04uoVkyAu4Rdvg UJSeDMkrGK3R0j15L9ntMFkRbWTGM+HuKxgUrE40ayG+A5RGo96jMTBxKY1frS1+wi3B evxA== X-Gm-Message-State: ANhLgQ0FfHZfBcChFnzeHb+eLmkqZwS94fiHHaCCvp1a+Xc4MCh2tOjp 5hU309moRSZic28Yu8YizdswezNPn6zwlkKHtXrMoQ== X-Received: by 2002:adf:fc12:: with SMTP id i18mr202355wrr.354.1583448845752; Thu, 05 Mar 2020 14:54:05 -0800 (PST) MIME-Version: 1.0 References: <20200305193511.28621-1-ignat@cloudflare.com> <1583442550.3927.47.camel@HansenPartnership.com> <20200305222117.GA1291132@rani.riverdale.lan> In-Reply-To: <20200305222117.GA1291132@rani.riverdale.lan> From: Ignat Korchagin Date: Thu, 5 Mar 2020 22:53:54 +0000 Message-ID: Subject: Re: [PATCH] mnt: add support for non-rootfs initramfs To: Arvind Sankar , James Bottomley Cc: Al Viro , linux-fsdevel@vger.kernel.org, linux-kernel , kernel-team Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 5, 2020 at 10:21 PM Arvind Sankar wrote: > > On Thu, Mar 05, 2020 at 01:09:10PM -0800, James Bottomley wrote: > > On Thu, 2020-03-05 at 19:35 +0000, Ignat Korchagin wrote: > > > The main need for this is to support container runtimes on stateless > > > Linux system (pivot_root system call from initramfs). > > > > > > Normally, the task of initramfs is to mount and switch to a "real" > > > root filesystem. However, on stateless systems (booting over the > > > network) it is just convenient to have your "real" filesystem as > > > initramfs from the start. > > > > > > This, however, breaks different container runtimes, because they > > > usually use pivot_root system call after creating their mount > > > namespace. But pivot_root does not work from initramfs, because > > > initramfs runs form rootfs, which is the root of the mount tree and > > > can't be unmounted. > > > > Can you say more about why this is a problem? We use pivot_root to > > pivot from the initramfs rootfs to the newly discovered and mounted > > real root ... the same mechanism should work for a container (mount > > namespace) running from initramfs ... why doesn't it? > > Not sure how it interacts with mount namespaces, but we don't use > pivot_root to go from rootfs to the real root. We use switch_root, which > moves the new root onto the old / using mount with MS_MOVE and then > chroot to it. > > https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt > > > > > The sequence usually looks like: create and enter a mount namespace, > > build a tmpfs for the container in some $root directory then do > > > > > > cd $root > > mkdir old-root > > pivot_root . old-root > > mount -- > > make-rprivate /old-root > > umount -l /old-root > > rmdir /old-root > > > > Once that's done you're disconnected from the initramfs root. The > > sequence is really no accident because it's what the initramfs would > > have done to pivot to the new root anyway (that's where container > > people got it from). > > > > > > James > > Yes, to add to Arvind's point the above sequence will only work for "old style" initrd (block ramdisk with some filesystem image on top), but will not work for the "new style" initramfs (just a disguised tmpfs). The sequence will fail on "pivot_root" with EINVAL (see pivot_root(2)). In fact this patch conceptually tries to have the same behaviour as with "old style" initrd. As currently, if you use initrd: 1. The kernel will create an empty "dummy" initramfs 2. Create a ramdisk 3. Unpack the FS image into the ramdisk 4. Mount the the disk 5. Do switch_root/move etc So we have initial mount tree as: rootfs->some_initrd_fs (and pivot_root works here and you get empty rootfs by default) With this option we have similar in the end: rootfs->tmpfs and rootfs is empty, because the kernel never unpacked anything there.