Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2442034ybb; Mon, 30 Mar 2020 06:16:46 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuo0J5b7xBfECKegd+ejVuvhIhowlALTxfGoV71PbiQQ6XC7SPt53LWe7bngOBHZyv5qOQv X-Received: by 2002:aca:5712:: with SMTP id l18mr7516906oib.178.1585574205790; Mon, 30 Mar 2020 06:16:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585574205; cv=none; d=google.com; s=arc-20160816; b=ttB/7gHShKyLSuJ+kG2s8eH4riObFdrfcDkm47L7QmPKbgeRi34J/YLocMyziNJS/M p82XVhlFdiW+Y3EMzM3etMtPQDmvI0EIHpTIHzRw17rdvyLU3wMsyij64NHkCB7ldp6c ElPOJv9J5BwqYpiYZ4aoK45+SN74fDclX3IlKei1zcLgfgcfxgvd4hLdhl6/pLluJD7d l+mWUH6akxxOl+M0UWqn56zTR4jFkgaMuVDIayYKLcMxp3M3hY98MOmqvggeJReVYVac GVuFfmNBiNR1vn1O3XdW6NokfZDYN1A08zRjfXP9mN63QkOMfsxjJh3Vh9K/cDHbYQl4 jR9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LfIlxScnHZVuwjqrv90OqAsj+bqUS/oDr+dehBCUUOE=; b=TDjVWR5gJhWMDWy/xgdMY0nTNPrd0nnPys8xz6yOw4LrnrDcmllAm+Hunsf59B9/ha 3pPVcTuzQGKQu7BbWKFjWtnVRzp8OYIkP1lVkiylEoVlE2tLR7tZ3Us4zQa6rK8bN1qo nQp2Pmk+YtyfEPDt51di/iIXgA67tgwY+MK7UJ0BPDNAWib8TI+U0WSq0T4N44eeen0o 1Es3NkC+0fsvv5AGgykePIzRZXS0OdCXp4yhFy4trGyQrOdLLvReeepAxUMbto4mCZQh DuAt3zgo9Lk4Ih6pizzGYpmUVkLafBR1XtPiMX70wA/HDUv4Q7ofdARLozTRKlM8nkit oxfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=B4aNZ73V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r5si6152804otp.241.2020.03.30.06.16.29; Mon, 30 Mar 2020 06:16:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=B4aNZ73V; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730294AbgC3NPg (ORCPT + 99 others); Mon, 30 Mar 2020 09:15:36 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:56088 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730288AbgC3NPg (ORCPT ); Mon, 30 Mar 2020 09:15:36 -0400 Received: by mail-wm1-f68.google.com with SMTP id r16so2055020wmg.5 for ; Mon, 30 Mar 2020 06:15:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LfIlxScnHZVuwjqrv90OqAsj+bqUS/oDr+dehBCUUOE=; b=B4aNZ73V9a79klahCoUBEKpxTe/HnntTbwiZqmpuAo4qgAHAUzTCFZ4D6yHiPwSNKV jPBfQm07NAzxWJbLkqdCN3/wxHjspIn3NqLgCPaMTbAromtv4JcAHh8gNtFR3yUu5svT /oX/k2OtIo4ilqDa5C3hr39bVL2yMEMkD/lJc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LfIlxScnHZVuwjqrv90OqAsj+bqUS/oDr+dehBCUUOE=; b=MwBNFsMKGsakzdqFCTmudtLoMk9qfQo6sLhDU6tag12Ee1GVAI7cVPHedfX1QumZMG MQ9n3utOApbt0Wzzy2BofM2jX2HBR/s0q3QyBBjQAH1/aPdx3TzaObb0RBKHRzhelmA+ n/rpE5LTr362iDkJLFbBe2ll8o45gUfQHkkwYB3UAIEVNVvhhnKqwQvo7VJ7XmPTak8W Ge4dkLxGa51MWXHAOMkneQoimAz8fWU4wRdM7jvRveI9QneSbDbc+bi/7/qXdFJgOTJv y9oWHY21e83keECrzbrXY3ImPyBIVgROcFwMPNnHpGVcwsuzKP76gJ4SuDuY5psgdPYB DNYg== X-Gm-Message-State: ANhLgQ2gaGk3I+5Qpxl4pL0dxQYRwBJp5Qzli0HjO+JhHSoDZx0j+Ur+ jrbff6WaBwENjJT7KxpEikqFHw== X-Received: by 2002:a7b:c842:: with SMTP id c2mr13931018wml.154.1585574133896; Mon, 30 Mar 2020 06:15:33 -0700 (PDT) Received: from localhost.localdomain ([88.144.89.159]) by smtp.gmail.com with ESMTPSA id f14sm21135398wmb.3.2020.03.30.06.15.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Mar 2020 06:15:33 -0700 (PDT) From: Ignat Korchagin To: viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Ignat Korchagin , kernel-team@cloudflare.com Subject: [PATCH 1/1] mnt: add support for non-rootfs initramfs Date: Mon, 30 Mar 2020 14:14:39 +0100 Message-Id: <20200330131439.2405-2-ignat@cloudflare.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200330131439.2405-1-ignat@cloudflare.com> References: <20200330131439.2405-1-ignat@cloudflare.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The main need for this is to support container runtimes on stateless Linux system (pivot_root system call from initramfs). Normally, the task of initramfs is to mount and switch to a "real" root filesystem. However, on stateless systems (booting over the network) it is just convenient to have your "real" filesystem as initramfs from the start. This, however, breaks different container runtimes, because they usually use pivot_root system call after creating their mount namespace. But pivot_root does not work from initramfs, because initramfs runs form rootfs, which is the root of the mount tree and can't be unmounted. One workaround is to do: mount --bind / / However, that defeats one of the purposes of using pivot_root in the cloned containers: get rid of host root filesystem, should the code somehow escapes the chroot. There is a way to solve this problem from userspace, but it is much more cumbersome: * either have to create a multilayered archive for initramfs, where the outer layer creates a tmpfs filesystem and unpacks the inner layer, switches root and does not forget to properly cleanup the old rootfs * or we need to use keepinitrd kernel cmdline option, unpack initramfs to rootfs, run a script to create our target tmpfs root, unpack the same initramfs there, switch root to it and again properly cleanup the old root, thus unpacking the same archive twice and also wasting memory, because the kernel stores compressed initramfs image indefinitely. With this change we can ask the kernel (by specifying nonroot_initramfs kernel cmdline option) to create a "leaf" tmpfs mount for us and switch root to it before the initramfs handling code, so initramfs gets unpacked directly into the "leaf" tmpfs with rootfs being empty and no need to clean up anything. This also bring the behaviour in line with the older style initrd, where the initrd is located on some leaf filesystem in the mount tree and rootfs remaining empty. Signed-off-by: Ignat Korchagin --- fs/namespace.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index 85b5f7bea82e..a1ec862e8146 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3701,6 +3701,49 @@ static void __init init_mount_tree(void) set_fs_root(current->fs, &root); } +#if IS_ENABLED(CONFIG_TMPFS) +static int __initdata nonroot_initramfs; + +static int __init nonroot_initramfs_param(char *str) +{ + if (*str) + return 0; + nonroot_initramfs = 1; + return 1; +} +__setup("nonroot_initramfs", nonroot_initramfs_param); + +static void __init init_nonroot_initramfs(void) +{ + int err; + + if (!nonroot_initramfs) + return; + + err = ksys_mkdir("/root", 0700); + if (err < 0) + goto out; + + err = do_mount("tmpfs", "/root", "tmpfs", 0, NULL); + if (err) + goto out; + + err = ksys_chdir("/root"); + if (err) + goto out; + + err = do_mount(".", "/", NULL, MS_MOVE, NULL); + if (err) + goto out; + + err = ksys_chroot("."); + if (!err) + return; +out: + printk(KERN_WARNING "Failed to create a non-root filesystem for initramfs\n"); +} +#endif /* IS_ENABLED(CONFIG_TMPFS) */ + void __init mnt_init(void) { int err; @@ -3734,6 +3777,10 @@ void __init mnt_init(void) shmem_init(); init_rootfs(); init_mount_tree(); + +#if IS_ENABLED(CONFIG_TMPFS) + init_nonroot_initramfs(); +#endif } void put_mnt_ns(struct mnt_namespace *ns) -- 2.20.1