2022-01-18 02:28:14

by CGEL

[permalink] [raw]
Subject: [PATCH v7 0/2] init/initramfs.c: make initramfs support pivot_root

From: Zhang Yunkai <[email protected]>

As Luis Chamberlain suggested, I split the patch:
[init/initramfs.c: make initramfs support pivot_root]
(https://lore.kernel.org/linux-fsdevel/[email protected]/)
into three.

The goal of the series patches is to make pivot_root() support initramfs.

In the first patch, I introduce the function ramdisk_exec_exist(), which
is used to check the exist of 'ramdisk_execute_command' in LOOKUP_DOWN
lookup mode.

In the second patch, I create a second mount, which is called
'user root', and make it become the root. Therefore, the root has a
parent mount, and it can be umounted or pivot_root.

In the third patch, I fix rootfs_fs_type with ramfs, as it is not used
directly any more, and it make no sense to switch it between ramfs and
tmpfs, just fix it with ramfs to simplify the code.

Changes since V6:

Fix some bugs by Zhang Yunkai.


Changes since V5:

Remove the third patch and make it combined with the second one.


Changes since V4:

Do some more code cleanup for the second patch, include:
- move 'ramdisk_exec_exist()' to 'init.h'
- remove unnecessary struct 'fs_rootfs_root'
- introduce 'revert_mount_rootfs()'
- [...]


Changes since V3:

Do a code cleanup for the second patch, as Christian Brauner suggested:
- remove the concept 'user root', which seems not suitable.
- introduce inline function 'check_tmpfs_enabled()' to avoid duplicated
code.
- rename function 'mount_user_root' to 'prepare_mount_rootfs'
- rename function 'end_mount_user_root' to 'finish_mount_rootfs'
- join 'init_user_rootfs()' with 'prepare_mount_rootfs()'

Changes since V2:

In the first patch, I use vfs_path_lookup() in init_eaccess() to make the
path lookup follow the mount on '/'. After this, the problem reported by
Masami Hiramatsu is solved. Thanks for your report :/


Changes since V1:

In the first patch, I add the flag LOOKUP_DOWN to init_eaccess(), to make
it support the check of filesystem mounted on '/'.

In the second patch, I control 'user root' with kconfig option
'CONFIG_INITRAMFS_USER_ROOT', and add some comments, as Luis Chamberlain
suggested.

In the third patch, I make 'rootfs_fs_type' in control of
'CONFIG_INITRAMFS_USER_ROOT'.


Zhang Yunkai (2):
init/main.c: introduce function ramdisk_exec_exist()
init/do_mounts.c: create second mount for initramfs

fs/init.c | 11 +++++++++--
include/linux/init.h | 1 +
init/do_mounts.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
init/do_mounts.h | 17 ++++++++++++++++-
init/initramfs.c | 12 +++++++++++-
init/main.c | 7 ++++++-
usr/Kconfig | 10 ++++++++++
7 files changed, 98 insertions(+), 5 deletions(-)

--
2.25.1


2022-01-18 02:28:16

by CGEL

[permalink] [raw]
Subject: [PATCH v7 1/2] init/main.c: introduce function ramdisk_exec_exist()

From: Zhang Yunkai <[email protected]>

Introduce the function ramdisk_exec_exist, which is used to check the
exist of 'ramdisk_execute_command'.

To make path lookup follow the mount on '/', use vfs_path_lookup() in
init_eaccess(), and make the filesystem that mounted on '/' as root
during path lookup.

Reported-by: Menglong Dong <[email protected]>
Signed-off-by: Zhang Yunkai <[email protected]>
---
fs/init.c | 11 +++++++++--
include/linux/init.h | 1 +
init/main.c | 7 ++++++-
3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/init.c b/fs/init.c
index 5c36adaa9b44..166356a1f15f 100644
--- a/fs/init.c
+++ b/fs/init.c
@@ -112,14 +112,21 @@ int __init init_chmod(const char *filename, umode_t mode)

int __init init_eaccess(const char *filename)
{
- struct path path;
+ struct path path, root;
int error;

- error = kern_path(filename, LOOKUP_FOLLOW, &path);
+ error = kern_path("/", LOOKUP_DOWN, &root);
if (error)
return error;
+ error = vfs_path_lookup(root.dentry, root.mnt, filename,
+ LOOKUP_FOLLOW, &path);
+ if (error)
+ goto on_err;
error = path_permission(&path, MAY_ACCESS);
+
path_put(&path);
+on_err:
+ path_put(&root);
return error;
}

diff --git a/include/linux/init.h b/include/linux/init.h
index d82b4b2e1d25..889d538b6dfa 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -149,6 +149,7 @@ extern unsigned int reset_devices;
void setup_arch(char **);
void prepare_namespace(void);
void __init init_rootfs(void);
+bool ramdisk_exec_exist(void);
extern struct file_system_type rootfs_fs_type;

#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_STRICT_MODULE_RWX)
diff --git a/init/main.c b/init/main.c
index bb984ed79de0..ed1c63f4ed87 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1577,6 +1577,11 @@ void __init console_on_rootfs(void)
fput(file);
}

+bool __init ramdisk_exec_exist(void)
+{
+ return init_eaccess(ramdisk_execute_command) == 0;
+}
+
static noinline void __init kernel_init_freeable(void)
{
/* Now the scheduler is fully set up and can do blocking allocations */
@@ -1618,7 +1623,7 @@ static noinline void __init kernel_init_freeable(void)
* check if there is an early userspace init. If yes, let it do all
* the work
*/
- if (init_eaccess(ramdisk_execute_command) != 0) {
+ if (!ramdisk_exec_exist()) {
ramdisk_execute_command = NULL;
prepare_namespace();
}
--
2.25.1

2022-01-18 02:28:21

by CGEL

[permalink] [raw]
Subject: [PATCH v7 2/2] init/do_mounts.c: create second mount for initramfs

From: Zhang Yunkai <[email protected]>

If using container platforms such as Docker, upon initialization it
wants to use pivot_root() so that currently mounted devices do not
propagate to containers. An example of value in this is that
a USB device connected prior to the creation of a containers on the
host gets disconnected after a container is created; if the
USB device was mounted on containers, but already removed and
umounted on the host, the mount point will not go away until all
containers unmount the USB device.

Another reason for container platforms such as Docker to use pivot_root
is that upon initialization the net-namspace is mounted under
/var/run/docker/netns/ on the host by dockerd. Without pivot_root
Docker must either wait to create the network namespace prior to
the creation of containers or simply deal with leaking this to each
container.

pivot_root is supported if the rootfs is a initrd or block device, but
it's not supported if the rootfs uses an initramfs (tmpfs). This means
container platforms today must resort to using block devices if
they want to pivot_root from the rootfs. A workaround to use chroot()
is not a clean viable option given every container will have a
duplicate of every mount point on the host.

In order to support using container platforms such as Docker on
all the supported rootfs types we must extend Linux to support
pivot_root on initramfs as well. This patch does the work to do
just that.

pivot_root will unmount the mount of the rootfs from its parent mount
and mount the new root to it. However, when it comes to initramfs, it
donesn't work, because the root filesystem has not parent mount, which
makes initramfs not supported by pivot_root.

In order to make pivot_root supported on initramfs, we create a second
mount with type of rootfs before unpacking cpio, and change root to
this mount after unpacking.

While mounting the second rootfs, 'rootflags' is passed, and it means
that we can set options for the mount of rootfs in boot cmd now.
For example, the size of tmpfs can be set with 'rootflags=size=1024M'.

This patch is from:
[init/initramfs.c: make initramfs support pivot_root]
https://lore.kernel.org/all/[email protected]/
I fix some console bugs.

Reported-by: [email protected]
Signed-off-by: Zhang Yunkai <[email protected]>
---
init/do_mounts.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
init/do_mounts.h | 17 ++++++++++++++++-
init/initramfs.c | 12 +++++++++++-
usr/Kconfig | 10 ++++++++++
4 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index 762b534978d9..8a72c1fe17c8 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -650,6 +650,50 @@ void __init prepare_namespace(void)
}

static bool is_tmpfs;
+#ifdef CONFIG_INITRAMFS_MOUNT
+
+/*
+ * Give systems running from the initramfs and making use of pivot_root a
+ * proper mount so it can be umounted during pivot_root.
+ */
+int __init prepare_mount_rootfs(void)
+{
+ char *rootfs = "ramfs";
+
+ if (is_tmpfs)
+ rootfs = "tmpfs";
+
+ init_mkdir("/root", 0700);
+ return do_mount_root(rootfs, rootfs,
+ root_mountflags & ~MS_RDONLY,
+ root_mount_data);
+}
+
+/*
+ * Revert to previous mount by chdir to '/' and unmounting the second
+ * mount.
+ */
+void __init revert_mount_rootfs(void)
+{
+ init_chdir("/");
+ init_umount(".", MNT_DETACH);
+}
+
+/*
+ * Change root to the new rootfs that mounted in prepare_mount_rootfs()
+ * if cpio is unpacked successfully and 'ramdisk_execute_command' exist.
+ */
+void __init finish_mount_rootfs(void)
+{
+ init_mount(".", "/", NULL, MS_MOVE, NULL);
+ if (likely(ramdisk_exec_exist()))
+ init_chroot(".");
+ else
+ revert_mount_rootfs();
+}
+
+#define rootfs_init_fs_context ramfs_init_fs_context
+#else
static int rootfs_init_fs_context(struct fs_context *fc)
{
if (IS_ENABLED(CONFIG_TMPFS) && is_tmpfs)
@@ -657,6 +701,7 @@ static int rootfs_init_fs_context(struct fs_context *fc)

return ramfs_init_fs_context(fc);
}
+#endif

struct file_system_type rootfs_fs_type = {
.name = "rootfs",
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 7a29ac3e427b..ae4ab306caa9 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -10,9 +10,24 @@
#include <linux/root_dev.h>
#include <linux/init_syscalls.h>

+extern int root_mountflags;
+
void mount_block_root(char *name, int flags);
void mount_root(void);
-extern int root_mountflags;
+
+#ifdef CONFIG_INITRAMFS_MOUNT
+
+int prepare_mount_rootfs(void);
+void finish_mount_rootfs(void);
+void revert_mount_rootfs(void);
+
+#else
+
+static inline int prepare_mount_rootfs(void) { return 0; }
+static inline void finish_mount_rootfs(void) { }
+static inline void revert_mount_rootfs(void) { }
+
+#endif

static inline __init int create_dev(char *name, dev_t dev)
{
diff --git a/init/initramfs.c b/init/initramfs.c
index 2f3d96dc3db6..881fb8ea4352 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -17,6 +17,8 @@
#include <linux/init_syscalls.h>
#include <linux/umh.h>

+#include "do_mounts.h"
+
static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
loff_t *pos)
{
@@ -671,7 +673,12 @@ static void __init populate_initrd_image(char *err)
static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
{
/* Load the built in initramfs */
- char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
+ char *err;
+
+ if (prepare_mount_rootfs())
+ panic("Failed to mount rootfs\n");
+
+ err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
if (err)
panic_show_mem("%s", err); /* Failed to decompress INTERNAL initramfs */

@@ -685,11 +692,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)

err = unpack_to_rootfs((char *)initrd_start, initrd_end - initrd_start);
if (err) {
+ revert_mount_rootfs();
#ifdef CONFIG_BLK_DEV_RAM
populate_initrd_image(err);
#else
printk(KERN_EMERG "Initramfs unpacking failed: %s\n", err);
#endif
+ } else {
+ finish_mount_rootfs();
}

done:
diff --git a/usr/Kconfig b/usr/Kconfig
index 8bbcf699fe3b..af2aeeef8635 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -52,6 +52,16 @@ config INITRAMFS_ROOT_GID

If you are not sure, leave it set to "0".

+config INITRAMFS_MOUNT
+ bool "Create second mount to make pivot_root() supported"
+ default y
+ help
+ Before unpacking cpio, create a second mount and make it become
+ the root filesystem. Therefore, initramfs will be supported by
+ pivot_root().
+
+ If container platforms is used with initramfs, say Y.
+
config RD_GZIP
bool "Support initial ramdisk/ramfs compressed using gzip"
default y
--
2.25.1

2022-01-19 20:46:15

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] init/do_mounts.c: create second mount for initramfs

On Mon, Jan 17, 2022 at 01:43:52PM +0000, [email protected] wrote:
> From: Zhang Yunkai <[email protected]>
>
> If using container platforms such as Docker, upon initialization it
> wants to use pivot_root() so that currently mounted devices do not
> propagate to containers. An example of value in this is that
> a USB device connected prior to the creation of a containers on the
> host gets disconnected after a container is created; if the
> USB device was mounted on containers, but already removed and
> umounted on the host, the mount point will not go away until all
> containers unmount the USB device.
>
> Another reason for container platforms such as Docker to use pivot_root
> is that upon initialization the net-namspace is mounted under
> /var/run/docker/netns/ on the host by dockerd. Without pivot_root
> Docker must either wait to create the network namespace prior to
> the creation of containers or simply deal with leaking this to each
> container.
>
> pivot_root is supported if the rootfs is a initrd or block device, but
> it's not supported if the rootfs uses an initramfs (tmpfs). This means
> container platforms today must resort to using block devices if
> they want to pivot_root from the rootfs. A workaround to use chroot()
> is not a clean viable option given every container will have a
> duplicate of every mount point on the host.

Sorry if this was already answered.

My understanding is that you have initramfs with docker installed on it and
with one or more container images packed there. And the desire is to use
this initramfs to run docker containers and for that you need to enable
pivot_root for initramfs.

Have you tried packing docker and the containers to a block image that can
be loop-mounted from the initramfs? Then you can chroot to that loop
mounted filesystem and there pivot_root will be available for docker.

> In order to support using container platforms such as Docker on
> all the supported rootfs types we must extend Linux to support
> pivot_root on initramfs as well. This patch does the work to do
> just that.

--
Sincerely yours,
Mike.